Native voice command detector is a multithreaded N-API based native NodeJS module that can be used to detect voice commands.
The entire audio processing queue is handled by this module to prevent N-API overhead.
Chunks of processing are distributed between worker threads and utilize parallelization for maximum performance.
The amount of created worker threads is logical_cpu_cores + 1
, where 1 thread schedules the work.
The project as of right now supports receiving RAW OPUS frames on Linux X86_64 platforms and invokes a callback with the text of the command whenever it's detected. Multiple audio streams are supported via unique IDs.
The module relies on Porcupine for hotword detection and Google Speech To Text API for speech recognition.
This project was primarily created for the Discord VoiceBot, since a NodeJS only solution wouldn't be able to provide the desired performance.
Demo (based on Discord VoiceBot)
Why Terminator
? It seems to be the most consistently detected hotword.
Why Mozart
? It's not copyrighted.
Consult my blog post about the design decisions if you're interested.
-
Install the external dependencies by relying on the system package manager:
- libopus
- libopusenc
- libcurl
- libssl
- libcrypto
-
Install the internal dependencies by running
git submodule update --init --recursive
-
Run
yarn
ornpm install
to setup the build environment -
Run
yarn build
ornpm run build
to build the module -
The module will be located in
build/Release/detector.node
-
NOTE: This project is in an alpha state and doesn't have prebuilt binaries ready. TO be able to
require
this module, rely onyarn link
ornpm link
.
First, initialize a Detector
instance with the correct configuration:
const Detector = require("native-voice-command-detector");
const voiceCommandDetector = new Detector(
pv_model_path,
pv_keyword_path,
pv_sensitivity,
gcloud_speech_to_text_api_key,
max_voice_buffer_ttl,
max_command_length,
max_command_silence_length_ms,
callback
);
pv_model_path
and pv_keyword_path
specify the hotword to detect. Consult Porcupine's documentation on how to generate these files.
pv_sensitivity
configures the sensitivity and should be a float between 0
(lowest sensitivity) and 1
(highest sensitivity).
gcloud_speech_to_text_api_key
is the API key that will be used for GCloud based operations. Consult the GCloud API Key documentation for details.
max_voice_buffer_ttl
is the amount of time in milliseconds a voice data buffer can spend being filled, without being processed by a worker thread.
max_command_length
specifies the maximum length of a voice command in milliseconds, after which the speech recognition will begin.
max_command_silence_length_ms
specifies the length of silence (no input) in milliseconds after which the command sequence will be treated as complete and the speech recognition will begin.
callback
will be called upon a keyword being detected:
const callback = (id, command) => {
// ID is a string containing the audio source identification
// command is the detected command text
console.log(id, command)
};
After the instance is initialized, submit audio data via:
commandDetector.addOpusFrame(id, buf);
Where buf
is a Buffer
containing the binary data of the OPUS frame.
TypeScript definitions are available out of the box in lib/index.d.ts
.
Copyright (c) 2019 Ruben Harutyunyan