Skip to content
No description or website provided.
C++ CMake JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
deps
lib
src
.clang-format
.clang-tidy
.gitignore
.gitmodules
CMakeLists.txt
LICENSE
README.md
package.json
yarn.lock

README.md

Native Voice Command Detector

Native voice command detector is a multithreaded N-API based native NodeJS module that can be used to detect voice commands.

The entire audio processing queue is handled by this module to prevent N-API overhead.

Chunks of processing are distributed between worker threads and utilize parallelization for maximum performance.

The amount of created worker threads is logical_cpu_cores + 1, where 1 thread schedules the work.

The project as of right now supports receiving RAW OPUS frames on Linux X86_64 platforms and invokes a callback with the text of the command whenever it's detected. Multiple audio streams are supported via unique IDs.

The module relies on Porcupine for hotword detection and Google Speech To Text API for speech recognition.

This project was primarily created for the Discord VoiceBot, since a NodeJS only solution wouldn't be able to provide the desired performance.

Demo (based on Discord VoiceBot)

Demo

Why Terminator? It seems to be the most consistently detected hotword.

Why Mozart? It's not copyrighted.

How does this work

Consult my blog post about the design decisions if you're interested.

Building

  • Install the external dependencies by relying on the system package manager:

    • libopus
    • libopusenc
    • libcurl
    • libssl
    • libcrypto
  • Install the internal dependencies by running git submodule update --init --recursive

  • Run yarn or npm install to setup the build environment

  • Run yarn build or npm run build to build the module

  • The module will be located in build/Release/detector.node

  • NOTE: This project is in an alpha state and doesn't have prebuilt binaries ready. TO be able to require this module, rely on yarn link or npm link.

Usage

First, initialize a Detector instance with the correct configuration:

const Detector = require("native-voice-command-detector");

const voiceCommandDetector = new Detector(
    pv_model_path,
    pv_keyword_path,
    pv_sensitivity,
    gcloud_speech_to_text_api_key,
    max_voice_buffer_ttl,
    max_command_length,
    max_command_silence_length_ms,
    callback
    );

pv_model_path and pv_keyword_path specify the hotword to detect. Consult Porcupine's documentation on how to generate these files.

pv_sensitivity configures the sensitivity and should be a float between 0 (lowest sensitivity) and 1 (highest sensitivity).

gcloud_speech_to_text_api_key is the API key that will be used for GCloud based operations. Consult the GCloud API Key documentation for details.

max_voice_buffer_ttl is the amount of time in milliseconds a voice data buffer can spend being filled, without being processed by a worker thread.

max_command_length specifies the maximum length of a voice command in milliseconds, after which the speech recognition will begin.

max_command_silence_length_ms specifies the length of silence (no input) in milliseconds after which the command sequence will be treated as complete and the speech recognition will begin.

callback will be called upon a keyword being detected:

  const callback = (id, command) => {
      // ID is a string containing the audio source identification
      // command is the detected command text
      console.log(id, command)
  };

After the instance is initialized, submit audio data via:

commandDetector.addOpusFrame(id, buf);

Where buf is a Buffer containing the binary data of the OPUS frame.

TypeScript

TypeScript definitions are available out of the box in lib/index.d.ts.

Copyright

Copyright (c) 2019 Ruben Harutyunyan

You can’t perform that action at this time.