Skip to content

MayamaTakeshi/mrcp_client

Repository files navigation

mrcp_client

This is an experimental Media Resource Control Protocol (MRCPv2) client that I'm writing in node.js for learning purposes.

The version of node used for development can be found in the package.json file.

Installation

First install non-npm dependencies. Do:

apt install sox libasound2-dev

or

yum install sox libasound2-devel

and then install npm dependencies

npm install

Then create config file:

cp config/default.js.sample config/default.js
vim config/default.js # adjust parameters if necessary.

Testing

You can test by using either:

node speechsynth_client.js

or

node speechrecog_client.js

You can try them with https://github.com/MayamaTakeshi/mrcp_server

Once it is installed you can test Google Speech Synthesis like this:

node speechsynth_client.js 127.0.0.1 8070 en-US en-US-Wavenet-E "Hello World."

node speechsynth_client.js 127.0.0.1 8070 ja-JP ja-JP-Wavenet-A "おはようございます."

or like this to save audio to a wav file:

node speechsynth_client.js -w generated_speech.wav 127.0.0.1 8070 en-US en-US-Wavenet-E "Hello World."

And if your machine doesn't have an audio device (no speaker), disable local audio generation by passing option -S:

node speechsynth_client.js -S -w generated_speech.wav 127.0.0.1 8070 en-US en-US-Wavenet-E "Hello World."

To test Google Speech Recognition:

node speechrecog_client.js 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe

To pass a grammar file use @PATH_TO_GRAMMAR_FILE:

node speechrecog_client.js 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav @artifacts/grammar.xml 

If you use mrcp_server and don't have Google credentials, you can test using DTMF:

node speechsynth_client.js 127.0.0.1 8070 dtmf dtmf 1234567890abcd*#

node speechrecog_client.js 127.0.0.1 8070 dtmf artifacts/dtmf.0123456789ABCDEF.16000hz.wav builtin:speech/transcribe

or Morse Code:

node speechsynth_client.js 127.0.0.1 8070 morse 440hz 'stop and smell the roses'

node speechrecog_client.js 127.0.0.1 8070 morse artifacts/morse.stop_and_smell_the_roses.wav builtin:speech/transcribe

Obs: morse speech recognition was adjusted to work with the output generated by morse speech synth (speed). This will be eventuall solved by MayamaTakeshi/morse-decoding-stream#2

You can also capture audio from your microphone this way (instead of path to a wav file, pass 'MIC'):

node speechrecog_client.js 127.0.0.1 8070 ja-JP MIC builtin:speech/transcribe

For speech synth you can use SSML:

node speechsynth_client.js 127.0.0.1 8070 en-US en-US-Standard-C "<speak><prosody rate='x-slow' pitch='3st'>I'm sad today.</prosody></speak>"

node speechsynth_client.js 127.0.0.1 8070 dtmf dtmf '<speak><prosody rate="50ms">1234</prosody><break time="500ms"/><prosody rate="100ms">1234</prosody></speak>'

node speechsynth_client.js 127.0.0.1 8070 morse C4 '<speak><prosody rate="50wpm">Save Our Souls</prosody><break time="500ms"/><prosody rate="70wpm">SOS SOS SOS</prosody></speak>'

To test Julius Speech Recognition with mrcp_server:

You will need to install julius_server

Then update your mrcp_server/config.js with the information about the julius_server.

Then you can test it like this:

node speechrecog_client.js -r 'engine: julius' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe

To test Olaris Speech Recognition with mrcp_server:

Obtain credentials for the Olaris API (https://ncr.ernie-mlg.com/)

Set the credentials on the config/default.js file

Then you can test it like this:

node speechrecog_client.js -r 'engine: olaris' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe

or this

node speechrecog_client.js -r 'engine: olaris' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav @artifacts/olaris_grammar.xml

To test Vosk Speech Recognition with mrcp_server:

You will need to have vosk_server instances running somewhere

Then update your mrcp_server/config.js with the information about the vosk_server instances.

Then you can test it like this:

node speechrecog_client.js -r 'engine: vosk' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe

Load testing

While this tool was not developed with load testing in mind, if you need to make several calls to your MRCP server you can do it with something like this for speechsynth:

NUMBER_OF_CALLS=10; for i in $(seq 1 $NUMBER_OF_CALLS);do node speechsynth_client.js 127.0.0.1 8070 dtmf dtmf 1234 & sleep 0.1; done

or this for speechrecog:

NUMBER_OF_CALLS=10; for i in $(seq 1 $NUMBER_OF_CALLS);do node speechrecog_client.js 127.0.0.1 8070 dtmf artifacts/dtmf.0123456789ABCDEF.16000hz.wav builtin:speech/transcribe & sleep 0.1; done

Obs: the "sleep 0.1" is necessary to minimize the risk of failing to allocate the UDP port for the SIP stack due to a shortcoming in the sip.js library we are using. Ref: kirm/sip.js#147

And to keep generating calls in a loop you can use something like this for speechsynth:

NUMBER_OF_CALLS=10; while [[ 1 ]];do for i in $(seq 1 $NUMBER_OF_CALLS);do node speechsynth_client.js -t 5000 127.0.0.1 8070 dtmf dtmf 1234 & sleep 0.1; done; sleep 2; done

or this for speechrecog:

NUMBER_OF_CALLS=10; while [[ 1 ]];do for i in $(seq 1 $NUMBER_OF_CALLS);do node speechrecog_client.js -t 5000 127.0.0.1 8070 dtmf artifacts/dtmf.0123456789ABCDEF.16000hz.wav builtin:speech/transcribe & sleep 0.1; done; sleep 4; done

Obs: be careful when load testing an MRCP server that uses paid speech services like Google Speech, Amazon Polly etc as you might get a large bill if you forget the load test running for very long.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published