This is an experimental Media Resource Control Protocol (MRCPv2) client that I'm writing in node.js for learning purposes.
The version of node used for development can be found in the package.json file.
First install non-npm dependencies. Do:
apt install sox libasound2-dev
or
yum install sox libasound2-devel
and then install npm dependencies
npm install
Then create config file:
cp config/default.js.sample config/default.js
vim config/default.js # adjust parameters if necessary.
You can test by using either:
node speechsynth_client.js
or
node speechrecog_client.js
You can try them with https://github.com/MayamaTakeshi/mrcp_server
Once it is installed you can test Google Speech Synthesis like this:
node speechsynth_client.js 127.0.0.1 8070 en-US en-US-Wavenet-E "Hello World."
node speechsynth_client.js 127.0.0.1 8070 ja-JP ja-JP-Wavenet-A "おはようございます."
or like this to save audio to a wav file:
node speechsynth_client.js -w generated_speech.wav 127.0.0.1 8070 en-US en-US-Wavenet-E "Hello World."
And if your machine doesn't have an audio device (no speaker), disable local audio generation by passing option -S:
node speechsynth_client.js -S -w generated_speech.wav 127.0.0.1 8070 en-US en-US-Wavenet-E "Hello World."
To test Google Speech Recognition:
node speechrecog_client.js 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe
To pass a grammar file use @PATH_TO_GRAMMAR_FILE:
node speechrecog_client.js 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav @artifacts/grammar.xml
If you use mrcp_server and don't have Google credentials, you can test using DTMF:
node speechsynth_client.js 127.0.0.1 8070 dtmf dtmf 1234567890abcd*#
node speechrecog_client.js 127.0.0.1 8070 dtmf artifacts/dtmf.0123456789ABCDEF.16000hz.wav builtin:speech/transcribe
or Morse Code:
node speechsynth_client.js 127.0.0.1 8070 morse 440hz 'stop and smell the roses'
node speechrecog_client.js 127.0.0.1 8070 morse artifacts/morse.stop_and_smell_the_roses.wav builtin:speech/transcribe
Obs: morse speech recognition was adjusted to work with the output generated by morse speech synth (speed). This will be eventuall solved by MayamaTakeshi/morse-decoding-stream#2
You can also capture audio from your microphone this way (instead of path to a wav file, pass 'MIC'):
node speechrecog_client.js 127.0.0.1 8070 ja-JP MIC builtin:speech/transcribe
For speech synth you can use SSML:
node speechsynth_client.js 127.0.0.1 8070 en-US en-US-Standard-C "<speak><prosody rate='x-slow' pitch='3st'>I'm sad today.</prosody></speak>"
node speechsynth_client.js 127.0.0.1 8070 dtmf dtmf '<speak><prosody rate="50ms">1234</prosody><break time="500ms"/><prosody rate="100ms">1234</prosody></speak>'
node speechsynth_client.js 127.0.0.1 8070 morse C4 '<speak><prosody rate="50wpm">Save Our Souls</prosody><break time="500ms"/><prosody rate="70wpm">SOS SOS SOS</prosody></speak>'
To test Julius Speech Recognition with mrcp_server:
You will need to install julius_server
Then update your mrcp_server/config.js with the information about the julius_server.
Then you can test it like this:
node speechrecog_client.js -r 'engine: julius' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe
To test Olaris Speech Recognition with mrcp_server:
Obtain credentials for the Olaris API (https://ncr.ernie-mlg.com/)
Set the credentials on the config/default.js file
Then you can test it like this:
node speechrecog_client.js -r 'engine: olaris' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe
or this
node speechrecog_client.js -r 'engine: olaris' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav @artifacts/olaris_grammar.xml
To test Vosk Speech Recognition with mrcp_server:
You will need to have vosk_server instances running somewhere
Then update your mrcp_server/config.js with the information about the vosk_server instances.
Then you can test it like this:
node speechrecog_client.js -r 'engine: vosk' 127.0.0.1 8070 ja-JP artifacts/ohayou_gozaimasu.wav builtin:speech/transcribe
While this tool was not developed with load testing in mind, if you need to make several calls to your MRCP server you can do it with something like this for speechsynth:
NUMBER_OF_CALLS=10; for i in $(seq 1 $NUMBER_OF_CALLS);do node speechsynth_client.js 127.0.0.1 8070 dtmf dtmf 1234 & sleep 0.1; done
or this for speechrecog:
NUMBER_OF_CALLS=10; for i in $(seq 1 $NUMBER_OF_CALLS);do node speechrecog_client.js 127.0.0.1 8070 dtmf artifacts/dtmf.0123456789ABCDEF.16000hz.wav builtin:speech/transcribe & sleep 0.1; done
Obs: the "sleep 0.1" is necessary to minimize the risk of failing to allocate the UDP port for the SIP stack due to a shortcoming in the sip.js library we are using. Ref: kirm/sip.js#147
And to keep generating calls in a loop you can use something like this for speechsynth:
NUMBER_OF_CALLS=10; while [[ 1 ]];do for i in $(seq 1 $NUMBER_OF_CALLS);do node speechsynth_client.js -t 5000 127.0.0.1 8070 dtmf dtmf 1234 & sleep 0.1; done; sleep 2; done
or this for speechrecog:
NUMBER_OF_CALLS=10; while [[ 1 ]];do for i in $(seq 1 $NUMBER_OF_CALLS);do node speechrecog_client.js -t 5000 127.0.0.1 8070 dtmf artifacts/dtmf.0123456789ABCDEF.16000hz.wav builtin:speech/transcribe & sleep 0.1; done; sleep 4; done
Obs: be careful when load testing an MRCP server that uses paid speech services like Google Speech, Amazon Polly etc as you might get a large bill if you forget the load test running for very long.