Performing speech recognition
This example shows how to perform speech recognition. Like DTMF recognition, it involves the Interaction
which is created with the Interaction.Builder
.
The first thing we need to do is to create the SpeechRecognition
. This object contains the speech grammar used for speech recognition. There
are a few built-in grammars specified in the VoiceXML specification. We are going to use digits
.
GrammarItem speechGrammar = new GrammarReference("builtin:grammar/digits");
SpeechRecognition speechRecognition = new SpeechRecognition(speechGrammar);
Then we need to build the interaction. We are going to create an instance of Interaction.Builder
for this task.
Interaction interaction = interaction("get-speech")
Then we need to specify the prompts that we are going to use. This is done with the addPrompt
method. Here we just use a SpeechSynthesis
object.
To create the Interaction
itself, we invoke the build
method and we pass the SpeechRecognition
method, meaning that we want to
perform speech recognition immediately after prompts are played. We also need to specify the time-out value, i.e. the time before a no-input
event is raise.
.addPrompt(new SpeechSynthesis("Say some digits."))
.build(speechRecognition, Duration.seconds(5));
Now that we have an Interaction
, we can execute it and receive the InputTurn
which contains the result of this interaction.
VoiceXmlInputTurn inputTurn = DialogueUtils.doTurn(interaction, context);
We now need to inspect the result in order to know what has been done by the user. We are going to deal with the following three outcomes:
- The user said some digits
- The user said something that could not be understood
- The user didn't say anything
Actually, other outcomes are possible (e.g. hang-up, error) but we'll ignore that for the sake of simplicity.
The InputTurn
contains a recognitionInfo
property which gives us acces to the recognition result.
Logger logger = context.getLogger();
if (inputTurn.getRecognitionInfo() != null) {
JsonArray recognitionResult = inputTurn.getRecognitionInfo().getRecognitionResult();
//Extracting the "interpretation" of the first recognition hypothesis.
String digits = recognitionResult.getJsonObject(0).getString("interpretation");
logger.info("Digits spoken: " + digits);
} else if (VoiceXmlEvent.hasEvent(VoiceXmlEvent.NO_MATCH, inputTurn.getEvents())) {
logger.info("Could not understand.");
} else if (VoiceXmlEvent.hasEvent(VoiceXmlEvent.NO_INPUT, inputTurn.getEvents())) {
logger.info("Timeout.");
}
You can download or browse the complete code for this example at GitHub.This is a complete working application that you can build and run for yourself.
You can also clone the Rivr Cookbook repository and checkout this example:
git clone -b simple-speech-interaction git@github.com:nuecho/rivr-cookbook.git
Then, to build and run it:
cd rivr-cookbook
./gradlew jettyRun
The VoiceXML dialogue should be available at http://localhost:8080/rivr-cookbook/dialogue
To stop the application, press Control-C in the console.