Skip to content

Performing speech recognition

Jean-Philippe Gariépy edited this page May 1, 2018 · 3 revisions

This example shows how to perform speech recognition. Like DTMF recognition, it involves the Interaction which is created with the Interaction.Builder.

The first thing we need to do is to create the SpeechRecognition. This object contains the speech grammar used for speech recognition. There are a few built-in grammars specified in the VoiceXML specification. We are going to use digits.

Dialogue.java:

GrammarItem speechGrammar = new GrammarReference("builtin:grammar/digits");
SpeechRecognition speechRecognition = new SpeechRecognition(speechGrammar);

Then we need to build the interaction. We are going to create an instance of Interaction.Builder for this task.

Dialogue.java:

Interaction interaction = interaction("get-speech")

Then we need to specify the prompts that we are going to use. This is done with the addPrompt method. Here we just use a SpeechSynthesis object.

To create the Interaction itself, we invoke the build method and we pass the SpeechRecognition method, meaning that we want to perform speech recognition immediately after prompts are played. We also need to specify the time-out value, i.e. the time before a no-input event is raise.

Dialogue.java:

.addPrompt(new SpeechSynthesis("Say some digits."))
.build(speechRecognition, Duration.seconds(5));

Now that we have an Interaction, we can execute it and receive the InputTurn which contains the result of this interaction.

Dialogue.java:

VoiceXmlInputTurn inputTurn = DialogueUtils.doTurn(interaction, context);

We now need to inspect the result in order to know what has been done by the user. We are going to deal with the following three outcomes:

  1. The user said some digits
  2. The user said something that could not be understood
  3. The user didn't say anything

Actually, other outcomes are possible (e.g. hang-up, error) but we'll ignore that for the sake of simplicity. The InputTurn contains a recognitionInfo property which gives us acces to the recognition result.

Dialogue.java:

Logger logger = context.getLogger();
if (inputTurn.getRecognitionInfo() != null) {
    JsonArray recognitionResult = inputTurn.getRecognitionInfo().getRecognitionResult();
    //Extracting the "interpretation" of the first recognition hypothesis. 
    String digits = recognitionResult.getJsonObject(0).getString("interpretation");
    logger.info("Digits spoken: " + digits);
} else if (VoiceXmlEvent.hasEvent(VoiceXmlEvent.NO_MATCH, inputTurn.getEvents())) {
    logger.info("Could not understand.");
} else if (VoiceXmlEvent.hasEvent(VoiceXmlEvent.NO_INPUT, inputTurn.getEvents())) {
    logger.info("Timeout.");
}

Running this example

You can download or browse the complete code for this example at GitHub.This is a complete working application that you can build and run for yourself.

You can also clone the Rivr Cookbook repository and checkout this example:

git clone -b simple-speech-interaction git@github.com:nuecho/rivr-cookbook.git

Then, to build and run it:

cd rivr-cookbook

./gradlew jettyRun

The VoiceXML dialogue should be available at http://localhost:8080/rivr-cookbook/dialogue

To stop the application, press Control-C in the console.