-
Notifications
You must be signed in to change notification settings - Fork 8
Grammars
DialogOS speech recognizer nodes use grammars for restricting the possible utterances that the recognizer should consider. This greatly improves recognition accuracy, as long as the user sticks to the utterances that the grammar permits.
There are multiple ways in which you can specify the grammar:
- By listing keywords (DirectGrammar, "<Generate from alternatives>")
- By writing your own fixed grammar
- By creating a grammar in code (DynamicGrammar, "<Generate from expression>")
If you select the entry "<Generate from alternatives>" under "Grammar" in the recognizer node, DialogOS automatically constructs a grammar which recognizes exactly the alternatives you specify in the list below. The node will have as many output ports as it has possible recognition outcomes.
In the simplest case, an alternative is an individual keyword ("yes", "no", etc.) or a phrase ("I want pizza"). For example, you may specify two alternatives, "yes" and "no" and the recognizer will map whatever you say to these two possibilities. The node will have two ports, one for each alternative.
You may also add a time-out port (see "Options" tab) that is triggered if recognition is not successful within the predetermined time.
Grammars will map anything you say to the alternatives. E.g., if you say "maybe" in the example above, it may well be recognized as yes or no. Set a higher confidence threshold ("Options") to reject hypotheses if they are not good matches to the defined alternatives. Select "Allow extra words" in the "Options" tab to enable recognition even if the user speaks words that are not part of the template (i.e., template: "I want pizza", user: "I want a large pizza").
Alternatives can go beyond simple keyword matching: When specifying an alternative, you can use any of the symbols that you could also use on the right-hand side of a grammar rule (see below). For instance, the following alternatives are all allowed:
yes
one plus two
I want a (large|small) pizza.
yes +
However, you cannot use any more complex patterns than this. In particular, you cannot use the "regular expressions" pattern /.../ = (variable)
to extract a part of your recognition result and store it in a variable. If you need this, consider writing a full grammar (see below).
If you need more flexibility in specifying what the user can say, you can alternatively specify a grammar for the recognizer. You manage the grammars your dialogue knows about under the Graph -> Grammars menu.
Grammars in DialogOS are context-free grammars that are written in a dialect of the Speech Recognition Grammar Format (SRGF). The following example grammar (taken from Uwe Debacher's DialogOS page) accepts variants of "yes" and "no":
language "English";
root $YesNo;
$YesNo = $Yes | $No;
$Yes = yes | yep | yup;
$No = no | nope;
This grammar uses the nonterminal symbols $YesNo
, $Yes
, and $No
to accept the words "yes", "yep", "yup", "no", or "nope". The symbol $YesNo
is specified as the start symbol by the root
declaration in the second line. The language is specified as English in the language
declaration in the first line. Specifying the language in the grammar is optional, and overrides the language specification in the speech recognizer node if it is present.
Symbols such as |
allow you to specify regular expressions on the right-hand side of each rule. You can also use the following symbols:
| logical disjunction (match either the expression on the left or on the right)
( ) sequencing, groups expressions together
[ ] match this optionally: expression inside the brackets may be in the input string or not
+ repeat the expression on the left as many times as you want, at least once
* repeat the expression on the left as many times as you want, at least zero times
Thus, the rule $Yes = yes +
would match yes
, yes yes
, yes yes yes
, and so on.
If the grammar recognizes the user's utterance as grammatical, it will by default return the string of words that was recognized. You can then match this string against any number of patterns, which you list under "Input patterns". The recognizer node will have an outgoing port for each pattern, and DialogOS will leave the recognizer node through the port for the pattern that matches the recognized string.
Note that your pattern may use a regular expression /.../ = (variable)
. This will allow you to extract a part of the recognized string and store it in a variable.
To use the example grammar from above in a speech recognizer node, you will probably want to specify two input patterns, corresponding to two output ports of the node: one for yes | yep | yup
and one for no | nope
. This can become complicated as your grammar grows in complexity.
If you want to simplify the pattern-matching process, you can also make the grammar return a tag instead of the literal string that it recognized. Here is an example of a grammar that uses tags:
language "English";
root $YesNo;
$YesNo = $Yes {$ = "positive"} | $No {$ = "negative"};
$Yes = yes | yep | yup;
$No = no | nope;
This grammar recognizes the same language as the one above ("yes", "nope", etc.). However, instead of returning these words directly, it returns the string "positive" if the user said "yes", "yep", or "yup", and returns the string "negative" if the user said "no" or "nope".
Observe that tags are returned by assigning values to the variable $
. In the speech recognizer node, you can then simply have one pattern that matches "positive" and one that matches "negative". You can then make further changes to your grammar without having to modify the patterns.
Sometimes it is useful for a rule to simply pass the tag of a child to the parent. This can be done by assigning to the variable $
(= value of the nonterminal on the left-hand side of a rule) the variable $child
(= value of the nonterminal $child
on the right-hand side). For example, this grammar will return the DialogOS string "1"
for the input one
:
root $input;
$input = $number { $ = $number };
$number = one {$ = "1"};
It is important that the rule $input = $number
specifies a value for $
. Otherwise the grammar will simply return the matched input for the nonterminal $input
- even if the rule for $number
returned a custom value through a tag. For instance, the following grammar will not return "1"
, but will simply return "one"
(= the input string itself) as the value of the input string one
:
root $input;
$input = $number;
$number = one {$ = "1"};
This is because the second grammar does not specify a tag for the rule $input = $number
; thus, the value of this rule will simply be the matched string.
Each tag in a DialogOS grammar has a data type. You can make DialogOS calculate the value of a tag by combining the values of other tags using arbitrary expressions. The following grammar illustrates this:
language "English";
root $calculator;
$calculator = ($numberA plus $numberB) { $ = $numberA + $numberB; };
$numberA = $number { $ = parseInt($number); };
$numberB = $number { $ = parseInt($number); };
$number = "one" { $ = "1"} | "two" { $ = "2" };
Observe the following details:
- The words "one" and "two" are assigned the tags "1" and "2" (i.e. strings).
- These are converted into int values in the rules for
$numberA
and$numberB
. - The rule for
$calculator
combines the tag values returned by$numberA
and$numberB
by addition and returns a tag value of type int.
You can then assign this int to a DialogOS variable in the input pattern of your recognizer node. Note that this variable must also have type int (not string) so it can hold the int value returned by the grammar.
It is sometimes convenient to construct a grammar when the dialog is already running. For instance, you might read a list of names from a database and then construct a grammar that understands all of these names. In DialogOS, you can use a dynamic grammar for this purpose.
To use a dynamic grammar, proceed as follows.
- Define a string variable (say,
grammar
) in Graph -> Variables. This variable will contain your grammar. - Add a script node and fill the
grammar
variable with your grammar. - In your recognizer node, choose "<Generate from expression>" as the grammar type. Click on the "Edit" button to the right and enter the name of your variable (
grammar
).
The script node may contain arbitrary code; it can calculate your grammar however you want. Here is a very simple example:
grammar = "root $Input;\n";
grammar += "$Input = hello;\n";
This script sets the variable grammar
to the following value.
root $Input;
$Input = hello;
Grammars that you construct in a script node follow the same syntax as ones that you write by hand. Thus, this grammar accepts the utterance "hello" and returns it as a string value.