Permalink
Browse files

[CS] Move matching from the grammar model into a Method Object

Separates responsibility for evaluating a match to a grammar from the grammar itself
  • Loading branch information...
1 parent 73ae1f1 commit d3bf708a9059aee7681eb1359a82229aa364f1d4 @benlangfeld benlangfeld committed Mar 3, 2013
View
@@ -108,96 +108,64 @@ which becomes
#### Grammar matching
-It is possible to match some arbitrary input against a GRXML grammar. In order to do so, certain normalization routines should first be run on the grammar in order to prepare it for matching. These are reference inlining, tokenization and whitespace normalization, and are described [in the SRGS spec](http://www.w3.org/TR/speech-grammar/#S2.1). This process will transform the above grammar like so:
+It is possible to match some arbitrary input against a GRXML grammar, like so:
```ruby
-grammy.inline!
-grammy.tokenize!
-grammy.normalize_whitespace
-```
+require 'ruby_speech'
-```xml
-<grammar xmlns="http://www.w3.org/2001/06/grammar" version="1.0" xml:lang="en-US" mode="dtmf" root="pin">
- <rule id="pin" scope="public">
- <one-of>
- <item>
- <item repeat="4">
- <one-of>
- <item>
- <token>0</token>
- </item>
- <item>
- <token>1</token>
- </item>
- <item>
- <token>2</token>
- </item>
- <item>
- <token>3</token>
- </item>
- <item>
- <token>4</token>
- </item>
- <item>
- <token>5</token>
- </item>
- <item>
- <token>6</token>
- </item>
- <item>
- <token>7</token>
- </item>
- <item>
- <token>8</token>
- </item>
- <item>
- <token>9</token>
- </item>
- </one-of>
- </item>
- <token>#</token>
- </item>
- <item>
- <token>*</token>
- <token>9</token>
- </item>
- </one-of>
- </rule>
-</grammar>
-```
+>> grammar = RubySpeech::GRXML.draw mode: :dtmf, root: 'pin' do
+ rule id: 'digit' do
+ one_of do
+ ('0'..'9').map { |d| item { d } }
+ end
+ end
-Matching against some sample input strings then returns the following results:
+ rule id: 'pin', scope: 'public' do
+ one_of do
+ item do
+ item repeat: '4' do
+ ruleref uri: '#digit'
+ end
+ "#"
+ end
+ item do
+ "* 9"
+ end
+ end
+ end
+end
-```ruby
->> subject.match '*9'
+matcher = RubySpeech::GRXML::Matcher.new grammar
+
+>> matcher.match '*9'
=> #<RubySpeech::GRXML::Match:0x00000100ae5d98
@mode = :dtmf,
@confidence = 1,
@utterance = "*9",
@interpretation = "*9"
>
->> subject.match '1234#'
+>> matcher.match '1234#'
=> #<RubySpeech::GRXML::Match:0x00000100b7e020
@mode = :dtmf,
@confidence = 1,
@utterance = "1234#",
@interpretation = "1234#"
>
->> subject.match '5678#'
+>> matcher.match '5678#'
=> #<RubySpeech::GRXML::Match:0x00000101218688
@mode = :dtmf,
@confidence = 1,
@utterance = "5678#",
@interpretation = "5678#"
>
->> subject.match '1111#'
+>> matcher.match '1111#'
=> #<RubySpeech::GRXML::Match:0x000001012f69d8
@mode = :dtmf,
@confidence = 1,
@utterance = "1111#",
@interpretation = "1111#"
>
->> subject.match '111'
+>> matcher.match '111'
=> #<RubySpeech::GRXML::NoMatch:0x00000101371660>
```
View
@@ -14,6 +14,7 @@ module GRXML
end
autoload :Match
+ autoload :Matcher
autoload :NoMatch
autoload :PotentialMatch
@@ -149,99 +149,6 @@ def normalize_whitespace
end
end
- ##
- # Checks the grammar for a match against an input string
- #
- # @param [String] other the input string to check for a match with the grammar
- #
- # @return [NoMatch, Match] depending on the result of a match attempt. If a match can be found, it will be returned with appropriate mode/confidence/utterance and interpretation attributes
- #
- # @example A grammar that takes a 4 digit pin terminated by hash, or the *9 escape sequence
- # ```ruby
- # grammar = RubySpeech::GRXML.draw :mode => :dtmf, :root => 'pin' do
- # rule :id => 'digit' do
- # one_of do
- # ('0'..'9').map { |d| item { d } }
- # end
- # end
- #
- # rule :id => 'pin', :scope => 'public' do
- # one_of do
- # item do
- # item :repeat => '4' do
- # ruleref :uri => '#digit'
- # end
- # "#"
- # end
- # item do
- # "\* 9"
- # end
- # end
- # end
- # end
- #
- # >> subject.match '*9'
- # => #<RubySpeech::GRXML::Match:0x00000100ae5d98
- # @mode = :dtmf,
- # @confidence = 1,
- # @utterance = "*9",
- # @interpretation = "*9"
- # >
- # >> subject.match '1234#'
- # => #<RubySpeech::GRXML::Match:0x00000100b7e020
- # @mode = :dtmf,
- # @confidence = 1,
- # @utterance = "1234#",
- # @interpretation = "1234#"
- # >
- # >> subject.match '111'
- # => #<RubySpeech::GRXML::PotentialMatch:0x00000101371660>
- #
- # >> subject.match '11111'
- # => #<RubySpeech::GRXML::NoMatch:0x00000101371936>
- #
- # ```
- #
- def match(other)
- other = other.dup
- regex = to_regexp
- return check_for_potential_match(other) if regex == //
- match = regex.match other
- return check_for_potential_match(other) unless match
-
- Match.new :mode => mode,
- :confidence => dtmf? ? 1 : 0,
- :utterance => other,
- :interpretation => interpret_utterance(other)
- end
-
- def check_for_potential_match(other)
- potential_match?(other) ? PotentialMatch.new : NoMatch.new
- end
-
- def potential_match?(other)
- root_rule.children.each do |token|
- return true if other.length.zero?
- longest_potential_match = token.longest_potential_match other
- return false if longest_potential_match.length.zero?
- other.gsub! /^#{Regexp.escape longest_potential_match}/, ''
- end
- other.length.zero?
- end
-
- ##
- # Converts the grammar into a regular expression for matching
- #
- # @return [Regexp] a regular expression which is equivalent to the grammar
- #
- def to_regexp
- /^#{regexp_content.join}$/
- end
-
- def regexp_content
- root_rule.children.map &:regexp_content
- end
-
def dtmf?
mode == :dtmf
end
@@ -270,16 +177,6 @@ def has_matching_root_rule?
!root || root_rule
end
- def interpret_utterance(utterance)
- conversion = Hash.new { |hash, key| hash[key] = key }
- conversion['*'] = 'star'
- conversion['#'] = 'pound'
-
- utterance.chars.inject [] do |array, digit|
- array << "dtmf-#{conversion[digit]}"
- end.join ' '
- end
-
def split_tokens(element)
element.to_s.split(/(\".*\")/).reject(&:empty?).map do |string|
match = string.match /^\"(.*)\"$/
@@ -0,0 +1,133 @@
+module RubySpeech
+ module GRXML
+ class Matcher
+
+ BLANK_REGEX = //.freeze
+
+ attr_reader :grammar, :regex
+
+ def initialize(grammar)
+ @grammar = grammar
+ prepare_grammar
+ @regex = /^#{regexp_content.join}$/
+ end
+
+ ##
+ # Checks the grammar for a match against an input string
+ #
+ # @param [String] other the input string to check for a match with the grammar
+ #
+ # @return [NoMatch, Match] depending on the result of a match attempt. If a match can be found, it will be returned with appropriate mode/confidence/utterance and interpretation attributes
+ #
+ # @example A grammar that takes a 4 digit pin terminated by hash, or the *9 escape sequence
+ # ```ruby
+ # grammar = RubySpeech::GRXML.draw :mode => :dtmf, :root => 'pin' do
+ # rule :id => 'digit' do
+ # one_of do
+ # ('0'..'9').map { |d| item { d } }
+ # end
+ # end
+ #
+ # rule :id => 'pin', :scope => 'public' do
+ # one_of do
+ # item do
+ # item :repeat => '4' do
+ # ruleref :uri => '#digit'
+ # end
+ # "#"
+ # end
+ # item do
+ # "\* 9"
+ # end
+ # end
+ # end
+ # end
+ #
+ # matcher = RubySpeech::GRXML::Matcher.new grammar
+ #
+ # >> matcher.match '*9'
+ # => #<RubySpeech::GRXML::Match:0x00000100ae5d98
+ # @mode = :dtmf,
+ # @confidence = 1,
+ # @utterance = "*9",
+ # @interpretation = "*9"
+ # >
+ # >> matcher.match '1234#'
+ # => #<RubySpeech::GRXML::Match:0x00000100b7e020
+ # @mode = :dtmf,
+ # @confidence = 1,
+ # @utterance = "1234#",
+ # @interpretation = "1234#"
+ # >
+ # >> matcher.match '5678#'
+ # => #<RubySpeech::GRXML::Match:0x00000101218688
+ # @mode = :dtmf,
+ # @confidence = 1,
+ # @utterance = "5678#",
+ # @interpretation = "5678#"
+ # >
+ # >> matcher.match '1111#'
+ # => #<RubySpeech::GRXML::Match:0x000001012f69d8
+ # @mode = :dtmf,
+ # @confidence = 1,
+ # @utterance = "1111#",
+ # @interpretation = "1111#"
+ # >
+ # >> matcher.match '111'
+ # => #<RubySpeech::GRXML::NoMatch:0x00000101371660>
+ # ```
+ #
+ def match(buffer)
+ buffer = buffer.dup
+
+ return check_potential_match(buffer) if regex == BLANK_REGEX
+
+ check_full_match(buffer) || check_potential_match(buffer) || NoMatch.new
+ end
+
+ private
+
+ def prepare_grammar
+ grammar.inline!
+ grammar.tokenize!
+ grammar.normalize_whitespace
+ end
+
+ def check_full_match(buffer)
+ match = regex.match buffer
+
+ return unless match
+
+ Match.new :mode => grammar.mode,
+ :confidence => grammar.dtmf? ? 1 : 0,
+ :utterance => buffer,
+ :interpretation => interpret_utterance(buffer)
+ end
+
+ def check_potential_match(buffer)
+ grammar.root_rule.children.each do |token|
+ p "Checking buffer #{buffer} against token #{token} which has a longest potential match #{token.longest_potential_match(buffer)}"
+ break if buffer.length.zero?
+ longest_potential_match = token.longest_potential_match buffer
+ return if longest_potential_match.length.zero?
+ buffer.gsub! /^#{Regexp.escape longest_potential_match}/, ''
+ end
+ buffer.length.zero? ? PotentialMatch.new : nil
+ end
+
+ def regexp_content
+ grammar.root_rule.children.map &:regexp_content
+ end
+
+ def interpret_utterance(utterance)
+ conversion = Hash.new { |hash, key| hash[key] = key }
+ conversion['*'] = 'star'
+ conversion['#'] = 'pound'
+
+ utterance.chars.inject [] do |array, digit|
+ array << "dtmf-#{conversion[digit]}"
+ end.join ' '
+ end
+ end
+ end
+end
Oops, something went wrong.

0 comments on commit d3bf708

Please sign in to comment.