public
Description: A Ruby-based parsing DSL based on parsing expression grammars.
Homepage: http://treetop.rubyforge.org
Clone URL: git://github.com/nathansobo/treetop.git
Search Repo:
Added some more details to the README, including advice about greedy 
matching and the effects of memoization.
cjheath (author)
Tue May 06 07:14:37 -0700 2008
commit  436225cce095d47e8c1bcffce59a6b409e6493bf
tree    d50ac9129d8bdfefb4f99ba9182ba0682180114b
parent  28f98a7c3c4bccfc3f0de0db7930240698aec8ba
0
...
15
16
17
 
18
19
20
...
27
28
29
 
 
30
31
32
...
53
54
55
56
 
 
 
57
58
59
60
...
65
66
67
68
 
69
 
 
70
71
72
...
93
94
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
97
98
...
15
16
17
18
19
20
21
...
28
29
30
31
32
33
34
35
...
56
57
58
 
59
60
61
62
63
64
65
...
70
71
72
 
73
74
75
76
77
78
79
...
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
0
@@ -15,6 +15,7 @@
0
 Next, you start filling your grammar with rules. Each rule associates a name with a parsing expression, like the following:
0
 
0
     # my_grammar.treetop
0
+ # You can use a .tt extension instead if you wish
0
     grammar MyGrammar
0
       rule hello
0
         'hello chomsky'
0
@@ -27,6 +28,8 @@
0
     require 'rubygems'
0
     require 'treetop'
0
     Treetop.load 'my_grammar'
0
+ # or just:
0
+ # require 'my_grammar' # This works because Polyglot hooks "require" to find and load Treetop files
0
     
0
     parser = MyGrammarParser.new
0
     puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
0
@@ -53,7 +56,9 @@
0
     puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
0
     puts parser.parse('hello lambek') # => Treetop::Runtime::SyntaxNode
0
     puts parser.parse('silly generativists!') # => nil
0
-
0
+
0
+Note that once a choice rule has matched the text using a particular alternative at a particular location in the input and hence has succeeded, that choice will never be reconsidered, even if the chosen alternative causes another rule to fail where a later alternative wouldn't have. It's always a later alternative, since the first to succeed is final - why keep looking when you've found what you wanted? This is a feature of PEG parsers that you need to understand if you're going to succeed in using Treetop. In order to memoize success and failures, such decisions cannot be reversed. Luckily Treetop provides a variety of clever ways you can tell it to avoid making the wrong decisions. But more on that later.
0
+
0
 Sequences
0
 ---------
0
 Sequences are composed of other parsing expressions separated by spaces. Using sequences, we can tighten up the above grammar.
0
0
@@ -65,8 +70,10 @@
0
       end
0
     end
0
 
0
-Node the use of parentheses to override the default precedence rules, which bind sequences more tightly than choices.
0
+Note the use of parentheses to override the default precedence rules, which bind sequences more tightly than choices.
0
 
0
+Once the whole sequence has been matched, the result is memoized and the details of the match will not be reconsidered for that location in the input.
0
+
0
 Nonterminal Symbols
0
 -------------------
0
 Here we leave regular expressions behind. Nonterminals allow expressions to refer to other expressions by name. A trivial use of this facility would allow us to make the above grammar more readable should the list of names grow longer.
0
@@ -93,6 +100,47 @@
0
 
0
 
0
 The `parens` expression simply states that a `parens` is a set of parentheses surrounding another `parens` expression or, if that doesn't match, the empty string. If you are uncomfortable with recursion, its time to get comfortable, because it is the basis of language. Here's a tip: Don't try and imagine the parser circling round and round through the same rule. Instead, imagine the rule is *already* defined while you are defining it. If you imagine that `parens` already matches a string of matching parentheses, then its easy to think of `parens` as an open and closing parentheses around another set of matching parentheses, which conveniently, you happen to be defining. You know that `parens` is supposed to represent a string of matched parentheses, so trust in that meaning, even if you haven't fully implemented it yet.
0
+
0
+Repetition
0
+----------
0
+Any item in a rule may be followed by a '+' or a '*' character, signifying one-or-more and zero-or-more occurrences of that item. Beware though; the match is greedy, and if it matches too many items and causes subsequent items in the sequence to fail, the number matched will never be reconsidered. Here's a simple example of a rule that will never succeed:
0
+
0
+ # toogreedy.treetop
0
+ grammar TooGreedy
0
+ rule a_s
0
+ 'a'* 'a'
0
+ end
0
+ end
0
+
0
+The 'a'* will always eat up any 'a's that follow, and the subsequent 'a' will find none there, so the whole rule will fail. You might need to use lookahead to avoid matching too much.
0
+
0
+Negative Lookahead
0
+------------------
0
+
0
+When you need to ensure that the following item *doesn't* match in some case where it might otherwise, you can use negat!ve lookahead, which is an item preceeded by a ! - here's an example:
0
+
0
+ # postcondition.treetop
0
+ grammar PostCondition
0
+ rule conditional_sentence
0
+ ( !conditional_keyword word )+ conditional_keyword [ \t]+ word*
0
+ end
0
+
0
+ rule word
0
+ ([a-zA-Z]+ [ \t]+)
0
+ end
0
+
0
+ rule conditional_keyword
0
+ 'if' / 'while' / 'until'
0
+ end
0
+ end
0
+
0
+Even though the rule `word` would match any of the conditional keywords, the first words of a conditional_sentence must not be conditional_keywords. The negative lookahead prevents that matching, and prevents the repetition from matching too much input. Note that the lookahead may be a grammar rule of any complexity, including one that isn't used elsewhere in your grammar.
0
+
0
+Positive lookahead
0
+------------------
0
+
0
+Sometimes you want an item to match, but only if the *following* text would match some pattern. You don't want to consume that following text, but if it's not there, you want this rule to fail. You can append a positive lookahead like this to a rule by appending the lookahead rule preceeded by an & character.
0
+
0
 
0
 
0
 Features to cover in the talk

Comments

    No one has commented yet.