public
Fork of nathansobo/treetop
Description: A Ruby-based parsing DSL based on parsing expression grammars.
Homepage: http://treetop.rubyforge.org
Clone URL: git://github.com/juretta/treetop.git
First take on the site goes public
Nathan Sobo (author)
Fri Jan 11 17:56:25 -0800 2008
commit  1285386f253c20ff92d26faf3c98aeb946ae268f
tree    98a3342955a961dfd72ac6fcb99d8c3a758e003a
parent  c294162094cdb098e889dce18c5116140c54181c
...
1
2
3
 
 
4
5
6
7
 
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
...
1
2
3
4
5
6
7
8
 
9
10
11
12
13
 
 
14
15
 
16
17
18
 
19
20
 
 
 
 
 
 
21
22
23
0
@@ -1,31 +1,23 @@
0
 #Contributing
0
 I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.
0
 
0
+The source code is currently stored in a git repository at <a href="http://repo.or.cz/w/treetop.git">http://repo.or.cz/w/treetop.git</a>
0
+
0
 ##Getting Started with the Code
0
 Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around `metagrammar.treetop`, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.
0
 
0
-Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of the metagrammar to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change `metagrammar.treetop` and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it. This became an issue fairly recently when I closed the loop on the bootstrap. Serious iteration on the metagrammar will probably necessitate a more robust testing strategy, perhaps one that relies on the Treetop gem for compiling the metagrammar under test. I haven't done this because my changes since closing the metacircular loop have been minor enough to deal with the issue, but let me know if you need help on this front.
0
+Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change `metagrammar.treetop` and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.
0
 
0
 ##Tests
0
 Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.
0
 
0
-Due to shortcomings in Ruby's semantics that scope constant definitions in a block's lexical environment rather than the environment in which it is module evaluated, I was unable to use Rspec without polluting a global namespace with const definitions. Rspec has recently improved to allow specs to reside within standard Ruby classes, but I have not yet migrated the tests back. Instead, they are built on a modified version of Test::Unit that allows tests to be defined as strings. It's not ideal but it worked at the time.
0
-
0
 #What Needs to be Done
0
 ##Small Stuff
0
-* Migrate the tests back to RSpec.
0
 * Improve the `tt` command line tool to allow `.treetop` extensions to be elided in its arguments.
0
 * Generate and load temp files with `Treetop.load` rather than evaluating strings to improve stack trace readability.
0
 * Allow `do/end` style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.
0
-* Allow the root of a grammar to be dynamically set for testing purposes.
0
 
0
 ##Big Stuff
0
-###Avoiding Excessive Object Instantiation
0
-Based on some preliminary profiling work, it is pretty apparent that a large percentage of a typical parse's time is spent instantiating objects. This needs to be avoided if parsing is to be more performant.
0
-
0
-####Avoiding Failure Result Instantiation
0
-Currently, every parse failure instantiates a failure object. Both success and failure objects propagate an array of the furthest-advanced terminal failures encountered during the parse. These are used to give feedback to the user in the event of a parse failure as to where the most likely source of the error was located. Rather than propagate them upward in the failure objects, it would be faster to just return false in the event of failure and instead write terminal failures to a mutable data structure that is global to the parse. Even this can be done only in the event that the index of the failure is greater than or equal to the current maximal failure index. In addition to minimizing failure object instantiation, this will probably reduce the time spent sorting propagated failures.
0
-
0
 ####Transient Expressions
0
 Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.
0
 
...
4
5
6
 
7
8
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
11
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
14
15
 
 
 
 
 
 
 
 
 
 
16
17
18
 
 
 
 
 
 
19
20
21
...
4
5
6
7
8
9
 
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
 
62
63
64
65
66
67
68
69
70
71
72
73
 
74
75
76
77
78
79
80
81
82
0
@@ -4,18 +4,79 @@ Treetop is a language for describing languages. Combining the elegance of Ruby w
0
 
0
 </p>
0
 
0
+ sudo gem install treetop
0
 
0
 #Intuitive Grammar Specifications
0
-Treetop's packrat parsers use _memoization_ to make the backtracking possible in linear time. This cuts the gordian knot of grammar design. There's no need to look ahead and no need to lex. Worry about the structure of the language, not the idiosyncrasies of the parser.
0
+Parsing expression grammars (PEGs) are simple to write and easy to maintain. They are a simple but powerful generalization of regular expressions that are easier to work with than the LALR or LR-1 grammars of traditional parser generators. There's no need for a tokenization phase, and _lookahead assertions_ can be used for a limited degree of context-sensitivity. Here's an extremely simple Treetop grammar that matches a subset of arithmetic, respecting operator precedence:
0
+
0
+ grammar Arithmetic
0
+ rule additive
0
+ multitive '+' additive / multitive
0
+ end
0
+
0
+ rule multitive
0
+ primary '*' multitive / primary
0
+ end
0
+
0
+ rule primary
0
+ '(' additive ')' / number
0
+ end
0
+
0
+ rule number
0
+ [1-9] [0-9]*
0
+ end
0
+ end
0
+
0
 
0
 #Syntax-Oriented Programming
0
-Rather than implementing semantic actions that construct parse trees, define methods on the trees that Treetop automatically constructs–and write this code directly inside the grammar.
0
+Rather than implementing semantic actions that construct parse trees, Treetop lets you define methods on trees that it constructs for you automatically. You can define these methods directly within the grammar...
0
+
0
+ grammar Arithmetic
0
+ rule additive
0
+ multitive '+' additive {
0
+ def value
0
+ multitive.value + additive.value
0
+ end
0
+ }
0
+ /
0
+ multitive
0
+ end
0
+
0
+ # other rules below ...
0
+ end
0
+
0
+...or associate rules with classes of nodes you wish your parsers to instantiate upon matching a rule.
0
+
0
+ grammar Arithmetic
0
+ rule additive
0
+ multitive '+' additive <AdditiveNode>
0
+ /
0
+ multitive
0
+ end
0
+
0
+ # other rules below ...
0
+ end
0
+
0
 
0
 #Reusable, Composable Language Descriptions
0
-Break grammars into modules and compose them via Ruby's mixin semantics. Or combine grammars written by others in novel ways. Or extend existing grammars with your own syntactic constructs by overriding rules with access to a `super` keyword. Compositionally means your investment of time into grammar writing is secure–you can always extend and reuse your code.
0
+Because PEGs are closed under composition, Treetop grammars can be treated like Ruby modules. You can mix them into one another and override rules with access to the `super` keyword. You can break large grammars down into coherent units or make your language's syntax modular. This is especially useful if you want other programmers to be able to reuse your work.
0
+
0
+ grammar RubyWithEmbeddedSQL
0
+ include SQL
0
+
0
+ rule string
0
+ quote sql_expression quote / super
0
+ end
0
+ end
0
+
0
 
0
 #Acknowledgements
0
-First, thank you to my employer Rob Mee of Pivotal Labs for funding a substantial portion of Treetop's development. He gets it.
0
+
0
+
0
+<a href="http://pivotallabs.com"><img id="pivotal_logo" src="./images/pivotal.gif"></a>
0
+
0
+First, thank you to my employer Rob Mee of <a href="http://pivotallabs.com"/>Pivotal Labs</a> for funding a substantial portion of Treetop's development. He gets it.
0
+
0
 
0
 I'd also like to thank:
0
 
...
128
129
130
131
 
 
 
132
133
134
...
128
129
130
 
131
132
133
134
135
136
0
@@ -128,7 +128,9 @@ Subexpressions can be given an explicit label to have an element accessor method
0
     rule labels
0
       first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
0
         def letters
0
- [first_letter] + rest_letters.map { |comma_and_letter| comma_and_letter.letter }
0
+ [first_letter] + rest_letters.map do |comma_and_letter|
0
+ comma_and_letter.letter
0
+ end
0
         end
0
       }
0
     end
...
59
60
61
62
63
 
 
 
 
64
65
66
...
90
91
92
93
 
 
 
94
95
96
...
59
60
61
 
 
62
63
64
65
66
67
68
...
92
93
94
 
95
96
97
98
99
100
0
@@ -59,8 +59,10 @@ class Documentation < Layout
0
         li { link_to 'Advanced Techniques', PitfallsAndAdvancedTechniques }
0
       end
0
     end
0
-
0
- documentation_content
0
+
0
+ div :id => 'documentation_content' do
0
+ documentation_content
0
+ end
0
   end
0
 end
0
 
0
@@ -90,7 +92,9 @@ end
0
 
0
 
0
 class Contribute < Layout
0
-
0
+ def content
0
+ bluecloth "contributing_and_planned_features.markdown"
0
+ end
0
 end
0
 
0
 
...
1
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
...
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
0
@@ -1 +1,111 @@
0
-<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li><a href="site/syntactic_recognition.html">Documentation</a></li><li>Contribute</li><li><a href="site/index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"></div></div><div id="bottom"></div></body></html>
0
\ No newline at end of file
0
+<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li>Contribute</li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><h1>Contributing</h1>
0
+
0
+<p>I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.</p>
0
+
0
+<p>The source code is currently stored in a git repository at <a href="http://repo.or.cz/w/treetop.git">http://repo.or.cz/w/treetop.git</a></p>
0
+
0
+<h2>Getting Started with the Code</h2>
0
+
0
+<p>Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around <code>metagrammar.treetop</code>, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.</p>
0
+
0
+<p>Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change <code>metagrammar.treetop</code> and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.</p>
0
+
0
+<h2>Tests</h2>
0
+
0
+<p>Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.</p>
0
+
0
+<h1>What Needs to be Done</h1>
0
+
0
+<h2>Small Stuff</h2>
0
+
0
+<ul>
0
+<li>Improve the <code>tt</code> command line tool to allow <code>.treetop</code> extensions to be elided in its arguments.</li>
0
+<li>Generate and load temp files with <code>Treetop.load</code> rather than evaluating strings to improve stack trace readability.</li>
0
+<li>Allow <code>do/end</code> style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.</li>
0
+</ul>
0
+
0
+<h2>Big Stuff</h2>
0
+
0
+<h4>Transient Expressions</h4>
0
+
0
+<p>Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.</p>
0
+
0
+<h3>Generate Rule Implementations in C</h3>
0
+
0
+<p>Parsing expressions are currently compiled into simple Ruby source code that comprises the body of parsing rules, which are translated into Ruby methods. The generator could produce C instead of Ruby in the body of these method implementations.</p>
0
+
0
+<h3>Global Parsing State and Semantic Backtrack Triggering</h3>
0
+
0
+<p>Some programming language grammars are not entirely context-free, requiring that global state dictate the behavior of the parser in certain circumstances. Treetop does not currently expose explicit parser control to the grammar writer, and instead automatically constructs the syntax tree for them. A means of semantic parser control compatible with this approach would involve callback methods defined on parsing nodes. Each time a node is successfully parsed it will be given an opportunity to set global state and optionally trigger a parse failure on <em>extrasyntactic</em> grounds. Nodes will probably need to define an additional method that undoes their changes to global state when there is a parse failure and they are backtracked.</p>
0
+
0
+<p>Here is a sketch of the potential utility of such mechanisms. Consider the structure of YAML, which uses indentation to indicate block structure.</p>
0
+
0
+<pre><code>level_1:
0
+ level_2a:
0
+ level_2b:
0
+ level_3a:
0
+ level_2c:
0
+</code></pre>
0
+
0
+<p>Imagine a grammar like the following:</p>
0
+
0
+<pre><code>rule yaml_element
0
+ name ':' block
0
+ /
0
+ name ':' value
0
+end
0
+
0
+rule block
0
+ indent yaml_elements outdent
0
+end
0
+
0
+rule yaml_elements
0
+ yaml_element (samedent yaml_element)*
0
+end
0
+
0
+rule samedent
0
+ newline spaces {
0
+ def after_success(parser_state)
0
+ spaces.length == parser_state.indent_level
0
+ end
0
+ }
0
+end
0
+
0
+rule indent
0
+ newline spaces {
0
+ def after_success(parser_state)
0
+ if spaces.length == parser_state.indent_level + 2
0
+ parser_state.indent_level += 2
0
+ true
0
+ else
0
+ false # fail the parse on extrasyntactic grounds
0
+ end
0
+ end
0
+
0
+ def undo_success(parser_state)
0
+ parser_state.indent_level -= 2
0
+ end
0
+ }
0
+end
0
+
0
+rule outdent
0
+ newline spaces {
0
+ def after_success(parser_state)
0
+ if spaces.length == parser_state.indent_level - 2
0
+ parser_state.indent_level -= 2
0
+ true
0
+ else
0
+ false # fail the parse on extrasyntactic grounds
0
+ end
0
+ end
0
+
0
+ def undo_success(parser_state)
0
+ parser_state.indent_level += 2
0
+ end
0
+ }
0
+end
0
+</code></pre>
0
+
0
+<p>In this case a block will be detected only if a change in indentation warrants it. Note that this change in the state of indentation must be undone if a subsequent failure causes this node not to ultimately be incorporated into a successful result.</p>
0
+
0
+<p>I am by no means sure that the above sketch is free of problems, or even that this overall strategy is sound, but it seems like a promising path.</p></div></div><div id="bottom"></div></body></html>
0
\ No newline at end of file
...
1
 
2
3
4
5
6
 
 
 
7
8
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
11
12
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
15
16
17
 
 
 
 
 
 
 
 
 
 
18
19
20
21
 
 
 
22
23
24
...
 
1
2
3
4
5
6
7
8
9
10
11
 
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
 
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
 
67
68
69
70
71
72
73
74
75
76
77
78
79
 
80
81
82
83
84
85
0
@@ -1,24 +1,85 @@
0
-<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li><a href="site/syntactic_recognition.html">Documentation</a></li><li><a href="site/contribute.html">Contribute</a></li><li>Home</li></ul></div></div><div id="middle"><div id="content"><p class="intro_text">
0
+<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li><a href="contribute.html">Contribute</a></li><li>Home</li></ul></div></div><div id="middle"><div id="content"><p class="intro_text">
0
 
0
 Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge <em>parsing expression grammars</em>, it helps you analyze syntax with revolutionarily ease.
0
 
0
 </p>
0
 
0
+<pre><code>sudo gem install treetop
0
+</code></pre>
0
+
0
 <h1>Intuitive Grammar Specifications</h1>
0
 
0
-<p>Treetop's packrat parsers use <em>memoization</em> to make the backtracking possible in linear time. This cuts the gordian knot of grammar design. There's no need to look ahead and no need to lex. Worry about the structure of the language, not the idiosyncrasies of the parser.</p>
0
+<p>Parsing expression grammars (PEGs) are simple to write and easy to maintain. They are a simple but powerful generalization of regular expressions that are easier to work with than the LALR or LR-1 grammars of traditional parser generators. There's no need for a tokenization phase, and <em>lookahead assertions</em> can be used for a limited degree of context-sensitivity. Here's an extremely simple Treetop grammar that matches a subset of arithmetic, respecting operator precedence:</p>
0
+
0
+<pre><code>grammar Arithmetic
0
+ rule additive
0
+ multitive '+' additive / multitive
0
+ end
0
+
0
+ rule multitive
0
+ primary '*' multitive / primary
0
+ end
0
+
0
+ rule primary
0
+ '(' additive ')' / number
0
+ end
0
+
0
+ rule number
0
+ [1-9] [0-9]*
0
+ end
0
+end
0
+</code></pre>
0
 
0
 <h1>Syntax-Oriented Programming</h1>
0
 
0
-<p>Rather than implementing semantic actions that construct parse trees, define methods on the trees that Treetop automatically constructs–and write this code directly inside the grammar.</p>
0
+<p>Rather than implementing semantic actions that construct parse trees, Treetop lets you define methods on trees that it constructs for you automatically. You can define these methods directly within the grammar...</p>
0
+
0
+<pre><code>grammar Arithmetic
0
+ rule additive
0
+ multitive '+' additive {
0
+ def value
0
+ multitive.value + additive.value
0
+ end
0
+ }
0
+ /
0
+ multitive
0
+ end
0
+
0
+ # other rules below ...
0
+end
0
+</code></pre>
0
+
0
+<p>...or associate rules with classes of nodes you wish your parsers to instantiate upon matching a rule.</p>
0
+
0
+<pre><code>grammar Arithmetic
0
+ rule additive
0
+ multitive '+' additive &lt;AdditiveNode&gt;
0
+ /
0
+ multitive
0
+ end
0
+
0
+ # other rules below ...
0
+end
0
+</code></pre>
0
 
0
 <h1>Reusable, Composable Language Descriptions</h1>
0
 
0
-<p>Break grammars into modules and compose them via Ruby's mixin semantics. Or combine grammars written by others in novel ways. Or extend existing grammars with your own syntactic constructs by overriding rules with access to a <code>super</code> keyword. Compositionally means your investment of time into grammar writing is secure–you can always extend and reuse your code.</p>
0
+<p>Because PEGs are closed under composition, Treetop grammars can be treated like Ruby modules. You can mix them into one another and override rules with access to the <code>super</code> keyword. You can break large grammars down into coherent units or make your language's syntax modular. This is especially useful if you want other programmers to be able to reuse your work.</p>
0
+
0
+<pre><code>grammar RubyWithEmbeddedSQL
0
+ include SQL
0
+
0
+ rule string
0
+ quote sql_expression quote / super
0
+ end
0
+end
0
+</code></pre>
0
 
0
 <h1>Acknowledgements</h1>
0
 
0
-<p>First, thank you to my employer Rob Mee of Pivotal Labs for funding a substantial portion of Treetop's development. He gets it.</p>
0
+<p><a href="http://pivotallabs.com"><img id="pivotal_logo" src="./images/pivotal.gif"></a></p>
0
+
0
+<p>First, thank you to my employer Rob Mee of <a href="http://pivotallabs.com"/>Pivotal Labs</a> for funding a substantial portion of Treetop's development. He gets it.</p>
0
 
0
 <p>I'd also like to thank:</p>
0
 
...
1
 
2
3
4
...
58
59
60
61
62
 
63
...
 
1
2
3
4
...
58
59
60
 
61
62
63
0
@@ -1,4 +1,4 @@
0
-<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="site/contribute.html">Contribute</a></li><li><a href="site/index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="site/syntactic_recognition.html">Syntax</a></li><li><a href="site/semantic_interpretation.html">Semantics</a></li><li><a href="site/using_in_ruby.html">Using In Ruby</a></li><li>Advanced Techniques</li></ul></div><h1>Pitfalls</h1>
0
+<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li><a href="semantic_interpretation.html">Semantics</a></li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li>Advanced Techniques</li></ul></div><div id="documentation_content"><h1>Pitfalls</h1>
0
 
0
 <h2>Left Recursion</h2>
0
 
0
@@ -58,4 +58,4 @@ end
0
 end
0
 </code></pre>
0
 
0
-<p>In general, when the syntax gets tough, it helps to focus on what you really mean. A keyword is a character not followed by another character that isn't a space.</p></div></div><div id="bottom"></div></body></html>
0
\ No newline at end of file
0
+<p>In general, when the syntax gets tough, it helps to focus on what you really mean. A keyword is a character not followed by another character that isn't a space.</p></div></div></div><div id="bottom"></div></body></html>
0
\ No newline at end of file
...
30
31
32
33
34
35
36
 
 
37
38
 
39
40
41
...
66
67
68
 
 
 
 
 
69
70
71
...
85
86
87
 
 
 
 
 
88
89
90
...
92
93
94
95
96
97
98
99
...
113
114
115
 
 
 
 
 
 
116
117
...
30
31
32
 
 
 
 
33
34
35
36
37
38
39
40
...
65
66
67
68
69
70
71
72
73
74
75
...
89
90
91
92
93
94
95
96
97
98
99
...
101
102
103
 
 
104
105
106
...
120
121
122
123
124
125
126
127
128
129
130
0
@@ -30,12 +30,11 @@ h3 {
0
   margin-bottom: .5em;
0
 }
0
 
0
-
0
-ul {
0
- list-style-type: none;
0
- padding: 0;
0
+a {
0
+ color: #ff8429;
0
 }
0
 
0
+
0
 div#top {
0
   background-image: url( "images/top_background.png" );
0
   height: 200px;
0
@@ -66,6 +65,11 @@ div#main_navigation {
0
   font-size: 90%;
0
 }
0
 
0
+div#main_navigation ul {
0
+ list-style-type: none;
0
+ padding: 0;
0
+}
0
+
0
 div#main_navigation a, div#main_navigation a:visited {
0
   color: white;
0
   text-decoration: none;
0
@@ -85,6 +89,11 @@ div#secondary_navigation {
0
   top: -10px;
0
 }
0
 
0
+div#secondary_navigation ul {
0
+ list-style-type: none;
0
+ padding: 0;
0
+}
0
+
0
 div#secondary_navigation li {
0
   display: inline;
0
   margin-left: 10px;
0
@@ -92,8 +101,6 @@ div#secondary_navigation li {
0
 }
0
 
0
 div#content {
0
- /*position: relative;
0
- left: -10px;*/
0
   width: 545px;
0
   margin: 0 auto 0 auto;
0
   padding: 0 60px 25px 60px;
0
@@ -113,4 +120,10 @@ p {
0
 p.intro_text {
0
   color: #C45900;
0
   font-size: 115%;
0
+}
0
+
0
+img#pivotal_logo {
0
+ border: none;
0
+ margin-left: auto;
0
+ margin-right: auto;
0
 }
0
\ No newline at end of file
...
1
 
2
3
4
...
144
145
146
147
 
 
 
148
149
150
...
202
203
204
205
206
 
207
...
 
1
2
3
4
...
144
145
146
 
147
148
149
150
151
152
...
204
205
206
 
207
208
209
0
@@ -1,4 +1,4 @@
0
-<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="site/contribute.html">Contribute</a></li><li><a href="site/index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="site/syntactic_recognition.html">Syntax</a></li><li>Semantics</li><li><a href="site/using_in_ruby.html">Using In Ruby</a></li><li><a href="site/pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><h1>Semantic Interpretation</h1>
0
+<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li>Semantics</li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Semantic Interpretation</h1>
0
 
0
 <p>Lets use the below grammar as an example. It describes parentheses wrapping a single character to an arbitrary depth.</p>
0
 
0
@@ -144,7 +144,9 @@ end
0
 <pre><code>rule labels
0
   first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
0
     def letters
0
- [first_letter] + rest_letters.map { |comma_and_letter| comma_and_letter.letter }
0
+ [first_letter] + rest_letters.map do |comma_and_letter|
0
+ comma_and_letter.letter
0
+ end
0
     end
0
   }
0
 end
0
@@ -202,4 +204,4 @@ end
0
       Available only on nonterminal nodes, returns the nodes parsed by the elements of the matched sequence.
0
     </td>
0
   </tr>
0
-</table></div></div><div id="bottom"></div></body></html>
0
\ No newline at end of file
0
+</table></div></div></div><div id="bottom"></div></body></html>
0
\ No newline at end of file
...
1
 
2
3
 
4
5
6
...
30
31
32
33
 
34
35
36
37
 
38
39
40
...
43
44
45
46
 
47
48
49
...
52
53
54
55
 
56
57
58
59
 
60
61
62
...
76
77
78
79
80
81
 
82
83
84
...
87
88
89
90
 
91
92
93
94
95
96
97
98
99
 
100
101
102
...
104
105
106
107
 
108
109
110
...
112
113
114
115
 
116
117
118
...
138
139
140
141
142
 
143
...
 
1
2
 
3
4
5
6
...
30
31
32
 
33
34
 
 
 
35
36
37
38
...
41
42
43
 
44
45
46
47
...
50
51
52
 
53
54
55
56
 
57
58
59
60
...
74
75
76
 
 
 
77
78
79
80
...
83
84
85
 
86
87
88
89
90
91
92
 
 
 
93
94
95
96
...
98
99
100
 
101
102
103
104
...
106
107
108
 
109
110
111
112
...
132
133
134
 
135
136
137
0
@@ -1,6 +1,6 @@
0
-<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="site/contribute.html">Contribute</a></li><li><a href="site/index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li>Syntax</li><li><a href="site/semantic_interpretation.html">Semantics</a></li><li><a href="site/using_in_ruby.html">Using In Ruby</a></li><li><a href="site/pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><h1>Syntactic Recognition</h1>
0
+<html><head><link rel="stylesheet" href="./screen.css" type="text/css"></link></head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li>Syntax</li><li><a href="semantic_interpretation.html">Semantics</a></li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Syntactic Recognition</h1>
0
 
0
-<p>Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of parsing expression grammars is useful in writing Treetop grammars.</p>
0
+<p>Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammars</a> is useful in writing Treetop grammars.</p>
0
 
0
 <h1>Grammar Structure</h1>
0
 
0
@@ -30,11 +30,9 @@ end
0
 
0
 <p>Each rule associates a name with a <em>parsing expression</em>. Parsing expressions are a generalization of vanilla regular expressions. Their key feature is the ability to reference other expressions in the grammar by name.</p>
0
 
0
-<h2>Atomic Expressions</h2>
0
+<h2>Terminal Symbols</h2>
0
 
0
-<h3>Terminal Symbols</h3>
0
-
0
-<h4>Strings</h4>
0
+<h3>Strings</h3>
0
 
0
 <p>Strings are surrounded in double or single quotes and must be matched exactly.</p>
0
 
0
@@ -43,7 +41,7 @@ end
0
 <li><code>'foo'</code></li>
0
 </ul>
0
 
0
-<h4>Character Classes</h4>
0
+<h3>Character Classes</h3>
0
 
0
 <p>Character classes are surrounded by brackets. Their semantics are identical to those used in Ruby's regular expressions.</p>
0
 
0
@@ -52,11 +50,11 @@ end
0
 <li><code>[0-9]</code></li>
0
 </ul>
0
 
0
-<h4>The Anything Symbol</h4>
0
+<h3>The Anything Symbol</h3>
0
 
0
 <p>The anything symbol is represented by a dot (<code>.</code>) and matches any single character.</p>
0
 
0
-<h3>Nonterminal Symbols</h3>
0
+<h2>Nonterminal Symbols</h2>
0
 
0
 <p>Nonterminal symbols are unquoted references to other named rules. They are equivalent to an inline substitution of the named expression.</p>
0
 
0
@@ -76,9 +74,7 @@ end
0
 end
0
 </code></pre>
0
 
0
-<h2>Composite Expressions</h2>
0
-
0
-<h3>Ordered Choice</h3>
0
+<h2>Ordered Choice</h2>
0
 
0
 <p>Parsers attempt to match ordered choices in left-to-right order, and stop after the first successful match.</p>
0
 
0
@@ -87,16 +83,14 @@ end
0
 
0
 <p>Note that if <code>"foo"</code> in the above expression came first, <code>"foobar"</code> would never be matched.</p>
0
 
0
-<h3>Sequences</h3>
0
+<h2>Sequences</h2>
0
 
0
 <p>Sequences are a space-separated list of parsing expressions. They have higher precedence than choices, so choices must be parenthesized to be used as the elements of a sequence. </p>
0
 
0
 <pre><code>"foo" "bar" ("baz" / "bop")
0
 </code></pre>
0
 
0
-<h3>Repetitions</h3>
0
-
0
-<h4>Zero or More</h4>
0
+<h2>Zero or More</h2>
0
 
0
 <p>Parsers will greedily match an expression zero or more times if it is followed by the star (<code>*</code>) symbol.</p>
0
 
0
@@ -104,7 +98,7 @@ end
0
 <li><code>'foo'*</code> matches the empty string, <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
0
 </ul>
0
 
0
-<h4>One or More</h4>
0
+<h2>One or More</h2>
0
 
0
 <p>Parsers will greedily match an expression one or more times if it is followed by the star (<code>+</code>) symbol.</p>
0
 
0
@@ -112,7 +106,7 @@ end
0
 <li><code>'foo'+</code> does not match the empty string, but matches <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
0
 </ul>
0
 
0
-<h3>Optional Expressions</h3>
0
+<h2>Optional Expressions</h2>