Every repository with this icon (
Every repository with this icon (
tree fac482c5693e82454788b35c3da83c2864527944
parent be4848611026d224ce106ff3249214f957dcb63c
Syntactic Recognition
Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of parsing expression grammars is useful in writing Treetop grammars.
Grammar Structure
Treetop grammars look like this:
grammar GrammarName
rule rule_name
...
end
rule rule_name
...
end
...
end
The main keywords are:
grammar: This introduces a new grammar. It is followed by a constant name to which the grammar will be bound when it is loaded.rule: This defines a parsing rule within the grammar. It is followed by a name by which this rule can be referenced within other rules. It is then followed by a parsing expression defining the rule.
Parsing Expressions
Each rule associates a name with a parsing expression. Parsing expressions are a generalization of vanilla regular expressions. Their key feature is the ability to reference other expressions in the grammar by name.
Terminal Symbols
Strings
Strings are surrounded in double or single quotes and must be matched exactly.
"foo"'foo'
Character Classes
Character classes are surrounded by brackets. Their semantics are identical to those used in Ruby's regular expressions.
[a-zA-Z][0-9]
The Anything Symbol
The anything symbol is represented by a dot (.) and matches any single character.
Nonterminal Symbols
Nonterminal symbols are unquoted references to other named rules. They are equivalent to an inline substitution of the named expression.
rule foo
"the dog " bar
end
rule bar
"jumped"
end
The above grammar is equivalent to:
rule foo
"the dog jumped"
end
Ordered Choice
Parsers attempt to match ordered choices in left-to-right order, and stop after the first successful match.
"foobar" / "foo" / "bar"
Note that if "foo" in the above expression came first, "foobar" would never be matched.
Sequences
Sequences are a space-separated list of parsing expressions. They have higher precedence than choices, so choices must be parenthesized to be used as the elements of a sequence.
"foo" "bar" ("baz" / "bop")
Zero or More
Parsers will greedily match an expression zero or more times if it is followed by the star (*) symbol.
'foo'*matches the empty string,"foo","foofoo", etc.
One or More
Parsers will greedily match an expression one or more times if it is followed by the star (+) symbol.
'foo'+does not match the empty string, but matches"foo","foofoo", etc.
Optional Expressions
An expression can be declared optional by following it with a question mark (?).
'foo'?matches"foo"or the empty string.
Lookahead Assertions
Lookahead assertions can be used to give parsing expressions a limited degree of context-sensitivity. The parser will look ahead into the buffer and attempt to match an expression without consuming input.
Positive Lookahead Assertion
Preceding an expression with an ampersand (&) indicates that it must match, but no input will be consumed in the process of determining whether this is true.
"foo" &"bar"matches"foobar"but only consumes up to the end"foo". It will not match"foobaz".
Negative Lookahead Assertion
Preceding an expression with a bang (!) indicates that the expression must not match, but no input will be consumed in the process of determining whether this is true.
"foo" !"bar"matches"foobaz"but only consumes up to the end"foo". It will not match"foobar".




