HTTPS clone URL
Subversion checkout URL
- Behind the Curtain: How Pegged Works
- Declaring a Grammar
- Extended PEG Syntax
- Four Levels of Encapsulation
- Generating Code
- Grammar Composition
- Grammar Examples
- Grammar Testing
- Grammars as D Modules
- Named Captures
- Parameterized Rules
- Parse Result
- Parse Trees
- Parsing Levels
- PEG Basics
- Pegged Tutorial
- Predefined Parsers
- Semantic Actions
- Tree Decimation
- User Defined Parsers
- Using a Grammar
- Using the Parse Tree
- Writing Your Own Grammar
Clone this wiki locally
Pegged is a parsing expression grammar (PEG) generator implemented in the D programming language.
The idea is to give the generator a PEG, with the syntax presented in the reference article . From this grammar definition a set of related parsers will be created, to be used at runtime or compile time.
To use Pegged, just call the
grammar function with a PEG and mix it in. For example:
import pegged.grammar; mixin(grammar( "Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier"));
This creates the
Factor (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-').
Identifier is a pre-defined parser recognizing your basic C-style identifier (first a letter or underscore, then digits, letters or underscores). In the rest of this document, I'll call 'rule' a
Parser <- Parsing Expression expression and I'll use 'grammar' to designate the entire group of rules given to
To use a parser, use the
.parse method. It will return a parse tree containing the calls to the different rules:
// Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);
By default, the grammars are not space-sensitive, because I found it to be what I want most of the time. There is an opt-out, though. This may change in the future.
Here is a little Pegged Tutorial
- The complete set of operators described here are implemented, with the 'traditional' PEG syntax. See Peg Operations.
- Pegged can parse its input at compile time and generate a complete parse tree at compile time. See Advantages of Compile-Time Parsing for the power that gives you. In a word: compile-time string (read: D code) transformation and generation.
- You can parse at runtime also, you lucky you.
- Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise.
- But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too.
- Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match: see PEG Additions.
- Adding new parsers is easy. See User-Defined Parsers to see how to do that.
- Grammars are composable: you can put different
mixin(grammar(rules));in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. Grammar Composition
- That's why Pegged comes with some pre-defined grammars (JSON, etc). See Grammar Examples.
More advanced features, outside the standard PEG perimeter are there to bring more power in the mix:
"List(E, Sep) <- E (Sep E)*"is possible. The previous rule defines a parameterized parser taking two other parsers (namely,
Sep) to match a
Sep-separated list of
E's. See Parameterized Rules to see what's possible.
Named captures: any parser can be named with the
=operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. See Named Captures for more explanations on this.
- Semantic actions: can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. See Semantic Actions.
- No effort was done to accelerate the parsing or reduce the memory consumption. The main goal was to 1) use the PEG syntax 2) parse at compile-time. Packrat parsing could be done, maybe. Pegged does some basic optimization, though: one-element sequences and choices are replaced by their only element and sequences of sequences are flattened (the same is done for ordered choices).
- No left-factorization for now. That's on my todo list, though. But you can do recursive grammars, of course.
- Only works on strings (no wstrings, dstrings or ranges). That's also on my todo list. I'm not sure ranges will work OK at compile-time and I think I'll need forward ranges at a minimum (or else store the produced elements anyway).
- Error reporting is the same as for any in-my-own-free-time / just-one-guy parsing project (read: awful). You'll know when it crashes. I ave plans to put an error stack in the mix, to unwind when a grammar cannot parse the provided input.
- I couldn't find a way to use anonymous function templates as actions (
o => o). Too bad.
- The entire grammar is defined at compile-time: no user input for actions and such. That was a design decision from the very beginning.
- Add unit-tests. I tend to write docs and unittests as I go, D making that so easy. Pegged is an exception, which I'll kill with extreme prejudice.
- Directly reading from a file.
- And writing a grammar's code in a file, to create a stub module.
- A better error reporting would be good. Maybe with threading some error stack.
- Better rules to control the parse tree: right now, the 'fuse' (~) and 'discard' (:) rules are there. Maybe something to tell the grammar to simplify a branch or using a policy for the parse tree construction and the level of simplification before presenting it to the user.
- A way to indicate rule-level space sensitivity.
- Right now, grammars are 'open' in that you cannot defined multiple rules with the same name in the same module. Use D modules to, well, do modularization of your code. But I intend to put different levels of 'openness'. See The Four Levels. You can use qualified identifiers for the rules names, though.
- Parsing wstrings, dstrings and maybe also ranges.
- Packrat parsing?
- As a long-term goal, parsing structures, as presented in OMeta. Yeah, I know, but I find that wonderfully interesting. Rules could match not only strings by any D type inner structure (matching a struct if it contains an
- Hence, a pattern-matcher. if you used Haskell or ML, you know what I'm talking about.
- As a longer-term goal: implementing the complete D grammar and see if that flies.
- As an even longer-term goal: macros in D. Think Lisp and talking to God.
- parsing writefln format string ("%s, %.8f, %d")
- parsing csv ?
- parsing command-line options ("-d --debug -ofMyFile")
- D code, to get documentation comments
- A literal programming example?
- Hisayuki Mima's CTPG, very similar, also done in D. Have a look!
- Nick Sabalausky's Goldie.
- Benjamin Shropshire's dparser.
- Martin Nowak put these gists on the D newgroup: https://gist.github.com/1255439 - lexer generator https://gist.github.com/1262321 - complete and fast D lexer
- pegtl, the PEG Template Library, in C++.
- chilon::parser in C++ also.
- Parslet in Ruby and Treetop, in Ruby also.
Pegged is herein released with the Boost licence (like most D projects). See here for more details.
Author: Philippe Sigaud.