Skip to content

Commit

Permalink
ENHANCEMENT: Update support files (documentation, examples) to reflec…
Browse files Browse the repository at this point in the history
…t recent changes
  • Loading branch information
Hamish Friedlander committed Feb 14, 2011
1 parent 534d799 commit afb28ca
Show file tree
Hide file tree
Showing 6 changed files with 47 additions and 11 deletions.
48 changes: 42 additions & 6 deletions README.md
Expand Up @@ -16,11 +16,31 @@ and lexing in a single top down grammar. For a basic overview of the subject, se

### Parser Format

Parsers are contained within a PHP file, in a special comment block that starts with `/*Parser:NameOfParser` and continues until the
comment is closed. During compilation this block will be replaced with a set of matching functions.
Parsers are contained within a PHP file, in one or more special comment blocks that start with `/*!* [name | !pragma]` (like a docblock, but with an
exclamation mark in the middle of the stars)

Lexically, the parser is a name token, a matching rule and a set of functions. The name token must not start with whitespace, contain no whitespace
and end with a `:` character. The rule and function set are on the same line or on the indented lines below.
Lexically, these blocks are a set of rules, each consisting of a name token, a matching rule and a set of attached functions.
The name token must contain no whitespace and end with a `:` character. The matching rule and functions are on the same line or on the indented lines below.

You can have multiple comment blocks, all of which are treated as contiguous for the purpose of compiling. During compilation these blocks will be replaced
with a set of "matching" functions (functions which match a string against their rules) for each rule in the block.

The optional name marks the start of a new set of parser rules. This is currently unused, but might be used in future for opimization & debugging purposes.
If unspecified, it defaults to the same name as the previous parser comment block, or 'Anonymous Parser' if no name has ever been set.

If the name starts with an '!' symbol, that comment block is a pragma, and is treated not as some part of the parser, but as a special block of meta-data

##### Tricks and traps

We allow indenting a parser block, but only in a consistant manner - whatever the indent of the /*** marker becomes the "base" indent, and needs to be used
for all lines. You can mix tabs and spaces, but the indent must always be an exact match - if the "base" indent is a tab then two spaces, every line within the
block also needs indenting with a tab then two spaces, not two tabs (even if in your editor, that gives the same indent).

Any line with more than the "base" indent is considered a continuation of the previous rule

Any line with less than the "base" indent is an error

This might get looser if I get around to re-writing the internal "parser parser" in php-peg, bootstrapping the whole thing

### Rules

Expand Down Expand Up @@ -58,8 +78,11 @@ Tokens may be

##### Regular expression tokens

Automatically anchored to the current string start - do not include a string start anchor (`^`) anywhere.
Can specify flags on stand-alone regexs. Currently doesn't handle flags on regexs with rules.
Automatically anchored to the current string start - do not include a string start anchor (`^`) anywhere. Always acts as when the 'x' flag is enabled in PHP -
whitespace is ignored unless escaped, and '#' stats a comment.

Be careful when ending a regular expression token - the '*/' pattern (as in /foo\s*/) will end a PHP comment. Since the 'x' flag is always active,
just split with a space (as in / foo \s* /)

### Expressions

Expand Down Expand Up @@ -183,6 +206,19 @@ You can also specify a rule-attached function called `*`, which will be called w
By default all matches are added to the 'text' property of a result. By prepending a member with `.` that match will not be added to the ['text'] member. This
doesn't affect the other result properties that named rules' add.

### Pragmas

When opening a parser comment block, if instead of a name (or no name) you put a word starting with '!', that comment block is treated as a pragma - not
part of the parser language itself, but some other instruction to the compiler. These pragmas are currently understood:

!silent

This is a comment that should only appear in the source code. Don't output it in the generated code

!insert_autogen_warning

Insert a warning comment into the generated code at this point, warning that the file is autogenerated and not to edit it

## TODO

- Allow configuration of whitespace - specify what matches, and wether it should be injected into results as-is, collapsed, or not at all
Expand Down
2 changes: 1 addition & 1 deletion examples/CalculatedLiterals.peg.inc
Expand Up @@ -4,7 +4,7 @@ require '../Parser.php' ;

class CalculatedLiterals extends Parser {

/*Parser:CalculatedLiterals
/*!* CalculatedLiterals
string: ( /\\./ | /[^${parent.q}]/ )*
Expand Down
2 changes: 1 addition & 1 deletion examples/Calculator.peg.inc
Expand Up @@ -4,7 +4,7 @@ require '../Parser.php' ;

class Calculator extends Parser {

/*Parser:Calculator
/*!* Calculator
Number: /[0-9]+/
Value: Number > | '(' > Expr > ')' >
Expand Down
2 changes: 1 addition & 1 deletion examples/EqualRepeat.peg.inc
Expand Up @@ -10,7 +10,7 @@ class EqualRepeat extends Packrat {
* aabbacc - bad
*/

/*Parser:Grammar1
/*!* Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
Expand Down
2 changes: 1 addition & 1 deletion examples/Rfc822.peg.inc
Expand Up @@ -8,7 +8,7 @@ require '../Parser.php';
*/
class Rfc822 extends Parser {

/*Parser:Rfc822
/*!* Rfc822
crlf: /\r\n/
Expand Down
2 changes: 1 addition & 1 deletion examples/Rfc822UTF8.peg.inc
Expand Up @@ -8,7 +8,7 @@ require 'Rfc822.php';
*/
class Rfc822UTF8 extends Rfc822 {

/*Parser:Rfc822UTF8
/*!* Rfc822UTF8
crlf: /\r\n/u
Expand Down

0 comments on commit afb28ca

Please sign in to comment.