Skip to content

Latest commit

 

History

History
110 lines (83 loc) · 3.9 KB

TUTORIAL.md

File metadata and controls

110 lines (83 loc) · 3.9 KB

On how to read grammars

BNF grammars are pretty easy to read. Just replace the ::= sign with is or matches. We can easily include error recovery attributes to this verbalisation as well:

paren_expr ::= '(' expr ')' {pin=1}

can be read like that: paren_expr matching is considered successful even if only '(' is actually matched. Originally parser would stop matching on any failure. Later the idea evolved into extendedPin mode (ON by default) when parser tries to match the rest parts of a sequence no matter what, i.e. ')' token will be matched in the case of " ( ) " input. It still stops matching on first failure if the pinned part is not reached.

Thus the notion of pin helps parser to recover when input misses some parts.

property ::= id '=' expr  {pin=2 recoverWhile=rule_recover}
private rule_recover ::= !(';' | id '=')

can be read like that: property matches the sequence of id, '=' and expr. The matching is considered successful if we get through '=' part. And regardless of the result skip all the tokens while rule_recover matches, i.e. while the parser doesn't encounter ';' or a rule start (id and '='). Note that recovery rule is always a predicate (a NOT predicate usually) hence it doesn't consume anything from the input.

Thus the notion of recoverWhile helps parser to recover when input includes something unexpected.

Live Preview introduction

Suppose we want to create a grammar for some expression language like this:

expr=1 * 2 + (3 - 8.3!);
text='This is a ' + 'text';
// line comment

test_pin_results=;                        // expression expected
some garbage to test error recovering



recovered =1/2                            // missing semicolon
recovered_again=1/2;

To do this lets make a new file sample.bnf. We can invoke Live Preview action via context menu or the ctrl-alt-P/meeta-alt-P shortcut and paste the sample text above right on start.

Structure toolwindow, File Structure popup (ctrl-F12/meta-F12) and PSI Viewer dialog can be used to observe the PSI tree as we modify the grammar.

Start/Stop Grammar Highlighting action (ctrl-alt-F7/meta-alt-F7) highlights grammar expressions at the current caret position in a preview editor.

In the end my IDE looked like that:

Live Preview

Here is the grammar I designed for the sample above. No java coding, no generation, no test running. I still need to add a lexer and some extra attributes to generate a real parser like package and some class names as described in the main readme but now I'm sure the BNF part is OK.

The fun part is that I even can inject this language in some other files I work with to quickly test the syntax.

{
  tokens=[
    SEMI=';'
    EQ='='
    LP='('
    RP=')'

    space='regexp:\s+'
    comment='regexp://.*'
    number='regexp:\d+(\.\d*)?'
    id='regexp:\p{Alpha}\w*'
    string="regexp:('([^'\\]|\\.)*'|\"([^\"\\]|\\.)*\")"

    op_1='+'
    op_2='-'
    op_3='*'
    op_4='/'
    op_5='!'
  ]

  name(".*expr")='expression'
  extends(".*expr")=expr
}

root ::= root_item *
private root_item ::= !<<eof>> property ';' {pin=1 recoverWhile=property_recover}

property ::= id '=' expr  {pin=2}
private property_recover ::= !(';' | id '=')

expr ::= factor plus_expr *
left plus_expr ::= plus_op factor
private plus_op ::= '+'|'-'
private factor ::= primary mul_expr *
left mul_expr  ::= mul_op primary
private mul_op ::= '*'|'/'
private primary ::= primary_inner factorial_expr ?
left factorial_expr ::= '!'
private primary_inner ::= literal_expr | ref_expr | paren_expr
paren_expr ::= '(' expr ')' {pin=1}
ref_expr ::= id
literal_expr ::= number | string | float

Try playing with pin and recoverWhile attributes, tokens and rule modifiers to see how this all works.

Note that since version 1.1.2 one can generate *.flex file and then lexer java code using context menu items.