Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Left recursion #164

Merged
merged 6 commits into from
Nov 15, 2015
Merged

Left recursion #164

merged 6 commits into from
Nov 15, 2015

Conversation

veelo
Copy link
Collaborator

@veelo veelo commented Nov 15, 2015

Hi Philippe,

This is something I am quite excited about: this PR adds support for every kind of left-recursion (EDIT: actually just direct and indirect left-recursion. Hidden left-recursion is not properly supported: valid matches may not be found if the grammar cannot easily be rewritten to remove the hidden recursion). It is more of an RFC than a PR though, as I just got it to work and I have yet to let it loose on a project that was blocked until now due to indirect left-recursion. The implementation currently only covers non-memoizing non-ctfe parsers. I am amazed that all it takes at the core is just 15 loc. The grammar is left unmodified.

The algorithm was developed by Medeiros et al. [1], who implemented it as IronMeta [3], written in C#. Reading about it in a recent publication by Laurent and Ment [2] helped a lot: the D code is a near look-alike of the pseudo code of Algorithm 1 in [2]. Their work is available as Autumn [4], written in Java.

The core observation by Medeiros et al. is that when in left-recursion, the part of the input that can be matched increases with every recursion, but only up to a certain depth. Recursing beyond this depth matches a shorter part, before increasing again up to the maximum. The trick is thus to allow bounded left-recursion as long as the "seed" can grow.

Memoization of previous recursions plays an important role in the implementation: Upon entrance of the expression parser on line 380 no previous recursions are in the set, so flow continues. The seed is initialised as failure, before starting an endless loop on line 384. On line 385 the expression is evaluated.

If the expression is not left-recursive, its result will become current and stored as the seed. Due to the loop the expression will be evaluated once more, and because this will not consume more input the loop will be ended on line 391.

If the expression is left-recursive then evaluation will cause a recursion into line 380. Because at this point failure exists as the seed, recursion ends and result will contain the matches of a single evaluation of the expression. This is now stored in seed as the value of the previous recursion for the next evaluation of the expression on line 385, which will be retrieved when evaluation recurses to line 380. As long as more input can be consumed, recursion will continue and the seed will grow. When the seed stops growing recursion is ended.

Left-recursive rules are left-associative, as is usually the intention, unless the rule is also right-recursive. Ad-hoc extensions can be added to give control over associativity, as explained in [1] and expanded on with expression clusters in [2], but I didn't go there.

Although this PR is backwards compatible with regular PEGs, parsing time may increase with a factor 2 I guess. Therefor it may be good to do introspection on the grammar first, and only add this magic to left-recursive rules. Laurent has an algorithm in [4] to pick one expression in a cycle of left-recursive expressions; handling this expression with the algorithm suffices to break the cycle. Also, since introspection takes time, we may want an option to turn support for left-recursion off for grammars that are known to be traditional non-left-recursive PEGs.

Another point is making it work for memoizing parsers, [2] found that they had to block memoization while growing the seed to prevent intermediate results to enter the set. Whether supporting CTFE would be any different I don't know.

So, there is still work to be done, but what's there looks very promising to me. I hope you like it too, and look forward to any comments.

Best regards,
Bastiaan.

[1] Sergio Medeiros, Fabio Mascarenhas, Roberto Ieruésalimschy, Left Recursion in Parsing Expression Grammars, Programming Languages, Volume 7554 of the series Lecture Notes in Computer Science pp 27-41, 2012. http://www.inf.puc-rio.br/~roberto/docs/sblp2012.pdf
[2] Nicolas Laurent, Kim Mens, Parsing Expression Grammars Made Practical, To appear in SLE 2015. http://arxiv.org/pdf/1509.02439v1.pdf
[3] https://github.com/kulibali/ironmeta
[4] https://github.com/norswap/autumn

@PhilippeSigaud
Copy link
Collaborator

That's absolutely wonderful! Not having left recursion was a sore point for
me from the very beginning. I'll have to read your explanation, the
articles and the code more carefully, but kudos to you for making this
work! I'm so excited by this, thanks a lot!

I'll of course merge this at once.

(Now that direct, indirect (and hidden?) left-recursion are OK, I wonder if
parsing ambiguous grammars is possible (a la GLL or GLR). I tried to code
GLL in D in another project and got it working, but it was quite slow.)

PhilippeSigaud added a commit that referenced this pull request Nov 15, 2015
@PhilippeSigaud PhilippeSigaud merged commit 4d8b4da into dlang-community:master Nov 15, 2015
@veelo
Copy link
Collaborator Author

veelo commented Nov 15, 2015

Wow that was quick! Thanks for your appreciation.

I forgot about hidden left-recursion. Should work, I'll add a unittest.

@veelo
Copy link
Collaborator Author

veelo commented Nov 15, 2015

Hidden left-recursion has issues... :-(

@veelo
Copy link
Collaborator Author

veelo commented Nov 16, 2015

I have asked Nicolas Laurent, https://github.com/norswap/autumn_dev/issues/1, to comment on a failing test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants