Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Philippe,
This is something I am quite excited about: this PR adds support for every kind of left-recursion (EDIT: actually just direct and indirect left-recursion. Hidden left-recursion is not properly supported: valid matches may not be found if the grammar cannot easily be rewritten to remove the hidden recursion). It is more of an RFC than a PR though, as I just got it to work and I have yet to let it loose on a project that was blocked until now due to indirect left-recursion. The implementation currently only covers non-memoizing non-ctfe parsers. I am amazed that all it takes at the core is just 15 loc. The grammar is left unmodified.
The algorithm was developed by Medeiros et al. [1], who implemented it as IronMeta [3], written in C#. Reading about it in a recent publication by Laurent and Ment [2] helped a lot: the D code is a near look-alike of the pseudo code of Algorithm 1 in [2]. Their work is available as Autumn [4], written in Java.
The core observation by Medeiros et al. is that when in left-recursion, the part of the input that can be matched increases with every recursion, but only up to a certain depth. Recursing beyond this depth matches a shorter part, before increasing again up to the maximum. The trick is thus to allow bounded left-recursion as long as the "seed" can grow.
Memoization of previous recursions plays an important role in the implementation: Upon entrance of the expression parser on line 380 no previous recursions are in the set, so flow continues. The
seedis initialised as failure, before starting an endless loop on line 384. On line 385 the expression is evaluated.If the expression is not left-recursive, its
resultwill becomecurrentand stored as theseed. Due to the loop the expression will be evaluated once more, and because this will not consume more input the loop will be ended on line 391.If the expression is left-recursive then evaluation will cause a recursion into line 380. Because at this point failure exists as the
seed, recursion ends andresultwill contain the matches of a single evaluation of the expression. This is now stored inseedas the value of the previous recursion for the next evaluation of the expression on line 385, which will be retrieved when evaluation recurses to line 380. As long as more input can be consumed, recursion will continue and theseedwill grow. When theseedstops growing recursion is ended.Left-recursive rules are left-associative, as is usually the intention, unless the rule is also right-recursive. Ad-hoc extensions can be added to give control over associativity, as explained in [1] and expanded on with expression clusters in [2], but I didn't go there.
Although this PR is backwards compatible with regular PEGs, parsing time may increase with a factor 2 I guess. Therefor it may be good to do introspection on the grammar first, and only add this magic to left-recursive rules. Laurent has an algorithm in [4] to pick one expression in a cycle of left-recursive expressions; handling this expression with the algorithm suffices to break the cycle. Also, since introspection takes time, we may want an option to turn support for left-recursion off for grammars that are known to be traditional non-left-recursive PEGs.
Another point is making it work for memoizing parsers, [2] found that they had to block memoization while growing the seed to prevent intermediate results to enter the set. Whether supporting CTFE would be any different I don't know.
So, there is still work to be done, but what's there looks very promising to me. I hope you like it too, and look forward to any comments.
Best regards,
Bastiaan.
[1] Sergio Medeiros, Fabio Mascarenhas, Roberto Ieruésalimschy, Left Recursion in Parsing Expression Grammars, Programming Languages, Volume 7554 of the series Lecture Notes in Computer Science pp 27-41, 2012. http://www.inf.puc-rio.br/~roberto/docs/sblp2012.pdf
[2] Nicolas Laurent, Kim Mens, Parsing Expression Grammars Made Practical, To appear in SLE 2015. http://arxiv.org/pdf/1509.02439v1.pdf
[3] https://github.com/kulibali/ironmeta
[4] https://github.com/norswap/autumn