Document S05 treatment of alternations as atoms for ratchet #1962
Comments
moritz
added a commit
that referenced
this issue
Apr 25, 2018
ronaldxs
added a commit
to ronaldxs/Grammar-Modelica
that referenced
this issue
Apr 25, 2018
…d on alternatives being atomic to backtracking to be documented and tested resulting from new perl6 doc issue #1962: Raku/doc#1962
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
S05 explains that
|
and||
alternatives are atomic with respect to ratchet restriction on backtracking. In a discussion with @moritz on IRC #perl6 he seemed to explain that Perl 6 alternatives do not use backtracking at all (https://irclog.perlgeek.de/perl6/2018-04-12#i_16038420). After further research and testing I believe that to explain the problem discussed on IRC #perl6, the atomicity of alternatives with respect to ratchet discussed in S05 needs to be in the public facing documentation which I do not believe to be true now. I try to make the case below that there was adequate reason for me to be confused about the backtracking behavior of alternatives with ratchet. I am sorry if it is not so easy to read but it was not easy for me to write either and I felt it was worth the trouble.The question of backtracking and alternatives came up for me in practice for the Perl 6 Grammar::Modelica parser discussed in an advent calendar posting. A Modelica feature called "named arguments" allows a function to be called either as
CircleArea(x)
orCircleArea(radius=x)
. The grammar failed on the second form until I changed a grammarrule
which was an alternative, similar to/\w+ || \w+ '=' \w+/
, into a grammarregex
which can backtrack. Reversing the alternatives so that alternative atomicity for backtracking parses "named arguments" with arule
seems to be at the heart of the matter.The example below fails because the first alternative matches and ratchet prevents backtracking to match the end of the string. If you remove the ratchet modifier the match succeeds.
If we switch the order of alternatives both sides work fine with ratchet.
For
"a=b"
there is now no need to look at the second alternative so there is no question of backtracking. For"a"
the regex seems to first match\w+
, fail on'='
, and then move back its position in the string so it says'alt2'
before matching the second alternative. @moritz seemed to claim there is no backtracking here.A DFA, which makes one transition between states on each input symbol, could do the matching without backtracking. The transition from after \w+ to rematch the
a
with the second alternative does not look like DFA processing although adding the code blocks might effect the parsing engine and there might be some clever way to execute the code blocks in the right order and still only make one state transition. I have read that NFA engines can backtrack as is the case of the regex with the first ordering that parses"a=b"
only without ratchet. In the Regex Mechanics" chapter (Chapter 6) of @moritz fine book, "Parsing with Perl 6 Regexes and Grammars", it says that|
disjunctions are handled by an NFA engine which can't handle||
sequential disjunctions. Perl 5 backtracking for alternatives is also mentioned in the "Grouping things" section of the Perl regular expressions tutorial.It seems reasonable to me for a programmer to wonder about the possibility that in the last code example, accepting
"a"
, the regex might be backtracking. Documenting the relevant ratchet paragraph from Synopsis 5#Modifiers seems to allow individual regex alternatives to backtrack if helpful and cleanly fix the confusion:I think the feature should be explained in the public facing documentation. If my arguments are accepted I will later open an issue requesting roast tests for this feature in S05-modifier/ratchet.t.
The text was updated successfully, but these errors were encountered: