New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed | and || in regexes (trap?) #1141
Comments
|
Roast test added yesterday. |
|
Summary of RT #130562 for this doc issue. Initially the evaluation of During further discussion with jnthn the next day I learned that in S05 the last paragraph of the section on ltm has a sentence: "The first || in a regex makes the token patterns on its left available to the outer longest-token matcher, but hides any subsequent tests from longest-token matching. " So, whatever other interpretations might be possible, the implementation agrees with S05 since Describing this behavior in traps, as suggested in this issue, was also mentioned. Perl 5 had only one kind of alternation denoted by '|' which behaves like Perl 6 "||". There is a 5to6-nutshell section on the topic but it describes ltm as "a set of rules" which could use clarification. jnthn also suggested that an explanation of the concept of "declarative prefixes", which is somewhat particular to Perl 6, might be helpful to programmers using the two kinds of regex alternation. The 5to6-nutshell section simply says that Perl 5 programmers should use '||' in place of '|'. However if a regex written with "||" is inherited or composed into a grammar that uses '|' either by design or typo the result may not work as expected and '|' may be a better choice for grammar reuse and eventually programmers may need to have some understanding of both. There is an older related RT #125608. |
This is reasonable enough advice when porting or writing regexes. While Perl 6 makes the on-ramp to writing grammars easy by making it feel like a small step up from writing regexes, in truth you will only get so far without grasping the way LTM works - because that's how your tokenizer is written for you. So perhaps it should note that for simple regexes just using |
|
There was another mention of this issue recently: RT #131991. |
|
@AlexDaniel this can be closed via #1640 , or you think it should be documented as trap? |
|
@W4anD0eR96 I think mentioning it separately in traps is beneficial. Also, @ronaldxs++ mentioned 5to6-nutshell page which also has to mention this in one way or another. So two things to be done:
|
See RT #130562.
We should make sure that it is documented. Maybe even add another section to traps?
The text was updated successfully, but these errors were encountered: