Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed | and || in regexes (trap?) #1141

Closed
AlexDaniel opened this issue Jan 16, 2017 · 6 comments
Closed

Mixed | and || in regexes (trap?) #1141

AlexDaniel opened this issue Jan 16, 2017 · 6 comments
Labels
docs Documentation issue (primary issue type) trap

Comments

@AlexDaniel
Copy link
Member

See RT #130562.

We should make sure that it is documented. Maybe even add another section to traps?

@AlexDaniel AlexDaniel added the docs Documentation issue (primary issue type) label Jan 16, 2017
@ronaldxs
Copy link
Contributor

Roast test added yesterday.

@ronaldxs
Copy link
Contributor

Summary of RT #130562 for this doc issue.

Initially the evaluation of ~(42 ~~ / [ 0 || 42 ] | 4/) to "4" rather than "42" was called into question. @jnthn replied in his rejection of the ticket that || was imperative and | was declarative and only the first branch of || alternation was significant when mixed.

During further discussion with jnthn the next day I learned that in S05 the last paragraph of the section on ltm has a sentence: "The first || in a regex makes the token patterns on its left available to the outer longest-token matcher, but hides any subsequent tests from longest-token matching. " So, whatever other interpretations might be possible, the implementation agrees with S05 since ~(42 ~~ / [ 42 || 0 ] | 4 /) evaluates to "42".

Describing this behavior in traps, as suggested in this issue, was also mentioned. Perl 5 had only one kind of alternation denoted by '|' which behaves like Perl 6 "||". There is a 5to6-nutshell section on the topic but it describes ltm as "a set of rules" which could use clarification. jnthn also suggested that an explanation of the concept of "declarative prefixes", which is somewhat particular to Perl 6, might be helpful to programmers using the two kinds of regex alternation.

The 5to6-nutshell section simply says that Perl 5 programmers should use '||' in place of '|'. However if a regex written with "||" is inherited or composed into a grammar that uses '|' either by design or typo the result may not work as expected and '|' may be a better choice for grammar reuse and eventually programmers may need to have some understanding of both.

There is an older related RT #125608.

@jnthn
Copy link
Contributor

jnthn commented Jan 19, 2017

The 5to6-nutshell section simply says that Perl 5 programmers should use '||' in place of '|'.

This is reasonable enough advice when porting or writing regexes. While Perl 6 makes the on-ramp to writing grammars easy by making it feel like a small step up from writing regexes, in truth you will only get so far without grasping the way LTM works - because that's how your tokenizer is written for you. So perhaps it should note that for simple regexes just using || instead will get you familiar semantics, but if writing grammars then it's useful to learn about LTM and declarative prefixes and prefer | (and link to something on the topic).

@AlexDaniel
Copy link
Member Author

There was another mention of this issue recently: RT #131991.

@tisonkun
Copy link
Member

tisonkun commented Nov 4, 2017

@AlexDaniel this can be closed via #1640 , or you think it should be documented as trap?

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Nov 4, 2017

@W4anD0eR96 I think mentioning it separately in traps is beneficial. Also, @ronaldxs++ mentioned 5to6-nutshell page which also has to mention this in one way or another. So two things to be done:

  • 5to6-nutshell
  • traps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation issue (primary issue type) trap
Projects
None yet
Development

No branches or pull requests

4 participants