-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regexp with recursive subpatterns matches incorrectly #18096
Comments
|
Had 5 mins to quickly check older release behaviour: v5.20.0 prints the desired Will try to bisect tonight, unless someone beats me to it. |
|
@demerphq - bisect.pl says a51d618a82a7057c3aabb600a7a8691d27f44a34 is the commit that changed the behaviour |
|
In addition to the case in the first comment, the same regexp also fails incorrectly. This should print According to |
I checked just to be sure and immediately prior to a51d618, it did print |
|
Oh, this is wrongly optimizing the central part of the regexp .. but CURLYM is supposed to be used only in cases where each iteration of the content to be iterated has the same width. When that is not the case, it will do the wrong thing. The specific check that should be stopping it from trying to use CURLYM here is regcomp.c:5685: .. so the problem is that I think the fix needs to be that GOSUBs not recursed into should be treated as matching widths in the range (0, \inf), with the potential risk that in some cases this may reject a valid pattern as infinite; the patch below appears to fix it, but I'm uncertain if there's some additional bookkeeping required (eg setting is_inf or similar). If anyone can confirm the validity of this, I can add testcases and update the preceding comments to be a bit more accurate. |
This seems to work for me. Below is the testcase I have tested (just rewrites of one-line tests above): |
Thanks, in the absence of any confirmation from others I'll have a go tomorrow at tracing through what follows to see if I can verify the need or lack of need for additional bookkeeping. |
|
I've now created a PR #18138 for this. Looking at the code branch where we do recurse, I see it marks the same case with I also tweaked your testcase to replace the parens with angle brackets, just to make the patterns marginally easier to read. |
|
That PR now applied as commit f4cd5e2. |
Description
Regular expression with recursive subpatterns (
(?PARNO)) matches incorrectly.Steps to Reproduce
Expected behavior
In this regular expression,
$1is expected to match a expression-like string with balanced parenthesis, and whole pattern is to match partial expression without the last (parenthesized in this example) term (a + b +).So this one-liner should print
a + b, but actually printsa + b + (c.Perl configuration
The text was updated successfully, but these errors were encountered: