-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexec: super-linear cache can prevent a valid match #10823
Comments
From nick@cleaton.netCreated by nick@cleaton.netThis is a bug report for perl from nick@cleaton.net, ----------------------------------------------------------------- print "yay\n" if 'xayxay' =~ /(q1|.)*(q2|.)*(x(a|bc)*y){2,}/; This should match, but it doesn't because the cache fails to This bug is also present in blead. Perl Info
|
From nick@cleaton.netPatch. |
From nick@cleaton.netInline Patchdiff --git a/regcomp.c b/regcomp.c
index 7c7f526..779d0fc 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -3217,13 +3217,16 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
f |= SCF_DO_STCLASS_AND;
f &= ~SCF_DO_STCLASS_OR;
}
- /* These are the cases when once a subexpression
- fails at a particular position, it cannot succeed
- even after backtracking at the enclosing scope.
-
- XXXX what if minimal match and we are at the
- initial run of {n,m}? */
- if ((mincount != maxcount - 1) && (maxcount != REG_INFTY))
+ /* Exclude from super-linear cache processing any {n,m}
+ regops for which the combination of input pos and regex
+ pos is not enough information to determine if a match
+ will be possible.
+
+ For example, in the regex /foo(bar\s*){4,8}baz/ with the
+ regex pos at the \s*, the prospects for a match depend not
+ only on the input position but also on how many (bar\s*)
+ repeats into the {4,8} we are. */
+ if ((mincount > 1) || (maxcount > 1 && maxcount != REG_INFTY))
f &= ~SCF_WHILEM_VISITED_POS;
/* This will finish on WHILEM, setting scan, or on NULL: */
diff --git a/t/re/re_tests b/t/re/re_tests
index 66a47cc..02da1e1 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1482,5 +1482,10 @@ abc\N{def - c - \\N{NAME} must be resolved by the lexer
[\0005] 5\000 y $& 5
[\_] _ y $& _
+# RT #79152
+(q1|.)*(q2|.)*(x(a|bc)*y){2,} xayxay y $& xayxay
+(q1|.)*(q2|.)*(x(a|bc)*y){2,3} xayxay y $& xayxay
+(q1|z)*(q2|z)*z{15}-.*?(x(a|bc)*y){2,3}Z zzzzzzzzzzzzzzzz-xayxayxayxayZ y $& zzzzzzzzzzzzzzzz-xayxayxayxayZ
+
(?:(?:)foo|bar|zot|rt78356) foo y $& foo
# vim: softtabstop=0 noexpandtab |
From nick@cleaton.netThis seems to do the trick, but I don't know regcomp.c well enough to be Patch against blead attached, review please. Nick |
From nick@cleaton.netInline Patchdiff --git a/regcomp.c b/regcomp.c
index 7c7f526..779d0fc 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -3217,13 +3217,16 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
f |= SCF_DO_STCLASS_AND;
f &= ~SCF_DO_STCLASS_OR;
}
- /* These are the cases when once a subexpression
- fails at a particular position, it cannot succeed
- even after backtracking at the enclosing scope.
-
- XXXX what if minimal match and we are at the
- initial run of {n,m}? */
- if ((mincount != maxcount - 1) && (maxcount != REG_INFTY))
+ /* Exclude from super-linear cache processing any {n,m}
+ regops for which the combination of input pos and regex
+ pos is not enough information to determine if a match
+ will be possible.
+
+ For example, in the regex /foo(bar\s*){4,8}baz/ with the
+ regex pos at the \s*, the prospects for a match depend not
+ only on the input position but also on how many (bar\s*)
+ repeats into the {4,8} we are. */
+ if ((mincount > 1) || (maxcount > 1 && maxcount != REG_INFTY))
f &= ~SCF_WHILEM_VISITED_POS;
/* This will finish on WHILEM, setting scan, or on NULL: */
diff --git a/t/re/re_tests b/t/re/re_tests
index 66a47cc..02da1e1 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1482,5 +1482,10 @@ abc\N{def - c - \\N{NAME} must be resolved by the lexer
[\0005] 5\000 y $& 5
[\_] _ y $& _
+# RT #79152
+(q1|.)*(q2|.)*(x(a|bc)*y){2,} xayxay y $& xayxay
+(q1|.)*(q2|.)*(x(a|bc)*y){2,3} xayxay y $& xayxay
+(q1|z)*(q2|z)*z{15}-.*?(x(a|bc)*y){2,3}Z zzzzzzzzzzzzzzzz-xayxayxayxayZ y $& zzzzzzzzzzzzzzzz-xayxayxayxayZ
+
(?:(?:)foo|bar|zot|rt78356) foo y $& foo
# vim: softtabstop=0 noexpandtab |
From @cpansproutOn Mon Nov 22 00:16:34 2010, ncleaton wrote:
Thank you. Applied as 779bcb7. |
The RT System itself - Status changed from 'new' to 'open' |
@cpansprout - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#79152 (status was 'resolved')
Searchable as RT79152$
The text was updated successfully, but these errors were encountered: