-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex loop for \K in lookbehind #14638
Comments
From ph10@hermes.cam.ac.ukCreated by ph10@hermes.cam.ac.ukThe command perl -e 'while("aaaa" =~ /(?<=\Ka)/g){};' puts Perl into an infinite loop. Without the \K in the assertion, it Regards, -- Perl Info
|
From @jkeenanOn Tue Apr 07 09:46:45 2015, ph10@hermes.cam.ac.uk wrote:
A little diagnostic: ##### while ("aaaa" =~ /(?<=a)/g) { say ''; # Below: Infinite loop The top part produces: ##### The bottom part loops forever, producing: #####
Can you provide a link to that fix? Thank you very much. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonI bisected this, and it appears that this bug has been there from the very first, so this is not a regression: ee9b8ea is the first bad commit Add Regexp::Keep \K functionality to regex engine as well as add \v and \V, cleanup and more docs for regatom() -- |
From @khwilliamsonAlso, -Dr output looks like EXECUTING... Matching REx "(?<=\Ka)" against "aaaa" ad infinitum |
From ph10@hermes.cam.ac.ukOn Mon, 27 Apr 2015, James E Keenan via RT wrote:
Not very easily (I'd have to check the SVN manual to figure out how to However, even after matching a non-empty string, there is still one The code now checks for this case and advances the /g loop by one Regards, -- |
From @demerphqOn 28 April 2015 at 13:14, <ph10@hermes.cam.ac.uk> wrote:
Sounds like we have the same problem in Perl. Karl, if you dont see a way to fix this then let me know and I will pick it up. Cheers, -- |
From @khwilliamsonOn 04/28/2015 06:40 AM, demerphq wrote:
I haven't ever looked at the \K code; I was hoping someone else would |
From @demerphqOn 28 April 2015 at 22:28, Karl Williamson <public@khwilliamson.com> wrote:
Ok, then I will try to find time to address this. cheers, -- |
From @khwilliamsonOn 4/28/2015 3:26 PM, demerphq wrote:
Note that this is a 5.23 item, so there isn't a real urgency |
From @tonycozOn Tue Apr 28 14:27:34 2015, demerphq wrote:
What's the intended behaviour of \K in a look-(ahead|behind)? My first thught was that it should be ignored, something like: --- a/regcomp.c (with some extra code needed to handle look-aheads.) Or should it control the region matched, so \K in the example would set $& to "a", Tony |
From @ap* Tony Cook via RT <perlbug-followup@perl.org> [2015-05-27 07:05]:
It seems to me like it can only really have reasonable semantics at the So I would be most inclined to treat it as a compile error, at the least I have, FWIW, used it plenty but never used it inside a group. Anyone? Regards, |
From @tonycozOn Tue May 26 22:03:58 2015, tonyc wrote:
Which was broken in at least one way. Attached is a "better" patch, depending on how much I've managed Tony |
From @tonycoz0001-prevent-K-working-in-lookahead-behind-assertions-and.patchFrom 40f4dee6393db72c33d89117b6e092e4e349366c Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 28 May 2015 16:03:50 +1000
Subject: [PATCH] prevent \K working in lookahead/behind assertions (and warn)
this is probably incorrect, since I'm clueless about the regexp
engine
It may be that in_lookbehind and in_lookahead can be combined, since
in_lookbehind appears to be only used to maintain its own value.
---
pod/perldiag.pod | 5 +++++
regcomp.c | 51 +++++++++++++++++++++++++++++++++++++-----------
t/lib/warnings/regcomp | 12 ++++++++++++
t/re/pat_advanced.t | 14 +++++++++++++
4 files changed, 71 insertions(+), 11 deletions(-)
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 93ae13b..fb00745 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -6556,6 +6556,11 @@ about the /d modifier.
(W misc) You have a \E in a double-quotish string without a C<\U>,
C<\L> or C<\Q> preceding it.
+=item Useless use of \K in lookbehind/lookahead in regex; marked by S<<-- HERE> in m/%s/
+
+(W regexp) Your regular expression used C<\K> in a lookhead or
+lookbehind assertion, where is has no effect.
+
=item Useless use of greediness modifier '%c' in regex; marked by S<<-- HERE> in m/%s/
(W regexp) You specified something like these:
diff --git a/regcomp.c b/regcomp.c
index 712c8ed7..945778d 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -177,6 +177,7 @@ struct RExC_state_t {
through */
U32 study_chunk_recursed_bytes; /* bytes in bitmap */
I32 in_lookbehind;
+ I32 in_lookahead;
I32 contains_locale;
I32 contains_i;
I32 override_recoding;
@@ -255,6 +256,7 @@ struct RExC_state_t {
#define RExC_study_chunk_recursed_bytes \
(pRExC_state->study_chunk_recursed_bytes)
#define RExC_in_lookbehind (pRExC_state->in_lookbehind)
+#define RExC_in_lookahead (pRExC_state->in_lookahead)
#define RExC_contains_locale (pRExC_state->contains_locale)
#define RExC_contains_i (pRExC_state->contains_i)
#define RExC_override_recoding (pRExC_state->override_recoding)
@@ -6633,6 +6635,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
RExC_seen = 0;
RExC_maxlen = 0;
RExC_in_lookbehind = 0;
+ RExC_in_lookahead = 0;
RExC_seen_zerolen = *exp == '^' ? -1 : 0;
RExC_extralen = 0;
RExC_override_recoding = 0;
@@ -9782,6 +9785,12 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
*flagp = 0; /* Tentatively. */
+ if (RExC_in_lookbehind) {
+ RExC_in_lookbehind++;
+ }
+ if (RExC_in_lookahead) {
+ RExC_in_lookahead++;
+ }
/* Make an OPEN node, if parenthesized. */
if (paren) {
@@ -10055,9 +10064,11 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
RExC_seen |= REG_LOOKBEHIND_SEEN;
RExC_in_lookbehind++;
RExC_parse++;
- /* FALLTHROUGH */
+ RExC_seen_zerolen++;
+ break;
case '=': /* (?=...) */
- RExC_seen_zerolen++;
+ RExC_in_lookahead++;
+ RExC_seen_zerolen++;
break;
case '!': /* (?!...) */
RExC_seen_zerolen++;
@@ -10685,6 +10696,9 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
if (RExC_in_lookbehind) {
RExC_in_lookbehind--;
}
+ if (RExC_in_lookahead) {
+ RExC_in_lookahead--;
+ }
if (after_freeze > RExC_npar)
RExC_npar = after_freeze;
return(ret);
@@ -11787,15 +11801,30 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
*flagp |= SIMPLE;
goto finish_meta_pat;
case 'K':
- RExC_seen_zerolen++;
- ret = reg_node(pRExC_state, KEEPS);
- *flagp |= SIMPLE;
- /* XXX:dmq : disabling in-place substitution seems to
- * be necessary here to avoid cases of memory corruption, as
- * with: C<$_="x" x 80; s/x\K/y/> -- rgs
- */
- RExC_seen |= REG_LOOKBEHIND_SEEN;
- goto finish_meta_pat;
+ if (!RExC_in_lookbehind && !RExC_in_lookahead) {
+ RExC_seen_zerolen++;
+ ret = reg_node(pRExC_state, KEEPS);
+ *flagp |= SIMPLE;
+ /* XXX:dmq : disabling in-place substitution seems to
+ * be necessary here to avoid cases of memory corruption, as
+ * with: C<$_="x" x 80; s/x\K/y/> -- rgs
+ */
+ RExC_seen |= REG_LOOKBEHIND_SEEN;
+ }
+ else {
+ if (PASS2) {
+ /* adjust offset so <-- points at the K */
+ ++RExC_parse;
+ ckWARNreg(RExC_parse, "Useless use of \\K in lookbehind/lookahead");
+ --RExC_parse;
+ }
+ /* originally I did goto tryagain here, but that failed
+ * with an Internal urp when a ) immediately followed the \K.
+ * So return something, even if it's NOTHING.
+ */
+ ret = reg_node(pRExC_state, NOTHING);
+ }
+ goto finish_meta_pat;
case 'Z':
ret = reg_node(pRExC_state, SEOL);
*flagp |= SIMPLE;
diff --git a/t/lib/warnings/regcomp b/t/lib/warnings/regcomp
index b9943a0..9d569cd 100644
--- a/t/lib/warnings/regcomp
+++ b/t/lib/warnings/regcomp
@@ -36,3 +36,15 @@ $a = qr/[\c,]/;
EXPECT
"\c," is more clearly written simply as "l" at - line 9.
"\c," is more clearly written simply as "l" at - line 10.
+########
+# regcomp.c - \K in assertion
+use warnings;
+$x = "aaaa";
+$x =~ /(?<=\Ka)/;
+$x =~ /(?=a\Ka)aa/;
+no warnings 'regexp';
+$x =~ /(?<=\Ka)/;
+$x =~ /(?=a\Ka)aa/;
+EXPECT
+Useless use of \K in lookbehind/lookahead in regex; marked by <-- HERE in m/(?<=\K <-- HERE a)/ at - line 4.
+Useless use of \K in lookbehind/lookahead in regex; marked by <-- HERE in m/(?=a\K <-- HERE a)aa/ at - line 5.
diff --git a/t/re/pat_advanced.t b/t/re/pat_advanced.t
index 891bb66..d8d5823 100644
--- a/t/re/pat_advanced.t
+++ b/t/re/pat_advanced.t
@@ -1498,6 +1498,20 @@ sub run_tests {
$x = "abcde";
$x =~ s/(.)\K/$1/g;
is($x, "aabbccddee", $message);
+
+ no warnings 'regexp';
+ $x = "aaaa";
+ $x =~ /(?<=\Ka)/;
+ is($&, "", "\\K in lookbehind meaningless");
+
+ $x =~ /(?<=(a)\K)/;
+ is($&, "", "\\K in lookbehind meaningless (with nesting)");
+
+ $x =~ /(?=a\Ka)aa/;
+ is($&, "aa", "\\K in lookahead meaningless");
+
+ $x =~ /(?=(a)\Ka)aa/;
+ is($&, "aa", "\\K in lookahead meaningless (nesting)");
}
{
--
1.7.10.4
|
From @TuxOn Wed, 27 May 2015 23:05:34 -0700, "Tony Cook via RT"
that looks like inconsistent indentation -- |
From @jkeenanOn Wed May 27 23:17:14 2015, hmbrand wrote:
Contributors to this RT: Do we have any more ideas on how to proceed further? Thank you very much. |
From @hvdsOn Wed Dec 09 15:49:38 2015, jkeenan wrote:
I think someone with regexp-engine knowledge needs to review the patch; ideally that would be Yves. If he doesn't have time, I can try to do this - it'll take a while, since I'm not yet familiar with the use or implementation of \K, but I should have some free time over the Christmas period. Hugo |
From @demerphqWhile I have not tested it the patch looks fine to me. I have to admit Yves On 28 May 2015 at 08:05, Tony Cook via RT <perlbug-followup@perl.org> wrote:
-- |
From @khwilliamsonOn Thu Dec 10 03:37:21 2015, demerphq wrote:
Tony, It looks to me like you can apply this patch, with a possible watchdog-timer test as well |
From @tonycozOn Sat, 06 Aug 2016 16:44:36 -0700, khw wrote:
Applied as 105c827, I haven't added a watchdog test, since the code doesn't get to that point anymore (the tests don't check against a string that would loop) Tony |
@tonycoz - Status changed from 'open' to 'pending release' |
It was disallowed because: - it was breaking things (GH Perl#14638) - at the time, no-one had the tuits and knowledge to select non-buggy semantics and implement them.
It was disallowed because: - it was breaking things (GH #14638) - at the time, no-one had the tuits and knowledge to select non-buggy semantics and implement them.
Migrated from rt.perl.org#124256 (status was 'pending release')
Searchable as RT124256$
The text was updated successfully, but these errors were encountered: