-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/g failure with zero-width patterns #9831
Comments
From @ikegamiCreated by @ikegamiA regression was introduced into 5.10.0 concerning /g and zero-width
I expect the 5.8 behaviour. 5.6.x and 5.8.x behave like 5.8.9. I've been Eric "ikegami" Brine Perl Info
|
From @hvds"Eric Brine" (via RT) <perlbug-followup@perl.org> wrote: With -Dr the first difference between the two cases is that the failing The failure case searches a second time for a match starting from offset 0, I'll look further to see if there's an opportunity to improve stclass Hugo |
The RT System itself - Status changed from 'new' to 'open' |
From @demerphq2009/8/16 <hv@crypt.org>:
This rings a bell for me. Is this really a regression or is it a bug fix? Why do we assume the first is correct? cheers, -- |
From @iabynOn Mon, Aug 17, 2009 at 04:36:08PM +0200, demerphq wrote:
Well, I'd expect -- |
From Eirik-Berg.Hanssen@allverden.nodemerphq <demerphq@gmail.com> writes:
Because, since the match is zero-length, we expect[1] pos() to be sidhekin@blackbox[18:07:50] [1]: Disclaimer: Our expectations may well be the problem. :-\ Eirik |
From @ikegamiOn Mon, Aug 17, 2009 at 10:36 AM, demerphq <demerphq@gmail.com> wrote:
Well something is broken. With 5.10.0 (debian lenny): $ perl -wle'print for "abc " =~ /(?=(\S+))/g' $ perl -wle'print for "abc " =~ /(?{})(?=(\S+))/g' $ perl -wle'print for "abc " =~ /x*(?=(\S+))/g' These should all return the same. |
From @AbigailOn Mon, Aug 17, 2009 at 12:37:45PM -0400, Eric Brine wrote:
Something related (broken only in 5.10, fixed in 5.10.1-RC): $ /opt/perl/5.8.8/bin/perl -wle 'print for "abc" =~ /(?=)(?=(\S+))/g' Not sure what blead does - the last pull I did didn't build. Abigail |
From @cpansproutOn Sun Aug 16 01:15:58 2009, ikegami@adaelis.com wrote:
This was broken by commit 07be1b8, I tried following the code paths for + and for {1,} (which are meant to So I fixed it by not setting PREGf_SKIP if the + is inside a (?=). I really don’t understand this code, and would much appreciate any |
From @cpansproutFrom: Father Chrysostomos <sprout@cpan.org> [perl #68564] /g failure with zero-width patterns To make "abc " =~ /(?=(\S+))/g work again (it should return "abc", Inline Patchdiff --git a/pod/perldelta.pod b/pod/perldelta.pod
index b478273..bb1af44 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -698,6 +698,13 @@ Stringifying a scalar containing -0.0 no longer has the affect of turning
false into true
L<[perl #45133]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=45133>.
+=item *
+
+Using a regular expression lookhead assertion containing a subpattern
+quantified with a C<+> inside a global pattern (i.e.,
+C<"abc " =~ /(?=(\S+))/g>) now works again, as it did in 5.8.x
+L<[perl #68564]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=68564>.
+
=back
=head1 Known Problems
diff --git a/regcomp.c b/regcomp.c
index d6f3523..c3867e9 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -4595,6 +4595,7 @@ reStudy:
I32 last_close = 0; /* pointed to by data */
regnode *first= scan;
regnode *first_next= regnext(first);
+ bool lookahead = FALSE;
/*
* Skip introductions and multiplicators >= 1
@@ -4611,7 +4612,7 @@ reStudy:
/* An OR of *one* alternative - should not happen now. */
(OP(first) == BRANCH && OP(first_next) != BRANCH) ||
/* for now we can't handle lookbehind IFMATCH*/
- (OP(first) == IFMATCH && !first->flags) ||
+ (OP(first) == IFMATCH && !first->flags && (lookahead = 1)) ||
(OP(first) == PLUS) ||
(OP(first) == MINMOD) ||
/* An {n,m} with n>0 */
@@ -4699,7 +4700,8 @@ reStudy:
goto again;
}
if (sawplus && (!sawopen || !RExC_sawback)
- && !(RExC_seen & REG_SEEN_EVAL)) /* May examine pos and $& */
+ && !(RExC_seen & REG_SEEN_EVAL)
+ && !lookahead) /* May examine pos and $& */
/* x+ must match at the 1st pos of run of x's */
r->intflags |= PREGf_SKIP;
diff --git a/t/re/pat_rt_report.t b/t/re/pat_rt_report.t
index 33b6f7c..cf176ec 100644
--- a/t/re/pat_rt_report.t
+++ b/t/re/pat_rt_report.t
@@ -21,7 +21,7 @@ BEGIN {
}
-plan tests => 2510; # Update this when adding/deleting tests.
+plan tests => 2511; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -1181,6 +1181,14 @@ sub run_tests {
iseq($first, $second);
}
+
+ {
+ local $BugId = 68564; # minimal CURLYM limited to 32767 matches
+ iseq
+ join("-", "abc " =~ /(?=(\S+))/g),
+ "abc-bc-b",
+ "[perl #68564] stclass optimisation does not break + inside (?=)";
+ }
} # End of sub run_tests
1; |
From @nwc10On Mon, Aug 17, 2009 at 10:55:09PM +0200, Abigail wrote:
I bisected with #!/bin/sh and found that this was fixed by commit 89c6a13 Re: [perl #52672] regexp failure: (?=) turns into OPFAIL Inline Patchdiff --git a/regcomp.c b/regcomp.c
index 05b54ee..07d5535 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -5691,6 +5691,8 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
RExC_seen |= REG_SEEN_LOOKBEHIND;
RExC_parse++;
case '=': /* (?=...) */
+ RExC_seen_zerolen++;
+ break;
case '!': /* (?!...) */
RExC_seen_zerolen++;
if (*RExC_parse == ')') {
diff --git a/t/op/re_tests b/t/op/re_tests
index 3df3745..6894458 100644
--- a/t/op/re_tests
+++ b/t/op/re_tests
@@ -414,6 +414,7 @@ a[-]?c ac y $& ac
'(abc)\1'i ABCABC y $1 ABC
'([a-c]*)\1'i ABCABC y $1 ABC
a(?!b). abad y $& ad
+(?=)a a y $& a
a(?=d). abad y $& ad
a(?=c|d). abad y $& ad
a(?:b|c|d)(.) ace y $1 e
Nicholas Clark |
From @khwilliamsonFather Chrysostomos via RT wrote:
I'm afraid I don't know this code either. |
From @ikegamiOn Mon, Oct 11, 2010 at 10:39 AM, Nicholas Clark <nick@ccl4.org> wrote:
I don't see why not. Especially since it still fails for blead. Patch with test attached. - Eric |
From @ikegami0001-Test-for-RT-68564-perl-wle-print-for-abc-S-g.patchFrom 4508e9547d2d335f019b9cdfb4b48881749a46be Mon Sep 17 00:00:00 2001
From: Eric Brine <ikegami@adaelis.com>
Date: Mon, 11 Oct 2010 17:57:03 -0700
Subject: [PATCH] Test for RT##68564: perl -wle"print for 'abc' =~ /(?=(\S+))/g"
---
t/re/pat.t | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/t/re/pat.t b/t/re/pat.t
index c007880..b1d43f7 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -202,6 +202,15 @@ sub run_tests {
}
{
+ local $Message = "Test zero-width match with /g";
+ local $" = ":";
+ $_ = "abc";
+ my @words = 'abc' =~ /(?=(\S+))/g;
+ my $exp = "abc:cb:c";
+ iseq "@words", $exp;
+ }
+
+ {
$_ = "abcdefghi";
my $pat1 = 'def';
--
1.7.1.1
|
From @ikegamiOn Mon, Oct 11, 2010 at 9:23 PM, Eric Brine <ikegami@adaelis.com> wrote:
Forgot to increment number of tests. Updated patch attached. |
From @ikegamiFrom e81434cbf81a1bb93004b920bb35c502c51bf182 Mon Sep 17 00:00:00 2001 t/re/pat.t | 11 ++++++++++- diff --git a/t/re/pat.t b/t/re/pat.t -plan tests => 398; # Update this when adding/deleting tests. run_tests() unless caller; @@ -202,6 +202,15 @@ sub run_tests { { my $pat1 = 'def'; |
From @demerphqOn 10 October 2010 23:27, Father Chrysostomos via RT
I have looked into this more deeply and applied a modified version of Here is the commit message I wrote: commit e7f38d0 fix 68564: /g failure with zero-width patterns This is based on a patch by Father Chrysostomos <sprout@cpan.org> The start class optimisation has two modes, "try every valid start Consider /(\d+)X/ and the string "123456Y", now we know that if we fail Now consider the case with zero-width lookahead under /g: /(?=(\d+)X)/. print $1 for "123"=~/(?=(\d+))/g should first match "123". Since $& is zero length, pos() is not The point here is that it makes perfect sense to disable the IMO your patch was quite right, although I had to dig fairly deep to Thanks for the patch. BTW, I am a bit curious if there are any other flaws in the flip-flop logic. Cheers, -- |
@cpansprout - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#68564 (status was 'resolved')
Searchable as RT68564$
The text was updated successfully, but these errors were encountered: