Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong match order inside replacement #6149

Closed
p5pRT opened this issue Dec 12, 2002 · 14 comments
Closed

wrong match order inside replacement #6149

p5pRT opened this issue Dec 12, 2002 · 14 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Dec 12, 2002

Migrated from rt.perl.org#19078 (status was 'resolved')

Searchable as RT19078$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Dec 12, 2002

From perl-5.8.0@ton.iguana.be

Created by perl-5.8.0@ton.iguana.be

perl -le '$_="CCCGGG"; s!.!<@​a{print("$&"),/[$&]/g}>!g; print'
C
C
C
C

Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]/ at -e line 1.

$& should be the result of /./ each time. It however seems to
come from the /[$&]/g in later rounds

on perl perl, v5.7.0 built for sun4-solaris​:
perl -le '$_="CCCGGG"; s!.!<@​a{print("$&"),/[$&]/g}>!g; print'
C
C
C
C
C
C
< >< >< >< >< >< >

Here it is even stranger, the last three C's should have been G's

Giving an extra scope solves it​:
perl -le '$_="CCCGGG"; s!.!<@​a{do{print("$&"),/[$&]/g}}>!g; print'
C
C
C
G
G
G
< >< >< >< >< >< >

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.0:

Configured by ton at Tue Nov 12 01:56:18 CET 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.19, archname=i686-linux-thread-multi-64int-ld
    uname='linux quasar 2.4.19 #5 wed oct 2 02:34:25 cest 2002 i686 unknown '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=undef uselongdouble=define
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -fomit-frame-pointer',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -ldb -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
    libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.2.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.0:
    /usr/lib/perl5/5.8.0/i686-linux-thread-multi-64int-ld
    /usr/lib/perl5/5.8.0
    /usr/lib/perl5/site_perl/5.8.0/i686-linux-thread-multi-64int-ld
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    .


Environment for perl v5.8.0:
    HOME=/home/ton
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/local/jre/bin:/home/oracle/product/9.0.1/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:.
    PERL_BADLANG (unset)
    SHELL=/bin/bash

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 5, 2005

From @schwern

Confirmed to still exist in 5.8.6 and bleadperl.

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 5, 2005

The RT System itself - Status changed from 'new' to 'open'

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 1, 2010

From @cpansprout

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset PL_curpm itself appears to be incorrect. The attached patch fixes it.

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 1, 2010

From @cpansprout

Inline Patch
diff -Nup blead/pp_ctl.c blead-19078-subst-with-rhs-match-in-same-scope/pp_ctl.c
--- blead/pp_ctl.c	2010-07-21 05:59:10.000000000 -0700
+++ blead-19078-subst-with-rhs-match-in-same-scope/pp_ctl.c	2010-07-31 18:38:59.000000000 -0700
@@ -377,6 +377,7 @@ PP(pp_substcont)
 	(void)ReREFCNT_inc(rx);
     cx->sb_rxtainted |= RX_MATCH_TAINTED(rx);
     rxres_save(&cx->sb_rxres, rx);
+    PL_curpm = pm;
     RETURNOP(pm->op_pmstashstartu.op_pmreplstart);
 }
 
diff -Nurp blead/t/re/subst.t blead-19078-subst-with-rhs-match-in-same-scope/t/re/subst.t
--- blead/t/re/subst.t	2010-07-17 18:51:14.000000000 -0700
+++ blead-19078-subst-with-rhs-match-in-same-scope/t/re/subst.t	2010-07-31 22:28:49.000000000 -0700
@@ -7,7 +7,7 @@ BEGIN {
 }
 
 require './test.pl';
-plan( tests => 170 );
+plan( tests => 172 );
 
 # Stolen from re/ReTest.pl. Can't just use the file since it doesn't support
 # like() and it conflicts with test.pl
@@ -724,3 +724,25 @@ fresh_perl_is( '$_="abcef"; s/bc|(.)\G(.
     $string =~ s/./\777/;
     is($string, chr 0x1FF, "Verify that handles s/foo/\\777/");
 }
+
+# Scoping of s//the RHS/ when there is no /e
+# Tests based on [perl #19078]
+{
+ local *_;
+ my $output = ''; my %a;
+ no warnings 'uninitialized';
+
+ $_="CCCGGG";
+ s!.!<@a{$output .= ("$&"),/[$&]/g}>!g;
+ $output .= $_;
+ is(
+   $output, "CCCGGG<   ><  >< ><   ><  >< >",
+  's/// sets PL_curpm for each iteration even when the RHS has set it'
+ );
+ 
+ s/C/$a{m\G\}/;
+ is(
+  "$&", G =>
+  'Match vars reflect the last match after s/pat/$a{m|pat|}/ without /e'
+ );
+}

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 4, 2010

From @demerphq

On 1 August 2010 21​:19, Father Chrysostomos <sprout@​cpan.org> wrote​:

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset PL_curpm itself appears to be incorrect. The attached patch fixes it.

Hi, thanks for the patch. Im a little concerned that this might have
deeper consequences than at first meet the eye, could we hold off on
this one until we know more?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 4, 2010

From @khwilliamson

demerphq wrote​:

On 1 August 2010 21​:19, Father Chrysostomos <sprout@​cpan.org> wrote​:

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset PL_curpm itself appears to be incorrect. The attached patch fixes it.

Hi, thanks for the patch. Im a little concerned that this might have
deeper consequences than at first meet the eye, could we hold off on
this one until we know more?

How will we find out more?

Yves

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 4, 2010

From @demerphq

On 4 August 2010 15​:47, karl williamson <public@​khwilliamson.com> wrote​:

demerphq wrote​:

On 1 August 2010 21​:19, Father Chrysostomos <sprout@​cpan.org> wrote​:

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each
iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with
the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript
is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset
PL_curpm itself appears to be incorrect. The attached patch fixes it.

Hi, thanks for the patch. Im a little concerned that this might have
deeper consequences than at first meet the eye, could we hold off on
this one until we know more?

How will we find out more?

The problem is that there is a lot of code around that tries to deal
with various logical inconsistancies related to qr// and PL_curpm, and
the code is not obvious or intuitive.

So, for instance I am wondering what happens with this patch and
things like recursive patterns on the LHS, or things like that.

PL_curpm and everything related to it really deserves some deep
analysis and some love.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 8, 2010

From @cpansprout

On Aug 4, 2010, at 7​:27 AM, demerphq wrote​:

On 4 August 2010 15​:47, karl williamson <public@​khwilliamson.com> wrote​:

demerphq wrote​:

On 1 August 2010 21​:19, Father Chrysostomos <sprout@​cpan.org> wrote​:

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each
iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with
the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript
is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset
PL_curpm itself appears to be incorrect. The attached patch fixes it.

Hi, thanks for the patch. Im a little concerned that this might have
deeper consequences than at first meet the eye, could we hold off on
this one until we know more?

How will we find out more?

The problem is that there is a lot of code around that tries to deal
with various logical inconsistancies related to qr// and PL_curpm, and
the code is not obvious or intuitive.

So, for instance I am wondering what happens with this patch and
things like recursive patterns on the LHS, or things like that.

PL_curpm and everything related to it really deserves some deep
analysis and some love.

Could you explain to me what the two PL_curpm assignments in regexec.c are for? One is in S_regtry; the other in restore_pos. When are those two functions called?

(All other assignments to PL_curpm look fine to me and do not conflict with the patch.)

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 10, 2010

From @demerphq

On 8 August 2010 21​:24, Father Chrysostomos <sprout@​cpan.org> wrote​:

On Aug 4, 2010, at 7​:27 AM, demerphq wrote​:

On 4 August 2010 15​:47, karl williamson <public@​khwilliamson.com> wrote​:

demerphq wrote​:

On 1 August 2010 21​:19, Father Chrysostomos <sprout@​cpan.org> wrote​:

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each
iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with
the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript
is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset
PL_curpm itself appears to be incorrect. The attached patch fixes it.

Hi, thanks for the patch. Im a little concerned that this might have
deeper consequences than at first meet the eye, could we hold off on
this one until we know more?

How will we find out more?

The problem is that there is a lot of code around that tries to deal
with various logical inconsistancies related to qr// and PL_curpm, and
the code is not obvious or intuitive.

So, for instance I am wondering what happens with this patch and
things like recursive patterns on the LHS, or things like that.

PL_curpm and everything related to it really deserves some deep
analysis and some love.

Could you explain to me what the two PL_curpm assignments in regexec.c are for? One is in S_regtry; the other in restore_pos. When are those two functions called?

(All other assignments to PL_curpm look fine to me and do not conflict with the patch.)

regtry() is a setup wrapper around regmatch(). Essentailly matching
consists of calling regtry at each successive character point in the
string until we get a match, possibly with the start points filtered
out by logic in find_byclass. It handles the book keeping tasks such
as setting up the capture buffer and things like that.

restore_pos() is called prior to exiting p_regexec().

To explain a bit....

Originally regexes were basically stored entirely in the opcode, there
were no recursive patterns and there were no qr objects. The regex
struct included allocation for the capture buffer offsets PL_curpm
was the most recent regex op that succeeded. Since the capture buffer
offsets were effectively stored in the opcode this all made sense.

At some point in the evolution of the regex engine we ended up with
both qr// objects (so now regex objects arent so strongly associated
with PMOP's) AND recursive patterns of the form (??{ $qr_object }).
This is where things get hairy, because this means that a regex object
can be PL_curpm in two different scopes but /against different
strings/. Yet each regex object has only one capture buffer associated
with it. Now, back in the day some of this was solved by some gnarly
stuff on the stack, some of it was solved with a hack I did called a
swap buffer (this is/was used to prevent clobbering the results of a
successful match when attempting a new match), and some of it was
solved with the idea of a "lite copy" of regex (see the mother_re
field etc) which effectively provides a new capture buffer with the
results.

Anyway, the reason why I asked to hold off on this, and I apologize
for taking so long to explain, is that the above is (obviously) a big
mess. The basic design is arguably broken at multiple levels, and I
fear that even slight changes to how it works can have deep and
surprising effects. We need to be very careful here, and we also need
to really consider somebody looking into it all quite carefully.
Currently we have to consider that match results, regex objects, and
PMOP's are all independent objects, yet they started off being one
object and we should consider redoing how we manage them so that it is
simpler and provably correct.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 15, 2010

From @cpansprout

On Aug 10, 2010, at 12​:57 AM, demerphq wrote​:

On 8 August 2010 21​:24, Father Chrysostomos <sprout@​cpan.org> wrote​:

On Aug 4, 2010, at 7​:27 AM, demerphq wrote​:

On 4 August 2010 15​:47, karl williamson <public@​khwilliamson.com> wrote​:

demerphq wrote​:

On 1 August 2010 21​:19, Father Chrysostomos <sprout@​cpan.org> wrote​:

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each
iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with
the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript
is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset
PL_curpm itself appears to be incorrect. The attached patch fixes it.

Hi, thanks for the patch. Im a little concerned that this might have
deeper consequences than at first meet the eye, could we hold off on
this one until we know more?

How will we find out more?

The problem is that there is a lot of code around that tries to deal
with various logical inconsistancies related to qr// and PL_curpm, and
the code is not obvious or intuitive.

So, for instance I am wondering what happens with this patch and
things like recursive patterns on the LHS, or things like that.

PL_curpm and everything related to it really deserves some deep
analysis and some love.

Could you explain to me what the two PL_curpm assignments in regexec.c are for? One is in S_regtry; the other in restore_pos. When are those two functions called?

(All other assignments to PL_curpm look fine to me and do not conflict with the patch.)

regtry() is a setup wrapper around regmatch(). Essentailly matching
consists of calling regtry at each successive character point in the
string until we get a match, possibly with the start points filtered
out by logic in find_byclass. It handles the book keeping tasks such
as setting up the capture buffer and things like that.

restore_pos() is called prior to exiting p_regexec().

I’ve had a look, and I see that S_regtry assigns PL_curpm to PL_reg_oldcurpm before clobbering the former.

restore_pos restores to PL_curpm the value saved in PL_reg_oldcurpm.

That all happens within the call to pregexec, so the value of PL_curpm should always be the same after pregexec as it was before.

To explain a bit....

Originally regexes were basically stored entirely in the opcode, there
were no recursive patterns and there were no qr objects. The regex
struct included allocation for the capture buffer offsets PL_curpm
was the most recent regex op that succeeded. Since the capture buffer
offsets were effectively stored in the opcode this all made sense.

That would explain the scoping bug.

$u = ",echle etn sJ";
$t = "\nrka rPrhoatu";
$_ = $u.$t;
sub foo { s/(.)//s or return; bar(); print chop $$1 }
sub bar { s/(.)//s or return; foo(); print chop $$1 }
foo

At some point in the evolution of the regex engine we ended up with
both qr// objects (so now regex objects arent so strongly associated
with PMOP's) AND recursive patterns of the form (??{ $qr_object }).
This is where things get hairy, because this means that a regex object
can be PL_curpm in two different scopes but /against different
strings/. Yet each regex object has only one capture buffer associated
with it. Now, back in the day some of this was solved by some gnarly
stuff on the stack, some of it was solved with a hack I did called a
swap buffer (this is/was used to prevent clobbering the results of a
successful match when attempting a new match), and some of it was
solved with the idea of a "lite copy" of regex (see the mother_re
field etc) which effectively provides a new capture buffer with the
results.

I know about the light copies, since I hacked around them to fix bug #70764. :-)

Anyway, the reason why I asked to hold off on this, and I apologize
for taking so long to explain, is that the above is (obviously) a big
mess. The basic design is arguably broken at multiple levels, and I
fear that even slight changes to how it works can have deep and
surprising effects. We need to be very careful here, and we also need
to really consider somebody looking into it all quite carefully.
Currently we have to consider that match results, regex objects, and
PMOP's are all independent objects, yet they started off being one
object and we should consider redoing how we manage them so that it is
simpler and provably correct.

Even if this is cleaned up (I’d like to do it, but not any time soon, as I need more time to digest the source code), I think my patch to pp_substcont would still apply, as pp_substcont is effectively ‘outside’ the regexp engine. In fact, if PL_curpm is replaced with some other global, it will still need to be set in pp_substcont, so in a sense my patch marks the right spot for future reference.

I’ve not been able to come up with a test case that my patch breaks.

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Aug 20, 2010

From @demerphq

On 15 August 2010 22​:24, Father Chrysostomos <sprout@​cpan.org> wrote​:

On Aug 10, 2010, at 12​:57 AM, demerphq wrote​:

On 8 August 2010 21​:24, Father Chrysostomos <sprout@​cpan.org> wrote​:

On Aug 4, 2010, at 7​:27 AM, demerphq wrote​:

On 4 August 2010 15​:47, karl williamson <public@​khwilliamson.com> wrote​:

demerphq wrote​:

On 1 August 2010 21​:19, Father Chrysostomos <sprout@​cpan.org> wrote​:

A slightly simpler case​:

$ perl -le '$_="CCCGGG"; s!.!@​a{print("[$&]"),/./}!g'
[C]
[C]
[C]
[C]
[C]
[C]

What’s happening is that the s/// does not reset PL_curpm for each
iteration, because it doesn’t usually have to.

The RHS’s scoping takes care of it most of the time. This happens with
the /e modifier and with @​{...}.

In this example, though, we have a subscript, not a block. This subscript
is in the same scope as the s/// itself.

The assumption that the substitution operator will never have to reset
PL_curpm itself appears to be incorrect. The attached patch fixes it.

Hi, thanks for the patch. Im a little concerned that this might have
deeper consequences than at first meet the eye, could we hold off on
this one until we know more?

How will we find out more?

The problem is that there is a lot of code around that tries to deal
with various logical inconsistancies related to qr// and PL_curpm, and
the code is not obvious or intuitive.

So, for instance I am wondering what happens with this patch and
things like recursive patterns on the LHS, or things like that.

PL_curpm and everything related to it really deserves some deep
analysis and some love.

Could you explain to me what the two PL_curpm assignments in regexec.c are for? One is in S_regtry; the other in restore_pos. When are those two functions called?

(All other assignments to PL_curpm look fine to me and do not conflict with the patch.)

regtry() is a setup wrapper around regmatch(). Essentailly matching
consists of calling regtry at each successive character point in the
string until we get a match, possibly with the start points filtered
out by logic in find_byclass. It handles the book keeping tasks such
as setting up the capture buffer and things like that.

restore_pos() is called prior to exiting p_regexec().

I’ve had a look, and I see that S_regtry assigns PL_curpm to PL_reg_oldcurpm before clobbering the former.

restore_pos restores to PL_curpm the value saved in PL_reg_oldcurpm.

That all happens within the call to pregexec, so the value of PL_curpm should always be the same after pregexec as it was before.

To explain a bit....

Originally regexes were basically stored entirely in the opcode, there
were no recursive patterns and there were no qr objects. The regex
struct included allocation for the capture buffer offsets  PL_curpm
was the most recent regex op that succeeded. Since the capture buffer
offsets were effectively stored in the opcode this all made sense.

That would explain the scoping bug.

$u = ",echle etn sJ";
$t = "\nrka rPrhoatu";
$_ = $u.$t;
sub foo { s/(.)//s or return; bar(); print chop $$1 }
sub bar { s/(.)//s or return; foo(); print chop $$1 }
foo

At some point in the evolution of the regex engine we ended up with
both qr// objects (so now regex objects arent so strongly associated
with PMOP's) AND recursive patterns of the form (??{ $qr_object }).
This is where things get hairy, because this means that a regex object
can be PL_curpm in two different scopes but /against different
strings/. Yet each regex object has only one capture buffer associated
with it. Now, back in the day some of this was solved by some gnarly
stuff on the stack, some of it was solved with a hack I did called a
swap buffer (this is/was used to prevent clobbering the results of a
successful match when attempting a new match), and some of it was
solved with the idea of a "lite copy" of regex (see the mother_re
field etc) which effectively provides a new capture buffer with the
results.

I know about the light copies, since I hacked around them to fix bug #70764. :-)

Anyway, the reason why I asked to hold off on this, and I apologize
for taking so long to explain, is that the above is (obviously) a big
mess. The basic design is arguably broken at multiple levels, and I
fear that even slight changes to how it works can have deep and
surprising effects. We need to be very careful here, and we also need
to really consider somebody looking into it all quite carefully.
Currently we have to consider that match results, regex objects, and
PMOP's are all independent objects, yet they started off being one
object and we should consider redoing how we manage them so that it is
simpler and provably correct.

Even if this is cleaned up (I’d like to do it, but not any time soon, as I need more time to digest the source code), I think my patch to pp_substcont would still apply, as pp_substcont is effectively ‘outside’ the regexp engine. In fact, if PL_curpm is replaced with some other global, it will still need to be set in pp_substcont, so in a sense my patch marks the right spot for future reference.

I’ve not been able to come up with a test case that my patch breaks.

Well if you have reviewed the source aware of my general concerns and
concluded its safe then im satisfied for sure.

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 21, 2010

From @cpansprout

On Sun Aug 01 12​:19​:41 2010, sprout wrote​:

The attached patch fixes it.

Applied as af9838c.

Loading

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 21, 2010

@cpansprout - Status changed from 'open' to 'resolved'

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant