Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: regexp flag to not set $1, $2 etc. #9451

Closed
p5pRT opened this issue Aug 18, 2008 · 12 comments
Closed

Feature request: regexp flag to not set $1, $2 etc. #9451

p5pRT opened this issue Aug 18, 2008 · 12 comments

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Aug 18, 2008

Migrated from rt.perl.org#58072 (status was 'rejected')

Searchable as RT58072$

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Aug 18, 2008

From @epa

Created by @epa

When a regexp has capturing parentheses it sets match variables from
$1 upwards. This can sometimes cause trouble, for example

<http​://rt.cpan.org/Ticket/Display.html?id=36956>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=23140>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=22369>

In most situations you can avoid the problem by following two rules​:
do not pass $1 as an argument to subroutines, and any subroutine
should unpack @​_ first before doing any regexp operations. But this
is not always possible, as in the first bug report above for NEXT.

It would simplify these situations to have a flag meaning do not
change the magical match variables. Perhaps it could be called /l but
the exact letter is not important at all. Then

  my $str = 'hello';
  my ($x, $y) = ($str =~ /(\w)(\w)/l);

would set $x to 'h' and $y to 'e' as now, but would not touch the old
values of $1 and $2.

In module code where you have to be cautious about trampling on other
people's $1 and $2, the /l flag would let you use regular expressions
without a lot of shenanigans.

Perl Info

Flags:
    category=core
    severity=wishlist

This perlbug was built using Perl 5.10.0 in the Fedora build system.
It is being executed now by Perl 5.10.0 - Mon Jul 21 06:53:59 EDT 2008.

Site configuration information for perl 5.10.0:

Configured by Red Hat, Inc. at Mon Jul 21 06:53:59 EDT 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.18-92.1.6.el5, archname=i386-linux-thread-multi
    uname='linux x86-7 2.6.18-92.1.6.el5 #1 smp fri jun 20 02:36:06 edt 2008 i686 i686 i386 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -Dversion=5.10.0 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Dprivlib=/usr/lib/perl5/5.10.0 -Dsitelib=/usr/local/lib/perl5/site_perl/5.10.0 -Dvendorlib=/usr/lib/perl5/vendor_perl/5.10.0 -Darchlib=/usr/lib/perl5/5.10.0/i386-linux-thread-multi -Dsitearch=/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi -Dvendorarch=/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi -Darchname=i386-linux-thread-multi -Dotherlibdirs=/usr/lib/perl5/site_perl/5.10.0 -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_prot -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='4.3.0 20080428 (Red Hat 4.3.0-8)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.8'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    /home/eda/lib/perl5/5.10.0/i386-linux-thread-multi
    /home/eda/lib/perl5/5.10.0
    /home/eda/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /home/eda/lib/perl5/site_perl/5.10.0
    /usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/site_perl/5.10.0/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    .


Environment for perl 5.10.0:
    HOME=/home/eda
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_GB.UTF-8
    LC_MESSAGES=en_GB.UTF-8
    LC_MONETARY=en_GB.UTF-8
    LC_NUMERIC=en_GB.UTF-8
    LC_TIME=en_GB.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/eda/bin:/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin
    PERL5LIB=/home/eda/lib/perl5/5.10.0:/home/eda/lib/perl5/site_perl/5.10.0:/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi:/usr/local/lib/perl5/site_perl/5.10.0:/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi:/usr/lib/perl5/vendor_perl/5.10.0
    PERL_BADLANG (unset)
    SHELL=/bin/bash

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Aug 19, 2008

From @Abigail

On Mon, Aug 18, 2008 at 09​:00​:06AM -0700, Ed Avis wrote​:

# New Ticket Created by "Ed Avis"
# Please include the string​: [perl #58072]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=58072 >

This is a bug report for perl from eda@​waniasset.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

When a regexp has capturing parentheses it sets match variables from
$1 upwards. This can sometimes cause trouble, for example

<http​://rt.cpan.org/Ticket/Display.html?id=36956>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=23140>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=22369>

In most situations you can avoid the problem by following two rules​:
do not pass $1 as an argument to subroutines, and any subroutine
should unpack @​_ first before doing any regexp operations. But this
is not always possible, as in the first bug report above for NEXT.

It would simplify these situations to have a flag meaning do not
change the magical match variables. Perhaps it could be called /l but
the exact letter is not important at all. Then

my $str = 'hello';
my \($x\, $y\) = \($str =~ /\(\\w\)\(\\w\)/l\);

would set $x to 'h' and $y to 'e' as now, but would not touch the old
values of $1 and $2.

In module code where you have to be cautious about trampling on other
people's $1 and $2, the /l flag would let you use regular expressions
without a lot of shenanigans.

You can already do this easily currently​:

  "foo" =~ /(\w)(\w)/;
  say "$1 $2";
  my $str = 'hello';
  my ($x, $y) = do {$str =~ /(\w)(\w)/};
  say "$x $y $1 $2";
  __END__
  f o
  h e f o

Abigail

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Aug 19, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 23, 2008

From norbi@nix.hu

On Tue Aug 19 00​:34​:49 2008, abigail@​abigail.be wrote​:

On Mon, Aug 18, 2008 at 09​:00​:06AM -0700, Ed Avis wrote​:

# New Ticket Created by "Ed Avis"
# Please include the string​: [perl #58072]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=58072 >

This is a bug report for perl from eda@​waniasset.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

When a regexp has capturing parentheses it sets match variables from
$1 upwards. This can sometimes cause trouble, for example

<http​://rt.cpan.org/Ticket/Display.html?id=36956>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=23140>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=22369>

In most situations you can avoid the problem by following two rules​:
do not pass $1 as an argument to subroutines, and any subroutine
should unpack @​_ first before doing any regexp operations. But this
is not always possible, as in the first bug report above for NEXT.

It would simplify these situations to have a flag meaning do not
change the magical match variables. Perhaps it could be called /l but
the exact letter is not important at all. Then

my $str = 'hello';
my \($x\, $y\) = \($str =~ /\(\\w\)\(\\w\)/l\);

would set $x to 'h' and $y to 'e' as now, but would not touch the old
values of $1 and $2.

In module code where you have to be cautious about trampling on other
people's $1 and $2, the /l flag would let you use regular expressions
without a lot of shenanigans.

You can already do this easily currently​:

"foo" =~ /\(\\w\)\(\\w\)/;
say "$1 $2";
my $str = 'hello';
my \($x\, $y\) = do \{$str =~ /\(\\w\)\(\\w\)/\};
say "$x $y $1 $2";
\_\_END\_\_
f o
h e f o

Abigail

Yeah, that's exactly the method I chose for fixing NEXT
(http​://rt.cpan.org/Ticket/Display.html?id=36956). Unfortunately nobody
had the time to review and apply my patch. :-(

1 similar comment
@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 23, 2008

From norbi@nix.hu

On Tue Aug 19 00​:34​:49 2008, abigail@​abigail.be wrote​:

On Mon, Aug 18, 2008 at 09​:00​:06AM -0700, Ed Avis wrote​:

# New Ticket Created by "Ed Avis"
# Please include the string​: [perl #58072]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=58072 >

This is a bug report for perl from eda@​waniasset.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

When a regexp has capturing parentheses it sets match variables from
$1 upwards. This can sometimes cause trouble, for example

<http​://rt.cpan.org/Ticket/Display.html?id=36956>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=23140>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=22369>

In most situations you can avoid the problem by following two rules​:
do not pass $1 as an argument to subroutines, and any subroutine
should unpack @​_ first before doing any regexp operations. But this
is not always possible, as in the first bug report above for NEXT.

It would simplify these situations to have a flag meaning do not
change the magical match variables. Perhaps it could be called /l but
the exact letter is not important at all. Then

my $str = 'hello';
my \($x\, $y\) = \($str =~ /\(\\w\)\(\\w\)/l\);

would set $x to 'h' and $y to 'e' as now, but would not touch the old
values of $1 and $2.

In module code where you have to be cautious about trampling on other
people's $1 and $2, the /l flag would let you use regular expressions
without a lot of shenanigans.

You can already do this easily currently​:

"foo" =~ /\(\\w\)\(\\w\)/;
say "$1 $2";
my $str = 'hello';
my \($x\, $y\) = do \{$str =~ /\(\\w\)\(\\w\)/\};
say "$x $y $1 $2";
\_\_END\_\_
f o
h e f o

Abigail

Yeah, that's exactly the method I chose for fixing NEXT
(http​://rt.cpan.org/Ticket/Display.html?id=36956). Unfortunately nobody
had the time to review and apply my patch. :-(

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 11, 2012

From @cpansprout

On Tue Aug 19 00​:34​:49 2008, abigail@​abigail.be wrote​:

On Mon, Aug 18, 2008 at 09​:00​:06AM -0700, Ed Avis wrote​:

# New Ticket Created by "Ed Avis"
# Please include the string​: [perl #58072]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=58072 >

This is a bug report for perl from eda@​waniasset.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

When a regexp has capturing parentheses it sets match variables from
$1 upwards. This can sometimes cause trouble, for example

<http​://rt.cpan.org/Ticket/Display.html?id=36956>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=23140>
<http​://rt.perl.org/rt3/Public/Bug/Display.html?id=22369>

In most situations you can avoid the problem by following two rules​:
do not pass $1 as an argument to subroutines, and any subroutine
should unpack @​_ first before doing any regexp operations. But this
is not always possible, as in the first bug report above for NEXT.

It would simplify these situations to have a flag meaning do not
change the magical match variables. Perhaps it could be called /l but
the exact letter is not important at all. Then

my $str = 'hello';
my \($x\, $y\) = \($str =~ /\(\\w\)\(\\w\)/l\);

would set $x to 'h' and $y to 'e' as now, but would not touch the old
values of $1 and $2.

In module code where you have to be cautious about trampling on other
people's $1 and $2, the /l flag would let you use regular expressions
without a lot of shenanigans.

You can already do this easily currently​:

"foo" =~ /\(\\w\)\(\\w\)/;
say "$1 $2";
my $str = 'hello';
my \($x\, $y\) = do \{$str =~ /\(\\w\)\(\\w\)/\};
say "$x $y $1 $2";
\_\_END\_\_
f o
h e f o

We should optimise do {$str =~ /(...)/} not to bother with the pre-match
copy.

Likewise, do {$str =~ s/(...)//} should consider in-place substitution,
instead of skipping it because of the ().

--

Father Chrysostomos

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 9, 2017

From @epa

I think this bug can be closed because of the suggested alternative of

  do { $str =~ /regexp/ };

@p5pRT p5pRT closed this Jun 10, 2017
@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 10, 2017

@iabyn - Status changed from 'open' to 'rejected'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 11, 2017

From @xsawyerx

On Fri, 09 Jun 2017 08​:31​:11 -0700, ed wrote​:

I think this bug can be closed because of the suggested alternative of

 do \{ $str =~ /regexp/ \};

There is also /n​:

  n Prevent the grouping metacharacters "()" from capturing. This
  modifier, new in 5.22, will stop $1, $2, etc... from being filled in.

  "hello" =~ /(hi|hello)/; # $1 is "hello"
  "hello" =~ /(hi|hello)/n; # $1 is undef

  This is equivalent to putting "?​:" at the beginning of every capturing
  group​:

  "hello" =~ /(?​:hi|hello)/; # $1 is undef

  "/n" can be negated on a per-group basis. Alternatively, named
  captures may still be used.

  "hello" =~ /(?-n​:(hi|hello))/n; # $1 is "hello"
  "hello" =~ /(?<greet>hi|hello)/n; # $1 is "hello", $+{greet} is
  # "hello"

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 11, 2017

From @epa

There is some discussion in bug 127617 about why the /n modifier is not always usable.
It stops you using backreferences in your regular expression, even if you only wanted to use the backreference internally and not have it set $1 etc externally. It also stops you being able to use recursive subpatterns (the subject of that other bug).

However, the technique

  do { /(a)\1/ }

  do { /(a)(?1)/ }

stops $1 being set externally, but doesn't break the backreference \1 or the recursive subpattern (?1).
So in general it is a better option than /n.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 11, 2017

From @khwilliamson

On Sun, 11 Jun 2017 03​:24​:19 -0700, ed wrote​:

There is some discussion in bug 127617 about why the /n modifier is
not always usable.
It stops you using backreferences in your regular expression, even if
you only wanted to use the backreference internally and not have it
set $1 etc externally. It also stops you being able to use recursive
subpatterns (the subject of that other bug).

However, the technique

do { /(a)\1/ }

do { /(a)(?1)/ }

stops $1 being set externally, but doesn't break the backreference \1
or the recursive subpattern (?1).
So in general it is a better option than /n.

Would you care to submit a patch documenting this technique?
And doesn't using this also solve your request in 127617?
--
Karl Williamson

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 11, 2017

From @cpansprout

On Sun, 11 Jun 2017 07​:03​:11 -0700, khw wrote​:

On Sun, 11 Jun 2017 03​:24​:19 -0700, ed wrote​:

There is some discussion in bug 127617 about why the /n modifier is
not always usable.
It stops you using backreferences in your regular expression, even if
you only wanted to use the backreference internally and not have it
set $1 etc externally. It also stops you being able to use recursive
subpatterns (the subject of that other bug).

However, the technique

do { /(a)\1/ }

do { /(a)(?1)/ }

stops $1 being set externally, but doesn't break the backreference \1
or the recursive subpattern (?1).
So in general it is a better option than /n.

Would you care to submit a patch documenting this technique?

I might suggest ‘more versatile, but slower’ rather than ‘better’. Whether it is better depends on your use case.

--

Father Chrysostomos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.