On RHS of s///, ${9} works but ${10} does not #12948

p5pRT · 2013-05-07T14:14:03Z

Migrated from rt.perl.org#117907 (status was 'open')

Searchable as RT117907$

p5pRT · 2013-05-07T14:14:03Z

From @epa

Created by @epa

I found this surprising:

#!/usr/bin/perl
use 5.016;
use warnings;
use strict;
$_ = 'a' x 10;
s/(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)/${9}/ or die;
say $_;
$_ = 'a' x 10;
s/(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)/${10}/ or die;
say $_;

Why is ${9} accepted on the RHS of a substitution but ${10} not?
They should both be or neither be.

The same applies in ordinary code:

say ${9}; # ok
say ${10}; # fails

Perl Info


Flags:
    category=core
    severity=low

Site configuration information for perl 5.16.3:

Configured by Red Hat, Inc. at Thu Apr 11 09:48:29 UTC 2013.

Summary of my perl5 (revision 5 version 16 subversion 3) configuration:
   
  Platform:
    osname=linux, osvers=2.6.32-358.2.1.el6.x86_64, archname=x86_64-linux-thread-multi
    uname='linux buildvm-08.phx2.fedoraproject.org 2.6.32-358.2.1.el6.x86_64 #1 smp wed feb 20 12:17:37 est 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4  -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4  -m64 -mtune=generic -Wl,-z,relro  -DDEBUGGING=-g -Dversion=5.16.3 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.7.2 20121109 (Red Hat 4.7.2-8)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.16'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-rpath,/usr/lib64/perl5/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wl,-z,relro '

Locally applied patches:
    


@INC for perl 5.16.3:
    /home/eda/lib/perl5/
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .


Environment for perl 5.16.3:
    HOME=/home/tradingsystems
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_GB.UTF-8
    LC_MESSAGES=en_GB.UTF-8
    LC_MONETARY=en_GB.UTF-8
    LC_NUMERIC=en_GB.UTF-8
    LC_TIME=en_GB.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
    PERL5LIB=/home/eda/lib/perl5/
    PERL_BADLANG (unset)
    SHELL=/bin/bash

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

p5pRT · 2013-05-07T14:36:10Z

From @iabyn

On Tue, May 07, 2013 at 07:14:03AM -0700, Ed Avis wrote:

I found this surprising:
\#\!/usr/bin/perl
use 5\.016;
use warnings;
use strict;
$\_ = 'a' x 10;
s/$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$/$\{9\}/ or die;
say $\_;
$\_ = 'a' x 10;
s/$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$/$\{10\}/ or die;
say $\_;
Why is ${9} accepted on the RHS of a substitution but ${10} not?
They should both be or neither be.

The same applies in ordinary code:
say $\{9\};    \# ok
say $\{10\};   \# fails

It's just the braced variant of the variable name (${10} verses $10, and
has been that way a long time:

$ perl589 -e'use strict; $9; ${9}; $10; ${10}'
Can't use string ("10") as a SCALAR ref while "strict refs" in use at -e line 1.

I don't know if there is any rationale behind this, but a first glance it
indeed seems like a bug.

--
O Unicef Clearasil!
Gibberish and Drivel!
-- "Bored of the Rings"

p5pRT · 2013-05-07T14:36:11Z

The RT System itself - Status changed from 'new' to 'open'

p5pRT · 2013-05-08T05:48:23Z

From @ikegami

On Tue, May 7, 2013 at 10:35 AM, Dave Mitchell <davem@iabyn.com> wrote:

$ perl589 -e'use strict; $9; ${9}; $10; ${10}'
Can't use string ("10") as a SCALAR ref while "strict refs" in use at -e
line 1.

I don't know if there is any rationale behind this, but a first glance it
indeed seems like a bug.

No other type of variable seems to generate a strict error.

perl -E"use strict; my $xyz; ${xyz} = 1; say 'ok'"
ok

perl -E"use strict; my $xyz; ${^I} = 1; say 'ok'"
ok

p5pRT · 2013-05-08T08:33:10Z

From @nwc10

On Wed, May 08, 2013 at 01:47:48AM -0400, Eric Brine wrote:

On Tue, May 7, 2013 at 10:35 AM, Dave Mitchell <davem@iabyn.com> wrote:

$ perl589 -e'use strict; $9; ${9}; $10; ${10}'
Can't use string ("10") as a SCALAR ref while "strict refs" in use at -e
line 1.

I don't know if there is any rationale behind this, but a first glance it
indeed seems like a bug.

No other type of variable seems to generate a strict error.

perl -E"use strict; my $xyz; ${xyz} = 1; say 'ok'"
ok

perl -E"use strict; my $xyz; ${^I} = 1; say 'ok'"
ok

Knowing how the implementation has to handle the mutli-character numeric
variables specially (well, differently specially) this doesn't actually
surprise me. (Regular identifiers start with a number. Punctuation variables
are one character. Control character variables start with a control
character. The sequence "10" is none of these.)

Still, I agree with Dave that it feels like a bug. It suggests that the
differently-special code needs to be added in yet another place.

Nicholas Clark

p5pRT · 2013-05-08T13:11:31Z

From @epa

Another interesting wrinkle to this bug is that $010 and ${010} do different
things. As with hash lookups, there are two things you can put inside the
curlies - a literal string or an expression - and Perl has to magically
decide which one you meant. However, since there is no builtin variable $010
I think this may not matter much. I would suggest that variable names
which begin with $0 but are not $0 should be prohibited, so that

$010 = 'a';

would be a syntax error.

--
Ed Avis <eda@waniasset.com>

p5pRT · 2013-05-08T13:20:01Z

From @nwc10

On Wed, May 08, 2013 at 01:10:37PM +0000, Ed Avis wrote:

Another interesting wrinkle to this bug is that $010 and ${010} do different
things. As with hash lookups, there are two things you can put inside the
curlies - a literal string or an expression - and Perl has to magically
decide which one you meant. However, since there is no builtin variable $010
I think this may not matter much. I would suggest that variable names
which begin with $0 but are not $0 should be prohibited, so that
$010 = 'a';
would be a syntax error.

Yes, particularly as they don't seem to offer anything other than
possibilities for obfuscation:

$ perl -le 'use strict; $00 = 2; $000 = 3; print foreach $0, $00, $000'
-e
2
3
$ perl -le 'use strict; "P" =~ /(.)/; $01 = 2; $001 = 3; print foreach $1, $01, $001'
P
2
3

And, they aren't even octal :-)

Nicholas Clark

p5pRT · 2013-07-06T21:38:28Z

From @cpansprout

On Wed May 08 06:20:01 2013, nicholas wrote:

On Wed, May 08, 2013 at 01:10:37PM +0000, Ed Avis wrote:
Another interesting wrinkle to this bug is that $010 and ${010} do
different
things. As with hash lookups, there are two things you can put
inside the
curlies - a literal string or an expression - and Perl has to
magically
decide which one you meant. However, since there is no builtin
variable $010
I think this may not matter much. I would suggest that variable
names
which begin with $0 but are not $0 should be prohibited, so that
$010 = 'a';
would be a syntax error.
Yes, particularly as they don't seem to offer anything other than
possibilities for obfuscation:

It has long been documented that variable can start with a digit, in
which case all the characters must be digits. I would suggest we
deprecate use of octal in ${001} (which should affect nothing), make
${123} auto-quote, just like any other simple variable name (as opposed
to expression). And then later we can make ${001} follow the same rules.

--

Father Chrysostomos

p5pRT · 2019-09-20T15:50:56Z

From @epa

FTR, the behaviour is still the same with 5.28.2.

khwilliamson · 2019-12-07T15:51:03Z

I'm willing to work on this issue, but I don't understand enough about how things are tokenized, etc to efficiently get started. Can someone give me some tips?

Karl asked me for some help investigating #12948, this is what I came up with. I have not even run "make test" so likely this is broken in important ways, but it should be enough of a thread for Karl to pull on that hopefull the whole shirt comes apart. So to speak. :-) What it does do is make ${10} parse the same way as $10

${10} and $10 were handled differently, this patch makes them be handled the same. It also forbids multi-digit numeric variables from starting with 0. Thus $00 is now a new fatal exception "Numeric variables with more than one digit may not start with '0'"

khwilliamson · 2020-03-09T03:48:48Z

This was fixed by the commit above

bram-perl · 2022-07-23T13:02:57Z

Looking at this bug report (because it recently came up on p5p) it wasn't entirely clear to me what the issues were.

What one should be aware of: when using ${...} then the '...' is/was parsed as number if it started with '0'.
Which means:

${0123}: treated as octal
${0x123}: ' treated as hex
${0b1001}: treated as binary

An example (using an older perl):

    #!/usr/bin/perl -l

    use strict;
    no strict "refs";

    "abcdefghijklmnopqrstuvwxyz" =~ m/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/ or die;
    print $10;
    print ${10};
    print ${012};
    print ${0xa};
    print ${0b1010};

   __END__

   Output:
            j
            j
            j
            j
            j

Running the code with use strict "refs" fails with the error:

    Can't use string ("10") as a SCALAR ref while "strict refs" in use

It appears (I did not verify in code) to turn ${012} into ${"10"} which is not allowed when strict refs are used.
This is similar to:

    #!/usr/bin/perl

    use strict;
    use vars qw# $foo #;

    $foo = "bar";
    print ${"foo"};

    __END__
    Output:
            Can't use string ("foo") as a SCALAR ref while "strict refs" in use

What commit 60267e1 does:

it changes how an 'all digits string' inside ${} is parsed
it forbids numbers with a leading '0'

Forbidding octal (leading numbers with '0') in ${} is somewhat required because it could otherwise - silently - change the behaviour.

Before that commit: ${012} appears to mean ${"10"} (failure under use strict "refs")
After that commit¹: ${012} appears to mean $::012 (no failure under use strict "refs")

Note: even today you can still use hex/binary inside ${} when not using use strict refs; [which can cause some confusing behaviour]
Using blead (b5df4e0):

    #!/usr/bin/perl -l

    use strict;
    no strict "refs";

    "abcdefghijklmnopqrstuvwxyz" =~ m/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/ or die;
    print $10;
    print ${10};
    print ${0xa};
    print ${0b1010};
    __END__
    Output:
            j
            j
            j
            j

and patching the code to not throw the error ↩

In 60267e1 I patched toke.c to refuse $00 but did not properly handle ${00} and related cases when the code was unicode. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". A byproduct of this change is that the logic to detect duplicated leading zeros is now quite a bit simpler. This includes more tests for leading zero checks. See Issue #12948, Issue #19986, and Issue #19989.

demerphq · 2022-07-27T12:02:17Z

See #20000

In 60267e1 I patched toke.c to refuse $00 but did not properly handle ${00} and related cases when the code was unicode. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". A byproduct of this change is that the logic to detect duplicated leading zeros is now quite a bit simpler. This includes more tests for leading zero checks. See Issue #12948, Issue #19986, and Issue #19989.

In 60267e1 I patched toke.c to refuse $00 but did not properly handle ${00} and related cases when the code was unicode. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for leading zero checks. See Issue #12948, Issue #19986, and Issue #19989.

demerphq · 2022-07-28T10:15:29Z

@bram-perl I added a bunch of tests to validate things. One thing you missed in your analysis was that my patch (somewhat accidentally) changes S_scan_ident so it parses an entire var more often than it used to, which is what is responsible for the change from a run time var for ${10} to a compile time var for ${10}. My latest version of the patch in #20000 now does it consistently for both unicode and not. If we wished we could apply the same logic for hex and binary identifiers. IMO as they are prefixed with a 0b or 0x there is no ambiguity.

In 60267e1 I patched toke.c to refuse $00 but did not properly handle ${00} and related cases when the code was unicode. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for leading zero checks. See Issue #12948, Issue #19986, and Issue #19989.

bram-perl · 2022-07-28T12:24:56Z

I did not miss it but maybe I didn't spell it out clear enough;

As for the hex/binary identifiers: that's a separate discussion so might be best to leave it out of the discussion of this issue.

demerphq · 2022-07-28T13:29:27Z

On Thu, 28 Jul 2022 at 14:25, Bram ***@***.***> wrote: I did not miss it but maybe I didn't spell it out clear enough; As for the hex/binary identifiers: that's a separate discussion so might be best to leave it out of the discussion of this issue.

Heh. Too late. Running make test right now. ;-p Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

…strict. Executive summary: in ${ .. } style notation consistently forbid octal and allow multi-digit longer decimal values under strict. The vars ${1} through ${9} have always been allowed under strict, but ${10} threw an error unlike its equivalent variable $10. In 60267e1 I patched toke.c to refuse octal like $001 but did not properly handle ${001} and related cases when the code was under 'use utf8'. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for allowed vars and forbidden var names. See Issue #12948, Issue #19986, and Issue #19989.

…strict. Executive summary: in ${ .. } style notation consistently forbid octal and allow multi-digit longer decimal values under strict. The vars ${1} through ${9} have always been allowed under strict, but ${10} threw an error unlike its equivalent variable $10. In 60267e1 I patched toke.c to refuse octal like $001 but did not properly handle ${001} and related cases when the code was under 'use utf8'. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for allowed vars and forbidden var names. See Issue Perl#12948, Issue Perl#19986, and Issue Perl#19989.

p5pRT added Severity Low distro-Linux type-core labels Oct 19, 2019

toddr removed the khw label Oct 25, 2019

khwilliamson closed this as completed Mar 9, 2020

This was referenced Jul 23, 2022

${00} should not be allowed #19986

Closed

${10}, ${11}, ... does not work when 'use utf8' and 'use strict' is in use #19989

Closed

demerphq mentioned this issue Jul 27, 2022

toke.c - improve handling of $00 and ${00} #20000

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On RHS of s///, ${9} works but ${10} does not #12948

On RHS of s///, ${9} works but ${10} does not #12948

p5pRT commented May 7, 2013

p5pRT commented May 7, 2013

p5pRT commented May 7, 2013

p5pRT commented May 7, 2013

p5pRT commented May 8, 2013

p5pRT commented May 8, 2013

p5pRT commented May 8, 2013

p5pRT commented May 8, 2013

p5pRT commented Jul 6, 2013

p5pRT commented Sep 20, 2019

khwilliamson commented Dec 7, 2019

khwilliamson commented Mar 9, 2020

bram-perl commented Jul 23, 2022

demerphq commented Jul 27, 2022

demerphq commented Jul 28, 2022 •

edited

bram-perl commented Jul 28, 2022

demerphq commented Jul 28, 2022 via email

On RHS of s///, ${9} works but ${10} does not #12948

On RHS of s///, ${9} works but ${10} does not #12948

Comments

p5pRT commented May 7, 2013

p5pRT commented May 7, 2013

From @epa

Created by @epa

p5pRT commented May 7, 2013

From @iabyn

p5pRT commented May 7, 2013

p5pRT commented May 8, 2013

From @ikegami

p5pRT commented May 8, 2013

From @nwc10

p5pRT commented May 8, 2013

From @epa

p5pRT commented May 8, 2013

From @nwc10

p5pRT commented Jul 6, 2013

From @cpansprout

p5pRT commented Sep 20, 2019

From @epa

khwilliamson commented Dec 7, 2019

khwilliamson commented Mar 9, 2020

bram-perl commented Jul 23, 2022

Footnotes

demerphq commented Jul 27, 2022

demerphq commented Jul 28, 2022 • edited

bram-perl commented Jul 28, 2022

demerphq commented Jul 28, 2022 via email

demerphq commented Jul 28, 2022 •

edited