toke.c - improve handling of $00 and ${00} #20000

demerphq · 2022-07-27T11:59:31Z

In 60267e1 I patched toke.c to refuse
$00 but did not properly handle ${00} and related cases when the code
was unicode. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

A byproduct of this change is that the logic to detect duplicated
leading zeros is now quite a bit simpler.

This includes more tests for leading zero checks.

See Issue #12948, Issue #19986, and Issue #19989.

toke.c

demerphq · 2022-07-27T12:14:50Z

On Wed, 27 Jul 2022 at 14:10, Bram ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In toke.c <#20000 (comment)> : > d[1] = '\0'; } } + + /* make sure that we dont have a var starting with two zero digits + * which is not allowed as it introduces confusion with octal and + * decimal. */ + if (*d == '0' && *s == '0') This check doesn't look entirely right.. What it will do is disallow ${00} (and ${000}, ${0000}, ...) but the earlier check (lines 10117-10127) disallows things like: $00, $010, $09324`, ... i.e. the second digit doesn't have to be zero

Doh. Good catch, thanks bram. Fixup coming. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

toke.c

bram-perl

The code looks good to me;
Just one additional suggestion: maybe amend the commit description and mention something about the change in behavior for ${10}, ${11}, ... (i.e. that they now behave as expected under use strict`)

…strict. Executive summary: in ${ .. } style notation consistently forbid octal and allow multi-digit longer decimal values under strict. The vars ${1} through ${9} have always been allowed under strict, but ${10} threw an error unlike its equivalent variable $10. In 60267e1 I patched toke.c to refuse octal like $001 but did not properly handle ${001} and related cases when the code was under 'use utf8'. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for allowed vars and forbidden var names. See Issue #12948, Issue #19986, and Issue #19989.

demerphq · 2022-07-28T14:24:01Z

I wanted to see if my understand of scan_ident was correct, so I made it handle binary and hex eg, ${0x10} and ${0b10000}. Both of these end up returning the same GV that $16 and ${16} do. If that proves controversial then we can drop that patch.

bram-perl · 2022-07-28T18:39:28Z

I wanted to see if my understand of scan_ident was correct, so I made it handle binary and hex eg, ${0x10} and ${0b10000}. Both of these end up returning the same GV that $16 and ${16} do. If that proves controversial then we can drop that patch.

I think it would be best to drop it;
Even tho just some small comments:

there is a 0o prefix for octal (since perl v5.34.0) (0o10 = 8)
in the commit message of the patch you correctly note that $0x10 is not allowed; one side remark on that: $::0x10 (for some reason) is allowed.

demerphq · 2022-07-29T06:56:43Z

On Thu, 28 Jul 2022 at 20:39, Bram ***@***.***> wrote: I wanted to see if my understand of scan_ident was correct, so I made it handle binary and hex eg, ${0x10} and ${0b10000}. Both of these end up returning the same GV that $16 and ${16} do. If that proves controversial then we can drop that patch. I think it would be best to drop it; Even tho just some small comments: - there is a 0o prefix for octal (since perl v5.34.0) (0o10 = 8) Ah, ok, ill add that and push this part in a separate PR for consideration

separately.

- in the commit message of the patch you correctly note that $0x10 is not allowed; one side remark on that: $::0x10 (for some reason) is allowed. It is not so much that $0x10 is not allowed as it is that it will be (and

should be?) parsed as $0 x 10 as $0 is the only var allowed to start with 0. That $::0x10 is not parsed similarly is because (I guess) that in $::0x10 the 0 is not the first character of the identifier, which is actually $main::0x10. Notice it does not resolve to a digit var, it resolves to a var named "0x10", unlike ${0x10} which resolves to the var $16. As far as I know the name after the colons does not have to follow IDFIRST conventions. perl -MO=Concise -e'print $::0x10' 6 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter v ->2 2 <;> nextstate(main 1 -e:1) v:{ ->3 5 <@> print vK ->6 3 <0> pushmark s ->4 - <1> ex-rv2sv sK/1 ->5 4 <$> gvsv(*0x10) s ->5 $ perl -MO=Concise -MDevel::Peek -e'Dump(${0x10})' 6 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter v ->2 2 <;> nextstate(main 44 -e:1) v:{ ->3 5 <2> Devel_Peek_Dump vK/1 ->6 4 <1> rv2sv sK/1 ->5 - <@> scope sK ->4 - <;> ex-nextstate(main 45 -e:1) v ->3 3 <$> const(IV 16) s ->4 -e syntax OK Anyway, I'll tweak the PR. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq · 2022-07-29T12:08:44Z

K. Removed the controversial patch.

This was referenced Jul 27, 2022

On RHS of s///, ${9} works but ${10} does not #12948

Closed

${00} should not be allowed #19986

Closed

${10}, ${11}, ... does not work when 'use utf8' and 'use strict' is in use #19989

Closed

bram-perl reviewed Jul 27, 2022

View reviewed changes

toke.c Outdated Show resolved Hide resolved

demerphq force-pushed the yves/fix_19989 branch from 5d18a0d to 4ee5b7d Compare July 27, 2022 18:00

bram-perl reviewed Jul 27, 2022

View reviewed changes

toke.c Outdated Show resolved Hide resolved

demerphq force-pushed the yves/fix_19989 branch 2 times, most recently from 1c55371 to f4dad88 Compare July 28, 2022 10:25

bram-perl reviewed Jul 28, 2022

View reviewed changes

demerphq force-pushed the yves/fix_19989 branch from f4dad88 to 127d128 Compare July 28, 2022 14:15

demerphq force-pushed the yves/fix_19989 branch from 127d128 to a4bd3fd Compare July 29, 2022 12:08

demerphq mentioned this pull request Jul 29, 2022

Allow ${0x10}, ${0b10000} and ${0o20} to be used as aliases for $16 under use strict. #20014

Closed

demerphq merged commit c432f9f into blead Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

toke.c - improve handling of $00 and ${00} #20000

toke.c - improve handling of $00 and ${00} #20000

demerphq commented Jul 27, 2022 •

edited

Loading

demerphq commented Jul 27, 2022 via email

bram-perl left a comment

demerphq commented Jul 28, 2022 •

edited

Loading

bram-perl commented Jul 28, 2022

demerphq commented Jul 29, 2022 via email

demerphq commented Jul 29, 2022

toke.c - improve handling of $00 and ${00} #20000

toke.c - improve handling of $00 and ${00} #20000

Conversation

demerphq commented Jul 27, 2022 • edited Loading

demerphq commented Jul 27, 2022 via email

bram-perl left a comment

Choose a reason for hiding this comment

demerphq commented Jul 28, 2022 • edited Loading

bram-perl commented Jul 28, 2022

demerphq commented Jul 29, 2022 via email

demerphq commented Jul 29, 2022

demerphq commented Jul 27, 2022 •

edited

Loading

demerphq commented Jul 28, 2022 •

edited

Loading