`${00}` should not be allowed #19986

bram-perl · 2022-07-23T15:06:25Z

Description

In commit 60267e1 (by @demerphq ) for #12948 code was added to disallow $00, $000, ${000}, ...
The perldiag entry added for it:

    Numeric variables with more than one digit may not start with '0'
            (F) The only numeric variable which is allowed to start with a 0 is $0,
            and you mentioned a variable that starts with 0 that has more than one digit.
            You probably want to remove the leading 0, or if the intent was to express a
            variable name in octal you should convert to decimal.

Using blead (b5df4e0):

    $ ./perl -e '$0;'
    $ ./perl -e '${0};'
    $ ./perl -e '$00;'
    Numeric variables with more than one digit may not start with '0' at -e line 1.
    $ ./perl -e '${00};'
    $ ./perl -e '${000};'
    Numeric variables with more than one digit may not start with '0' at -e line 1.
    $ ./perl -e '$000;'
    Numeric variables with more than one digit may not start with '0' at -e line 1.

Behaviour of $00 and ${00} is inconsistent.
I believe the intent was to also disallow ${00}.

Looking at the code there appears to be an off-by-one error;

In toke.c S_scan_ident there are two similar loops to handle this case:

    10113    if (isDIGIT(*s)) { /* handle $0 and $1 $2 and $10 and etc */
    10114        bool is_zero= *s == '0' ? TRUE : FALSE;
    10115        char *digit_start= d;
    10116        *d++ = *s++;
    10117        while (s < PL_bufend && isDIGIT(*s)) {
    10118            if (d >= e)
    10119                Perl_croak(aTHX_ "%s", ident_too_long);
    10120            *d++ = *s++;
    10121        }
    10122        if (is_zero && d - digit_start > 1)
    10123            Perl_croak(aTHX_ ident_var_zero_multi_digit);
    10124    }
    10125    else {  /* See if it is a "normal" identifier */
    10126        parse_ident(&s, &d, e, 1, is_utf8, FALSE, TRUE);
    10127    }
    10128    *d = '\0';

and

    10173        else {
    10174            *d = *s++;
    10175            /* special case to handle ${10}, ${11} the same way we handle ${1} etc */
    10176            if (isDIGIT(*d)) {
    10177                bool is_zero= *d == '0' ? TRUE : FALSE;
    10178                char *digit_start= d;
    10179                while (s < PL_bufend && isDIGIT(*s)) {
    10180                    d++;
    10181                    if (d >= e)
    10182                        Perl_croak(aTHX_ "%s", ident_too_long);
    10183                    *d= *s++;
    10184                }
    10185                if (is_zero && d - digit_start > 1)
    10186                    Perl_croak(aTHX_ ident_var_zero_multi_digit);
    10187            }
    10188            d[1] = '\0';
    10189        }

While the check in both cases is the same (is_zero && d - digit_start > 1) what is different is the handling of d.
In the first loop it always does d++
In the second loop it only does d++ when *s is a digit

Another way to notice this: to null terminate the string the first loop uses: *d = '\0'; while the second loop uses d[1] = '\0';

TLDR: the second check (L10185) should use >= 1 instead of > 1.

Steps to reproduce

$ ./perl -e '${00}'

Expected behavior

Expected error message:

    Numeric variables with more than one digit may not start with '0'

The text was updated successfully, but these errors were encountered:

demerphq · 2022-07-27T10:35:14Z

Note that the second case seems to be designed to handle single character vars ONLY. It only moves one character from *s into *d. I guess i noticed that it actually handles longer variables as well and patched the ascii variant to detect this particular error when I should have noticed that it is firing on variable names longer than one character.

In 60267e1 I patched toke.c to refuse $00 but did not properly handle ${00} and related cases when the code was unicode. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". A byproduct of this change is that the logic to detect duplicated leading zeros is now quite a bit simpler. This includes more tests for leading zero checks. See Issue #12948, Issue #19986, and Issue #19989.

demerphq · 2022-07-27T12:02:23Z

See #20000

In 60267e1 I patched toke.c to refuse $00 but did not properly handle ${00} and related cases when the code was unicode. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". A byproduct of this change is that the logic to detect duplicated leading zeros is now quite a bit simpler. This includes more tests for leading zero checks. See Issue #12948, Issue #19986, and Issue #19989.

In 60267e1 I patched toke.c to refuse $00 but did not properly handle ${00} and related cases when the code was unicode. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for leading zero checks. See Issue #12948, Issue #19986, and Issue #19989.

…strict. Executive summary: in ${ .. } style notation consistently forbid octal and allow multi-digit longer decimal values under strict. The vars ${1} through ${9} have always been allowed under strict, but ${10} threw an error unlike its equivalent variable $10. In 60267e1 I patched toke.c to refuse octal like $001 but did not properly handle ${001} and related cases when the code was under 'use utf8'. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for allowed vars and forbidden var names. See Issue #12948, Issue #19986, and Issue #19989.

bram-perl · 2022-08-01T14:01:20Z

Change was merged on blead (c432f9f), closing the issue

…strict. Executive summary: in ${ .. } style notation consistently forbid octal and allow multi-digit longer decimal values under strict. The vars ${1} through ${9} have always been allowed under strict, but ${10} threw an error unlike its equivalent variable $10. In 60267e1 I patched toke.c to refuse octal like $001 but did not properly handle ${001} and related cases when the code was under 'use utf8'. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for allowed vars and forbidden var names. See Issue Perl#12948, Issue Perl#19986, and Issue Perl#19989.

bram-perl added the Needs Triage label Jul 23, 2022

demerphq mentioned this issue Jul 27, 2022

toke.c - improve handling of $00 and ${00} #20000

Merged

jkeenan removed the Needs Triage label Jul 27, 2022

bram-perl closed this as completed Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`${00}` should not be allowed #19986

`${00}` should not be allowed #19986

bram-perl commented Jul 23, 2022

demerphq commented Jul 27, 2022

demerphq commented Jul 27, 2022

bram-perl commented Aug 1, 2022

${00} should not be allowed #19986

${00} should not be allowed #19986

Comments

bram-perl commented Jul 23, 2022

demerphq commented Jul 27, 2022

demerphq commented Jul 27, 2022

bram-perl commented Aug 1, 2022

`${00}` should not be allowed #19986

`${00}` should not be allowed #19986