Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

${00} should not be allowed #19986

Closed
bram-perl opened this issue Jul 23, 2022 · 3 comments
Closed

${00} should not be allowed #19986

bram-perl opened this issue Jul 23, 2022 · 3 comments

Comments

@bram-perl
Copy link

Description

In commit 60267e1 (by @demerphq ) for #12948 code was added to disallow $00, $000, ${000}, ...
The perldiag entry added for it:

    Numeric variables with more than one digit may not start with '0'
            (F) The only numeric variable which is allowed to start with a 0 is $0,
            and you mentioned a variable that starts with 0 that has more than one digit.
            You probably want to remove the leading 0, or if the intent was to express a
            variable name in octal you should convert to decimal.

Using blead (b5df4e0):

    $ ./perl -e '$0;'
    $ ./perl -e '${0};'
    $ ./perl -e '$00;'
    Numeric variables with more than one digit may not start with '0' at -e line 1.
    $ ./perl -e '${00};'
    $ ./perl -e '${000};'
    Numeric variables with more than one digit may not start with '0' at -e line 1.
    $ ./perl -e '$000;'
    Numeric variables with more than one digit may not start with '0' at -e line 1.

Behaviour of $00 and ${00} is inconsistent.
I believe the intent was to also disallow ${00}.

Looking at the code there appears to be an off-by-one error;

In toke.c S_scan_ident there are two similar loops to handle this case:

    10113    if (isDIGIT(*s)) { /* handle $0 and $1 $2 and $10 and etc */
    10114        bool is_zero= *s == '0' ? TRUE : FALSE;
    10115        char *digit_start= d;
    10116        *d++ = *s++;
    10117        while (s < PL_bufend && isDIGIT(*s)) {
    10118            if (d >= e)
    10119                Perl_croak(aTHX_ "%s", ident_too_long);
    10120            *d++ = *s++;
    10121        }
    10122        if (is_zero && d - digit_start > 1)
    10123            Perl_croak(aTHX_ ident_var_zero_multi_digit);
    10124    }
    10125    else {  /* See if it is a "normal" identifier */
    10126        parse_ident(&s, &d, e, 1, is_utf8, FALSE, TRUE);
    10127    }
    10128    *d = '\0';

and

    10173        else {
    10174            *d = *s++;
    10175            /* special case to handle ${10}, ${11} the same way we handle ${1} etc */
    10176            if (isDIGIT(*d)) {
    10177                bool is_zero= *d == '0' ? TRUE : FALSE;
    10178                char *digit_start= d;
    10179                while (s < PL_bufend && isDIGIT(*s)) {
    10180                    d++;
    10181                    if (d >= e)
    10182                        Perl_croak(aTHX_ "%s", ident_too_long);
    10183                    *d= *s++;
    10184                }
    10185                if (is_zero && d - digit_start > 1)
    10186                    Perl_croak(aTHX_ ident_var_zero_multi_digit);
    10187            }
    10188            d[1] = '\0';
    10189        }

While the check in both cases is the same (is_zero && d - digit_start > 1) what is different is the handling of d.
In the first loop it always does d++
In the second loop it only does d++ when *s is a digit

Another way to notice this: to null terminate the string the first loop uses: *d = '\0'; while the second loop uses d[1] = '\0';

TLDR: the second check (L10185) should use >= 1 instead of > 1.

Steps to reproduce

$ ./perl -e '${00}'

Expected behavior

Expected error message:

    Numeric variables with more than one digit may not start with '0'
@demerphq
Copy link
Collaborator

Note that the second case seems to be designed to handle single character vars ONLY. It only moves one character from *s into *d. I guess i noticed that it actually handles longer variables as well and patched the ascii variant to detect this particular error when I should have noticed that it is firing on variable names longer than one character.

demerphq added a commit that referenced this issue Jul 27, 2022
In 60267e1 I patched toke.c to refuse
$00 but did not properly handle ${00} and related cases when the code
was unicode. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

A byproduct of this change is that the logic to detect duplicated
leading zeros is now quite a bit simpler.

This includes more tests for leading zero checks.

See Issue #12948, Issue #19986, and Issue #19989.
@demerphq
Copy link
Collaborator

See #20000

demerphq added a commit that referenced this issue Jul 27, 2022
In 60267e1 I patched toke.c to refuse
$00 but did not properly handle ${00} and related cases when the code
was unicode. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

A byproduct of this change is that the logic to detect duplicated
leading zeros is now quite a bit simpler.

This includes more tests for leading zero checks.

See Issue #12948, Issue #19986, and Issue #19989.
demerphq added a commit that referenced this issue Jul 28, 2022
In 60267e1 I patched toke.c to refuse
$00 but did not properly handle ${00} and related cases when the code
was unicode. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

This includes more tests for leading zero checks.

See Issue #12948, Issue #19986, and Issue #19989.
demerphq added a commit that referenced this issue Jul 28, 2022
In 60267e1 I patched toke.c to refuse
$00 but did not properly handle ${00} and related cases when the code
was unicode. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

This includes more tests for leading zero checks.

See Issue #12948, Issue #19986, and Issue #19989.
demerphq added a commit that referenced this issue Jul 28, 2022
…strict.

Executive summary: in ${ .. } style notation consistently forbid octal
and allow multi-digit longer decimal values under strict. The vars
${1} through ${9} have always been allowed under strict, but ${10} threw
an error unlike its equivalent variable $10.

In 60267e1 I patched toke.c to refuse
octal like $001 but did not properly handle ${001} and related cases when
the code was under 'use utf8'. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

This includes more tests for allowed vars and forbidden var names.

See Issue #12948, Issue #19986, and Issue #19989.
demerphq added a commit that referenced this issue Jul 30, 2022
…strict.

Executive summary: in ${ .. } style notation consistently forbid octal
and allow multi-digit longer decimal values under strict. The vars
${1} through ${9} have always been allowed under strict, but ${10} threw
an error unlike its equivalent variable $10.

In 60267e1 I patched toke.c to refuse
octal like $001 but did not properly handle ${001} and related cases when
the code was under 'use utf8'. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

This includes more tests for allowed vars and forbidden var names.

See Issue #12948, Issue #19986, and Issue #19989.
@bram-perl
Copy link
Author

Change was merged on blead (c432f9f), closing the issue

scottchiefbaker pushed a commit to scottchiefbaker/perl5 that referenced this issue Nov 3, 2022
…strict.

Executive summary: in ${ .. } style notation consistently forbid octal
and allow multi-digit longer decimal values under strict. The vars
${1} through ${9} have always been allowed under strict, but ${10} threw
an error unlike its equivalent variable $10.

In 60267e1 I patched toke.c to refuse
octal like $001 but did not properly handle ${001} and related cases when
the code was under 'use utf8'. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

This includes more tests for allowed vars and forbidden var names.

See Issue Perl#12948, Issue Perl#19986, and Issue Perl#19989.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants