Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toke.c - improve handling of $00 and ${00} #20000

Merged
merged 1 commit into from
Jul 30, 2022
Merged

toke.c - improve handling of $00 and ${00} #20000

merged 1 commit into from
Jul 30, 2022

Conversation

demerphq
Copy link
Collaborator

@demerphq demerphq commented Jul 27, 2022

In 60267e1 I patched toke.c to refuse
$00 but did not properly handle ${00} and related cases when the code
was unicode. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

A byproduct of this change is that the logic to detect duplicated
leading zeros is now quite a bit simpler.

This includes more tests for leading zero checks.

See Issue #12948, Issue #19986, and Issue #19989.

toke.c Outdated Show resolved Hide resolved
@demerphq
Copy link
Collaborator Author

demerphq commented Jul 27, 2022 via email

toke.c Outdated Show resolved Hide resolved
@demerphq demerphq force-pushed the yves/fix_19989 branch 2 times, most recently from 1c55371 to f4dad88 Compare July 28, 2022 10:25
Copy link

@bram-perl bram-perl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me;
Just one additional suggestion: maybe amend the commit description and mention something about the change in behavior for ${10}, ${11}, ... (i.e. that they now behave as expected under use strict`)

…strict.

Executive summary: in ${ .. } style notation consistently forbid octal
and allow multi-digit longer decimal values under strict. The vars
${1} through ${9} have always been allowed under strict, but ${10} threw
an error unlike its equivalent variable $10.

In 60267e1 I patched toke.c to refuse
octal like $001 but did not properly handle ${001} and related cases when
the code was under 'use utf8'. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.

Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".

This includes more tests for allowed vars and forbidden var names.

See Issue #12948, Issue #19986, and Issue #19989.
@demerphq
Copy link
Collaborator Author

demerphq commented Jul 28, 2022

I wanted to see if my understand of scan_ident was correct, so I made it handle binary and hex eg, ${0x10} and ${0b10000}. Both of these end up returning the same GV that $16 and ${16} do. If that proves controversial then we can drop that patch.

@bram-perl
Copy link

I wanted to see if my understand of scan_ident was correct, so I made it handle binary and hex eg, ${0x10} and ${0b10000}. Both of these end up returning the same GV that $16 and ${16} do. If that proves controversial then we can drop that patch.

I think it would be best to drop it;
Even tho just some small comments:

  • there is a 0o prefix for octal (since perl v5.34.0) (0o10 = 8)
  • in the commit message of the patch you correctly note that $0x10 is not allowed; one side remark on that: $::0x10 (for some reason) is allowed.

@demerphq
Copy link
Collaborator Author

demerphq commented Jul 29, 2022 via email

@demerphq
Copy link
Collaborator Author

K. Removed the controversial patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants