-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
toke.c - improve handling of $00 and ${00} #20000
Conversation
On Wed, 27 Jul 2022 at 14:10, Bram ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In toke.c <#20000 (comment)>
:
> d[1] = '\0';
}
}
+
+ /* make sure that we dont have a var starting with two zero digits
+ * which is not allowed as it introduces confusion with octal and
+ * decimal. */
+ if (*d == '0' && *s == '0')
This check doesn't look entirely right..
What it will do is disallow ${00} (and ${000}, ${0000}, ...) but the
earlier check (lines 10117-10127) disallows things like: $00, $010, $09324`,
... i.e. the second digit doesn't have to be zero
Doh. Good catch, thanks bram. Fixup coming.
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
1c55371
to
f4dad88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me;
Just one additional suggestion: maybe amend the commit description and mention something about the change in behavior for ${10}
, ${11}, ... (i.e. that they now behave as expected under
use strict`)
…strict. Executive summary: in ${ .. } style notation consistently forbid octal and allow multi-digit longer decimal values under strict. The vars ${1} through ${9} have always been allowed under strict, but ${10} threw an error unlike its equivalent variable $10. In 60267e1 I patched toke.c to refuse octal like $001 but did not properly handle ${001} and related cases when the code was under 'use utf8'. Part of the reason was the confusing macro VALID_LEN_ONE_IDENT() which despite its name does not restrict what it matches to things which are one character long. Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its name and placement is confusing I have moved it back into the code inline as part of this fix. I have also added more comments about what is going on, and moved the related comment directly next to the code that it affects. If it moved out of this code then we should think of a better name and be more careful and clear about checking things like length. I would argue the logic is used to parse what might be called a variable "description", and thus it is not identical to code which might validate an actual parsed variable name. Eg, ${^Var} is a description of the variable whose "name" is "\026ar". The exception of course is $^ whose name actually is "^". This includes more tests for allowed vars and forbidden var names. See Issue #12948, Issue #19986, and Issue #19989.
I wanted to see if my understand of scan_ident was correct, so I made it handle binary and hex eg, |
I think it would be best to drop it;
|
On Thu, 28 Jul 2022 at 20:39, Bram ***@***.***> wrote:
I wanted to see if my understand of scan_ident was correct, so I made it
handle binary and hex eg, ${0x10} and ${0b10000}. Both of these end up
returning the same GV that $16 and ${16} do. If that proves controversial
then we can drop that patch.
I think it would be best to drop it;
Even tho just some small comments:
- there is a 0o prefix for octal (since perl v5.34.0) (0o10 = 8)
Ah, ok, ill add that and push this part in a separate PR for consideration
separately.
- in the commit message of the patch you correctly note that $0x10 is
not allowed; one side remark on that: $::0x10 (for some reason) is
allowed.
It is not so much that $0x10 is not allowed as it is that it will be (and
should be?) parsed as $0 x 10 as $0 is the only var allowed to start with
0. That $::0x10 is not parsed similarly is because (I guess) that in
$::0x10 the 0 is not the first character of the identifier, which is
actually $main::0x10. Notice it does not resolve to a digit var, it
resolves to a var named "0x10", unlike ${0x10} which resolves to the var
$16. As far as I know the name after the colons does not have to follow
IDFIRST conventions.
perl -MO=Concise -e'print $::0x10'
6 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
5 <@> print vK ->6
3 <0> pushmark s ->4
- <1> ex-rv2sv sK/1 ->5
4 <$> gvsv(*0x10) s ->5
$ perl -MO=Concise -MDevel::Peek -e'Dump(${0x10})'
6 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 44 -e:1) v:{ ->3
5 <2> Devel_Peek_Dump vK/1 ->6
4 <1> rv2sv sK/1 ->5
- <@> scope sK ->4
- <;> ex-nextstate(main 45 -e:1) v ->3
3 <$> const(IV 16) s ->4
-e syntax OK
Anyway, I'll tweak the PR.
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
K. Removed the controversial patch. |
In 60267e1 I patched toke.c to refuse
$00
but did not properly handle${00}
and related cases when the codewas unicode. Part of the reason was the confusing macro
VALID_LEN_ONE_IDENT() which despite its name does not restrict what it
matches to things which are one character long.
Since the VALID_LEN_ONE_IDENT() macro is used in only one place and its
name and placement is confusing I have moved it back into the code
inline as part of this fix. I have also added more comments about what
is going on, and moved the related comment directly next to the code
that it affects. If it moved out of this code then we should think of a
better name and be more careful and clear about checking things like
length. I would argue the logic is used to parse what might be called a
variable "description", and thus it is not identical to code which might
validate an actual parsed variable name. Eg, ${^Var} is a description of
the variable whose "name" is "\026ar". The exception of course is $^
whose name actually is "^".
A byproduct of this change is that the logic to detect duplicated
leading zeros is now quite a bit simpler.
This includes more tests for leading zero checks.
See Issue #12948, Issue #19986, and Issue #19989.