Skip to content

Perl_check_hash_fields_and_hekify - hekifying NVs is not locale-safe#24252

Merged
richardleach merged 2 commits intoPerl:bleadfrom
richardleach:chfandhekify_locale_NVs
Mar 11, 2026
Merged

Perl_check_hash_fields_and_hekify - hekifying NVs is not locale-safe#24252
richardleach merged 2 commits intoPerl:bleadfrom
richardleach:chfandhekify_locale_NVs

Conversation

@richardleach
Copy link
Copy Markdown
Contributor

Perl_check_hash_fields_and_hekify used to convert IV, NV, and PV OP_CONST SV values into shared HEKs at compile time. However, the decimal separator character is locale dependent, so it is unsafe to perform a one-off conversion of an NV to a hash key using the compile time locale.

This commit adds a check to ensure that NVs are no longer hekified.


  • This set of changes does not require a perldelta entry.

@richardleach
Copy link
Copy Markdown
Contributor Author

Note: I'll try to add a test.

@richardleach richardleach marked this pull request as draft March 3, 2026 17:44
@richardleach
Copy link
Copy Markdown
Contributor Author

Hmm, exists($h{1.2}) still seems to get hekified. Converted this PR to a draft.

@richardleach
Copy link
Copy Markdown
Contributor Author

No, it does actually seem fine.

@richardleach richardleach marked this pull request as ready for review March 3, 2026 17:52
@khwilliamson
Copy link
Copy Markdown
Contributor

I'm trying to understand the motivation behind this. No matter what the locale, a dot is supposed to be accepted as the decimal radix character., in addition to whatever the locale's official character is. We shouldn't be outputting a dot but we should always accept them. Is there a real life example where this didn't happen?

@richardleach
Copy link
Copy Markdown
Contributor Author

This PR followed from the discussion here: #24228 (comment)

@tonycoz
Copy link
Copy Markdown
Contributor

tonycoz commented Mar 3, 2026

I'm trying to understand the motivation behind this. No matter what the locale, a dot is supposed to be accepted as the decimal radix character., in addition to whatever the locale's official character is. We shouldn't be outputting a dot but we should always accept them. Is there a real life example where this didn't happen?

The problem is the other direction, in $h{1.2} the 1.2 is treated as an NV, but is converted at compile time to a string.

This is a problem under use locale; since it uses the compile-time locale rather than the runtime locale:

# 5.40.1 because it's handy
$ LC_ALL=de_DE.UTF-8 perl -Mlocale -MPOSIX=setlocale,LC_ALL -le 'setlocale(LC_ALL, "en_AU.UTF-8") or die; $h{1.2} =1; print keys %h'
1,2

(we use . for the decimal in Australia)

This is more obviously a bug if we also add runtime evaluation:

$ LC_ALL=de_DE.UTF-8 perl -Mlocale -MPOSIX=setlocale,LC_ALL -le 'setlocale(LC_ALL, "en_AU.UTF-8") or die; $x=1.2; $h{1.2} =1; $h{$x} = 2; print for keys %h'
1.2
1,2

Note the problem isn't locales, it's that we convert to a string too early.

Comment thread op.c Outdated
&& SvTYPE(sv) < SVt_PVMG
&& SvOK(sv)
&& !SvROK(sv)
&& !SvNOK(sv) /* Decimal separator depends on runtime locale */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safe to do the NV to string conversion if IN_LC_COMPILETIME(LC_NUMERIC) is false, so maybe:

  && !(SvNOK(sv) && IN_LC_COMPILETIME(LC_NUMERIC))

which I think is more readable than:

&& (!SvNOK(sv) || !IN_LC_COMPILETIME(LC_NUMERIC))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IN_LC_COMPILETIME() check should be cheap in the non-use locale; common case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does IN_LC_COMPILETIME work? Is it safe against things like hekification within BEGIN {} blocks before the use locale; is encountered, or require locale; (if that's allowed) at runtime?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I've added a couple of tests into t/run/locale.t and the one relevant to this PR fails if the !SvNOK(sv) test is changed to !(SvNOK(sv) && IN_LC_COMPILETIME(LC_NUMERIC)). Not sure if that's just the way the tests have been written though....

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't looked, but it looks like the run time vs compile time distinction in the macros is broken (or unneeded).

During compilation, while generating OPs, PL_curcop points at PL_compiling, and during peep-time (still compiling) PL_curcop points at the most recent COP, so it's always correct to check PL_curcop.

Functions like Perl_ckwarn_common() which can be called at compiler or run time only check PL_curcop.

@khwilliamson ^^

So this should be checking IN_LC_RUNTIME(LC_NUMERIC) and the new tests pass if I change it:

diff --git a/op.c b/op.c
index c36327cf67..5974518ab5 100644
--- a/op.c
+++ b/op.c
@@ -2806,7 +2806,7 @@ Perl_check_hash_fields_and_hekify(pTHX_ UNOP *rop, SVOP *key_op, int real)
             && SvTYPE(sv) < SVt_PVMG
             && SvOK(sv)
             && !SvROK(sv)
-            && !SvNOK(sv) /* Decimal separator depends on runtime locale */
+            && !(SvNOK(sv) && IN_LC_RUNTIME(LC_NUMERIC)) /* Decimal separator depends on runtime locale */
             && real)
         {
             SSize_t keylen;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed that change., Thanks Tony.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently IN_SOME_LOCALE_FORM_COMPILETIME is cBOOL(PL_hints & (HINT_LOCALE))
Are you @tonycoz saying that's wrong? like it should instead be (PL_curcop && CopHINTS_get(PL_curcop) & (HINT_LOCALE))

@tonycoz
Copy link
Copy Markdown
Contributor

tonycoz commented Mar 10, 2026

  • This set of changes does not require a perldelta entry.

I think it does - it changes user visible behaviour.

@richardleach
Copy link
Copy Markdown
Contributor Author

  • This set of changes does not require a perldelta entry.

I think it does - it changes user visible behaviour.

An entry has now been added.

`Perl_check_hash_fields_and_hekify` used to convert IV, NV, and PV
`OP_CONST` SV values into shared HEKs at compile time. However, the
decimal  separator character is locale dependent, so it is unsafe to
perform a one-off conversion of an NV to a hash key using the compile
time locale.

This commit adds a check at optimization time to ensure that NVs are
not unsafely hekified.

It also adds a couple of new regression tests.
@richardleach richardleach force-pushed the chfandhekify_locale_NVs branch from e8eed1e to 81dacdb Compare March 10, 2026 23:49
@richardleach richardleach merged commit 65e9d85 into Perl:blead Mar 11, 2026
33 checks passed
@richardleach richardleach deleted the chfandhekify_locale_NVs branch March 11, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants