New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When embedding Perl in C, the locale is switched to C/ASCII #21366
Comments
There have been significant fixes to the locale initialization code recently. I was hoping that they would fix this issue, and indeed in trying it out on blead, I get:
Could you check if it is now fixed for you? |
@trygveaa could you try blead on this problem to verify that it has been fixed or not? |
Thanks! Yes, I checked now, and the issue is indeed resolved on blead. Will there be a patch release on 5.38 for this, or will it only be fixed in the next standard release? |
In order for the corrections to appear in maintenance release perl-5.38.1, we have to be able to identify the commit(s) made since perl-5.38.0 that corrected the problem. In other words, what commit(s) undo the harmful effects of 7af2d20 in the last production cycle without undoing that commit's benefits. A reverse bisection, in effect. Given that the defect shows up, not in just a regular Perl program, but when you embed Perl in a C program, this is non-trivial. (It's above my own pay grade.) @trygveaa, would it be possible for you to identify the monthly development release at which this problem cleared up for you? (E.g., @khwilliamson and @steve-m-hay, advice sought. Thanks. |
In #21366, Trygve Aaberge reported on Aug 11 2023 that commit 7af2d20 (Oct 18 2022, during 5.37 dev cycle) had broken certain locale-related functionality when embedding Perl code into a C program. Subsequent investigation indicated that this problem had been corrected by commit bf38d1c (Jul 25 2023, during 5.39 dev cycle).
In the Perl 5 repository there is a branch called maint-votes in which committers propose commits for back-porting to maintenance releases. (The next such maint release would be perl-5.38.1.) In commit bfaeb30 in that branch, I have requested that bf38d1c be included in that maintenance release. @steve-m-hay and @tonycoz, please double-check that commit, as I have not often touched the maint-votes branch. |
Thanks! |
The maint-votes commit looks fine. |
I believe this issue is currently breaking (the whole of, including C parts) irssi on fedora 39, first reported in irssi/scripts.irssi.org#857 (basically the same use case as @trygveaa ) |
after discussion with @khwilliamson reverting 7af2d20 on top of 5.38.0 fixes this for me |
Description
When embedding Perl in a C program in order to run Perl scripts in it, the locale is set to C/ANSI_X3.4-1968 (ASCII). This causes issues with using non-ascii characters.
This is a regression introduced in commit 7af2d20. Before this commit, the chosen locale was kept.
Steps to Reproduce
This script is one of the examples of embedding Perl taken from https://perldoc.perl.org/perlembed. The only changes are that it sets the locale first, and prints the current charset before and after loading Perl.
When ran it prints:
If run with Perl before commit 7af2d20, it prints
UTF-8
in both lines.After running
PERL_SYS_TERM()
the locale is back toUTF-8
again, but the documentation says that it should only be called once after freeing the last interpreter. My use case (Perl scripts for extending functionality in WeeChat) is having long running Perl scripts that often are kept running for the whole lifetime of the application, so this doesn't help.Expected behavior
That the locale is preserved after loading Perl and running Perl code.
Perl configuration
The text was updated successfully, but these errors were encountered: