When embedding Perl in C, the locale is switched to C/ASCII #21366

trygveaa · 2023-08-11T22:25:32Z

Description

When embedding Perl in a C program in order to run Perl scripts in it, the locale is set to C/ANSI_X3.4-1968 (ASCII). This causes issues with using non-ascii characters.

This is a regression introduced in commit 7af2d20. Before this commit, the chosen locale was kept.

Steps to Reproduce

This script is one of the examples of embedding Perl taken from https://perldoc.perl.org/perlembed. The only changes are that it sets the locale first, and prints the current charset before and after loading Perl.

#include <EXTERN.h>
#include <locale.h>
#include <langinfo.h>
#include <perl.h>

static PerlInterpreter *my_perl;

int main(int argc, char **argv, char **env) {
  char *current_charset = NULL;
  setlocale(LC_ALL, "");

  current_charset = strdup (nl_langinfo (CODESET));
  printf("charset before loading perl: %s\n", current_charset);

  PERL_SYS_INIT3(&argc, &argv, &env);
  my_perl = perl_alloc();
  perl_construct(my_perl);
  PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
  perl_parse(my_perl, NULL, argc, argv, (char **)NULL);
  perl_run(my_perl);

  current_charset = strdup (nl_langinfo (CODESET));
  printf("charset after loading perl: %s\n", current_charset);

  perl_destruct(my_perl);
  perl_free(my_perl);
  PERL_SYS_TERM();
  exit(EXIT_SUCCESS);
}

When ran it prints:

charset before loading perl: UTF-8
charset after loading perl: ANSI_X3.4-1968

If run with Perl before commit 7af2d20, it prints UTF-8 in both lines.

After running PERL_SYS_TERM() the locale is back to UTF-8 again, but the documentation says that it should only be called once after freeing the last interpreter. My use case (Perl scripts for extending functionality in WeeChat) is having long running Perl scripts that often are kept running for the whole lifetime of the application, so this doesn't help.

Expected behavior

That the locale is preserved after loading Perl and running Perl code.

Perl configuration

Summary of my perl5 (revision 5 version 38 subversion 0) configuration:
   
  Platform:
    osname=linux
    osvers=5.12.15-arch1-1
    archname=x86_64-linux-thread-multi
    uname='archlinux'
    config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/5.38/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/5.38/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/5.38/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dlddlflags=-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -Dldflags=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='13.1.1 20230714'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags ='-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/lib/../lib/libc.so.6
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.37'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.38/core_perl/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Built under linux
  Compiled at Jul 24 2023 22:17:30
  @INC:
    /usr/lib/perl5/5.38/site_perl
    /usr/share/perl5/site_perl
    /usr/lib/perl5/5.38/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib/perl5/5.38/core_perl
    /usr/share/perl5/core_perl

The text was updated successfully, but these errors were encountered:

khwilliamson · 2023-09-06T00:38:51Z

There have been significant fixes to the locale initialization code recently. I was hoping that they would fix this issue, and indeed in trying it out on blead, I get:

charset before loading perl: UTF-8
charset after loading perl: UTF-8

Could you check if it is now fixed for you?

khwilliamson · 2023-09-09T17:05:06Z

@trygveaa could you try blead on this problem to verify that it has been fixed or not?

trygveaa · 2023-09-10T22:36:56Z

Thanks! Yes, I checked now, and the issue is indeed resolved on blead.

Will there be a patch release on 5.38 for this, or will it only be fixed in the next standard release?

jkeenan · 2023-09-15T17:35:48Z

Thanks! Yes, I checked now, and the issue is indeed resolved on blead.

Will there be a patch release on 5.38 for this, or will it only be fixed in the next standard release?

In order for the corrections to appear in maintenance release perl-5.38.1, we have to be able to identify the commit(s) made since perl-5.38.0 that corrected the problem. In other words, what commit(s) undo the harmful effects of 7af2d20 in the last production cycle without undoing that commit's benefits. A reverse bisection, in effect.

Given that the defect shows up, not in just a regular Perl program, but when you embed Perl in a C program, this is non-trivial. (It's above my own pay grade.) @trygveaa, would it be possible for you to identify the monthly development release at which this problem cleared up for you? (E.g., v5.39.1 on July 20 or v5.39.2 on August 20).

@khwilliamson and @steve-m-hay, advice sought. Thanks.

trygveaa · 2023-09-16T09:35:56Z

@jkeenan: I bisected it and found that the commit that fixes the issue is bf38d1c.

In #21366, Trygve Aaberge reported on Aug 11 2023 that commit 7af2d20 (Oct 18 2022, during 5.37 dev cycle) had broken certain locale-related functionality when embedding Perl code into a C program. Subsequent investigation indicated that this problem had been corrected by commit bf38d1c (Jul 25 2023, during 5.39 dev cycle).

jkeenan · 2023-09-16T12:45:56Z

@jkeenan: I bisected it and found that the commit that fixes the issue is bf38d1c.

In the Perl 5 repository there is a branch called maint-votes in which committers propose commits for back-porting to maintenance releases. (The next such maint release would be perl-5.38.1.) In commit bfaeb30 in that branch, I have requested that bf38d1c be included in that maintenance release.

@steve-m-hay and @tonycoz, please double-check that commit, as I have not often touched the maint-votes branch.

trygveaa · 2023-09-16T13:18:30Z

Thanks!

tonycoz · 2023-09-16T23:22:29Z

please double-check that commit, as I have not often touched the maint-votes branch.

The maint-votes commit looks fine.

ailin-nemui · 2023-09-21T10:42:58Z

I believe this issue is currently breaking (the whole of, including C parts) irssi on fedora 39, first reported in irssi/scripts.irssi.org#857 (basically the same use case as @trygveaa )

ailin-nemui · 2023-09-24T12:34:15Z

after discussion with @khwilliamson reverting 7af2d20 on top of 5.38.0 fixes this for me

trygveaa added the Needs Triage label Aug 11, 2023

trygveaa mentioned this issue Aug 14, 2023

Non-ascii characters (unicode, non-latin or accented characters, emojis etc.) broken after Perl upgrade weechat/weechat#1996

Closed

tonycoz mentioned this issue Aug 28, 2023

S_bool_setlocale_2008_i, S_querylocale_2008_i: Add entry assertions #21396

Open

jkeenan added type-locale and removed Needs Triage labels Sep 15, 2023

This was referenced Sep 21, 2023

Perl 5.38 breaks Irssi locale [Negative repeat count does nothing at trackbar.pl line 435.] irssi/scripts.irssi.org#857

Closed

Restore locale after loading Perl irssi/irssi#1498

Merged

ailin-nemui mentioned this issue Dec 26, 2023

restore locale if perl breaks it irssi/irssi#1510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When embedding Perl in C, the locale is switched to C/ASCII #21366

When embedding Perl in C, the locale is switched to C/ASCII #21366

trygveaa commented Aug 11, 2023

khwilliamson commented Sep 6, 2023

khwilliamson commented Sep 9, 2023

trygveaa commented Sep 10, 2023

jkeenan commented Sep 15, 2023

trygveaa commented Sep 16, 2023

jkeenan commented Sep 16, 2023

trygveaa commented Sep 16, 2023

tonycoz commented Sep 16, 2023

ailin-nemui commented Sep 21, 2023

ailin-nemui commented Sep 24, 2023

When embedding Perl in C, the locale is switched to C/ASCII #21366

When embedding Perl in C, the locale is switched to C/ASCII #21366

Comments

trygveaa commented Aug 11, 2023

khwilliamson commented Sep 6, 2023

khwilliamson commented Sep 9, 2023

trygveaa commented Sep 10, 2023

jkeenan commented Sep 15, 2023

trygveaa commented Sep 16, 2023

jkeenan commented Sep 16, 2023

trygveaa commented Sep 16, 2023

tonycoz commented Sep 16, 2023

ailin-nemui commented Sep 21, 2023

ailin-nemui commented Sep 24, 2023