Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When embedding Perl in C, the locale is switched to C/ASCII #21366

Open
trygveaa opened this issue Aug 11, 2023 · 10 comments
Open

When embedding Perl in C, the locale is switched to C/ASCII #21366

trygveaa opened this issue Aug 11, 2023 · 10 comments

Comments

@trygveaa
Copy link

Description

When embedding Perl in a C program in order to run Perl scripts in it, the locale is set to C/ANSI_X3.4-1968 (ASCII). This causes issues with using non-ascii characters.

This is a regression introduced in commit 7af2d20. Before this commit, the chosen locale was kept.

Steps to Reproduce

This script is one of the examples of embedding Perl taken from https://perldoc.perl.org/perlembed. The only changes are that it sets the locale first, and prints the current charset before and after loading Perl.

#include <EXTERN.h>
#include <locale.h>
#include <langinfo.h>
#include <perl.h>

static PerlInterpreter *my_perl;

int main(int argc, char **argv, char **env) {
  char *current_charset = NULL;
  setlocale(LC_ALL, "");

  current_charset = strdup (nl_langinfo (CODESET));
  printf("charset before loading perl: %s\n", current_charset);

  PERL_SYS_INIT3(&argc, &argv, &env);
  my_perl = perl_alloc();
  perl_construct(my_perl);
  PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
  perl_parse(my_perl, NULL, argc, argv, (char **)NULL);
  perl_run(my_perl);

  current_charset = strdup (nl_langinfo (CODESET));
  printf("charset after loading perl: %s\n", current_charset);

  perl_destruct(my_perl);
  perl_free(my_perl);
  PERL_SYS_TERM();
  exit(EXIT_SUCCESS);
}

When ran it prints:

charset before loading perl: UTF-8
charset after loading perl: ANSI_X3.4-1968

If run with Perl before commit 7af2d20, it prints UTF-8 in both lines.

After running PERL_SYS_TERM() the locale is back to UTF-8 again, but the documentation says that it should only be called once after freeing the last interpreter. My use case (Perl scripts for extending functionality in WeeChat) is having long running Perl scripts that often are kept running for the whole lifetime of the application, so this doesn't help.

Expected behavior

That the locale is preserved after loading Perl and running Perl code.

Perl configuration

Summary of my perl5 (revision 5 version 38 subversion 0) configuration:
   
  Platform:
    osname=linux
    osvers=5.12.15-arch1-1
    archname=x86_64-linux-thread-multi
    uname='archlinux'
    config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/5.38/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/5.38/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/5.38/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dlddlflags=-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -Dldflags=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/perl/src=/usr/src/debug/perl -flto=auto'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='13.1.1 20230714'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags ='-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/lib/../lib/libc.so.6
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.37'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.38/core_perl/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Built under linux
  Compiled at Jul 24 2023 22:17:30
  @INC:
    /usr/lib/perl5/5.38/site_perl
    /usr/share/perl5/site_perl
    /usr/lib/perl5/5.38/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib/perl5/5.38/core_perl
    /usr/share/perl5/core_perl
@khwilliamson
Copy link
Contributor

There have been significant fixes to the locale initialization code recently. I was hoping that they would fix this issue, and indeed in trying it out on blead, I get:

charset before loading perl: UTF-8
charset after loading perl: UTF-8

Could you check if it is now fixed for you?

@khwilliamson
Copy link
Contributor

@trygveaa could you try blead on this problem to verify that it has been fixed or not?

@trygveaa
Copy link
Author

Thanks! Yes, I checked now, and the issue is indeed resolved on blead.

Will there be a patch release on 5.38 for this, or will it only be fixed in the next standard release?

@jkeenan
Copy link
Contributor

jkeenan commented Sep 15, 2023

Thanks! Yes, I checked now, and the issue is indeed resolved on blead.

Will there be a patch release on 5.38 for this, or will it only be fixed in the next standard release?

In order for the corrections to appear in maintenance release perl-5.38.1, we have to be able to identify the commit(s) made since perl-5.38.0 that corrected the problem. In other words, what commit(s) undo the harmful effects of 7af2d20 in the last production cycle without undoing that commit's benefits. A reverse bisection, in effect.

Given that the defect shows up, not in just a regular Perl program, but when you embed Perl in a C program, this is non-trivial. (It's above my own pay grade.) @trygveaa, would it be possible for you to identify the monthly development release at which this problem cleared up for you? (E.g., v5.39.1 on July 20 or v5.39.2 on August 20).

@khwilliamson and @steve-m-hay, advice sought. Thanks.

@trygveaa
Copy link
Author

@jkeenan: I bisected it and found that the commit that fixes the issue is bf38d1c.

jkeenan added a commit that referenced this issue Sep 16, 2023
In #21366, Trygve Aaberge reported
on Aug 11 2023 that commit 7af2d20 (Oct 18 2022, during 5.37 dev
cycle) had broken certain locale-related functionality when embedding
Perl code into a C program.

Subsequent investigation indicated that this problem had been corrected
by commit bf38d1c (Jul 25 2023, during 5.39 dev cycle).
@jkeenan
Copy link
Contributor

jkeenan commented Sep 16, 2023

@jkeenan: I bisected it and found that the commit that fixes the issue is bf38d1c.

In the Perl 5 repository there is a branch called maint-votes in which committers propose commits for back-porting to maintenance releases. (The next such maint release would be perl-5.38.1.) In commit bfaeb30 in that branch, I have requested that bf38d1c be included in that maintenance release.

@steve-m-hay and @tonycoz, please double-check that commit, as I have not often touched the maint-votes branch.

@trygveaa
Copy link
Author

Thanks!

@tonycoz
Copy link
Contributor

tonycoz commented Sep 16, 2023

please double-check that commit, as I have not often touched the maint-votes branch.

The maint-votes commit looks fine.

@ailin-nemui
Copy link

I believe this issue is currently breaking (the whole of, including C parts) irssi on fedora 39, first reported in irssi/scripts.irssi.org#857 (basically the same use case as @trygveaa )

@ailin-nemui
Copy link

after discussion with @khwilliamson reverting 7af2d20 on top of 5.38.0 fixes this for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants