Skip to content

Commit

Permalink
Add ability to emulate thread-safe locale operations
Browse files Browse the repository at this point in the history
Locale information was originally global for an entire process.  Later,
it was realized that different threads could want to be running in
different locales.  Windows added this ability, and POSIX 2008 followed
suit (though using a completely different API).  When available, perl
automatically uses these capabilities.

But many platforms have neither, or their implementation, such as on
Darwin, is buggy.  This commit adds the capability for Perl programs to
operate as if the platform were thread-safe.

This implementation is based on the observation that the underlying
locale matters only to relatively few libc calls, and only during their
execution.  It can be anything at all at any other time.  perl keeps
what the proper locale should be for each category in a a per-thread
array.  Each locale-dependent operation must be wrapped in mutex
lock/unlock operations.  The lock additionally compares what libc knows
the locale to be, and what it should be for this thread at this time,
and changes the actual locale to the proper value if necessary.  That's
all that is needed.

This commit adds macros to perl.h, for example "MBTOWC_LOCK_", that
expand to do the mutex lock, and change the global locale to the
expected value.  On perls built without this emulation capability, they
are no-ops.  All code in the perl core (unless I've missed something),
are changed to use these macros (there weren't actually many places that
needed this).  Thus, any pure perl program will automatically become
locale-thread-safe under this Configuration.

In order for XS code to also become locale-thread-safe, it must use
these macros to wrap calls to locale-dependent functions.  Relatively
few modules call such functions.  For example, the only one I found that
ships with the perl core is Time::Piece, and it has more fundamental
issues with running under threads than this.  I am preparing pull
requests for it.

Thus, this is not completely transparent to code like native-thread-safe
locale handling is.  Therefore ${^SAFE_LOCALES} returns 2 (instead of 1)
for this type of thread-safety.

Another deficiency compared to the native thread safety is when a thread
calls a non-perl library that accesses the locale.  The typical example is
Gtk (though this particular application can be configured to not be
problematic).  With the native safe threads, everything works as long as
only one such thread is used per Perl program.  That thread would then
be the only one operating in the global locale, hence there are no
conflicts.  With this emulation, all threads are operating in the global
locale, and mutexes would have to be used to prevent conflicts.  To
minimize those, the code added in this commit restores the global locale
when through to the state it was in when started.

A major concern is the performance impact.  This is after all trading
speed for accuracy.  lib/locale_threads.t is noticeably slower when this
is being used.  But that is doing multiple threads constantly using
locale-dependent operations.  I don't notice any change with the rest of
the test suite.  In pure perl, this only comes into play while in the
scope of 'use locale' or when using some of the few POSIX:: functions
that are locale-dependent.  And to some extent when formatting, but the
regular overhead there should dwarf what this adds.

This commit leaves this feature off by default.  The next commit changes
that for the next few 5.39 development releases, so we can see if there
is actually an issue.
  • Loading branch information
khwilliamson committed Nov 22, 2023
1 parent bfc409c commit e4072a8
Show file tree
Hide file tree
Showing 14 changed files with 848 additions and 99 deletions.
28 changes: 20 additions & 8 deletions embed.fnc
Expand Up @@ -3779,6 +3779,17 @@ p |void |dump_sv_child |NN SV *sv
CRTip |unsigned int|variant_byte_number \
|PERL_UINTMAX_T word
#endif
#if defined(EMULATE_THREAD_SAFE_LOCALES)
Cp |void |category_lock_i|const locale_category_index index \
|NN const char *file \
|const line_t caller_line
Cp |void |category_unlock_i \
|const locale_category_index index \
|NN const char *file \
|const line_t caller_line
Cip |int |posix_LC_foo_ |const int c \
|const U8 classnum
#endif
#if defined(F_FREESP) && !defined(HAS_CHSIZE) && !defined(HAS_TRUNCATE)
ARdp |I32 |my_chsize |int fd \
|Off_t length
Expand Down Expand Up @@ -4457,6 +4468,13 @@ RS |char * |my_setlocale_debug_string_i \
|NULLOK const char *retval \
|const line_t line
# endif
# if defined(EMULATE_THREAD_SAFE_LOCALES) || \
( defined(USE_POSIX_2008_LOCALE) && !defined(USE_QUERYLOCALE) )
S |void |update_PL_curlocales_i \
|const locale_category_index index \
|NN const char *new_locale \
|const line_t caller_line
# endif
# if defined(HAS_NL_LANGINFO) || defined(HAS_NL_LANGINFO_L)
S |const char *|my_langinfo_i \
|const nl_item item \
Expand Down Expand Up @@ -4517,14 +4535,8 @@ S |const char *|querylocale_2008_i \
|const locale_category_index index \
|const line_t line
S |locale_t|use_curlocale_scratch
# if !defined(USE_QUERYLOCALE)
S |void |update_PL_curlocales_i \
|const locale_category_index index \
|NN const char *new_locale \
|const line_t caller_line
# endif
# elif defined(USE_LOCALE_THREADS) && !defined(USE_THREAD_SAFE_LOCALE) && \
!defined(USE_THREAD_SAFE_LOCALE_EMULATION)
# elif !defined(EMULATE_THREAD_SAFE_LOCALES) && \
defined(USE_LOCALE_THREADS) && !defined(USE_THREAD_SAFE_LOCALE)
S |bool |less_dicey_bool_setlocale_r \
|const int cat \
|NN const char *locale
Expand Down
17 changes: 11 additions & 6 deletions embed.h
Expand Up @@ -810,6 +810,11 @@
# if !defined(EBCDIC)
# define variant_byte_number Perl_variant_byte_number
# endif
# if defined(EMULATE_THREAD_SAFE_LOCALES)
# define category_lock_i(a,b,c) Perl_category_lock_i(aTHX_ a,b,c)
# define category_unlock_i(a,b,c) Perl_category_unlock_i(aTHX_ a,b,c)
# define posix_LC_foo_(a,b) Perl_posix_LC_foo_(aTHX_ a,b)
# endif
# if defined(F_FREESP) && !defined(HAS_CHSIZE) && !defined(HAS_TRUNCATE)
# define my_chsize(a,b) Perl_my_chsize(aTHX_ a,b)
# endif
Expand Down Expand Up @@ -1313,6 +1318,10 @@
# if defined(DEBUGGING)
# define my_setlocale_debug_string_i(a,b,c,d) S_my_setlocale_debug_string_i(aTHX_ a,b,c,d)
# endif
# if defined(EMULATE_THREAD_SAFE_LOCALES) || \
( defined(USE_POSIX_2008_LOCALE) && !defined(USE_QUERYLOCALE) )
# define update_PL_curlocales_i(a,b,c) S_update_PL_curlocales_i(aTHX_ a,b,c)
# endif
# if defined(HAS_NL_LANGINFO) || defined(HAS_NL_LANGINFO_L)
# define my_langinfo_i(a,b,c,d,e,f) S_my_langinfo_i(aTHX_ a,b,c,d,e,f)
# else
Expand Down Expand Up @@ -1343,12 +1352,8 @@
# define bool_setlocale_2008_i(a,b,c) S_bool_setlocale_2008_i(aTHX_ a,b,c)
# define querylocale_2008_i(a,b) S_querylocale_2008_i(aTHX_ a,b)
# define use_curlocale_scratch() S_use_curlocale_scratch(aTHX)
# if !defined(USE_QUERYLOCALE)
# define update_PL_curlocales_i(a,b,c) S_update_PL_curlocales_i(aTHX_ a,b,c)
# endif
# elif defined(USE_LOCALE_THREADS) && \
!defined(USE_THREAD_SAFE_LOCALE) && \
!defined(USE_THREAD_SAFE_LOCALE_EMULATION)
# elif !defined(EMULATE_THREAD_SAFE_LOCALES) && \
defined(USE_LOCALE_THREADS) && !defined(USE_THREAD_SAFE_LOCALE)
# define less_dicey_bool_setlocale_r(a,b) S_less_dicey_bool_setlocale_r(aTHX_ a,b)
# define less_dicey_setlocale_r(a,b) S_less_dicey_setlocale_r(aTHX_ a,b)
# endif
Expand Down
3 changes: 3 additions & 0 deletions embedvar.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 17 additions & 4 deletions handy.h
Expand Up @@ -1555,9 +1555,16 @@ or casts

# define HIGHEST_REGCOMP_DOT_H_SYNC_ CC_VERTSPACE_

/* The members of the third group below do not need to be coordinated with data
* structures in regcomp.[ch] and regexec.c. */
# define CC_IDFIRST_ 16
/* These three follow immediately after the final function that has a version
* defined by C, like isascii(), so they overlap with anything else. They are
* used in the 'PL_libc_char_fcns' data structure, along with the ones above
* them */
# define CC_IDFIRST_ 16
# define CC_TOLOWER_ (CC_IDFIRST_ + 1)
# define CC_TOUPPER_ (CC_TOLOWER_ + 1)

/* The members of the fourth group below do not need to be coordinated with
* data structures in regcomp.[ch] and regexec.c. */
# define CC_CHARNAME_CONT_ 17
# define CC_NONLATIN1_FOLD_ 18
# define CC_NONLATIN1_SIMPLE_FOLD_ 19
Expand Down Expand Up @@ -2025,7 +2032,7 @@ END_EXTERN_C
# define is_posix_XDIGIT(c) isxdigit((U8) (c))
#endif

/* Below is the next level up, which currently expands to nothing more
/* Below is the next level up, which on most platforms expands to nothing more
* than the previous layer. These are the macros to use if you really need
* something whose input domain is a byte, and the locale isn't UTF-8; that is,
* where you normally would have to use things like bare isalnum().
Expand All @@ -2037,7 +2044,13 @@ END_EXTERN_C
* (Note, proper general operation of the bare libc functions requires you to
* cast to U8. These do that for you automatically.) */

/* In this one circumstance, the macro is implemented with a lock; otherwise it
* expands to just the layer below */
#ifdef EMULATE_THREAD_SAFE_LOCALES
# define WRAP_U8_LC_(c, classnum, posix) posix_LC_foo_((c), (classnum))
#else
# define WRAP_U8_LC_(c, classnum, posix) posix(c)
#endif

#define isU8_ALPHANUMERIC_LC(c) \
WRAP_U8_LC_((c), CC_ALPHANUMERIC_, is_posix_ALPHANUMERIC)
Expand Down
43 changes: 43 additions & 0 deletions inline.h
Expand Up @@ -318,6 +318,49 @@ S_PerlEnv_putenv(pTHX_ char * str)

#endif

/* ------------------------------- handy.h ------------------------------- */

#ifdef EMULATE_THREAD_SAFE_LOCALES

PERL_STATIC_INLINE int
Perl_posix_LC_foo_(pTHX_ const int c, const U8 classnum) {
int result;

LC_CTYPE_LOCK;

/* All calls to this (so far) are with a 'classnum' known at compile time,
* so the compiler should constant fold this down to a single assignment */
switch (classnum) {
case CC_ALPHANUMERIC_:result = (bool) is_posix_ALPHANUMERIC(c); break;
case CC_ALPHA_: result = (bool) is_posix_ALPHA(c); break;
case CC_ASCII_: result = (bool) is_posix_ASCII(c); break;
case CC_BLANK_: result = (bool) is_posix_BLANK(c); break;
case CC_CASED_: result = (bool) is_posix_CASED(c); break;
case CC_CNTRL_: result = (bool) is_posix_CNTRL(c); break;
case CC_DIGIT_: result = (bool) is_posix_DIGIT(c); break;
case CC_GRAPH_: result = (bool) is_posix_GRAPH(c); break;
case CC_LOWER_: result = (bool) is_posix_LOWER(c); break;
case CC_PRINT_: result = (bool) is_posix_PRINT(c); break;
case CC_PUNCT_: result = (bool) is_posix_PUNCT(c); break;
case CC_SPACE_: result = (bool) is_posix_SPACE(c); break;
case CC_UPPER_: result = (bool) is_posix_UPPER(c); break;
case CC_WORDCHAR_: result = (bool) is_posix_WORDCHAR(c); break;
case CC_XDIGIT_: result = (bool) is_posix_XDIGIT(c); break;
case CC_IDFIRST_: result = (bool) is_posix_IDFIRST(c); break;
case CC_TOLOWER_: result = to_posix_LOWER(c); break;
case CC_TOUPPER_: result = to_posix_UPPER(c); break;

default:
LC_CTYPE_UNLOCK;
locale_panic_(Perl_form(aTHX_ "Unknown charclass %d", classnum));
}

LC_CTYPE_UNLOCK;
return result;
}

#endif

/* ------------------------------- mg.h ------------------------------- */

#if defined(PERL_CORE) || defined(PERL_EXT)
Expand Down
7 changes: 7 additions & 0 deletions intrpvar.h
Expand Up @@ -757,7 +757,14 @@ PERLVARI(I, cur_locale_obj, locale_t, LC_GLOBAL_LOCALE)
* is almost always toggled into the C locale, and the locale it nominally is
* is stored as PL_numeric_name. */
PERLVARA(I, curlocales, LOCALE_CATEGORIES_COUNT_ + 1, const char *)
#endif
#ifdef EMULATE_THREAD_SAFE_LOCALES
PERLVARA(I, restore_locale, LOCALE_CATEGORIES_COUNT_, const char *)
PERLVARA(I, restore_locale_depth, LOCALE_CATEGORIES_COUNT_, Size_t)
#endif

#if defined(USE_LOCALE) && (defined(WIN32) || ! defined(USE_THREAD_SAFE_LOCALE))
PERLVARI(I, perl_controls_locale, bool, true)
#endif
#ifdef USE_PL_CUR_LC_ALL
PERLVARI(I, cur_LC_ALL, const char *, NULL)
Expand Down

0 comments on commit e4072a8

Please sign in to comment.