Skip to content

Commit

Permalink
Cache locale UTF8-ness lookups
Browse files Browse the repository at this point in the history
Some locales are UTF-8, some are not.  Knowledge of this is needed in
various circumstances.  This commit saves the results of the last
several lookups so they don't have to be recalculated each time.

The full generality of POSIX locales is such that you can have error
messages be displayed in one locale, say Spanish, while other things are
in French.  To accommodate this generality, the program can loop through
all the locale categories finding the UTF8ness of the locale it points
to.  However, in almost all instances, people are going to be in either
French or in Spanish, and not in some combination.  Suppose it is a
French UTF-8 locale for all categories.  This new cache will know that
the French locale is UTF-8, and the queries for all but the first
category can return that immediately.

This simple cache avoids the overhead of hashes.

This also fixes a bug I realized exists in threaded perls, but haven't
reproduced.  We do not support locales in such perls, and the user must
not change the locale or 'use locale'.  But perl itself could change the
locale behind the scenes, leading to segfaults or incorrect results.
One such instance is the determination of UTF8ness.  But this only could
happen if the full generality of locales is used so that the categories
are not all in the same locale.  This could only happen (if the user
doesn't change locales) if the environment is such that the perl program
is started up so that the categories are in such a state.  This commit
fixes this potential bug by caching the UTF8ness of each category at
startup, before any threads are instantiated, and so checking for it
later just looks it up in the cache, without perl changing the locale.
  • Loading branch information
khwilliamson committed Jan 18, 2018
1 parent fb713cc commit ce5b3ef
Show file tree
Hide file tree
Showing 4 changed files with 217 additions and 43 deletions.
1 change: 1 addition & 0 deletions embedvar.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@
#define PL_lastgotoprobe (vTHX->Ilastgotoprobe)
#define PL_laststatval (vTHX->Ilaststatval)
#define PL_laststype (vTHX->Ilaststype)
#define PL_locale_utf8ness (vTHX->Ilocale_utf8ness)
#define PL_localizing (vTHX->Ilocalizing)
#define PL_localpatches (vTHX->Ilocalpatches)
#define PL_lockhook (vTHX->Ilockhook)
Expand Down
2 changes: 2 additions & 0 deletions intrpvar.h
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,8 @@ PERLVAR(I, exit_flags, U8) /* was exit() unexpected, etc. */
PERLVAR(I, utf8locale, bool) /* utf8 locale detected */
PERLVAR(I, in_utf8_CTYPE_locale, bool)
PERLVAR(I, in_utf8_COLLATE_locale, bool)
PERLVARA(I, locale_utf8ness, 256, char)

#ifdef USE_LOCALE_CTYPE
PERLVAR(I, warn_locale, SV *)
#endif
Expand Down
Loading

0 comments on commit ce5b3ef

Please sign in to comment.