Skip to content

Commit

Permalink
Fix isspace() handling on AIX
Browse files Browse the repository at this point in the history
This box incorrectly thinks the NBSP is a graphic on many locales.
Overriding that makes these locales behave like proper POSIX ones.
  • Loading branch information
khwilliamson committed May 5, 2021
1 parent 4b32b2c commit f7acdfc
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 14 deletions.
17 changes: 10 additions & 7 deletions handy.h
Expand Up @@ -1913,7 +1913,6 @@ END_EXTERN_C

/* The next few are the same in all platforms. */
#define is_porcelain_CNTRL(c) iscntrl((U8) (c))
#define is_porcelain_SPACE(c) isspace((U8) (c))
#define is_porcelain_IDFIRST(c) (UNLIKELY((c) == '_') || is_porcelain_ALPHA(c))
#define is_porcelain_WORDCHAR(c) (UNLIKELY((c) == '_') || is_porcelain_ALPHANUMERIC(c))

Expand All @@ -1922,18 +1921,22 @@ END_EXTERN_C
#define to_porcelain_UPPER(c) toupper((U8) (c))
#define to_porcelain_FOLD(c) to_porcelain_LOWER(c)

#ifdef _AIX /* Many AIX locales have this wrong */
# define is_porcelain_SPACE(c) (isspace((U8) (c)) && ! is_porcelain_GRAPH(c))
#else
# define is_porcelain_SPACE(c) isspace((U8) (c))
#endif
#ifdef WIN32

/* The Windows functions don't bother to follow the POSIX standard, which for
* example says that something can't both be a printable and a control. But
* Windows treats the \t control as a printable, and does such things as making
* superscripts into both digits and punctuation. These #defines tame these
* flaws by assuming that the definitions of both controls and space are
* correct, and then making sure that other definitions don't have weirdnesses,
* by adding a check that things that aren't \w, like ispunct(), arent't
* controls, and that \w and its subsets aren't ispunct(). Not all possible
* weirdnesses are checked for, just ones that were detected on actual
* Microsoft code pages */
* flaws by assuming that the definitions of controls are correct, and then
* making sure that other definitions don't have weirdnesses, by adding a check
* that things that aren't \w, like ispunct(), arent't controls, and that \w
* and its subsets aren't ispunct(). Not all possible weirdnesses are checked
* for, just ones that were detected on actual Microsoft code pages */
# define is_porcelain_ALPHA(c) \
(isalpha((U8) (c)) && ! is_porcelain_PUNCT(c))
# define is_porcelain_ALPHANUMERIC(c) \
Expand Down
10 changes: 3 additions & 7 deletions lib/locale.t
Expand Up @@ -64,15 +64,11 @@ if ($^O =~ /cygwin | darwin /xi) {

# Certain tests have been shown to be problematical for a few locales. Don't
# fail them unless at least this percentage of the tested locales fail.
# On AIX machines, many locales call a no-break space a graphic.
# (There aren't 1000 locales currently in existence, so 99.9 works)
# EBCDIC os390 has more locales fail than normal, because it has locales that
# move various critical characters like '['.
my $acceptable_failure_percentage = ($os =~ / ^ ( aix ) $ /x)
? 99.9
: ($os =~ / ^ ( os390 ) $ /x)
? 10
: 5;
my $acceptable_failure_percentage = ($os =~ / ^ ( os390 ) $ /x)
? 10
: 5;

# The list of test numbers of the problematic tests.
my %problematical_tests;
Expand Down

0 comments on commit f7acdfc

Please sign in to comment.