Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solaris Failing Some Locale Tests #16537

Closed
p5pRT opened this issue Apr 27, 2018 · 14 comments
Closed

Solaris Failing Some Locale Tests #16537

p5pRT opened this issue Apr 27, 2018 · 14 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Apr 27, 2018

Migrated from rt.perl.org#133157 (status was 'resolved')

Searchable as RT133157$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 27, 2018

From carlos@carlosguevara.com

This is a bug report for perl from "Carlos Guevara" <carlos@​carlosguevara.com>,
generated with the help of perlbug 1.41 running under perl 5.28.0.


Solaris is failing some locale tests​:
http​://perl5.test-smoke.org/report/65421



Flags​:
  category=core
  severity=low


Site configuration information for perl 5.28.0​:

Configured by cpan at Thu Apr 26 22​:26​:15 CDT 2018.

Summary of my perl5 (revision 5 version 28 subversion 0) configuration​:
  Snapshot of​: 5dbe8f0
  Platform​:
  osname=solaris
  osvers=2.11
  archname=i86pc-solaris-64
  uname='sunos cjg-hipster 5.11 illumos-094e47e980 i86pc i386 i86pc '
  config_args='-des -Dprefix=/bin/perl-blead
-Dscriptdir=
/bin/perl-blead/bin -Dusedevel -Duse64bitall -Dcc=gcc'
  hint=recommended
  useposix=true
  d_sigaction=define
  useithreads=undef
  usemultiplicity=undef
  use64bitint=define
  use64bitall=define
  uselongdouble=undef
  usemymalloc=n
  default_inc_excludes_dot=define
  bincompat5005=undef
  Compiler​:
  cc='gcc'
  ccflags ='-m64 -fwrapv -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/gnu/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -DPERL_USE_SAFE_PUTENV'
  optimize='-O'
  cppflags='-m64 -fwrapv -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/gnu/include'
  ccversion=''
  gccversion='6.4.0'
  gccosandvers=''
  intsize=4
  longsize=8
  ptrsize=8
  doublesize=8
  byteorder=12345678
  doublekind=3
  d_longlong=define
  longlongsize=8
  d_longdbl=define
  longdblsize=16
  longdblkind=3
  ivtype='long'
  ivsize=8
  nvtype='double'
  nvsize=8
  Off_t='off_t'
  lseeksize=8
  alignbytes=8
  prototype=define
  Linker and Libraries​:
  ld='gcc'
  ldflags =' -m64 -fstack-protector-strong -L/usr/gnu/lib '
  libpth=/usr/gcc/6/lib /usr/lib /usr/gnu/lib /usr/ccs/lib
  libs=-lpthread -lsocket -lnsl -lgdbm -ldb -ldl -lm -lc
  perllibs=-lpthread -lsocket -lnsl -ldl -lm -lc
  libc=/lib/libc.so
  so=so
  useshrplib=true
  libperl=libperl.so
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs
  dlext=so
  d_dlsymun=undef
  ccdlflags=' -R /home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64/CORE'
  cccdlflags='-fPIC'
  lddlflags=' -shared -m64 -L/usr/gnu/lib -fstack-protector-strong'


@​INC for perl 5.28.0​:
  /home/cpan/bin/perl-blead/lib/site_perl/5.28.0/i86pc-solaris-64
  /home/cpan/bin/perl-blead/lib/site_perl/5.28.0
  /home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64
  /home/cpan/bin/perl-blead/lib/5.28.0


Environment for perl 5.28.0​:
  HOME=/home/cpan
  LANG=en_US
  LANGUAGE (unset)
  LC_ALL=C
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/cpan/bin/perl-blead/bin​:/home/cpan/bin​:/usr/bin​:/usr/sbin​:/sbin​:/usr/gnu/bin
  PERL_BADLANG (unset)
  SHELL=/usr/bin/bash

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 27, 2018

From @khwilliamson

On 04/26/2018 10​:07 PM, Carlos Guevara (via RT) wrote​:

# New Ticket Created by Carlos Guevara
# Please include the string​: [perl #133157]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133157 >

This is a bug report for perl from "Carlos Guevara" <carlos@​carlosguevara.com>,
generated with the help of perlbug 1.41 running under perl 5.28.0.

-----------------------------------------------------------------
Solaris is failing some locale tests​:
http​://perl5.test-smoke.org/report/65421

The problem is that these locales have a UTF-8 decimal radix character,
and it appears that the OS doesn't properly handle this case. A very
similar issue was present in cygwin until we reported it to them, and
they have since fixed it. Attached is a short C program to verify that
it's an OS problem.

In the meantime, solaris smokes are failing. I've made this a 5.28
blocker. I have patches that skip or todo the failing tests. But I'll
wait until Carlos runs the program.

Also, this is openindiana solaris. I have no idea if Oracle solaris has
this issue. The bug tracker is not open to the public, which I find
astonishing and disconcerting. I did not find this issue in the
openindiana
list-----------------------------------------------------------------

---
Flags​:
category=core
severity=low
---
Site configuration information for perl 5.28.0​:

Configured by cpan at Thu Apr 26 22​:26​:15 CDT 2018.

Summary of my perl5 (revision 5 version 28 subversion 0) configuration​:
Snapshot of​: 5dbe8f0
Platform​:
osname=solaris
osvers=2.11
archname=i86pc-solaris-64
uname='sunos cjg-hipster 5.11 illumos-094e47e980 i86pc i386 i86pc '
config_args='-des -Dprefix=/bin/perl-blead
-Dscriptdir=
/bin/perl-blead/bin -Dusedevel -Duse64bitall -Dcc=gcc'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler​:
cc='gcc'
ccflags ='-m64 -fwrapv -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/gnu/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -DPERL_USE_SAFE_PUTENV'
optimize='-O'
cppflags='-m64 -fwrapv -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/gnu/include'
ccversion=''
gccversion='6.4.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries​:
ld='gcc'
ldflags =' -m64 -fstack-protector-strong -L/usr/gnu/lib '
libpth=/usr/gcc/6/lib /usr/lib /usr/gnu/lib /usr/ccs/lib
libs=-lpthread -lsocket -lnsl -lgdbm -ldb -ldl -lm -lc
perllibs=-lpthread -lsocket -lnsl -ldl -lm -lc
libc=/lib/libc.so
so=so
useshrplib=true
libperl=libperl.so
gnulibc_version=''
Dynamic Linking​:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags=' -R /home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64/CORE'
cccdlflags='-fPIC'
lddlflags=' -shared -m64 -L/usr/gnu/lib -fstack-protector-strong'

---
@​INC for perl 5.28.0​:
/home/cpan/bin/perl-blead/lib/site_perl/5.28.0/i86pc-solaris-64
/home/cpan/bin/perl-blead/lib/site_perl/5.28.0
/home/cpan/bin/perl-blead/lib/5.28.0/i86pc-solaris-64
/home/cpan/bin/perl-blead/lib/5.28.0

---
Environment for perl 5.28.0​:
HOME=/home/cpan
LANG=en_US
LANGUAGE (unset)
LC_ALL=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/cpan/bin/perl-blead/bin​:/home/cpan/bin​:/usr/bin​:/usr/sbin​:/sbin​:/usr/gnu/bin
PERL_BADLANG (unset)
SHELL=/usr/bin/bash

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 27, 2018

From @khwilliamson

#include <stdio.h>
#include <locale.h>

int
main(int argc, char ** argv)
{
    char buf[100];
    unsigned int i;

    printf("%s\n", setlocale(LC_ALL, "ar_AE.UTF8"));
    snprintf(buf, sizeof(buf), "%g", 3.2);

    for (i = 0; i < sizeof(buf); i++) {
        if (buf[i] == '\0') break;
        printf(" %x", buf[i]);
    }
    printf("\n");
}

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 27, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 28, 2018

From carlos@carlosguevara.com

Revised radix.c​:
#####
#include <stdio.h>
#include <locale.h>

int
main(int argc, char ** argv)
{
  unsigned char buf[100];
  unsigned int i;

  printf("%s\n", setlocale(LC_ALL, "ar_AE.UTF-8"));
  snprintf(buf, sizeof(buf), "%g", 3.2);

  for (i = 0; i < sizeof(buf); i++) {
  if (buf[i] == '\0') break;
  printf(" %x", buf[i]);
  }
  printf("\n");
}
#####

Output​:
#####
ar_AE.UTF-8
33 d9 32
#####

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 28, 2018

From @khwilliamson

On 04/27/2018 09​:56 PM, Carlos Guevara wrote​:

Revised radix.c​:
#####
#include <stdio.h>
#include <locale.h>

int
main(int argc, char ** argv)
{
unsigned char buf[100];
unsigned int i;

 printf\("%s\\n"\, setlocale\(LC\_ALL\, "ar\_AE\.UTF\-8"\)\);
 snprintf\(buf\, sizeof\(buf\)\, "%g"\, 3\.2\);

 for \(i = 0; i \< sizeof\(buf\); i\+\+\) \{
     if \(buf\[i\] == '\\0'\) break;
     printf\(" %x"\, buf\[i\]\);
 \}
 printf\("\\n"\);

}
#####

Output​:
#####
ar_AE.UTF-8
33 d9 32
#####

That should instead have been 33 d9 ab 32.
And that indicates that the problem is indeed with the OS.
My guess is that it doesn't consider the possibility of a multi-byte
radix character, so it uses just the first byte, but \xd9 is a start
byte of a two byte sequence, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 28, 2018

From @khwilliamson

On 04/27/2018 10​:24 PM, Karl Williamson wrote​:

On 04/27/2018 09​:56 PM, Carlos Guevara wrote​:

Revised radix.c​:
#####
#include <stdio.h>
#include <locale.h>

int
main(int argc, char ** argv)
{
     unsigned char buf[100];
     unsigned int i;

     printf("%s\n", setlocale(LC_ALL, "ar_AE.UTF-8"));
     snprintf(buf, sizeof(buf), "%g", 3.2);

     for (i = 0; i < sizeof(buf); i++) {
         if (buf[i] == '\0') break;
         printf(" %x", buf[i]);
     }
     printf("\n");
}
#####

Output​:
#####
ar_AE.UTF-8
  33 d9 32
#####

That should instead have been 33 d9 ab 32.
And that indicates that the problem is indeed with the OS.
My guess is that it doesn't consider the possibility of a multi-byte
radix character, so it uses just the first byte, but \xd9 is a start
byte of a two byte sequence, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

Now done as https://www.illumos.org/issues/9511

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 28, 2018

From @khwilliamson

On 04/28/2018 10​:02 AM, Karl Williamson wrote​:

On 04/27/2018 10​:24 PM, Karl Williamson wrote​:

On 04/27/2018 09​:56 PM, Carlos Guevara wrote​:

Revised radix.c​:
#####
#include <stdio.h>
#include <locale.h>

int
main(int argc, char ** argv)
{
     unsigned char buf[100];
     unsigned int i;

     printf("%s\n", setlocale(LC_ALL, "ar_AE.UTF-8"));
     snprintf(buf, sizeof(buf), "%g", 3.2);

     for (i = 0; i < sizeof(buf); i++) {
         if (buf[i] == '\0') break;
         printf(" %x", buf[i]);
     }
     printf("\n");
}
#####

Output​:
#####
ar_AE.UTF-8
  33 d9 32
#####

That should instead have been 33 d9 ab 32.
And that indicates that the problem is indeed with the OS.
My guess is that it doesn't consider the possibility of a multi-byte
radix character, so it uses just the first byte, but \xd9 is a start
byte of a two byte sequence, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

Now done as https://www.illumos.org/issues/9511

Attached are three patches that cause these tests to pass on solaris. A
version specification should probably be added to the one for
t/run/locale.t. But there are complications that I don't know how to
deal with. I don't know the version spec to use for openindiana which I
understand has a different kind of release deal. And I don't know if
this is a bug in the Oracle solaris, which has a very different version
number.

I think these patches, after the versioning is ironed out, should go in
5.28, so that this platform passes the test suite. These affect only
two .t files.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 28, 2018

From @khwilliamson

0002-t-run-locale.t-Skip-some-Solaris-locales.patch
From bd0d1ba4062ea201cb26e4d5690e76f00a3f9287 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Thu, 19 Apr 2018 14:43:43 -0600
Subject: [PATCH 2/4] t/run/locale.t: Skip some Solaris locales

Solaris is buggy in dealing with locales that have a multi-byte UTF-8
decimal radix character.  Skip using these, like we do on cygwin, which
has a similar problem.
---
 t/run/locale.t | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/t/run/locale.t b/t/run/locale.t
index 13bc25d7a7..282fbb5f86 100644
--- a/t/run/locale.t
+++ b/t/run/locale.t
@@ -88,6 +88,13 @@ if ($non_C_locale) {
         @test_numeric_locales = grep { $_ !~ m/ps_AF/i } @test_numeric_locales;
     }
 
+    # Similarly the arabic locales on solaris don't work right on the
+    # multi-byte radix character, generating malformed UTF-8.
+    if ($^O eq 'solaris') {
+        @test_numeric_locales = grep { $_ !~ m/ ^ ( ar_ | pa_ ) /x }
+                                                        @test_numeric_locales;
+    }
+
     fresh_perl_is("for (qw(@test_numeric_locales)) {\n" . <<'EOF',
         use POSIX qw(locale_h);
         use locale;
-- 
2.11.0

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 28, 2018

From @khwilliamson

0003-lib-locale.t-Mark-a-test-problematic.patch
From 57e2dd7f14b426e28eab0b11640ba1b921daf080 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 28 Apr 2018 10:16:08 -0600
Subject: [PATCH 3/4] lib/locale.t: Mark a test problematic

We now have found a system that fails this test.  Tests that are listed
as problematic automatically get marked as TODO when they fail with
specified platforms.  The next commit will specify the platform that
this is fails on.
---
 lib/locale.t | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/locale.t b/lib/locale.t
index 85843acae7..638e21cff0 100644
--- a/lib/locale.t
+++ b/lib/locale.t
@@ -2237,6 +2237,7 @@ foreach my $Locale (@Locale) {
 
     report_result($Locale, ++$locales_test_number, $ok15);
     $test_names{$locales_test_number} = 'Verify that a number with a UTF-8 radix has a UTF-8 stringification';
+    $problematical_tests{$locales_test_number} = 1;
 
     report_result($Locale, ++$locales_test_number, $ok16);
     $test_names{$locales_test_number} = 'Verify that a sprintf of a number with a UTF-8 radix yields UTF-8';
-- 
2.11.0

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 28, 2018

From @khwilliamson

0004-lib-locale.t-TODO-some-locales-on-Solaris.patch
From 54749c361a30cfad35542ed0841956477ae3fa32 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 28 Apr 2018 10:18:05 -0600
Subject: [PATCH 4/4] lib/locale.t: TODO some locales on Solaris

There is a bug in Solaris with locales which have a multi-byte decimal
radix character.  Make these TODO, like we do cygwin, which has had a
similar problem.
---
 lib/locale.t | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/locale.t b/lib/locale.t
index 638e21cff0..17931c894d 100644
--- a/lib/locale.t
+++ b/lib/locale.t
@@ -78,6 +78,11 @@ my %known_bad_locales = (
                           darwin => qr/ ^ lt_LT.ISO8859 /ix,
                           os390 => qr/ ^ italian /ix,
                           netbsd => qr/\bISO8859-2\b/i,
+
+                          # This may be the same bug as the cygwin below; it's
+                          # generating malformed UTF-8 on the radix being
+                          # mulit-byte
+                          solaris => qr/ ^ ( ar_ | pa_ ) /x,
                         );
 
 # cygwin isn't returning proper radix length in this locale, but supposedly to
-- 
2.11.0

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 30, 2018

From @xsawyerx

On 04/28/2018 07​:24 PM, Karl Williamson wrote​:

On 04/28/2018 10​:02 AM, Karl Williamson wrote​:

On 04/27/2018 10​:24 PM, Karl Williamson wrote​:

On 04/27/2018 09​:56 PM, Carlos Guevara wrote​:

Revised radix.c​:
#####
#include <stdio.h>
#include <locale.h>

int
main(int argc, char ** argv)
{
     unsigned char buf[100];
     unsigned int i;

     printf("%s\n", setlocale(LC_ALL, "ar_AE.UTF-8"));
     snprintf(buf, sizeof(buf), "%g", 3.2);

     for (i = 0; i < sizeof(buf); i++) {
         if (buf[i] == '\0') break;
         printf(" %x", buf[i]);
     }
     printf("\n");
}
#####

Output​:
#####
ar_AE.UTF-8
  33 d9 32
#####

That should instead have been 33 d9 ab 32.
And that indicates that the problem is indeed with the OS.
My guess is that it doesn't consider the possibility of a multi-byte
radix character, so it uses just the first byte, but \xd9 is a start
byte of a two byte sequence, so this is leading to malformed UTF-8.

I'll submit a trouble ticket for them.

Now done as https://www.illumos.org/issues/9511

Attached are three patches that cause these tests to pass on solaris. 
A version specification should probably be added to the one for
t/run/locale.t.  But there are complications that I don't know how to
deal with.  I don't know the version spec to use for openindiana which
I understand has a different kind of release deal.  And I don't know
if this is a bug in the Oracle solaris, which has a very different
version number.

I think these patches, after the versioning is ironed out, should go
in 5.28, so that this platform passes the test suite.  These affect
only two .t files.

I'd like one of the committers to approve this before it is merged to
blead. Dave, Tony, Yves, Zefram, etc.?

@p5pRT
Copy link
Author

@p5pRT p5pRT commented May 1, 2018

From @iabyn

On Mon, Apr 30, 2018 at 11​:50​:45PM +0300, Sawyer X wrote​:

On 04/28/2018 07​:24 PM, Karl Williamson wrote​:

Attached are three patches that cause these tests to pass on solaris. 
A version specification should probably be added to the one for
t/run/locale.t.  But there are complications that I don't know how to
deal with.  I don't know the version spec to use for openindiana which
I understand has a different kind of release deal.  And I don't know
if this is a bug in the Oracle solaris, which has a very different
version number.

I think these patches, after the versioning is ironed out, should go
in 5.28, so that this platform passes the test suite.  These affect
only two .t files.

I'd like one of the committers to approve this before it is merged to
blead. Dave, Tony, Yves, Zefram, etc.?

I approve, and and just merged them, as

  v5.27.11-26-ge3e8c0d65c
  v5.27.11-27-ga6bc52d6f4
  v5.27.11-28-gb974d2c0b3

As regards the specifics of openindiana id and versions, that can
always be added later if we obtain that info.

--
Modern art​:
  "That's easy, I could have done that!"
  "Ah, but you didn't!"

@p5pRT
Copy link
Author

@p5pRT p5pRT commented May 1, 2018

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant