-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] I18N::Langinfo doesn't set UTF8 flag #15131
Comments
From @ntyniThis is a bug report for perl from Niko Tyni <ntyni@debian.org>, I18N::Langinfo::langinf() can return UTF-8 strings but doesn't set LC_ALL=fr_FR.UTF-8 perl -MDevel::Peek -MI18N::Langinfo=langinfo,MON_12 -e 'Dump langinfo(MON_12())' The attached somewhat clumsy set of two patches fixes this, but perhaps (This was prompted by Time-Format test suite starting to fail on Perl Flags: Site configuration information for perl 5.23.7: Configured by niko at Sat Jan 16 13:32:56 EET 2016. Summary of my perl5 (revision 5 version 23 subversion 7) configuration: Locally applied patches: @INC for perl 5.23.7: Environment for perl 5.23.7: |
From @ntyni0001-Make-__is_cur_LC_category_utf8-visible-to-I18N-Langi.patchFrom b8ea0feef314dd5432298db82d9f2b8afed1442f Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Sat, 16 Jan 2016 10:41:29 +0200
Subject: [PATCH 1/2] Make __is_cur_LC_category_utf8() visible to
I18N::Langinfo
PERL_EXT_LANGINFO is to be used by ext/I18N-Langinfo/Langinfo.xs
in a future commit.
---
embed.fnc | 2 +-
embed.h | 4 ++--
proto.h | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/embed.fnc b/embed.fnc
index effe0ad..f840ecd 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2495,7 +2495,7 @@ s |char* |stdize_locale |NN char* locs
#endif
#if defined(USE_LOCALE) \
- && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+ && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
ApM |bool |_is_cur_LC_category_utf8|int category
# ifdef DEBUGGING
AMnPpR |char * |_setlocale_debug_string|const int category \
diff --git a/embed.h b/embed.h
index 73c02d2..7ce8691 100644
--- a/embed.h
+++ b/embed.h
@@ -784,7 +784,7 @@
#if defined(DEBUGGING)
#define pad_setsv(a,b) Perl_pad_setsv(aTHX_ a,b)
#define pad_sv(a) Perl_pad_sv(aTHX_ a)
-# if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+# if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
#define _setlocale_debug_string Perl__setlocale_debug_string
# endif
#endif
@@ -860,7 +860,7 @@
#define sv_dup(a,b) Perl_sv_dup(aTHX_ a,b)
#define sv_dup_inc(a,b) Perl_sv_dup_inc(aTHX_ a,b)
#endif
-#if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+#if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
#define _is_cur_LC_category_utf8(a) Perl__is_cur_LC_category_utf8(aTHX_ a)
#endif
#if defined(USE_LOCALE_COLLATE)
diff --git a/proto.h b/proto.h
index 1bbdace..8087c2b 100644
--- a/proto.h
+++ b/proto.h
@@ -3829,7 +3829,7 @@ STATIC int S_tokereport(pTHX_ I32 rv, const YYSTYPE* lvalp);
#define PERL_ARGS_ASSERT_TOKEREPORT \
assert(lvalp)
# endif
-# if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+# if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
PERL_CALLCONV char * Perl__setlocale_debug_string(const int category, const char* const locale, const char* const retval)
__attribute__warn_unused_result__
__attribute__pure__;
@@ -5505,7 +5505,7 @@ PERL_CALLCONV SV* Perl_sv_dup_inc(pTHX_ const SV *const sstr, CLONE_PARAMS *cons
assert(param)
#endif
-#if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+#if defined(USE_LOCALE) && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
PERL_CALLCONV bool Perl__is_cur_LC_category_utf8(pTHX_ int category);
#endif
#if defined(USE_LOCALE) && defined(PERL_IN_LOCALE_C)
--
2.6.4
|
From @ntyni0002-langinfo-Set-UTF-8-flag-appropriately-on-return.patchFrom 67271c83612f3ab129c8326d07ca55104a2f23f8 Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Fri, 15 Jan 2016 21:20:24 +0200
Subject: [PATCH 2/2] langinfo: Set UTF-8 flag appropriately on return
nl_langinfo(3) can return non-ASCII strings in some locales. Set the
UTF-8 flag if this is the case and LC_TIME is set to a UTF8 locale.
Largely inspired by commit 9717af6d049902fc887c.
Bug-Debian: https://bugs.debian.org/811109
---
MANIFEST | 1 +
ext/I18N-Langinfo/Langinfo.pm | 2 +-
ext/I18N-Langinfo/Langinfo.xs | 17 ++++++++++++++++-
ext/I18N-Langinfo/t/utf8.t | 30 ++++++++++++++++++++++++++++++
4 files changed, 48 insertions(+), 2 deletions(-)
create mode 100644 ext/I18N-Langinfo/t/utf8.t
diff --git a/MANIFEST b/MANIFEST
index e75199d..2da10cc 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -3727,6 +3727,7 @@ ext/I18N-Langinfo/Langinfo.pm I18N::Langinfo
ext/I18N-Langinfo/Langinfo.xs I18N::Langinfo
ext/I18N-Langinfo/Makefile.PL I18N::Langinfo
ext/I18N-Langinfo/t/Langinfo.t See whether I18N::Langinfo works
+ext/I18N-Langinfo/t/utf8.t Test I18N::Langinfo utf8 handling
ext/IPC-Open3/lib/IPC/Open2.pm Open a two-ended pipe
ext/IPC-Open3/lib/IPC/Open3.pm Open a three-ended pipe
ext/IPC-Open3/t/fd.t See if IPC::Open3 works w/ file descriptors
diff --git a/ext/I18N-Langinfo/Langinfo.pm b/ext/I18N-Langinfo/Langinfo.pm
index 033d8de..e922035 100644
--- a/ext/I18N-Langinfo/Langinfo.pm
+++ b/ext/I18N-Langinfo/Langinfo.pm
@@ -72,7 +72,7 @@ our @EXPORT_OK = qw(
YESSTR
);
-our $VERSION = '0.13';
+our $VERSION = '0.14';
XSLoader::load();
diff --git a/ext/I18N-Langinfo/Langinfo.xs b/ext/I18N-Langinfo/Langinfo.xs
index 582b7fa..3910d6b 100644
--- a/ext/I18N-Langinfo/Langinfo.xs
+++ b/ext/I18N-Langinfo/Langinfo.xs
@@ -1,5 +1,8 @@
#define PERL_NO_GET_CONTEXT
+/* for _is_cur_LC_category_utf8() */
+#define PERL_EXT_LANGINFO
+
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
@@ -23,11 +26,23 @@ langinfo(code)
PROTOTYPE: _
CODE:
#ifdef HAS_NL_LANGINFO
+ char *buf;
+ STRLEN len;
+ SV *sv;
if (code < 0) {
SETERRNO(EINVAL, LIB_INVARG);
RETVAL = &PL_sv_undef;
} else {
- RETVAL = newSVpv(nl_langinfo(code), 0);
+ buf = nl_langinfo(code);
+ len = strlen(buf);
+ sv = newSVpv(buf, 0);
+ if (! is_ascii_string((U8*) buf, len)
+ && is_utf8_string((U8*) buf, len)
+ && _is_cur_LC_category_utf8(LC_TIME))
+ {
+ SvUTF8_on(sv);
+ }
+ RETVAL = sv;
}
#else
croak("nl_langinfo() not implemented on this architecture");
diff --git a/ext/I18N-Langinfo/t/utf8.t b/ext/I18N-Langinfo/t/utf8.t
new file mode 100644
index 0000000..2773a2e
--- /dev/null
+++ b/ext/I18N-Langinfo/t/utf8.t
@@ -0,0 +1,30 @@
+#!perl -T
+use strict;
+use Config;
+use Test::More;
+
+plan skip_all => "I18N::Langinfo or POSIX unavailable"
+ if $Config{'extensions'} !~ m!\bI18N/Langinfo\b!
+ or $Config{'extensions'} !~ m!\bPOSIX\b!;
+
+plan skip_all => "setlocale unavailable"
+ if $Config{'d_setlocale'} ne 'define';
+
+require POSIX;
+
+plan skip_all => "fr_FR.UTF-8 locale unavailable"
+ if !POSIX::setlocale(POSIX::LC_TIME(), 'fr_FR.UTF-8');
+
+plan tests => 5;
+
+use_ok('I18N::Langinfo', 'langinfo', 'MON_12');
+
+SKIP: {
+ my $string = eval { langinfo(MON_12()) };
+ is( $@, '', "calling langinfo() with MON_12" );
+ skip "returned string was empty, skipping next two tests", 2 unless $string;
+ ok( defined $string, "checking if the returned string is defined" );
+ cmp_ok( length($string), '>=', 1, "checking if the returned string has a positive length" );
+ ok($string =~ / ^ \p{ASCII}+ $ /x || utf8::is_utf8($string), "'$string' is pure ASCII or an UTF8 string");
+}
+
--
2.6.4
|
From @khwilliamsonOn 01/16/2016 05:15 AM, Niko Tyni (via RT) wrote:
I'm not sure about the best way to go about this. I'm still reluctant (In pure perl, within the scope of 'use locale', you can do fc(\xdf) and So if someone has an opinion about this, chime in. Here's a couple problems with your patch. It is possible for the various locale categories to be in different For the .t file, looking in Configure for whether setlocale and POSIX,
|
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonAlthough these patches were not used directly, this bug got fixed as part of then general overhaul of locale handling in 5.28. And there are tests for this in ext/I18N-Langinfo/t/Langinfo.t, where it goes through the UTF-8 locales, and looks for a non-ASCII time-related unit. It checks that the returned string for that is flagged as UTF-8, and stops looking for more Karl Williamson |
@khwilliamson - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#127288 (status was 'resolved')
Searchable as RT127288$
The text was updated successfully, but these errors were encountered: