-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
\N{} incompatibility in 5.12+ #10367
Comments
From tokuhirom@gpath.example.orgCreated by tokuhirom@gpath.example.orgfollowing one liner fails with perl 5.12.0. perl -e 'use charnames ":full"; /\N{FULLWIDTH LEFT PARENTHESIS}./;print "ok\n";' Invalid hexadecimal number in \N{U+...} in regex; marked by <-- HERE in m/\N{U+FF08} <-- HERE ./ at -e line 1. Perl Info
|
From @tokuhiromCreated by tokuhirom@gmail.comfollowing one liner works in perl5.10.0, but it fails with perl 5.12.0 % perl -e 'use charnames ":full"; /\N{FULLWIDTH LEFT Perl Info
|
From @khwilliamsonTokuhiro Matsuno (via RT) wrote:
Thanks for the bug report. I was the one who introduced the bug. I'm |
The RT System itself - Status changed from 'new' to 'open' |
From @obra
I'm going to hold 5.12.1 RC1 for this. Best, |
From @khwilliamsonAttached is a minimal patch to fix this. There are two other commits |
From @khwilliamson0001-Comment-where-to-find-file-s-format.patchFrom ce65c312b89d6f851ca46d24719e07bce288ee99 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Sat, 8 May 2010 13:12:53 -0600
Subject: [PATCH] Comment where to find file's format
---
t/re/re_tests | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/t/re/re_tests b/t/re/re_tests
index 1807ffc..b7471d9 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1,5 +1,5 @@
# This stops me getting screenfulls of syntax errors every time I accidentally
-# run this file via a shell glob
+# run this file via a shell glob. Format of this file is given in regexp.t
__END__
abc abc y $& abc
abc abc y $-[0] 0
--
1.5.6.3
|
From @khwilliamson0002-Note-in-comment-that-many-N-.-tests-won-t-work-h.patchFrom 50e44d09a829eed4eeabf9ce78d3374a5f785d4f Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Sat, 8 May 2010 13:38:27 -0600
Subject: [PATCH] Note in comment that many \N{...} tests won't work here
---
t/re/re_tests | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/t/re/re_tests b/t/re/re_tests
index b7471d9..c550b5a 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1,5 +1,7 @@
# This stops me getting screenfulls of syntax errors every time I accidentally
# run this file via a shell glob. Format of this file is given in regexp.t
+# Can't use \N{VALID NAME TEST} here because need 'use charnames'; but can use
+# \N{U+valid} here.
__END__
abc abc y $& abc
abc abc y $-[0] 0
--
1.5.6.3
|
From @khwilliamson0003-PATCH-perl-74978-dot-after-breaks-N.patchFrom 1bb86a94fea493dd6213e60ed8e19b51b8ceea0c Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Sat, 8 May 2010 14:06:10 -0600
Subject: [PATCH] PATCH [perl #74978] dot after } breaks \N{}
The problem is that a dot can come between the braces in \N{foo.bar},
but when searching for it, I didn't stop looking at the right brace, so
it generated an error inappropriately.
This is essentially a minimum patch; efficiency could be improved
slightly with a little more work.
---
regcomp.c | 8 +++-----
t/re/pat.t | 8 +++++++-
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index f665f0b..be5acdb 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -6762,11 +6762,10 @@ S_reg_namedseq(pTHX_ RExC_state_t *pRExC_state, UV *valuep, I32 *flagp)
| PERL_SCAN_DISALLOW_PREFIX
| (SIZE_ONLY ? PERL_SCAN_SILENT_ILLDIGIT : 0);
- char * endchar = strchr(RExC_parse, '.');
- if (endchar) {
+ char * endchar = RExC_parse + strcspn(RExC_parse, ".}");
+ if (endchar < endbrace) {
ckWARNreg(endchar, "Using just the first character returned by \\N{} in character class");
}
- else endchar = endbrace;
length_of_hex = (STRLEN)(endchar - RExC_parse);
*valuep = grok_hex(RExC_parse, &length_of_hex, &flags, NULL);
@@ -6817,8 +6816,7 @@ S_reg_namedseq(pTHX_ RExC_state_t *pRExC_state, UV *valuep, I32 *flagp)
/* Code points are separated by dots. If none, there is only one
* code point, and is terminated by the brace */
- endchar = strchr(RExC_parse, '.');
- if (! endchar) endchar = endbrace;
+ endchar = RExC_parse + strcspn(RExC_parse, ".}");
/* The values are Unicode even on EBCDIC machines */
length_of_hex = (STRLEN)(endchar - RExC_parse);
diff --git a/t/re/pat.t b/t/re/pat.t
index 40ae52e..7b9594c 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -23,7 +23,7 @@ BEGIN {
}
-plan tests => 297; # Update this when adding/deleting tests.
+plan tests => 299; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -987,6 +987,12 @@ sub run_tests {
ok "abbbbc" =~ m/\N{3,4}/ && $& eq "abbb", '"abbbbc" =~ m/\N{3,4}/ && $& eq "abbb"';
}
+ {
+ use charnames ":full";
+ local $Message = '[perl #74982] Period coming after \N{}';
+ ok "\x{ff08}." =~ m/\N{FULLWIDTH LEFT PARENTHESIS}./ && $& eq "\x{ff08}.";
+ ok "\x{ff08}." =~ m/[\N{FULLWIDTH LEFT PARENTHESIS}]./ && $& eq "\x{ff08}.";
+ }
} # End of sub run_tests
--
1.5.6.3
|
The RT System itself - Status changed from 'new' to 'open' |
From @obraOn Sat, May 08, 2010 at 02:18:33PM -0600, karl williamson wrote:
Thanks. Applied. +1 to backport the code patch for .1. -Jesse |
From @xdgOn Sat, May 8, 2010 at 5:34 PM, Jesse Vincent <jesse@fsck.com> wrote:
agreed. +1 to backport |
@rgs - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#74978 (status was 'resolved')
Searchable as RT74978$
The text was updated successfully, but these errors were encountered: