-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5.20 regression: '"X" !~ /[x]/i', when pattern is UTF-8 #14051
Comments
From @khwilliamsonThis is a bug report for perl from khw@khw.(none), A regression was introduced by this: Convert more EXACTFish nodes to EXACT when possible Under /i matching, many characters match only themselves, such a This changes the alloc_maybe_populate() function to look for A patch is smoking. Flags: Site configuration information for perl 5.21.4: Configured by khw at Sat Aug 30 06:56:50 MDT 2014. Summary of my perl5 (revision 5 version 21 subversion 4) configuration: @INC for perl 5.21.4: Environment for perl 5.21.4: PATH=/home/khw/bin:/home/khw/perl5/perlbrew/bin:/home/khw/print/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/games:/usr/local/games:/home/khw/iands/www:/home/khw/cxoffice/bin |
From @khwilliamsonI stumbled across this problem today. Since this is a 5.20 regression, it should go into a 5.20 maint release. Is it worth delaying 5.20.1 for? The patch commitdiff is The problem is I should have been testing for the characters being part of a fold pair, instead of the fold of one being different from itself. The latter test only works on uppercase input. |
The RT System itself - Status changed from 'new' to 'open' |
From @jkeenanOn Sat Aug 30 10:34:24 2014, khw wrote:
Regression confirmed: ##### $ perl 122655-regex.pl $ perl 122655-regex.pl |
From @jkeenanOn Sat Aug 30 10:34:24 2014, khw wrote:
Since, as I understand it, the main purpose of maintenance releases is to correct regressions that have crept in to a major release, I vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not doing the release, so I don't know how much this burdens the Release Manager.) Thank you very much. |
From @jkeenanOn 8/30/14 3:28 PM, James E Keenan via RT wrote:
I checked out the maint-5.20 branch, then checked out a new branch on I'm attaching the patch as it would apply to maint-5.20 branch. Since Thank you very much. |
From @jkeenan122655-Correct-5.20-regression-X-x-i.maint.5.20.patchFrom 1081e0b54d547a5e2a62087689194e5c5f005876 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 30 Aug 2014 16:08:58 -0400
Subject: [PATCH] Correct 5.20 regression: '"X" !~ /[x]/i'
This problem occurs only when the pattern is UTF-8, contains a single ASCII
lowercase letter. It does not match its uppercase counterpart.
For RT #122655
---
regcomp.c | 14 ++++++++++----
t/re/pat.t | 8 +++++++-
2 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index 8d4ebda..e991999 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -10976,10 +10976,16 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t *pRExC_state,
EBCDIC, but it works there, as the extra invariants
fold to themselves) */
*character = toFOLD((U8) code_point);
- if (downgradable
- && *character == code_point
- && ! HAS_NONLATIN1_FOLD_CLOSURE(code_point))
- {
+
+ /* We can downgrade to an EXACT node if this character
+ * isn't a folding one. Note that this assumes that
+ * nothing above Latin1 folds to some other invariant than
+ * one of these alphabetics; otherwise we would also have
+ * to check:
+ * && (! HAS_NONLATIN1_FOLD_CLOSURE(code_point)
+ * || ASCII_FOLD_RESTRICTED))
+ */
+ if (downgradable && PL_fold[code_point] == code_point) {
OP(node) = EXACT;
}
}
diff --git a/t/re/pat.t b/t/re/pat.t
index 04f8b84..51838f9 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
require './test.pl';
}
-plan tests => 721; # Update this when adding/deleting tests.
+plan tests => 722; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -1582,6 +1582,12 @@ EOP
+ { # Was getting optimized into EXACT (non-folding node)
+ my $x = qr/[x]/i;
+ utf8::upgrade($x);
+ like("X", qr/$x/, "UTF-8 of /[x]/i matches upper case");
+ }
+
} # End of sub run_tests
1;
--
1.6.3.2
|
From @cpansproutOn Sat Aug 30 17:26:16 2014, jkeen@verizon.net wrote:
I would have to leave that to someone who knows the regexp engine better. But backporting gets my +1. (Whether it should delay 5.20.1 or go into 5.20.2 I decline to opine [hey, that rhymes!].) -- Father Chrysostomos |
From @khwilliamsonNow in blead as b6e093f |
@khwilliamson - Status changed from 'open' to 'pending release' |
From @khwilliamsonOn Sat Aug 30 17:26:16 2014, jkeen@verizon.net wrote:
Your maint patch looks good to me |
From @jkeenanOn Mon Sep 01 08:04:18 2014, khw wrote:
I posted a request for it to go into 5.20.1 in one of the relevant perl5-porters mailing list threads. -- |
From @steve-m-hayOn 1 Sep 2014 20:59, "James E Keenan via RT" <perlbug-followup@perl.org>
Thanks guys. I will roll out an RC2 with this when I return from vacation |
From @steve-m-hayOn Mon Sep 01 23:13:19 2014, shay wrote:
One day late, but now applied in commit db6d387. |
From @khwilliamsonThanks for submitting this ticket The issue should be resolved with the release today of Perl v5.22. If you find that the problem persists, feel free to reopen this ticket -- |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#122655 (status was 'resolved')
Searchable as RT122655$
The text was updated successfully, but these errors were encountered: