5.20 regression: '"X" !~ /[x]/i', when pattern is UTF-8 #14051
Comments
From @khwilliamsonThis is a bug report for perl from khw@khw.(none), A regression was introduced by this: Convert more EXACTFish nodes to EXACT when possible Under /i matching, many characters match only themselves, such a This changes the alloc_maybe_populate() function to look for A patch is smoking. Flags: Site configuration information for perl 5.21.4: Configured by khw at Sat Aug 30 06:56:50 MDT 2014. Summary of my perl5 (revision 5 version 21 subversion 4) configuration: @INC for perl 5.21.4: Environment for perl 5.21.4: PATH=/home/khw/bin:/home/khw/perl5/perlbrew/bin:/home/khw/print/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/games:/usr/local/games:/home/khw/iands/www:/home/khw/cxoffice/bin |
From @khwilliamsonI stumbled across this problem today. Since this is a 5.20 regression, it should go into a 5.20 maint release. Is it worth delaying 5.20.1 for? The patch commitdiff is The problem is I should have been testing for the characters being part of a fold pair, instead of the fold of one being different from itself. The latter test only works on uppercase input. |
The RT System itself - Status changed from 'new' to 'open' |
From @jkeenanOn Sat Aug 30 10:34:24 2014, khw wrote:
Regression confirmed: ##### $ perl 122655-regex.pl $ perl 122655-regex.pl |
From @jkeenanOn Sat Aug 30 10:34:24 2014, khw wrote:
Since, as I understand it, the main purpose of maintenance releases is to correct regressions that have crept in to a major release, I vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not doing the release, so I don't know how much this burdens the Release Manager.) Thank you very much. |
From @jkeenanOn 8/30/14 3:28 PM, James E Keenan via RT wrote:
I checked out the maint-5.20 branch, then checked out a new branch on I'm attaching the patch as it would apply to maint-5.20 branch. Since Thank you very much. |
From @jkeenan122655-Correct-5.20-regression-X-x-i.maint.5.20.patchFrom 1081e0b54d547a5e2a62087689194e5c5f005876 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 30 Aug 2014 16:08:58 -0400
Subject: [PATCH] Correct 5.20 regression: '"X" !~ /[x]/i'
This problem occurs only when the pattern is UTF-8, contains a single ASCII
lowercase letter. It does not match its uppercase counterpart.
For RT #122655
---
regcomp.c | 14 ++++++++++----
t/re/pat.t | 8 +++++++-
2 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index 8d4ebda..e991999 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -10976,10 +10976,16 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t *pRExC_state,
EBCDIC, but it works there, as the extra invariants
fold to themselves) */
*character = toFOLD((U8) code_point);
- if (downgradable
- && *character == code_point
- && ! HAS_NONLATIN1_FOLD_CLOSURE(code_point))
- {
+
+ /* We can downgrade to an EXACT node if this character
+ * isn't a folding one. Note that this assumes that
+ * nothing above Latin1 folds to some other invariant than
+ * one of these alphabetics; otherwise we would also have
+ * to check:
+ * && (! HAS_NONLATIN1_FOLD_CLOSURE(code_point)
+ * || ASCII_FOLD_RESTRICTED))
+ */
+ if (downgradable && PL_fold[code_point] == code_point) {
OP(node) = EXACT;
}
}
diff --git a/t/re/pat.t b/t/re/pat.t
index 04f8b84..51838f9 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
require './test.pl';
}
-plan tests => 721; # Update this when adding/deleting tests.
+plan tests => 722; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -1582,6 +1582,12 @@ EOP
+ { # Was getting optimized into EXACT (non-folding node)
+ my $x = qr/[x]/i;
+ utf8::upgrade($x);
+ like("X", qr/$x/, "UTF-8 of /[x]/i matches upper case");
+ }
+
} # End of sub run_tests
1;
--
1.6.3.2
|
From @cpansproutOn Sat Aug 30 17:26:16 2014, jkeen@verizon.net wrote:
I would have to leave that to someone who knows the regexp engine better. But backporting gets my +1. (Whether it should delay 5.20.1 or go into 5.20.2 I decline to opine [hey, that rhymes!].) -- Father Chrysostomos |
From @khwilliamsonNow in blead as b6e093f |
@khwilliamson - Status changed from 'open' to 'pending release' |
From @khwilliamsonOn Sat Aug 30 17:26:16 2014, jkeen@verizon.net wrote:
Your maint patch looks good to me |
From @jkeenanOn Mon Sep 01 08:04:18 2014, khw wrote:
I posted a request for it to go into 5.20.1 in one of the relevant perl5-porters mailing list threads. -- |
From @steve-m-hayOn 1 Sep 2014 20:59, "James E Keenan via RT" <perlbug-followup@perl.org>
Thanks guys. I will roll out an RC2 with this when I return from vacation |
From @steve-m-hayOn Mon Sep 01 23:13:19 2014, shay wrote:
One day late, but now applied in commit db6d387. |
From @khwilliamsonThanks for submitting this ticket The issue should be resolved with the release today of Perl v5.22. If you find that the problem persists, feel free to reopen this ticket -- |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#122655 (status was 'resolved')
Searchable as RT122655$
The text was updated successfully, but these errors were encountered: