-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regex: \c inside (?[]) causes panics and unexpected behavior #14934
Comments
From victor@drawall.ccCreated by @Grimy* /(?[(\c]) / panics: Read past end of '(?[ ])'. * /(?[\c#])/, /(?[\c[])/, /(?[\c\])/ and /(?[\c]])/ all report a syntax error. * /(?[(\c])/ matches a single "\c]". * /(?[(\c]) ]\b/ behaves like /\c]b/. * /(?[\c[]](])/ behaves like /\c[\]/. * /(?[\c#] All of these bugs were found on the current blead Perl Info
|
From victor@drawall.ccThe attached patch works-for-me™. Should I add new tests checking for |
From @jkeenanOn Wed Sep 30 09:15:21 2015, victor@drawall.cc wrote:
It appears that the patch did not get attached. Can you re-try? Thank you very much. -- |
The RT System itself - Status changed from 'new' to 'open' |
From victor@drawall.ccSorry for the mess-up. Here’s the patch again. I’m also including it From 96cb469f3930a276be8cedac7d23e9a26899be08 Mon Sep 17 00:00:00 2001 The \cX notation for control characters used to cause panics and unexpected The solution is to ignore the byte following \c during the first parsing pass AUTHORS | 1 + Inline Patchdiff --git a/AUTHORS b/AUTHORS
index 451c707..ebd9222 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -1222,6 +1222,7 @@ Unicode Consortium <unicode.org>
Vadim Konovalov <vkonovalov@lucent.com>
Valeriy E. Ushakov <uwe@ptc.spbu.ru>
Vernon Lyon <vlyon@cpan.org>
+Victor Adam <victor@drawall.cc>
Victor Efimov <victor@vsespb.ru>
Viktor Turskyi <koorchik@gmail.com>
Ville Skyttä <scop@cs132170.pp.htv.fi>
diff --git a/regcomp.c b/regcomp.c
index 4f4bb44..a8e80ee 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -13454,6 +13454,10 @@ S_handle_regex_sets(pTHX_ RExC_state_t
|
From victor@drawall.cc0001-regex-handle-cX-inside.patchFrom 96cb469f3930a276be8cedac7d23e9a26899be08 Mon Sep 17 00:00:00 2001
From: Victor Adam <victor@drawall.cc>
Date: Sun, 27 Sep 2015 10:22:08 +0200
Subject: [PATCH] regex: handle \cX inside (?[])
The \cX notation for control characters used to cause panics and unexpected
behavior when used insed an extended character class. See bug #126181.
The solution is to ignore the byte following \c during the first parsing pass
of a (?[]) construct.
---
AUTHORS | 1 +
regcomp.c | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/AUTHORS b/AUTHORS
index 451c707..ebd9222 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -1222,6 +1222,7 @@ Unicode Consortium <unicode.org>
Vadim Konovalov <vkonovalov@lucent.com>
Valeriy E. Ushakov <uwe@ptc.spbu.ru>
Vernon Lyon <vlyon@cpan.org>
+Victor Adam <victor@drawall.cc>
Victor Efimov <victor@vsespb.ru>
Viktor Turskyi <koorchik@gmail.com>
Ville Skyttä <scop@cs132170.pp.htv.fi>
diff --git a/regcomp.c b/regcomp.c
index 4f4bb44..a8e80ee 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -13454,6 +13454,10 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV** return_invlist,
* default: case next time and keep on incrementing until
* we find one of the invariants we do handle. */
RExC_parse++;
+ if (*RExC_parse == 'c') {
+ /* Skip the \cX notation for control characters */
+ RExC_parse++;
+ }
break;
case '[':
{
--
2.4.5
|
From @khwilliamsonOn 09/30/2015 02:21 AM, Victor ADAM wrote:
Yes, please. The easiest place to put them is in t/re/re_tests. Your patch looks good to me. I would split the author portion into its Karl Williamson |
From victor@drawall.ccAs discussed, I’ve split my patch in two : 0001*.patch adds my name to |
From victor@drawall.cc0001-AUTHORS-Add-myself.patchFrom 4c62e3b50838f5d7c1f5320046740d1f500b1737 Mon Sep 17 00:00:00 2001
From: Victor Adam <victor@drawall.cc>
Date: Fri, 2 Oct 2015 19:38:25 +0200
Subject: [PATCH 1/3] AUTHORS: Add myself
---
AUTHORS | 1 +
1 file changed, 1 insertion(+)
diff --git a/AUTHORS b/AUTHORS
index 451c707..ebd9222 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -1222,6 +1222,7 @@ Unicode Consortium <unicode.org>
Vadim Konovalov <vkonovalov@lucent.com>
Valeriy E. Ushakov <uwe@ptc.spbu.ru>
Vernon Lyon <vlyon@cpan.org>
+Victor Adam <victor@drawall.cc>
Victor Efimov <victor@vsespb.ru>
Viktor Turskyi <koorchik@gmail.com>
Ville Skyttä <scop@cs132170.pp.htv.fi>
--
2.5.3
|
From victor@drawall.cc0002-test-Add-regex-tests.patchFrom 648d5fc4a052dab62bdcc9f28af234b0ab163063 Mon Sep 17 00:00:00 2001
From: Victor Adam <victor@drawall.cc>
Date: Fri, 2 Oct 2015 19:35:19 +0200
Subject: [PATCH 2/3] test: Add regex tests
This adds tests for RT tickets #126177, #126178, #126179, #126180, #126185,
#126181, #126186, #126187 and #126222.
---
t/re/re_tests | 45 +++++++++++++++++++++++++++++++++++++++++++++
t/re/regex_sets.t | 19 +++++++++++++++++++
t/re/regexp.t | 1 +
3 files changed, 65 insertions(+)
diff --git a/t/re/re_tests b/t/re/re_tests
index 9d5fa73..7d83ee1 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1926,5 +1926,50 @@ A+(*PRUNE)BC(?{}) AAABC y $& AAABC
/(a+){1}+a/ aaa n - - # [perl #125825]
+# RT #126178: these should cause a syntax error
+(?i Bc - Unmatched (
+(?^ Bc - Unmatched (
+(?- Bc - Unmatched (
+(?a-a Bc - Unmatched (
+
+# RT #126177: (?n) should disable numbered captures
+(?n)(abc) abc y -$1 -
+'(?-n)(abc)'n abc y -$1 -abc
+
+# RT #126179: \N{} should cause an error (even when skipping the lexer)
+[\N{}] Bc - Zero length \N{}
+
+# RT #126180: those segfault inside (?[])
+[\ &!] Bsc - Incomplete expression
+[\ +!] Bsc - Incomplete expression
+[\ -!] Bsc - Incomplete expression
+[\ ^!] Bsc - Incomplete expression
+[\ |!] Bsc - Incomplete expression
+
+# RT #126185: (?-p) should be an error
+(?-p) Bc - Regexp modifier \"a\" may not appear after the \"-\"
+'(?-p)'p Bc - Regexp modifier \"a\" may not appear after the \"-\"
+
+# RT #126186: (*ACCEPT:arg) should either be an error or set $REGMARK correctly
+(*MARK:arg) y $REGMARK arg
+(*ACCEPT:arg) By $REGMARK arg
+
+# RT #126187: these produce warnings (even when skipping the tests!)
+# \p c - Can't find Unicode property TODO
+# \p^ c - Can't find Unicode property TODO
+# \P c - Can't find Unicode property TODO
+# \P^ c - Can't find Unicode property TODO
+
+# RT #126222
+(?(?!)) By - -
+
+# RT #126253
+.{1}?? Bc - Nested quantifiers
+.{1}?+ Bc - Nested quantifiers
+.{1}+? c - Nested quantifiers
+.{1}++ c - Nested quantifiers
+.{1}*? c - Nested quantifiers
+.{1}*+ c - Nested quantifiers
+
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
diff --git a/t/re/regex_sets.t b/t/re/regex_sets.t
index ee161b2..185c70f 100644
--- a/t/re/regex_sets.t
+++ b/t/re/regex_sets.t
@@ -138,6 +138,25 @@ if (! is_miniperl() && locales_enabled('LC_CTYPE')) {
}
}
+# RT #126181: \cX behaves strangely inside (?[])
+{
+ no warnings qw(syntax regexp);
+
+ eval { $_ = '/(?[(\c]) /'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c]) / should not panic');
+ eval { $_ = '(?[\c#]' . "\n])"; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c]) / should not panic');
+ eval { $_ = '(?[(\c])'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c])/ should be a syntax error');
+ eval { $_ = '(?[(\c]) ]\b'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c]) ]\b/ should be a syntax error');
+ eval { $_ = '(?[\c[]](])'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[\c[]](])/ should be a syntax error');
+ like("\c#", qr/(?[\c#])/, '\c# should match itself');
+ like("\c[", qr/(?[\c[])/, '\c[ should match itself');
+ like("\c\ ", qr/(?[\c\])/, '\c\ should match itself');
+ like("\c]", qr/(?[\c]])/, '\c] should match itself');
+}
done_testing();
diff --git a/t/re/regexp.t b/t/re/regexp.t
index f27a027..cbac03f 100644
--- a/t/re/regexp.t
+++ b/t/re/regexp.t
@@ -99,6 +99,7 @@ use strict;
use warnings FATAL=>"all";
use vars qw($bang $ffff $nulnul); # used by the tests
use vars qw($qr $skip_amp $qr_embed $qr_embed_thr $regex_sets); # set by our callers
+use vars qw($REGMARK); # set from tested patterns
--
2.5.3
|
From victor@drawall.cc0003-regex-handle-cX-inside.patchFrom afd2df3109b876d25120b34a47fc6d952d99380d Mon Sep 17 00:00:00 2001
From: Victor Adam <victor@drawall.cc>
Date: Sun, 27 Sep 2015 10:22:08 +0200
Subject: [PATCH 3/3] regex: handle \cX inside (?[])
The \cX notation for control characters used to cause panics and unexpected
behavior when used insed an extended character class. See bug #126181.
The solution is to ignore the byte following \c during the first parsing pass
of a (?[]) construct.
---
regcomp.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/regcomp.c b/regcomp.c
index 4f4bb44..a8e80ee 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -13454,6 +13454,10 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV** return_invlist,
* default: case next time and keep on incrementing until
* we find one of the invariants we do handle. */
RExC_parse++;
+ if (*RExC_parse == 'c') {
+ /* Skip the \cX notation for control characters */
+ RExC_parse++;
+ }
break;
case '[':
{
--
2.5.3
|
From victor@drawall.ccThanks for the pointers! I already love re_tests, it makes it really Unfortunately, it doesn’t look like I can test the last example using Anyway, I’m working right now on adding tests for this and the other |
From victor@drawall.ccAs discussed, I’ve split my patch in two : 0001*.patch adds my name to |
From @khwilliamsonOn 10/03/2015 12:25 AM, Victor ADAM wrote:
One of our rules is that individual patches can't break blead on a major Syntax error in (?[...]) in regex m/(?[\c#])/ at re/regex_sets.t line 155. I'm afraid this patch was problematic to apply, as you had the And thanks for doing all this. |
From victor@drawall.ccSorry for the long delay. I’ve rebased my work on top of the current |
From victor@drawall.cc0001-AUTHORS-Add-myself.patchFrom 969a278ddf8f298c12df76ace52ee5af27814cc6 Mon Sep 17 00:00:00 2001
From: Victor Adam <victor@drawall.cc>
Date: Fri, 2 Oct 2015 19:38:25 +0200
Subject: [PATCH 1/2] AUTHORS: Add myself
---
AUTHORS | 1 +
1 file changed, 1 insertion(+)
diff --git a/AUTHORS b/AUTHORS
index 451c707..ebd9222 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -1222,6 +1222,7 @@ Unicode Consortium <unicode.org>
Vadim Konovalov <vkonovalov@lucent.com>
Valeriy E. Ushakov <uwe@ptc.spbu.ru>
Vernon Lyon <vlyon@cpan.org>
+Victor Adam <victor@drawall.cc>
Victor Efimov <victor@vsespb.ru>
Viktor Turskyi <koorchik@gmail.com>
Ville Skyttä <scop@cs132170.pp.htv.fi>
--
2.5.3
|
From victor@drawall.cc0002-regex-handle-cX-inside.patchFrom 4e40d5385b4ac0dd89f159ef411f90068b39b3b4 Mon Sep 17 00:00:00 2001
From: Victor Adam <victor@drawall.cc>
Date: Sun, 27 Sep 2015 10:22:08 +0200
Subject: [PATCH 2/2] regex: handle \cX inside (?[])
The \cX notation for control characters used to cause panics and unexpected
behavior when used insed an extended character class. See bug #126181.
The solution is to ignore the byte following \c during the first parsing pass
of a (?[]) construct.
---
regcomp.c | 4 ++++
t/re/regex_sets.t | 19 +++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/regcomp.c b/regcomp.c
index 540f71c..beec98d 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -13444,6 +13444,10 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV** return_invlist,
* default: case next time and keep on incrementing until
* we find one of the invariants we do handle. */
RExC_parse++;
+ if (*RExC_parse == 'c') {
+ /* Skip the \cX notation for control characters */
+ RExC_parse++;
+ }
break;
case '[':
{
diff --git a/t/re/regex_sets.t b/t/re/regex_sets.t
index 5c683ba..a5941ba 100644
--- a/t/re/regex_sets.t
+++ b/t/re/regex_sets.t
@@ -139,6 +139,25 @@ if (! is_miniperl() && locales_enabled('LC_CTYPE')) {
}
}
+# RT #126181: \cX behaves strangely inside (?[])
+{
+ no warnings qw(syntax regexp);
+
+ eval { $_ = '/(?[(\c]) /'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c]) / should not panic');
+ eval { $_ = '(?[\c#]' . "\n])"; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c]) / should not panic');
+ eval { $_ = '(?[(\c])'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c])/ should be a syntax error');
+ eval { $_ = '(?[(\c]) ]\b'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[(\c]) ]\b/ should be a syntax error');
+ eval { $_ = '(?[\c[]](])'; qr/$_/ };
+ like($@, qr/^Syntax error/, '/(?[\c[]](])/ should be a syntax error');
+ like("\c#", qr/(?[\c#])/, '\c# should match itself');
+ like("\c[", qr/(?[\c[])/, '\c[ should match itself');
+ like("\c\ ", qr/(?[\c\])/, '\c\ should match itself');
+ like("\c]", qr/(?[\c]])/, '\c] should match itself');
+}
done_testing();
--
2.5.3
|
From @khwilliamsonThanks, applied as |
@khwilliamson - Status changed from 'open' to 'pending release' |
@mauke - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#126181 (status was 'resolved')
Searchable as RT126181$
The text was updated successfully, but these errors were encountered: