Skip to content

Commit

Permalink
fix #131649 - extended charclass can trigger assert
Browse files Browse the repository at this point in the history
The extended charclass parser makes some assumptions during the
first pass which are only true on well structured input, and it
does not properly catch various errors. later on the code assumes
that things the first pass will let through are valid, when in
fact they should trigger errors.
  • Loading branch information
demerphq committed Dec 7, 2017
1 parent af05e4c commit 19a498a
Show file tree
Hide file tree
Showing 6 changed files with 68 additions and 32 deletions.
27 changes: 26 additions & 1 deletion pod/perldiag.pod
Expand Up @@ -5945,7 +5945,7 @@ yourself.
a perl4 interpreter, especially if the next 2 tokens are "use strict"
or "my $var" or "our $var".

=item Syntax error in (?[...]) in regex m/%s/
=item Syntax error in (?[...]) in regex; marked by <-- HERE in m/%s/

(F) Perl could not figure out what you meant inside this construct; this
notifies you that it is giving up trying.
Expand Down Expand Up @@ -6441,6 +6441,31 @@ to find out why that isn't happening.
(F) The unexec() routine failed for some reason. See your local FSF
representative, who probably put it there in the first place.

=item Unexpected ']' with no following ')' in (?[... in regex; marked by <-- HERE in m/%s/

(F) While parsing an extended character class a ']' character was encountered
at a point in the definition where the only legal use of ']' is to close the
character class definition as part of a '])', you may have forgotten the close
paren, or otherwise confused the parser.

=item Expecting close paren for nested extended charclass in regex; marked by <-- HERE in m/%s/

(F) While parsing a nested extended character class like:

(?[ ... (?flags:(?[ ... ])) ... ])
^

we expected to see a close paren ')' (marked by ^) but did not.

=item Expecting close paren for wrapper for nested extended charclass in regex; marked by <-- HERE in m/%s/

(F) While parsing a nested extended character class like:

(?[ ... (?flags:(?[ ... ])) ... ])
^

we expected to see a close paren ')' (marked by ^) but did not.

=item Unexpected binary operator '%c' with no preceding operand in regex;
marked by S<<-- HERE> in m/%s/

Expand Down
4 changes: 2 additions & 2 deletions pod/perlrecharclass.pod
Expand Up @@ -1128,8 +1128,8 @@ hence both of the following work:
Any contained POSIX character classes, including things like C<\w> and C<\D>
respect the C<E<sol>a> (and C<E<sol>aa>) modifiers.

C<< (?[ ]) >> is a regex-compile-time construct. Any attempt to use
something which isn't knowable at the time the containing regular
Note that C<< (?[ ]) >> is a regex-compile-time construct. Any attempt
to use something which isn't knowable at the time the containing regular
expression is compiled is a fatal error. In practice, this means
just three limitations:

Expand Down
28 changes: 18 additions & 10 deletions regcomp.c
Expand Up @@ -14947,8 +14947,9 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV** return_invlist,
TRUE /* Force /x */ );

switch (*RExC_parse) {
case '?':
if (RExC_parse[1] == '[') nest_depth++, RExC_parse++;
case '(':
if (RExC_parse[1] == '?' && RExC_parse[2] == '[')
nest_depth++, RExC_parse+=2;
/* FALLTHROUGH */
default:
break;
Expand Down Expand Up @@ -15005,9 +15006,9 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV** return_invlist,
}

case ']':
if (nest_depth--) break;
RExC_parse++;
if (*RExC_parse == ')') {
if (RExC_parse[1] == ')') {
RExC_parse++;
if (nest_depth--) break;
node = reganode(pRExC_state, ANYOF, 0);
RExC_size += ANYOF_SKIP;
nextchar(pRExC_state);
Expand All @@ -15019,20 +15020,25 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV** return_invlist,

return node;
}
goto no_close;
/* We output the messages even if warnings are off, because we'll fail
* the very next thing, and these give a likely diagnosis for that */
if (posix_warnings && av_tindex_skip_len_mg(posix_warnings) >= 0) {
output_or_return_posix_warnings(pRExC_state, posix_warnings, NULL);
}
RExC_parse++;
vFAIL("Unexpected ']' with no following ')' in (?[...");
}

RExC_parse += UTF ? UTF8SKIP(RExC_parse) : 1;
}

no_close:
/* We output the messages even if warnings are off, because we'll fail
* the very next thing, and these give a likely diagnosis for that */
if (posix_warnings && av_tindex_skip_len_mg(posix_warnings) >= 0) {
output_or_return_posix_warnings(pRExC_state, posix_warnings, NULL);
}

FAIL("Syntax error in (?[...])");
vFAIL("Syntax error in (?[...])");
}

/* Pass 2 only after this. */
Expand Down Expand Up @@ -15212,12 +15218,14 @@ S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV** return_invlist,
* inversion list, and RExC_parse points to the trailing
* ']'; the next character should be the ')' */
RExC_parse++;
assert(UCHARAT(RExC_parse) == ')');
if (UCHARAT(RExC_parse) != ')')
vFAIL("Expecting close paren for nested extended charclass");

/* Then the ')' matching the original '(' handled by this
* case: statement */
RExC_parse++;
assert(UCHARAT(RExC_parse) == ')');
if (UCHARAT(RExC_parse) != ')')
vFAIL("Expecting close paren for wrapper for nested extended charclass");

RExC_parse++;
RExC_flags = save_flags;
Expand Down
6 changes: 3 additions & 3 deletions t/lib/warnings/regcomp
Expand Up @@ -59,21 +59,21 @@ Unmatched [ in regex; marked by <-- HERE in m/abc[ <-- HERE fi[.00./ at - line
qr/(?[[[:word]]])/;
EXPECT
Assuming NOT a POSIX class since there is no terminating ':' in regex; marked by <-- HERE in m/(?[[[:word <-- HERE ]]])/ at - line 2.
syntax error in (?[...]) in regex m/(?[[[:word]]])/ at - line 2.
Unexpected ']' with no following ')' in (?[... in regex; marked by <-- HERE in m/(?[[[:word]] <-- HERE ])/ at - line 2.
########
# NAME qr/(?[ [[:digit: ])/
# OPTION fatal
qr/(?[[[:digit: ])/;
EXPECT
Assuming NOT a POSIX class since no blanks are allowed in one in regex; marked by <-- HERE in m/(?[[[:digit: ] <-- HERE )/ at - line 2.
syntax error in (?[...]) in regex m/(?[[[:digit: ])/ at - line 2.
syntax error in (?[...]) in regex; marked by <-- HERE in m/(?[[[:digit: ]) <-- HERE / at - line 2.
########
# NAME qr/(?[ [:digit: ])/
# OPTION fatal
qr/(?[[:digit: ])/
EXPECT
Assuming NOT a POSIX class since no blanks are allowed in one in regex; marked by <-- HERE in m/(?[[:digit: ] <-- HERE )/ at - line 2.
syntax error in (?[...]) in regex m/(?[[:digit: ])/ at - line 2.
syntax error in (?[...]) in regex; marked by <-- HERE in m/(?[[:digit: ]) <-- HERE / at - line 2.
########
# NAME [perl #126141]
# OPTION fatal
Expand Down
29 changes: 16 additions & 13 deletions t/re/reg_mesg.t
Expand Up @@ -234,8 +234,9 @@ my @death =
'/\b{gc}/' => "'gc' is an unknown bound type {#} m/\\b{gc{#}}/",
'/\B{gc}/' => "'gc' is an unknown bound type {#} m/\\B{gc{#}}/",

'/(?[[[::]]])/' => "Syntax error in (?[...]) in regex m/(?[[[::]]])/",
'/(?[[[:w:]]])/' => "Syntax error in (?[...]) in regex m/(?[[[:w:]]])/",

'/(?[[[::]]])/' => "Unexpected ']' with no following ')' in (?[... {#} m/(?[[[::]]{#}])/",
'/(?[[[:w:]]])/' => "Unexpected ']' with no following ')' in (?[... {#} m/(?[[[:w:]]{#}])/",
'/(?[[:w:]])/' => "",
'/([.].*)[.]/' => "", # [perl #127582]
'/[.].*[.]/' => "", # [perl #127604]
Expand All @@ -258,11 +259,12 @@ my @death =
'/(?[ \p{foo} ])/' => 'Can\'t find Unicode property definition "foo" {#} m/(?[ \p{foo}{#} ])/',
'/(?[ \p{ foo = bar } ])/' => 'Can\'t find Unicode property definition "foo = bar" {#} m/(?[ \p{ foo = bar }{#} ])/',
'/(?[ \8 ])/' => 'Unrecognized escape \8 in character class {#} m/(?[ \8{#} ])/',
'/(?[ \t ]/' => 'Syntax error in (?[...]) in regex m/(?[ \t ]/',
'/(?[ [ \t ]/' => 'Syntax error in (?[...]) in regex m/(?[ [ \t ]/',
'/(?[ \t ] ]/' => 'Syntax error in (?[...]) in regex m/(?[ \t ] ]/',
'/(?[ [ ] ]/' => 'Syntax error in (?[...]) in regex m/(?[ [ ] ]/',
'/(?[ \t + \e # This was supposed to be a comment ])/' => 'Syntax error in (?[...]) in regex m/(?[ \t + \e # This was supposed to be a comment ])/',
'/(?[ \t ]/' => "Unexpected ']' with no following ')' in (?[... {#} m/(?[ \\t ]{#}/",
'/(?[ [ \t ]/' => "Syntax error in (?[...]) {#} m/(?[ [ \\t ]{#}/",
'/(?[ \t ] ]/' => "Unexpected ']' with no following ')' in (?[... {#} m/(?[ \\t ]{#} ]/",
'/(?[ [ ] ]/' => "Syntax error in (?[...]) {#} m/(?[ [ ] ]{#}/",
'/(?[ \t + \e # This was supposed to be a comment ])/' =>
"Syntax error in (?[...]) {#} m/(?[ \\t + \\e # This was supposed to be a comment ]){#}/",
'/(?[ ])/' => 'Incomplete expression within \'(?[ ])\' {#} m/(?[ {#}])/',
'm/(?[[a-\d]])/' => 'False [] range "a-\d" {#} m/(?[[a-\d{#}]])/',
'm/(?[[\w-x]])/' => 'False [] range "\w-" {#} m/(?[[\w-{#}x]])/',
Expand Down Expand Up @@ -452,10 +454,10 @@ my @death_utf8 = mark_as_utf8(

'/ネ\p{}ネ/' => 'Empty \p{} {#} m/ネ\p{{#}}ネ/',

'/ネ(?[[[:ネ]]])ネ/' => "Syntax error in (?[...]) in regex m/ネ(?[[[:ネ]]])ネ/",
'/ネ(?[[[:ネ: ])ネ/' => "Syntax error in (?[...]) in regex m/ネ(?[[[:ネ: ])ネ/",
'/ネ(?[[[::]]])ネ/' => "Syntax error in (?[...]) in regex m/ネ(?[[[::]]])ネ/",
'/ネ(?[[[:ネ:]]])ネ/' => "Syntax error in (?[...]) in regex m/ネ(?[[[:ネ:]]])ネ/",
'/ネ(?[[[:ネ]]])ネ/' => "Unexpected ']' with no following ')' in (?[... {#} m/ネ(?[[[:ネ]]{#}])ネ/",
'/ネ(?[[[:ネ: ])ネ/' => "Syntax error in (?[...]) {#} m/ネ(?[[[:ネ: ])ネ{#}/",
'/ネ(?[[[::]]])ネ/' => "Unexpected ']' with no following ')' in (?[... {#} m/ネ(?[[[::]]{#}])ネ/",
'/ネ(?[[[:ネ:]]])ネ/' => "Unexpected ']' with no following ')' in (?[... {#} m/ネ(?[[[:ネ:]]{#}])ネ/",
'/ネ(?[[:ネ:]])ネ/' => "",
'/ネ(?[ネ])ネ/' => 'Unexpected character {#} m/ネ(?[ネ{#}])ネ/',
'/ネ(?[ + [ネ] ])/' => 'Unexpected binary operator \'+\' with no preceding operand {#} m/ネ(?[ +{#} [ネ] ])/',
Expand All @@ -468,8 +470,9 @@ my @death_utf8 = mark_as_utf8(
'/(?[ \x{ネ} ])ネ/' => 'Non-hex character {#} m/(?[ \x{ネ{#}} ])ネ/',
'/(?[ \p{ネ} ])/' => 'Can\'t find Unicode property definition "ネ" {#} m/(?[ \p{ネ}{#} ])/',
'/(?[ \p{ ネ = bar } ])/' => 'Can\'t find Unicode property definition "ネ = bar" {#} m/(?[ \p{ ネ = bar }{#} ])/',
'/ネ(?[ \t ]/' => 'Syntax error in (?[...]) in regex m/ネ(?[ \t ]/',
'/(?[ \t + \e # ネ This was supposed to be a comment ])/' => 'Syntax error in (?[...]) in regex m/(?[ \t + \e # ネ This was supposed to be a comment ])/',
'/ネ(?[ \t ]/' => "Unexpected ']' with no following ')' in (?[... {#} m/ネ(?[ \\t ]{#}/",
'/(?[ \t + \e # ネ This was supposed to be a comment ])/' =>
"Syntax error in (?[...]) {#} m/(?[ \\t + \\e # ネ This was supposed to be a comment ]){#}/",
'm/(*ネ)ネ/' => q<Unknown verb pattern 'ネ' {#} m/(*ネ){#}ネ/>,
'/\cネ/' => "Character following \"\\c\" must be printable ASCII",
'/\b{ネ}/' => "'ネ' is an unknown bound type {#} m/\\b{ネ{#}}/",
Expand Down
6 changes: 3 additions & 3 deletions t/re/regex_sets.t
Expand Up @@ -158,13 +158,13 @@ for my $char ("Ω ", "Ω₯", "Ω©") {
eval { $_ = '/(?[(\c]) /'; qr/$_/ };
like($@, qr/^Syntax error/, '/(?[(\c]) / should not panic');
eval { $_ = '(?[\c#]' . "\n])"; qr/$_/ };
like($@, qr/^Syntax error/, '/(?[(\c]) / should not panic');
like($@, qr/^Unexpected/, '/(?[(\c]) / should not panic');
eval { $_ = '(?[(\c])'; qr/$_/ };
like($@, qr/^Syntax error/, '/(?[(\c])/ should be a syntax error');
eval { $_ = '(?[(\c]) ]\b'; qr/$_/ };
like($@, qr/^Syntax error/, '/(?[(\c]) ]\b/ should be a syntax error');
like($@, qr/^Unexpected/, '/(?[(\c]) ]\b/ should be a syntax error');
eval { $_ = '(?[\c[]](])'; qr/$_/ };
like($@, qr/^Syntax error/, '/(?[\c[]](])/ should be a syntax error');
like($@, qr/^Unexpected/, '/(?[\c[]](])/ should be a syntax error');
like("\c#", qr/(?[\c#])/, '\c# should match itself');
like("\c[", qr/(?[\c[])/, '\c[ should match itself');
like("\c\ ", qr/(?[\c\])/, '\c\ should match itself');
Expand Down

0 comments on commit 19a498a

Please sign in to comment.