-
Notifications
You must be signed in to change notification settings - Fork 235
Add runtime checks for invalid uses of \K in lookaround #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
81b08f3
1621949
60a83f3
d5e178a
11afce6
ee12ec5
355c69c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -1010,11 +1010,28 @@ fprintf(stderr, "++ %2ld op=%3d %s\n", Fecode - mb->start_code, *Fecode, | |||
} | ||||
|
||||
#ifdef DEBUG_SHOW_OPS | ||||
fprintf(stderr, "++ Failed ACCEPT not at end (endanchnored set)\n"); | ||||
fprintf(stderr, "++ Failed ACCEPT not at end (endanchored set)\n"); | ||||
#endif | ||||
return MATCH_NOMATCH; /* (*ACCEPT) */ | ||||
} | ||||
|
||||
/* Fail if we detect that the start position was moved to be either after | ||||
the end position (\K in lookahead) or before the start offset (\K in | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I understand the operation. Why start offset is a problem? I think the only issue is when ovector(0) > ovector(1), which confuses some simple implementations. Nobody complained about startoffset before. Do this happens when \K is executed, or as a post check? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This rule is strange to me. As long as the end of the matches are <= ordered, the beginning should not matter when you do a global match. This is true now regardless of \K. Substitute (string insert) should not care where the match starts as long as it is <= than the end. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does matter for substitute. Here is the check that Philip added, long before my involvement: Line 454 in d29e729
Here's one way to think about it: a match that ends before it starts is weird. But also, if the unmatched text (text in between matches) that is backwards is equally weird. Any application that does something with the text in between matches, rather than just throwing it away, wants the matches to be non overlapping. Example: Empty unmatched text at start Find and replace is based around the concept of string split. If you wanted to replace all those three matches with "zzz", what would the result even be? After replacing "cd" with "zzz", how do you even begin to then replace "de"? The invariant must be that for both matches and the unmatched text in between, the substring start is before the end (or "not after"). Finally, even if applications only want a single match, it is surprising (and possibly broken) to get a match starting at offset 3, if you ask for searching to start from offset 4. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using |
||||
lookbehind). If this occurs, the pattern must have used \K in a somewhat | ||||
sneaky way (e.g. by pattern recursion), because if the \K is actually | ||||
syntactically inside the lookaround, it's blocked at compile-time. */ | ||||
|
||||
if (Fstart_match < mb->start_subject + mb->start_offset || | ||||
Fstart_match > Feptr) | ||||
{ | ||||
/* The \K expression is fairly rare. We assert it was used so that we | ||||
catch any unexpected invalid data in start_match. */ | ||||
PCRE2_ASSERT(mb->hasbsk); | ||||
|
||||
if (!mb->allowlookaroundbsk) | ||||
return PCRE2_ERROR_BAD_BACKSLASH_K; | ||||
} | ||||
|
||||
/* We have a successful match of the whole pattern. Record the result and | ||||
then do a direct return from the function. If there is space in the offset | ||||
vector, set any pairs that follow the highest-numbered captured string but | ||||
|
@@ -7393,8 +7410,11 @@ mb->start_offset = start_offset; | |||
mb->end_subject = end_subject; | ||||
mb->true_end_subject = true_end_subject; | ||||
mb->hasthen = (re->flags & PCRE2_HASTHEN) != 0; | ||||
mb->hasbsk = (re->flags & PCRE2_HASBSK) != 0; | ||||
mb->allowemptypartial = (re->max_lookbehind > 0) || | ||||
(re->flags & PCRE2_MATCH_EMPTY) != 0; | ||||
mb->allowlookaroundbsk = | ||||
(re->extra_options & PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK) != 0; | ||||
mb->poptions = re->overall_options; /* Pattern options */ | ||||
mb->ignore_skip_arg = 0; | ||||
mb->mark = mb->nomatch_mark = NULL; /* In case never set */ | ||||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better. However, TMP3 is emulated on x86-32, and better not use it whenever is possible:
(Or something similar)
The trick is that the flag register is set, then TMP1 is overwritten (but the flags are not), than jump based on the flag.