Skip to content

Commit

Permalink
Extend ending position of capture search.
Browse files Browse the repository at this point in the history
When searching for captures, we first use the DFA to find the start and
end of the match. We then pass just the matched region of text to the
NFA engine to find sub-capture locations. This is a key optimization
that prevents the NFA engine from searching a lot more text than what is
necessary in some cases.

One problem with this is that some instructions determine their match
state based on whether the engine is at the boundary of the search text.
For example, `$` matches if and only if the engine is at EOF. If we only
provide the matched text region, then assertions like `\b` might not
work, since it needs to examine at least one character past the end of
the match. If we provide the matched text region plus one character,
then `$` may match when it shouldn't. Therefore, we provide the matched
text plus (at most) two characters.

Fixes rust-lang#334
  • Loading branch information
BurntSushi committed Feb 18, 2017
1 parent d894c63 commit d813518
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 3 deletions.
9 changes: 6 additions & 3 deletions src/exec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -850,9 +850,12 @@ impl<'c> ExecNoSync<'c> {
match_start: usize,
match_end: usize,
) -> Option<(usize, usize)> {
// We can't use match_end directly, because we may need to examine
// one "character" after the end of a match for lookahead operators.
let e = cmp::min(next_utf8(text, match_end), text.len());
// We can't use match_end directly, because we may need to examine one
// "character" after the end of a match for lookahead operators. We
// need to move two characters beyond the end, since some look-around
// operations may falsely assume a premature end of text otherwise.
let e = cmp::min(
next_utf8(text, next_utf8(text, match_end)), text.len());
self.captures_nfa(slots, &text[..e], match_start)
}

Expand Down
4 changes: 4 additions & 0 deletions tests/regression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,7 @@ mat!(wb_start_x, r"(?u:\b)^(?-u:X)", "X", Some((0, 1)));
// See: https://github.com/rust-lang/regex/issues/321
ismatch!(strange_anchor_non_complete_prefix, r"a^{2}", "", false);
ismatch!(strange_anchor_non_complete_suffix, r"${2}a", "", false);

// See: https://github.com/rust-lang/regex/issues/334
mat!(captures_after_dfa_premature_end, r"a(b*(X|$))?", "abcbX",
Some((0, 1)), None, None);

0 comments on commit d813518

Please sign in to comment.