Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247

Closed
Deewiant opened this issue Apr 16, 2019 · 3 comments
Closed

11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247

Deewiant opened this issue Apr 16, 2019 · 3 comments
Labels
bug A bug.

Comments

@Deewiant
Copy link

What version of ripgrep are you using?

ripgrep 11.0.0 (rev d7f57d9aab)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

And I'm comparing it to:

ripgrep 0.10.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

From the binary releases for x86_64-unknown-linux-musl:

What operating system are you using ripgrep on?

Arch Linux

Describe your question, feature request, or bug.

I've run into a crippling performance regression on certain types of queries and non-UTF-8 files between 0.10.0 and 11.0.0, which looks like it might even be an infinite loop.

If this is a bug, what are the steps to reproduce the behavior?

A very simple way is to create a file containing only two bytes, "sä" encoded with ISO 8559-1, and search for a pattern with a short prefix that matches the "s" but not the rest, like '\bs(?:thiswillnotmatch|norwillthis)':

printf "s\xe4" > test.txt
rg '\bs(?:thiswillnotmatch|norwillthis)' test.txt

The \b does seem to be required at least in this case.

Another example file that reproduces this is sherlock.br in ripgrep's own source code, using the exact same pattern.

If this is a bug, what is the actual behavior?

11.0.0 seems to spin forever:

$ time rg-11.0 --debug '\bs(?:thiswillnotmatch|norwillthis)' test.txt >/dev/null
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:115: required literal found: "s"
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:435: built glob set; 3 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

<it's been 10 minutes and it's still spinning at 100% CPU>

If this is a bug, what is the expected behavior?

0.10.0 has no problems and gives a result in a few milliseconds:

$ time rg-0.10 --debug '\bs(?:thiswillnotmatch|norwillthis)' test.txt >/dev/null
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:110: required literal found: "s"
DEBUG|globset|globset/src/lib.rs:429: built glob set; 0 literals, 0 basenames, 8 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:429: built glob set; 3 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

0.00user 0.00kernel 0.003elapsed
BurntSushi added a commit to rust-lang/regex that referenced this issue Apr 16, 2019
This fixes a bug introduced by a bug fix for #557. In particular, the
termination condition wasn't exactly right, and this appears to have
slipped through the test suite. This probably reveals a hole in our test
suite, which is specifically the testing of Unicode regexes with
bytes::Regex on invalid UTF-8.

This bug was originally reported against ripgrep:
BurntSushi/ripgrep#1247
@BurntSushi BurntSushi added the bug A bug. label Apr 16, 2019
@BurntSushi
Copy link
Owner

Thanks for reporting this bug! This was actually a regression introduced in the underlying regex engine (as a result of fixing an unrelated bug). I've published a fix for the regex engine and brought in the updated version on ripgrep master. I'll put out a new point release of ripgrep with this fix soon.

@BurntSushi
Copy link
Owner

ripgrep 11.0.1 is out with this fix in it. Sorry about the regression!

@Deewiant
Copy link
Author

No problem, thanks for the quick response and fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants