Does RipGrep optimize queries sharing a common prefix? #2048
-
Just curious whether this is something I need to optimize on the input query, or whether RipGrep is smart enough to automatically optimize. Let's assume I want to search for:
And my argument to Is I expect that due to the ordering of the query, it will be satisfied by For files WITHOUT matches, it will search the entire file for any matches of I can optimize the query on the input-side to use the shortest prefix, I'm just curious whether I need to. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Have you read my blog post introducing ripgrep? Literal optimizations of this sort are a central part of what makes ripgrep fast. Also, by the way you're speaking, it kind of seems like you're assuming that the execution engine has backtracking semantics. I can't say for sure, but that's what it sounds like. Of course, if you use Of course, for a simple query like
You can see that the literal detector pulled out three complete literals from the regex. The easiest way to "confirm" this is to profile a ripgrep execution and look at where time is being spent. It helps to run it on a big file. So running In this case, the literals are passed to the In this case, the If you give ripgrep enough literals, then it will indeed ultimately just use Aho-Corasick and still bypass the regex engine entirely. Although other than construction time (with Aho-Corasick being much faster), the execution properties of Aho-Corasick and ripgrep's default regex engine are rather similar since they are both instantiations of finite automata. |
Beta Was this translation helpful? Give feedback.
Have you read my blog post introducing ripgrep? Literal optimizations of this sort are a central part of what makes ripgrep fast.
Also, by the way you're speaking, it kind of seems like you're assuming that the execution engine has backtracking semantics. I can't say for sure, but that's what it sounds like. Of course, if you use
-P/--pcre2
, this is a true. But the default engine uses finite automata, so matching an alternation doesn't tryBar
and thenBarBaz
and thenFoo
one after another. They are all matched simultaneously. Of course, that only occurs when the actual regex engine is used.Of course, for a simple query like
Foo(Bar|BarBaz|Foo)
, the regex engine shouldn't be used at all.…