-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiled regex exceeds size limit of 10485760 bytes. #362
Comments
You can't, as of now, without re-compiling and hard-coding a new limit. I would be interested to hear how well 8,000 entries performs. As of now, it is fed into a single regex as a single alternation. If they are all plain text strings and not patterns, then it might be better to use Aho-Corasick instead (which ripgrep should do, it just doesn't yet). |
If you were so inclined, the limit is here: https://github.com/BurntSushi/ripgrep/blob/master/grep/src/search.rs#L67 FYI, you can't "remove" the limit (but you can set it arbitrarily high). |
Thanks, I have run it with a 640-line file (11K) without problems and very fast. The lines of the file are like these:
I am fairly new to coding, so excuse me but I need a "for dummies" howto set the limit very high. Basically, what should I do and where? |
The limit probably should be exposed as a flag so that it's a knob you can turn easily. That feature is not currently available, so the only way for you to do it is change the source code of ripgrep and recompile it. Briefly:
In order for the above to work, you will need to install Rust. See: https://rustup.rs/ |
OK, solved, thanks. I changed the "size_limit" from 10 to 1000:
|
Cheers |
Can you try increasing the dfa limit as well? Might speed things up too
…On Feb 16, 2017 2:58 PM, "Microbial Genomics Lab" ***@***.***> wrote:
Cheers
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#362 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAb34qqW670PqgAxZQNhO7lrXV_VA0b6ks5rdKpggaJpZM4L_ose>
.
|
I'm going to re-open this because I think this limit should be configurable without re-compiling ripgrep. |
Nice! Down to seconds now! |
Wow. My guess is that you were previously exhausting the cache space of the DFA, so it probably did a lot of thrashing or dropped down to one of the (much) slower NFA engines. |
Just a comment (I have not tried resetting the limit, as detailed above): I ran into this exact issue today, comparing a ~15K list (individual words on separate lines) to another file (~272K; ditto). Large, I know, but grep trounced that task, whereas ripgrep failed:
Arch Linux x84_64; 32MB RAM + swap + tmpfs; ripgrep v.0.7.1; grep (GNU grep) v.3.1 |
@victoriastuart Increase the size limit using |
Hi Andrew; thank you for the project/code, comment -- appreciated. Love ripgrep (v. fast, generally)! :-D Some observations (just a FYI; I'm happy with using grep, here):
|
You need to increase --dfa-size-limit too. |
Noting #497 ,
|
@victoriastuart thanks! Is possible can you share the data you are searching and your regex queries as well? Or tell me how to get it? It reproduce it publicly available data? |
Hi ... it's my own data (private), but simply lists of words; e.g.:
formatted for processing (~15K lines, this particular output):
|
Hi, I am trying to use a file with more then 8,000 entries (10-20 letter words, one per line. 132K) and get their corresponding lines in a big file (645,151lines, 76M). I use:
rg -w -f query_file target_file
I get the error:
Compiled regex exceeds size limit of 10485760 bytes.
How can I configure it to allow rg to run without the limit?
The text was updated successfully, but these errors were encountered: