Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option to use `ripgrep` for scanning files #1086

Merged
merged 3 commits into from May 28, 2019

Conversation

Projects
None yet
2 participants
@rafeca
Copy link
Contributor

commented May 24, 2019

Summary

This PR adds a config option to enable ripgrep powered find and replace on this package. The integration with ripgrep has been done in Atom core on atom/atom#19348 and this PR just opts in the find and replace package to use it.

Screenshot 2019-05-24 at 12 14 38

Benefits

Using ripgrep speeds up drastically the time to find files on any kind of repository (we're taking about up to 22X faster times):

Type Num Files Time (standard) Time (ripgrep) Improvements
Small 2K 940ms 62ms 15X faster 馃帀
Medium 30K 7.7s 620ms 12X faster 馃帀
Large (returning 5 results) 270K 129s 5.9s 22X faster 馃帀
Large (returning 26k results) 270K 142s 17.5s 8X faster 馃帀

(the last measure has been done to check the less favourable case for ripgrep, which is a search that returns a lot of results (26k) which need to be passed from the ripgrep process to Atom. This is still 8X times faster than the current search logic. We could limit the number of results, but this can be done as a separate PR).

Possible Drawbacks

The search behaviour with ripgrep is slightly different than the one currently implemented, and there may be still some edge cases to polish. There's more information about the changes in atom/atom#19348.

To mitigate that, I've done quite extensive tests but I'll keep checking for potential edge cases. Also, I'm planning to leave this config flag at least for 1 version of Atom so we can catch as many issues as possible before shipping this to users by default .

Applicable Issues

#1075

@rafeca rafeca changed the title Add config Add config option to use `ripgrep` for scanning files May 24, 2019

@rafeca

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

Going to merge this one, if there's any feedback afterwards I'll handle it on a separate PR

@rafeca rafeca merged commit bdfdddd into master May 28, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@rafeca rafeca deleted the ripgrep-option branch May 28, 2019

@rafeca

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

As @maxbrunsfeld has mentioned on Slack, the ripgrep scanner also fixes #806, which is a long-standing issue regarding not ignoring VCS-ignored files when there's a filter path in the project search.

@dwelle

This comment has been minimized.

Copy link

commented Jun 17, 2019

Scanning the commit, it'd be great to allow to enable ripgrep's PCRE2 so that lookbehind assertions work, too (#571). AFAIK it has a small perf regression, so if that's a concern, best adding it as opt-in only (ala vscode).

@rafeca

This comment has been minimized.

Copy link
Contributor Author

commented Jun 17, 2019

@dwelle thanks for the suggestion!

Would you be willing to contribute adding this as a config option?

@dwelle

This comment has been minimized.

Copy link

commented Jun 17, 2019

Sure, provided it's not time-sensitive because I reckon it'll take a while --- I'll have to set up a dev enviro and figure out how to run it (plus I have little time ATM, but who does). Alternatively, I can write the code + tests blindly, cross my fingers and let the CI and reviewers do the rest :).

Btw, should a new event metadata field be added, too?

@rafeca

This comment has been minimized.

Copy link
Contributor Author

commented Jun 17, 2019

Thanks @dwelle! I don't think time sensitive, so it's fine 馃槃

Setting up the dev environment is quite easy, you can follow the instructions here and here, but if you prefer you can let the CI do the job 馃槶

Just take into account that the search logic is in Atom core. You can add a new option to the [workspace.scan() method] (https://github.com/atom/atom/blob/master/src/workspace.js#L2048) to use PCRE2, and then pass it from find-and-replace based on the config option that you create (as I did for ripgrep).

You don't need to care about the DefaultDirectorySearcher implementation (just mention on the configuration option description that it only works on ripgrep).

Also, make sure to create a new test on the workspace-spec.js file that asserts that the PCRE2 option works well (you can use any existing test as inspiration).

Regarding the metadata field, I don't think it's needed for now. Only if we see that the performance of the searches regress after this change we'll add that.

Thanks again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.