Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary call to check if a path is ignored #366

Merged
merged 1 commit into from Feb 27, 2019

Conversation

Projects
None yet
2 participants
@rafeca
Copy link
Contributor

rafeca commented Feb 26, 2019

Description of the Change

Based on some profiling, the most expensive part of crawling the list of files is to check if each of the discovered files/folder are being ignored by git (the isIgnored() method). This is done by calling the git-utils package which internally calls libgit2.

For medium to large repos (>30K files), isIgnored() takes ~80-90% of the crawling time 馃拃.

Fortunately, the crawling logic is currently doing unnecessary checks to isIgnored(), which results in performing almost the double of checks than needed.

By removing the additional call to isIgnored() we can cut by ~40% the time that takes to open the fuzzy finder for the first time on medium and large repos.

This is a short table with the perf gains based on the repo size:

Type Num Files Time (master) Time (This PR) Improvements
Small 2K 1.3s 1.3s 0%
Medium 30K 10.3s 6.2s 39.8% 馃帀
Large 270K 97.3s 57.5s 40.9% 馃帀

(In order to measure the time, I've added a console.time() call before creating the crawling task and a console.timeEnd() on the callback of the task).

The reason why this change is not impactful on small repos is because there seems to be some overhead when creating the Task, which dominates the time to crawl when there are not many files to check.

This is some impactful low hanging fruit for #271, but there are a few other improvements that I think we can do to the fuzzy finder to make it faster.

Alternate Designs

N/A.

Benefits

Time to open the fuzzy finder after the editor launch gets reduced by ~40% on large repositories.

Verification Process

  • Added some logging on the call to the callback to print all the project files that were crawled, stored them in disk and compared the list that gets generated with this PR against the one that gets generated in master. Verified that both lists are the same for several projects with different .gitignore configs (Atom, Jest, Gecko-dev).
  • Manually verified that there are only two callsites to the pathLoaded() method (1 and 2). Both of them pass the same variable as argument (pathToLoad), which never gets reassigned. Before calling both callsites there's always a call to isIgnored() with the same variable argument (pathToLoad). This means that isIgnored() is called for the same exact paths than it was before (just less times).

Possible Drawbacks

N/A.

Applicable Issues

#271

@jasonrudolph
Copy link
Member

jasonrudolph left a comment

Nice find, @rafeca! And thanks for the awesome write-up. 馃専

I see that the test suite is passing. That's a great sign. 馃槄 In addition to having a green test suite, can you describe the process you'll follow to verify that the change has not introduced any regressions? (This is something we ask contributors to provide in performance-related pull requests in atom/atom, but it doesn't look like we've incorporated that into atom/fuzzy-finder's pull request template yet.)

@rafeca

This comment has been minimized.

Copy link
Contributor Author

rafeca commented Feb 27, 2019

I see that the test suite is passing. That's a great sign. 馃槄 In addition to having a green test suite, can you describe the process you'll follow to verify that the change has not introduced any regressions? (This is something we ask contributors to provide in performance-related pull requests in atom/atom, but it doesn't look like we've incorporated that into atom/fuzzy-finder's pull request template yet.)

Thanks for the suggestion! I've just added some information on the PR description about the verification steps that I followed.

@jasonrudolph
Copy link
Member

jasonrudolph left a comment

In addition to having a green test suite, can you describe the process you'll follow to verify that the change has not introduced any regressions?

Thanks for the suggestion! I've just added some information on the PR description about the verification steps that I followed.

Awesome. Thank you! :shipit:

@rafeca rafeca merged commit b4f0e97 into master Feb 27, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@rafeca rafeca deleted the improve-perf branch Feb 27, 2019

@rafeca rafeca self-assigned this Feb 28, 2019

@rafeca rafeca referenced this pull request Mar 12, 2019

Merged

Add option to use `ripgrep` for crawling the list of files #369

10 of 10 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.