Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--files performance regression in 0.8.0 #820

Closed
chrmarti opened this issue Feb 19, 2018 · 3 comments · Fixed by #827
Closed

--files performance regression in 0.8.0 #820

chrmarti opened this issue Feb 19, 2018 · 3 comments · Fixed by #827
Labels
bug A bug.

Comments

@chrmarti
Copy link
Contributor

What version of ripgrep are you using?

0.8.0

What operating system are you using ripgrep on?

Windows 10

If this is a bug, what are the steps to reproduce the behavior?

Using an old commit of the Chromium sources (git clone https://chromium.googlesource.com/chromium/src) as corpus (~230k files).

With 0.8.0 --files runs about 4x longer:

PS C:\devel\testWorkspace> (Measure-Command { & 'C:\devel\ripgrep-0.8.0-x86_64-pc-windows-msvc\rg.exe' --files | Measure-Object –Line | Out-Default }).TotalSeconds

 Lines Words Characters Property
 ----- ----- ---------- --------
234319 [files]

43.309153 [seconds]

Than with 0.7.1:

PS C:\devel\testWorkspace> (Measure-Command { & 'C:\devel\ripgrep-0.7.1-x86_64-pc-windows-msvc\rg.exe' --files | Measure-Object –Line | Out-Default }).TotalSeconds

 Lines Words Characters Property
 ----- ----- ---------- --------
234319 [files]

11.2432563 [seconds]

Using Process Monitor it looks like 0.8.0 accesses individual files a lot more:
image

Than 0.7.1, which seems to only access folders:
image

/cc @roblourens

@BurntSushi
Copy link
Owner

@chrmarti Thanks for another great bug report! I can indeed reproduce this. Your tip about ripgrep stat'ing more files is right on the money. Basically, this is fallout from fixing #705. In particular, in order to fix #705, I had to work-around the corresponding bug in Rust's standard library, and I was apparently careless enough that I introduced an extra stat call for each file that ripgrep visits. This should be possible to fix, but I'll need to take a closer look tomorrow.

Assuming I get this fixed, and after I get some of the PR backlog cleared, I can put out another release.

@BurntSushi
Copy link
Owner

Note that this regression is also in walkdir, so it impacts both single threaded and multi-threaded directory traversal.

@BurntSushi BurntSushi added the bug A bug. label Feb 20, 2018
BurntSushi added a commit to BurntSushi/walkdir that referenced this issue Feb 21, 2018
This commit fixes a performance regression introduced in commit 0f4441,
which aimed to fix OneDrive traversals. In particular, we added an
additional stat call to every directory entry, which can be quite
disastrous for performance. We fix this by being more fastidious about
reusing the Metadata that comes from fs::DirEntry, which is, conveniently,
cheap to acquire specifically on Windows.

The performance regression was reported against ripgrep:
BurntSushi/ripgrep#820
BurntSushi added a commit to BurntSushi/walkdir that referenced this issue Feb 21, 2018
This commit fixes a performance regression introduced in commit 0f4441,
which aimed to fix OneDrive traversals. In particular, we added an
additional stat call to every directory entry, which can be quite
disastrous for performance. We fix this by being more fastidious about
reusing the Metadata that comes from fs::DirEntry, which is, conveniently,
cheap to acquire specifically on Windows.

The performance regression was reported against ripgrep:
BurntSushi/ripgrep#820
BurntSushi added a commit that referenced this issue Feb 21, 2018
This commit fixes a performance regression in Windows that resulted from
fallout from fixing #705. In particular, we introduced an additional
stat call for every single directory entry, which can be quite
disastrous for performance.

There is a corresponding companion PR that fixes the same bug in
walkdir: BurntSushi/walkdir#96

Fixes #820
BurntSushi added a commit that referenced this issue Feb 21, 2018
This commit fixes a performance regression in Windows that resulted from
fallout from fixing #705. In particular, we introduced an additional
stat call for every single directory entry, which can be quite
disastrous for performance.

There is a corresponding companion PR that fixes the same bug in
walkdir: BurntSushi/walkdir#96

Fixes #820
@BurntSushi
Copy link
Owner

This should be fixed in master. I manually QAed this and confirmed that performance is back to 0.7.1 levels for your specific test case. (Both single and multi threaded.)

If all goes well, I'm going to try and squeak a 0.8.1 release out tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants