Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize glob evaluation #6151

Merged
merged 13 commits into from Mar 3, 2021
Merged

Conversation

@ladipro
Copy link
Contributor

@ladipro ladipro commented Feb 12, 2021

Fixes #6069

Context

Glob expansion has been identified as one of the performance bottlenecks when building .NET Core projects. The cost is not negligible with a simple Hello World project and gets progressively worse when the project directory (transitively) contains more files, be it source files or completely unrelated files. Since glob expansion is part of project evaluation, it directly impacts Visual Studio performance as well, especially solution load performance.

Changes Made

This PR is a collection of several changes to the globbing logic aiming to make .NET Core project evaluation faster.

  • Refactor, simplify, and micro-optimize hot code paths,
  • Eliminate exclude patterns early if it's clear that they don't intersect with the include,
  • Optimize most common glob patterns to match without regex (building a typical .NET Core app now does not instantiate a single regex for glob matching),
  • Cache the results of all filesystem enumeration calls, not just directories.

Testing

Existing unit tests for correctness, ETL traces for performance.

Performance:

The wins in terms of the Microsoft-Build/ExpandGlob event duration range from 16% (Hello World project built with MSBuild.exe) to 48% (Hello World project with ~4000 files in ~600 folders, all non-.cs and non-.resx, built with dotnet build). These numbers were measured with incremental build on the command line, the project was a .NET Core project in all cases, just with a different build engine. Here's a table with all numbers:

Project MSBuild.exe dotnet build
Hello World (no extra files) 16% 29%
Hello World (4000 extra files) 35% 48%

In terms of the overall build time, we're looking at a 1-2% improvement for a Hello World project and 8-10% for a project with thousands of extra files in the source tree.

Notes

This PR is recommended to be reviewed commit by commit. FileMatcher is a complex class and there is non-trivial breaking potential as projects may depend on all kinds of corner case behavior. Which is the primary reason for not rewriting the entire thing. The most impactful change in this PR - caching files to avoid repeated filesystem API calls - is behind a change wave condition, just in case.

@ladipro ladipro force-pushed the ladipro:6069-globbing-improvements branch from e0b1252 to e4d7133 Feb 12, 2021
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
@ladipro ladipro force-pushed the ladipro:6069-globbing-improvements branch from e4d7133 to 8bb636c Feb 16, 2021
@ladipro ladipro force-pushed the ladipro:6069-globbing-improvements branch 2 times, most recently from 4ac817f to 04974ae Feb 16, 2021
@ladipro ladipro changed the title [DRAFT] Optimize glob evaluation Optimize glob evaluation Feb 16, 2021
@ladipro ladipro marked this pull request as ready for review Feb 16, 2021
@ladipro ladipro force-pushed the ladipro:6069-globbing-improvements branch from 04974ae to e4e1371 Feb 17, 2021
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
Copy link
Member

@Forgind Forgind left a comment

This looks good to me overall, thanks!

The only part of this that seemed like a big (potential) gain was a heuristic. Do you know how helpful this was for users?

src/Build/Globbing/MSBuildGlob.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
Copy link
Contributor

@cdmihai cdmihai left a comment

LGTM pending ensuring tests cover the new conditions / code paths

src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ladipro ladipro left a comment

Thank you all for the feedback. I will squash PR review changes into the right commits before merging to make git history look reasonable.

src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
@ladipro ladipro force-pushed the ladipro:6069-globbing-improvements branch from a45184b to c5afa7c Mar 3, 2021
@rokonec rokonec merged commit 202f872 into dotnet:master Mar 3, 2021
7 checks passed
7 checks passed
license/cla All CLA requirements met.
Details
@azure-pipelines
msbuild-pr Build #20210303.1 succeeded
Details
@azure-pipelines
msbuild-pr (Linux Core) Linux Core succeeded
Details
@azure-pipelines
msbuild-pr (Windows Core) Windows Core succeeded
Details
@azure-pipelines
msbuild-pr (Windows Full Release (no bootstrap)) Windows Full Release (no bootstrap) succeeded
Details
@azure-pipelines
msbuild-pr (Windows Full) Windows Full succeeded
Details
@azure-pipelines
msbuild-pr (macOS Core) macOS Core succeeded
Details
ladipro added a commit that referenced this pull request Mar 10, 2021
### Context

#6151 introduced a regression where `FileMatcher` takes the pattern `*.*` too literally and returns only files that have a dot in the name. `*.*` should be special cased to mean all files in the directory, with or without an extension.

### Changes Made

Fixed the regression by explicitly testing for `*.*` and added test coverage.

### Testing

Existing and modified unit tests, repro project from dotnet/sdk#16185 (comment).

### Notes

Testing for both `*` and `*.*` is already happening elsewhere in the class. MSBuild calls `Directory.EnumerateFileSystemEntries` which under the covers uses `MatchType.Win32` and causes this behavior of unifying `*.*` with `*` on all platforms.
@ForNeVeR
Copy link

@ForNeVeR ForNeVeR commented May 31, 2021

Hey, folks, could this be related to dotnet/sdk#17976? Could someone please take a look?

rainersigwald added a commit to rainersigwald/msbuild that referenced this pull request Jun 8, 2021
Fixes dotnet#6502

Summary

This change fixes a regression in glob matching where files without extension are erroneously not matched when taking a specific globbing code path.

Customer impact

Any customer who uses a glob pattern susceptible to the bug and has files without extensions in their source tree is affected. The bug was reported by external customers.

Regression?

Yes, caused by dotnet#6151 where glob matching was optimized which internally made it take a different code path.

Changes Made

Fixes the regression by properly handling `*.*` to mean all files, not just files with a dot in the name. This convention is used in .NET APIs on all platforms and matches the pre-regression behavior.

Testing

Added unit test coverage. Also verified locally with the repro provided by the original bug reporter.

Risk

Low. The star patterns are special-cased to mean all files, other patterns are unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

7 participants