Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files regex processing boost #3953

Merged
merged 9 commits into from
Mar 29, 2021

Conversation

BlackGad
Copy link
Contributor

@BlackGad BlackGad commented Mar 14, 2021

Changed recursive wildcards regex pattern to faster form
Added exclude pattern TPL provessing
Fixed linq lazy regex objects spawn

Bug

NuGet/Home#5016

NuGet/Home#5706

Fixes:

Fixed include and exclude files performance

Regression? Last working version:

Description

Changed regex pattern for recursive wildcards to faster one.
Lazy quantifier in this case much faster. (.+\)* -> ([^\\]+\)*?

Added parallel processing in source files filtering
https://docs.microsoft.com/en-us/dotnet/api/system.linq.parallelenumerable.asparallel?view=net-5.0

Fixed linq lazy regex objects spawn.
Lazy filters enumeration in GetMatches method was called on every file entry. (Source file)

PR Checklist

  • [x ] PR has a meaningful title

  • [x ] PR has a linked issue.

  • [x ] Described changes

  • Tests

    • Automated tests added
    • OR
    • [x ] Test exception

    PathResolverTests already contains all needed tests.

    • OR
    • N/A
  • Documentation

    • [x ] Documentation PR or issue filled
    • OR
    • N/A

@BlackGad BlackGad requested a review from a team as a code owner March 14, 2021 22:01
@BlackGad
Copy link
Contributor Author

Here is some test measurements on 6k files pack

Initial version

Exclude pattern GetMatches time Matches
**\*.pdb 00:00:29.8375342 0
**\*.obj 00:00:28.1902447 0
**\*.ilk 00:00:27.8622501 0
Sources\node_modules\typescript\**\* 00:00:00.0015847 169
Sources\node_modules\@types\**\* 00:00:00.0008627 37
default excludes 00:00:00.0376492 0

AsParallel only optimization

Exclude pattern GetMatches time Matches
**\*.pdb 00:00:06.4921811 0
**\*.obj 00:00:06.0926885 0
**\*.ilk 00:00:06.4990284 0
Sources\node_modules\typescript\**\* 00:00:00.0011430 169
Sources\node_modules\@types\**\* 00:00:00.0007241 37
default excludes 00:00:00.0101420 0

AsParallel + Regex lazy quantifier

Exclude pattern GetMatches time Matches
**\*.pdb 00:00:00.0618149 0
**\*.obj 00:00:00.0421215 0
**\*.ilk 00:00:00.0433455 0
Sources\node_modules\typescript\**\* 00:00:00.0018229 169
Sources\node_modules\@types\**\* 00:00:00.0007414 37
default excludes 00:00:00.0154888 0

@zivkan zivkan added the Community PRs created by someone not in the NuGet team label Mar 15, 2021
@BlackGad
Copy link
Contributor Author

NuGet/Home#5706 another related defect. Must be fixed as well because of used regex for home directory

@erdembayar
Copy link
Contributor

@BlackGad
Please fill out PR template checklist.

@BlackGad
Copy link
Contributor Author

Added some fixes to existing unit tests because AsParallel linq method not guaranty result order.
All operations related to PathResolver methods are order independent so I think we can leave result operation as is.

@BlackGad
Copy link
Contributor Author

@BlackGad
Please fill out PR template checklist.

Done

@BlackGad
Copy link
Contributor Author

@erdembayar Is everything ok with PR template checklist?

@BlackGad
Copy link
Contributor Author

Any progress?

Copy link
Contributor

@zkat zkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, and I appreciate the benchmarks, which look awesome! I think you missed a spot, but this otherwise looks fine :) Thank you!

src/NuGet.Core/NuGet.Common/PathUtil/PathResolver.cs Outdated Show resolved Hide resolved
@erdembayar
Copy link
Contributor

@BlackGad
I just compared nuget.exe from your branch with official 5.8 nuget.exe. So far I don't see actual perf improvement it's very minimal

image

Here is official nuget5.8.exe:
image

Here is nuget.exe from your branch:
image

For the sake of completeness I measured other 2 nuget.exe which mentioned in NuGet/Home#5016, it clearly shows there was regression in nuget4.1.0.exe but it looks addressed (please looked at nuget5.8.exe result above):

nuget.3.5.0.exe
image

nuget.4.1.0.exe
image

@BlackGad
Copy link
Contributor Author

BlackGad commented Mar 24, 2021

@erdembayar what did exactly you test?
Issue here when you trying to pack with excludes a lot of files and actual nuspec located deep in file system.

Example and benchmarks:

Current directory: "C:\"

NuGet Version: 5.8.1.7021

PS command line

Measure-Command -Expression {NuGet pack "C:\1\Test.nuspec" -OutputDirectory c:\temp\ -Version 1.0.0}

Result

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 6
Milliseconds      : 54
Ticks             : 60540642
TotalDays         : 7.00701875E-05
TotalHours        : 0.0016816845
TotalMinutes      : 0.10090107
TotalSeconds      : 6.0540642
TotalMilliseconds : 6054.0642

PS command line

Measure-Command -Expression {NuGet pack "C:\Temp\tests\111111\222222\33333\44444\Test.nuspec" -OutputDirectory c:\temp\ -Version 1.0.1}

Result

Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 29
Milliseconds      : 521
Ticks             : 895217841
TotalDays         : 0.00103613176041667
TotalHours        : 0.02486716225
TotalMinutes      : 1.492029735
TotalSeconds      : 89.5217841
TotalMilliseconds : 89521.7841

Patched NuGet

PS command line

Measure-Command -Expression {C:\Projects\GitHub\BlackGad\NuGet.Client\artifacts\NuGet.CommandLine\16.0\bin\Debug\net472\NuGet.exe pack "C:\1\Test.nuspec" -OutputDirectory c:\temp\ -Version 2.0.0}

Result

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 2
Milliseconds      : 972
Ticks             : 29728922
TotalDays         : 3.4408474537037E-05
TotalHours        : 0.000825803388888889
TotalMinutes      : 0.0495482033333333
TotalSeconds      : 2.9728922
TotalMilliseconds : 2972.8922

PS command line

Measure-Command -Expression {C:\Projects\GitHub\BlackGad\NuGet.Client\artifacts\NuGet.CommandLine\16.0\bin\Debug\net472\NuGet.exe pack "C:\Temp\tests\111111\222222\33333\44444\Test.nuspec" -OutputDirectory c:\temp\ -Version 2.0.1}

Result

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 3
Milliseconds      : 86
Ticks             : 30864824
TotalDays         : 3.57231759259259E-05
TotalHours        : 0.000857356222222222
TotalMinutes      : 0.0514413733333333
TotalSeconds      : 3.0864824
TotalMilliseconds : 3086.4824

Comparison table:

CLI Short path Long path
5.8.1 6.0540642 sec 89.5217841 sec
Patched 2.9728922 sec 3.0864824 sec

This regex fix boost performance even in best cases for current 5.8.1 version also deep nested folder structure is not affect performance so hard.

@BlackGad
Copy link
Contributor Author

BlackGad commented Mar 24, 2021

Forget to attach test data

Create 2 files in target folder:

package.json

{
  "name": "tests",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "devDependencies": {
    "@types/debug": "^4.1.5",
    "@types/jquery": "^3.3.38",
    "@types/node": "^8.0.14",
    "@types/yargs": "^15.0.4",
    "typescript": "^3.2.2"
  },
  "dependencies": {
    "debug": "^4.1.1",
    "node-red": "^1.2.6",
    "yargs": "^15.3.0"
  }
}

Test.nuspec

<?xml version="1.0"?>
<package>
  <metadata>
    <id>Test</id>
    <version>$version$</version>
    <authors>Volodymyr Shkolka</authors>
    <owners>Volodymyr Shkolka</owners>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <description>Test node-RED npm package pack</description>
  </metadata>
  <files>
    <file src="node_modules\**\*" target="libs\native\node_modules" exclude="**\*.pdb;**\*.obj;**\*.ilk;node_modules\typescript\**\*;node_modules\@types\**\*"/>
  </files>
</package>

Then in this folder call

npm install

After that you are ready to pack NuGet with nuspec specification

@erdembayar
Copy link
Contributor

@BlackGad
Thank you for providing good test data. Now I can see good improvement on test data you provided.

image

@erdembayar
Copy link
Contributor

@zkat @kartheekp-ms @dtivel
I can confirm there is perf improvement on test data, looks good for me. But I not sure about security aspect of regex, please give your review, I'm not familiar about regex quirks.

@BlackGad
Copy link
Contributor Author

Can you share me build details?

@erdembayar
Copy link
Contributor

erdembayar commented Mar 25, 2021

Can you share me build details?

@BlackGad
Sorry for inconvenience. I can see some errors related to ci/cd, it looks you have to rebase to latest dev again.
Your code rebased on dev from Mar 12, 2021 but since then there are some changes made to ci/cd.
image

@BlackGad BlackGad force-pushed the dev-BlackGad-ExcludeFilesPerformanceBoost branch from 32065ea to 072a39c Compare March 25, 2021 17:43
@BlackGad
Copy link
Contributor Author

Rebased. We have 8 hours time difference. For me to have a possibility to react during your working day, please write in the morning )

@BlackGad
Copy link
Contributor Author

Btw are there any plans to make tests build log accessible to contributors? It would be great feature for faster interaction without involving your team.

@BlackGad
Copy link
Contributor Author

Ok what is Build_and_UnitTest_RTM and what's going wrong with nix systems? TPL or regex?

@zkat
Copy link
Contributor

zkat commented Mar 25, 2021

image

On Linux:
image

On Windows:
image

Fixed nix and network windows systems regex (nix system absolute path starts with /, windows network path starts with \\ which were not matched with new regex)
@BlackGad
Copy link
Contributor Author

BlackGad commented Mar 25, 2021

Figured out errors.

Fixed several unit tests because of incorrect assertion (relay on initial files ordering but AsParallel not guaranty it)
Also fixed new regex pattern. On nix systems local path starts from / which did not match to ([^/]+/)*? in order to respect this added initial /? to full regex. Now it looks like /?([^/]+/)*?

Same may happen in windows systems for network pathes which starts from \\. In order to support those situations (\\\\)? was added. Regex for windows platforms become (\\\\)?([^\\]+\\)*?

@BlackGad
Copy link
Contributor Author

BlackGad commented Mar 25, 2021

@zkat could you review changes and rerun tests? :)

@BlackGad
Copy link
Contributor Author

BlackGad commented Mar 26, 2021

Last build failed at NuGet.Client-PrivateDev. @zkat can you provide more details?

@zkat
Copy link
Contributor

zkat commented Mar 26, 2021

@BlackGad it was a hiccup. Should be fine once I rerun. And yes, I agree that this isn't a great situation re: CI. Thanks for your patience.

@BlackGad
Copy link
Contributor Author

:( again failed. @zkat Could you please provide details?

@zkat
Copy link
Contributor

zkat commented Mar 26, 2021

Flaky test. I'll rerun it when the rest succeed.

@BlackGad
Copy link
Contributor Author

Bad luck or something was broken?

@zkat
Copy link
Contributor

zkat commented Mar 26, 2021

rerunning now. It was just a hiccup. Sometimes we have to retry CI a few times before it actually works :(

@BlackGad
Copy link
Contributor Author

BlackGad commented Mar 26, 2021

:( so unstable. I hoped situation become better from my last PR. I guess this side of nuget is closed for external contributors. Am I wrong?

@zkat
Copy link
Contributor

zkat commented Mar 26, 2021

no, it's just that since part of what we do HAS to be internal (because we insert stuff into VS itself), and for historical reasons, it's kinda tricky for us to have public CI. This definitely causes friction, though.

@BlackGad
Copy link
Contributor Author

BlackGad commented Mar 26, 2021

Well, integration parts with VS understandable but missing logs from build and basic unit test results on different platforms make me truly upset. As well as nix platform build and test instructions. We spent for 3 days because of this) Anyway hope you will solve this in future

@zkat
Copy link
Contributor

zkat commented Mar 26, 2021

We're actually actively working on this! It just takes time and there's a lot of internal resources we still use, and we need to stop doing that before CI can become public. But rest assured, it's coming. Thank you for bearing with us in the meantime. I'm keeping an eye on this build and things seem to be going ok so far. It's just those two flaky jobs running again.

@BlackGad
Copy link
Contributor Author

Yeah it would really make it easier to contribute! I am just trying to fix things that interferes me in my projects) Thank you :)

@zkat zkat merged commit c6ca6eb into NuGet:dev Mar 29, 2021
@zkat
Copy link
Contributor

zkat commented Mar 29, 2021

yay! We're green! Thanks so much for the contribution :)

@BlackGad
Copy link
Contributor Author

Yay! You are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community PRs created by someone not in the NuGet team
Projects
None yet
4 participants