Reduce allocations in TokenSegment.TryMatch #12728

kartheekp-ms · 2023-07-05T17:03:04Z

Nigusu-Allehu · 2023-08-07T21:26:59Z

Proposed Solution

Removing substrings by using a ReadOnlySpan
Passing the full path and the indices to the parser.

Result

According to my attempts using Span (passing the full path and the indices) was able to help
me reduce allocation by the TryMatch function, However it resulted in the helper functions
making the allocations
I looked into replacing the helpers functions so that they also used Span. However the
helper functions eventually call parses that almost always do lookups in a dictionary, which
requires a string as a key(ReadOnlySpan can not be used for the dictionaries, it is stack only
data type). As a result, eventually some function will end up producing the string.

Conclusion
Even though I was able to mitigate producing substrings in the TryMatch function and
almost all the functions it calls, all the parsers it uses eventually almost always do a lookup
in a dictionary and as a result end up producing a substring.
As a result, improving the performance of TryMatch by removing the substrings it produces
will only result in the better performance of TryMatch, however the system in general does
not improve in performance

possible solution

We currently, store file paths for a package content as a string. For example, "contetFiles/any/img.jpg"
This is formed by looping through the directories that would lead to img.jpg and concatenating them.
The TryMatch function is then tasked with tokenizing this path by delimiter "/" when we could have actually kept the file path to img.jpg as a list of tokens(the directories) which will lead to img.jpg. This way TryMatch won't have to tokenize a long file path string and result in unnecessary allocations. In addition, TryMatch won't have to loop through each character in a file path, making it faster.
Drawback: I am not sure yet, but if we need the file paths as a string at any point, we will have to do concatenation; possibly multiple times.

davkean · 2023-10-23T00:56:59Z

@Nigusu-Allehu Can you point me to the dictionary lookup?

Nigusu-Allehu · 2023-10-23T15:50:16Z

@Nigusu-Allehu Nigusu Yenework FTE Can you point me to the dictionary lookup?

The TryMatch method eventually, most of the time, calls on of the parsers in the ManagedCodeConventions.cs. Here is an example : https://github.com/NuGet/NuGet.Client/blob/b02fdbb36d34947c24496b2c0b505aa785bc657f/src/NuGet.Core/NuGet.Packaging/ContentModel/ManagedCodeConventions.cs#L128

zivkan · 2023-10-23T16:13:46Z

Is this code path only used by packages that have contentFiles/, or is it also used by lib/, build/, and so on?

Nigusu-Allehu · 2023-10-23T16:21:08Z

It is used by all of them. https://github.com/NuGet/NuGet.Client/blob/b02fdbb36d34947c24496b2c0b505aa785bc657f/src/NuGet.Core/NuGet.Packaging/ContentModel/ManagedCodeConventions.cs#L411 in here there is a list of patterns that a package could have for its asset files. TokenSegments in these patterns always use a parser. For example for the pattern "lib/{tfm}/{assembly}", tfm and assembly are TokenSegments and as a result a parser would be used to identify the tfm and assembly.

zivkan · 2023-10-23T18:52:02Z

Ok. What I'm thinking is that NuGet could "pre-compute" assets lists per TFM on package extraction, and save the results in the values in the .nupkg.metadata computed file. Then, when NuGet is doing asset selection for a project, "all" it has to do is select the "closest TFM", then it already has the List<string> of matching files, and per-project restore won't need to use ManagedCodeConventions at all.

This will be a lot more difficult to implement than refactoring a single class for optimizations. But I think it has the potential for significant perf improvements. Other people should definately think about the suggestion, in case I'm way off base. CI restore might not benefit a whole lot, since they'll have to extract packages and do the asset pre-compute on every restore. But on developer boxes where the global packages folder isn't deleted frequently (for example, where customers use VS), I think it could really have a decent (maybe even big?) improvement.

I think NuGet's "convention based" approach makes manually creating packages a little easier, but I think it harms restore performance. We can't change the past (and I might be wrong that a more explicit package format would be faster to restore), but we can take advantage of NuGet's "generated on extraction" files. The .nupkg.metadata file already has a version property, so we can detect when an older version without the asset pre-compute exists, then do a one-time update.

Unfortunately, for read-only fallback folders, we'll need to keep using the convention based approach, but if whatever creates these fallback folder layouts updates to a newer NuGet, then the higher .nupkg.metadata file version will be pre-created there.

However, the big question is whether this idea will actually benefit perf. ManagedCodeConventions, and TokenSegment cause a lot of allocations, but so can deserializing "large" json files. Obviously, I suspect reading the json file will be better than doing convention based filtering every restore. But I don't think it's a 100% sure thing, so if there's a way to quickly prototype and benchmark to validate it's a good idea, that can tell us if it's worthwhile doing a big change.

davkean · 2023-10-24T01:16:21Z

@Nigusu-Allehu Is this the table you a referring to? This is currently defined using strings, but doesn't have to be, this could be:

        private readonly Dictionary<ReadOnlyMemory<Char>, Dictionary<ReadOnlyMemory<Char>, object>> _table = new Dictionary<ReadOnlyMemory<Char>, Dictionary<ReadOnlyMemory<Char>, object>>(ReadOnlyMemoryCharOrdinalComparer.Instance);

Where ReadOnlyMemoryCharOrdinalComparer is a comparer that delegates to MemoryExtensions.Equals and a custom GetHashCode. Or an implementation similar to Drew's StringPool which takes a Span and returns a "string".

davkean · 2023-10-24T01:28:04Z

I do want to say that @zivkan's idea is a good one, but I don't want to lose sight here that there might be an easier approach we can take to reduce the allocations on this path first.

zivkan · 2023-10-24T08:12:52Z

I intended to write "if we can't find easy wins" in my previous message. Looks like I forgot to 🤦‍♂️

Nigusu-Allehu · 2023-10-25T22:10:19Z

@Nigusu-Allehu Nigusu Yenework FTE Is this the table you a referring to? This is currently defined using strings, but doesn't have to be, this could be:
        private readonly Dictionary<ReadOnlyMemory<Char>, Dictionary<ReadOnlyMemory<Char>, object>> _table = new Dictionary<ReadOnlyMemory<Char>, Dictionary<ReadOnlyMemory<Char>, object>>(ReadOnlyMemoryCharOrdinalComparer.Instance);       
Where ReadOnlyMemoryCharOrdinalComparer is a comparer that delegates to MemoryExtensions.Equals and a custom GetHashCode. Or an implementation similar to Drew's StringPool which takes a Span and returns a "string".

Thank you, David! I will look into it

kartheekp-ms added Type:Bug Tenet:Performance Performance issues Functionality:Restore Area:ContentFiles PackageReference contentFiles folder labels Jul 5, 2023

kartheekp-ms self-assigned this Jul 5, 2023

kartheekp-ms added the TechDebt Technical debt label Jul 5, 2023

kartheekp-ms assigned Nigusu-Allehu Jul 17, 2023

nkolev92 unassigned kartheekp-ms Sep 5, 2023

aortiz-msft unassigned Nigusu-Allehu Feb 22, 2024

jeffkl self-assigned this Mar 4, 2024

jeffkl mentioned this issue Mar 8, 2024

Reduce allocations in TokenSegment.TryMatch NuGet/NuGet.Client#5676

Closed

8 tasks

jeffkl added the Category:BreakingChange label Mar 8, 2024

nkolev92 added Priority:2 Issues for the current backlog. Priority:1 High priority issues that must be resolved in the current sprint. and removed Priority:2 Issues for the current backlog. labels Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations in TokenSegment.TryMatch #12728

Reduce allocations in TokenSegment.TryMatch #12728

kartheekp-ms commented Jul 5, 2023

Nigusu-Allehu commented Aug 7, 2023 •

edited

davkean commented Oct 23, 2023

Nigusu-Allehu commented Oct 23, 2023

zivkan commented Oct 23, 2023

Nigusu-Allehu commented Oct 23, 2023

zivkan commented Oct 23, 2023

davkean commented Oct 24, 2023 •

edited

davkean commented Oct 24, 2023 •

edited

zivkan commented Oct 24, 2023

Nigusu-Allehu commented Oct 25, 2023

Reduce allocations in TokenSegment.TryMatch #12728

Reduce allocations in TokenSegment.TryMatch #12728

Comments

kartheekp-ms commented Jul 5, 2023

Nigusu-Allehu commented Aug 7, 2023 • edited

davkean commented Oct 23, 2023

Nigusu-Allehu commented Oct 23, 2023

zivkan commented Oct 23, 2023

Nigusu-Allehu commented Oct 23, 2023

zivkan commented Oct 23, 2023

davkean commented Oct 24, 2023 • edited

davkean commented Oct 24, 2023 • edited

zivkan commented Oct 24, 2023

Nigusu-Allehu commented Oct 25, 2023

Nigusu-Allehu commented Aug 7, 2023 •

edited

davkean commented Oct 24, 2023 •

edited

davkean commented Oct 24, 2023 •

edited