Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup/ItemDictionary make up a huge amount of allocations/CPU item #2587

Open
Tracked by #6940
davkean opened this issue Oct 5, 2017 · 10 comments
Open
Tracked by #6940

Lookup/ItemDictionary make up a huge amount of allocations/CPU item #2587

davkean opened this issue Oct 5, 2017 · 10 comments
Labels
backlog performance Performance-Scenario-Solution-Open This issue affects solution open performance. Priority:2 Work that is important, but not critical for the release size:7 triaged

Comments

@davkean
Copy link
Member

davkean commented Oct 5, 2017

What's happening is that the initial capability/current capability is not correct, and we're resizing the underlying items dictionary, possible multiple times with the same ImportItems call.

image

Not sure of a good way to fix this - there's no way to resize a dictionary after it's created.

@sharwell
Copy link
Member

sharwell commented Oct 5, 2017

📝 As of this writing, the collections referenced in this comment are in their earliest stages (API review), but I'm keeping an eye open for cases where they may help solve real-world problems.

💡 ItemDictionary<T> appears to be a candidate for the SortedTreeList<T> collection, which may be able serve the combined purpose of both dictionaries and the linked lists in ItemDictionary<T>. Outside of that approach, a more direct replacement with TreeDictionary<TKey, TValue> would address the immediate concerns of this issue with less code churn.

⚠️ The locks used in ItemDictionary<T> are incorrect and/or misleading. Given the increased sensitivity of B+-tree data structures to corruption and/or unpredictable behavior when used incorrectly in a multi-threaded environment, this would obviously need to be fixed before the above is an option.

@davkean
Copy link
Member Author

davkean commented Oct 11, 2017

Having spent a non-trivial time walking through this code, I'm convinced that ItemDictionary/Lookup needs to be rethought/rewritten with more appropriate collections. 11% of allocations and 6.5% comes from Lookup.

Memory:
image

CPU:
image

We spent a large non-trivial amount of time creating and merging collections, huge amount of allocations are us resizing the underlying Dictionary<TKey, TValue> because we couldn't figure out the up-front size. I'm going to change this bug to be about that.

@davkean davkean changed the title ItemDictionary<T>.ImportItems allocates up to 2.5% of all builds Lookup/ItemDictionary make up a huge amount of allocations/CPU item Oct 11, 2017
@yahorsi
Copy link

yahorsi commented Oct 11, 2017

Just interesting why the hell Dictionary needs to be resized? Can't instead of resizing we allocate just another array and use list of arrays instead of a single?

@davkean
Copy link
Member Author

davkean commented Oct 11, 2017

Tradeoff - that would increase the size of Dictionary, and eventually something has to increase the size of the list of arrays and we're back to square one. Though that would help avoiding the large object heap- but if your dictionary is 85k you're probably better off with a different object.

@yahorsi
Copy link

yahorsi commented Oct 11, 2017

You don't have to use List to store Arrays, simple linked list would be enough here. Anyway any collection is good at some particular use case. E.g. if collection is resized a lot then that collection is probably a bad choice for that scenario?

PS: Can't we specify Capacity here?

@davkean
Copy link
Member Author

davkean commented Oct 11, 2017

I'm assuming you don't mean a linked link as the entire back storing (which would be O(n) lookup) but rather as the store for the underlying arrays. If so, yes, that's heading towards Immutable collection territory - which is what I'm thinking.

I've made changes to set capacity in places - but we due to the way it was written, we don't know how much data we're going to add these dictionaries by the end of the scopes of evaluation, so we can't pick the "right size".

@AndyGerlicher AndyGerlicher added this to the MSBuild 15.6 milestone Nov 6, 2017
@danmoseley
Copy link
Member

Note @davkean 's request was implemented for .NET Core 2.1, so for that target you can call EnsureCapacity

https://github.com/dotnet/core/blob/master/release-notes/2.1/api-diff/2.0-vs-2.1_System.Collections.Generic.md

@panopticoncentral panopticoncentral added the Performance-Scenario-Solution-Open This issue affects solution open performance. label Mar 31, 2020
@panopticoncentral panopticoncentral removed this from the MSBuild 15.6 milestone May 21, 2020
@panopticoncentral panopticoncentral added the Priority:2 Work that is important, but not critical for the release label Mar 23, 2021
@ladipro ladipro added the size:1 label Oct 12, 2021
@ladipro
Copy link
Member

ladipro commented Oct 12, 2021

Labeling with size:1 as the cost of the initial investigation. Will open a follow-up issue if more work is identified.

@ladipro ladipro self-assigned this Dec 6, 2021
@ladipro ladipro moved this from To Do to In Progress in Perf sprint Dec 6th - Dec 19th 2021 Dec 13, 2021
@ladipro
Copy link
Member

ladipro commented Dec 13, 2021

12/2021 numbers as measured when clean-building the Ocelot solution:

Lookup ItemDictionary
CPU 1.8% 1.1%
Memory 7.8% 6.0%

Not as bad anymore but definitely worth optimizing.

@ladipro ladipro removed this from In Progress in Perf sprint Dec 6th - Dec 19th 2021 Jan 10, 2022
@ladipro ladipro removed their assignment Jan 10, 2022
@ladipro ladipro added size:7 and removed size:1 labels Jan 10, 2022
@ladipro
Copy link
Member

ladipro commented Jan 10, 2022

Back to backlog as I will not have time to work on this in the near future.

@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog performance Performance-Scenario-Solution-Open This issue affects solution open performance. Priority:2 Work that is important, but not critical for the release size:7 triaged
Projects
None yet
Development

No branches or pull requests

9 participants