New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid allocations when adding existing items to the blob heap. #81059
Conversation
Tagging subscribers to this area: @dotnet/area-system-reflection-metadata Issue DetailsTo avoid adding them many times, This PR uses a different approach to solve this problem. Instead of a If we want to see if a blob exists:
The simplicity of this solution comes from using a "hash table within a hash table". The With this data structure in hand, the
Let's see what CI thinks. I will open an API proposal to add span APIs that would take advantage of this. I also left the optimization of the string and user string heaps for another time (they don't currently have a public API that would take advantage of this).
|
fa3dae9
to
bb40c67
Compare
@teo-tsirpanis I was confused with this, is this PR ready for review? If its ready, do you have numbers to show/share the improvements? If it's not ready yet let's convert it into draft |
@buyaa-n yes it's ready for review. I didn't run any benchmarks yet. |
@teo-tsirpanis thanks for this PR as it looks like a significant perf improvement. However, we need benchmarks to show the removed allocations and the CPU impact. This can either be done through the official benchmarks or by using one-off Benchmark.Net results. I do not see any existing official benchmarks for MetadataBuilder. I can assist by creating these and verify against this PR if needed. Thanks. |
Thanks @steveharter, I would appreciate some help with the benchmarks. |
@teo-tsirpanis can you merge to |
I created some benchmarks and they do look good, so thanks! I'll create a PR for these once you verify those. It seems like the changes to do the comparisons are much faster; the "Write_NoReuse" benchmark below is almost 2x faster (the first byte differs, causing a short compare perhaps). Allocs in the no-share increased slightly (3%) but in the sharing case did go down at a ratio of .65. (one-third). results:
To run the benchmarks:
|
…ffer. The blob builder's default starting size is 256 bytes; the same as the stack-alllocated buffer, and making it bigger does not have any benefit since such large blobs are rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. @teo-tsirpanis can you merge this or should I? Thanks
I can't, I'm not a Microsoft employee. 😅 |
Thank you @teo-tsirpanis, I am looking forward for that proposal and the new API that we could use for emitting custom attributes that comes with ReadOnlySpan Test failure unrelated and reported, merging |
To avoid adding them many times,
System.Reflaction.Metadata
keeps track of the blobs added in the blob heap through aDictionary<ImmutableArray<byte>, BlobHandle>
and a custom equality comparer that compares the blobs by their content. The problem with this approach is that we have to allocate an immutable array for every blob we are going to add, even if it already exists in the heap. The most obvious solution to this problem it be to wait for #2010 or implement a full-featured hash table with span keys for our own internal use.This PR uses a different approach to solve this problem. Instead of a
Dictionary<ImmutableArray<byte>, BlobHandle>
we have aDictionary<int, KeyValuePair<ImmutableArray<byte>, BlobHandle>>
where the key is the hash code of our blob, and the value is the blob itself and its handle.If we want to see if a blob exists:
The simplicity of this solution comes from using a "hash table within a hash table". The
Dictionary
gets keys in their full 32-bit range and will do its best to allow us efficiently access them, and we don't have to do all the heavy lifting ourselves.With this data structure in hand, the
MetadataReader.GetOrAddBlob***
methods got rewritten. An internal overload that takes aReadOnlySpan<byte>
got introduced, and the other overloads were reimplemented around it:GetOrAddBlobUTF16
will cast the string straight to aReadOnlySpan<byte>
if we are on a little-endian system and avoid any allocations and copies.GetOrAddBlob(BlobBuilder)
cannot always avoid the allocation, but it will if the blob builder consists of a single segment.GetOrAddBlob(ImmutableArray<byte>)
takes care to reuse the provided immutable array, like before.Let's see what CI thinks. I will open an API proposal to add span APIs that would take advantage of this. I also left the optimization of the string and user string heaps for another time (they don't currently have a public API that would take advantage of this).