Avoid allocations when adding existing items to the blob heap. #81059

teo-tsirpanis · 2023-01-23T23:07:18Z

To avoid adding them many times, System.Reflaction.Metadata keeps track of the blobs added in the blob heap through a Dictionary<ImmutableArray<byte>, BlobHandle> and a custom equality comparer that compares the blobs by their content. The problem with this approach is that we have to allocate an immutable array for every blob we are going to add, even if it already exists in the heap. The most obvious solution to this problem it be to wait for #2010 or implement a full-featured hash table with span keys for our own internal use.

This PR uses a different approach to solve this problem. Instead of a Dictionary<ImmutableArray<byte>, BlobHandle> we have a Dictionary<int, KeyValuePair<ImmutableArray<byte>, BlobHandle>> where the key is the hash code of our blob, and the value is the blob itself and its handle.

If we want to see if a blob exists:

We set the dictionary's key as the blob's hash code.
We check if the dictionary has an item with this key.
1. If it doesn't, the blob does not exist. We can add it at the current dictionary key. Only then do we have to allocate the immutable array.
2. If it has, we compare the value's key with the user-provided key.
  1. If they are equal, the blob exists.
  2. If they are not equal, get the next dictionary key (at the moment by running the current one over a Linear Congruential Generator) and go to step 2.

The simplicity of this solution comes from using a "hash table within a hash table". The Dictionary gets keys in their full 32-bit range and will do its best to allow us efficiently access them, and we don't have to do all the heavy lifting ourselves.

With this data structure in hand, the MetadataReader.GetOrAddBlob*** methods got rewritten. An internal overload that takes a ReadOnlySpan<byte> got introduced, and the other overloads were reimplemented around it:

GetOrAddBlobUTF16 will cast the string straight to a ReadOnlySpan<byte> if we are on a little-endian system and avoid any allocations and copies.
GetOrAddBlob(BlobBuilder) cannot always avoid the allocation, but it will if the blob builder consists of a single segment.
- Other overloads are using this one and were indirectly optimized too.
GetOrAddBlob(ImmutableArray<byte>) takes care to reuse the provided immutable array, like before.

Let's see what CI thinks. I will open an API proposal to add span APIs that would take advantage of this. I also left the optimization of the string and user string heaps for another time (they don't currently have a public API that would take advantage of this).

ghost · 2023-01-23T23:07:28Z

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata
See info in area-owners.md if you want to be subscribed.

Issue Details

To avoid adding them many times, System.Reflaction.Metadata keeps track of the blobs added in the blob heap through a Dictionary<ImmutableArray<byte>, BlobHandle> and a custom equality comparer that compares the blobs by their content. The problem with this approach is that we have to allocate an immutable array for every blob we are going to add, even if it already exists in the heap. The most obvious solution to this problem it be to wait for #2010 or until implement a full-featured hash table with span keys for our own internal use.

This PR uses a different approach to solve this problem. Instead of a Dictionary<ImmutableArray<byte>, BlobHandle> we have a Dictionary<int, KeyValuePair<ImmutableArray<byte>, BlobHandle>> where the key is the hash code of our blob, and the value is the blob itself and its handle.

If we want to see if a blob exists:

We set the dictionary's key as the blob's hash code.
We check if the dictionary has an item with this key.
1. If it doesn't, the blob does not exist. We can add it at the current dictionary key. Only then do we have to allocate the immutable array.
2. If it has, we compare the value's key with the user-provided key.
  1. If they are equal, the blob exists.
  2. If they are not equal, get the next dictionary key (at the moment by running the current one over a Linear Congruential Generator) and go to step 2.

The simplicity of this solution comes from using a "hash table within a hash table". The Dictionary gets keys in their full 32-bit range and will do its best to allow us efficiently access them, and we don't have to do all the heavy lifting ourselves.

With this data structure in hand, the MetadataReader.GetOrAddBlob*** methods got rewritten. An internal overload that takes a ReadOnlySpan<byte> got introduced, and the other overloads were reimplemented around it:

GetOrAddBlobUTF16 will cast the string straight to a ReadOnlySpan<byte> if we are on a little-endian system and avoid any allocations and copies.
GetOrAddBlob(BlobBuilder) cannot always avoid the allocation, but it will if the blob builder is small or consists of a single segment (in the latter case it will also avoid any copies).
- Other overloads are using this one and were indirectly optimized too.
GetOrAddBlob(ImmutableArray<byte>) takes care to reuse the provided immutable array, like before.

Let's see what CI thinks. I will open an API proposal to add span APIs that would take advantage of this. I also left the optimization of the string and user string heaps for another time (they don't currently have a public API that would take advantage of this).

Author:	teo-tsirpanis
Assignees:	-
Labels:	`area-System.Reflection.Metadata`, `community-contribution`
Milestone:	-

…e its API.

buyaa-n · 2023-03-06T22:19:54Z

Let's see what CI thinks. I will open an API proposal to add span APIs that would take advantage of this.

@teo-tsirpanis I was confused with this, is this PR ready for review? If its ready, do you have numbers to show/share the improvements? If it's not ready yet let's convert it into draft

teo-tsirpanis · 2023-03-06T22:25:22Z

@buyaa-n yes it's ready for review. I didn't run any benchmarks yet.

steveharter · 2023-04-03T20:15:06Z

@teo-tsirpanis thanks for this PR as it looks like a significant perf improvement. However, we need benchmarks to show the removed allocations and the CPU impact. This can either be done through the official benchmarks or by using one-off Benchmark.Net results.

I do not see any existing official benchmarks for MetadataBuilder. I can assist by creating these and verify against this PR if needed. Thanks.

teo-tsirpanis · 2023-04-04T09:15:43Z

Thanks @steveharter, I would appreciate some help with the benchmarks.

steveharter · 2023-04-11T15:40:36Z

@teo-tsirpanis can you merge to main again please? CI is a bit of a mess here with almost 2000 failures. Thanks

steveharter · 2023-04-11T15:48:44Z

I created some benchmarks and they do look good, so thanks! I'll create a PR for these once you verify those.

It seems like the changes to do the comparisons are much faster; the "Write_NoReuse" benchmark below is almost 2x faster (the first byte differs, causing a short compare perhaps).

Allocs in the no-share increased slightly (3%) but in the sharing case did go down at a ratio of .65. (one-third).

results:

|        Method |        Job |               Toolchain |      Mean |    Error |   StdDev |   Median |       Min |      Max | Ratio | RatioSD |    Gen0 |    Gen1 | Allocated | Alloc Ratio |
|-------------- |----------- |------------------------ |----------:|---------:|---------:|---------:|----------:|---------:|------:|--------:|--------:|--------:|----------:|------------:|
| Write_NoReuse | Job-VRUVWC |  \TEO_AFTER\corerun.exe | 105.65 us | 0.966 us | 0.904 us | 106.0 us | 104.29 us | 107.2 us |  0.55 |    0.01 | 26.1954 | 10.8108 | 269.83 KB |        1.03 |
| Write_NoReuse | Job-FAUVYJ | \TEO_BEFORE\corerun.exe | 193.31 us | 2.292 us | 2.144 us | 193.0 us | 190.17 us | 197.1 us |  1.00 |    0.00 | 25.2765 |  9.4787 | 261.21 KB |        1.00 |
|               |            |                         |           |          |          |          |           |          |       |         |         |         |           |             |
|   Write_Reuse | Job-VRUVWC |  \TEO_AFTER\corerun.exe |  99.95 us | 2.096 us | 2.414 us | 100.5 us |  96.51 us | 104.8 us |  0.95 |    0.03 | 16.0156 |  3.1250 | 167.72 KB |        0.65 |
|   Write_Reuse | Job-FAUVYJ | \TEO_BEFORE\corerun.exe | 104.73 us | 1.129 us | 1.001 us | 104.7 us | 103.04 us | 106.2 us |  1.00 |    0.00 | 25.4167 |  5.0000 | 259.65 KB |        1.00 |

To run the benchmarks:

Build your main and copy the binaries to a folder. e.g. copy artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\ c:\repos\TEO_BEFORE
Build your srm-blob-opt and copy the binaries to a folder. e.g. copy artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\ c:\repos\TEO_AFTER
Clone the Performance repo and pull my branch
From the performance repo, go to directory to src\benchmarks\micro
Run dotnet run -c Release -f net7.0 --filter System.Reflection.Metadata.Blob* --join --corerun c:\repos\TEO_BEFORE\corerun.exe c:\repos\TEO_AFTER\corerun.exe

…ffer. The blob builder's default starting size is 256 bytes; the same as the stack-alllocated buffer, and making it bigger does not have any benefit since such large blobs are rare.

steveharter

Looks good. @teo-tsirpanis can you merge this or should I? Thanks

teo-tsirpanis · 2023-04-19T18:07:06Z

I can't, I'm not a Microsoft employee. 😅

buyaa-n · 2023-04-19T23:31:27Z

I will open an API proposal to add span APIs that would take advantage of this.

Thank you @teo-tsirpanis, I am looking forward for that proposal and the new API that we could use for emitting custom attributes that comes with ReadOnlySpan

Test failure unrelated and reported, merging

teo-tsirpanis added 4 commits January 23, 2023 23:32

Add BlobDictionary.

3d44ea6

Use BlobDictionary in MetadataReader and optimize the simple cases.

0cf8b9c

Optimize adding blobs from UTF-16 strings and from blob builders.

a7b99d4

Delete ByteSequenceComparer.

1011fb3

dotnet-issue-labeler bot added the area-System.Reflection.Metadata label Jan 23, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 23, 2023

build-analysis bot mentioned this pull request Jan 24, 2023

System.Text.RegularExpressions.Tests.AttRegexTests test fails in CI #75808

Closed

teo-tsirpanis added 2 commits January 24, 2023 20:39

Fix compiler errors.

9483228

Clean-up BlobDictionary; move it, make it non-generic and streamlin…

bb40c67

…e its API.

teo-tsirpanis force-pushed the srm-blob-opt branch from fa3dae9 to bb40c67 Compare January 24, 2023 19:52

buyaa-n requested review from steveharter and tmat February 22, 2023 03:36

steveharter added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 3, 2023

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 4, 2023

teo-tsirpanis added 2 commits April 11, 2023 23:22

Merge branch 'dotnet:main' into srm-blob-opt

5694dce

Don't try to copy the blob builder's contents to a stack-allocated bu…

9c62c03

…ffer. The blob builder's default starting size is 256 bytes; the same as the stack-alllocated buffer, and making it bigger does not have any benefit since such large blobs are rare.

buyaa-n mentioned this pull request Apr 17, 2023

AssemblyBuilder.Save add custom attributes handling. #84580

Merged

steveharter approved these changes Apr 19, 2023

View reviewed changes

buyaa-n mentioned this pull request Apr 19, 2023

System.Security.Cryptography.Tests CertTests.RaceDisposeAndKeyAccess failing #85073

Closed

buyaa-n merged commit 6149ca0 into dotnet:main Apr 19, 2023
104 of 107 checks passed

teo-tsirpanis deleted the srm-blob-opt branch April 20, 2023 09:11

steveharter added the tenet-performance Performance related issue label Apr 24, 2023

teo-tsirpanis mentioned this pull request Apr 24, 2023

[API Proposal]: Embrace spans in System.Reflection.Metadata. #85280

Open

dotnet locked as resolved and limited conversation to collaborators May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid allocations when adding existing items to the blob heap. #81059

Avoid allocations when adding existing items to the blob heap. #81059

teo-tsirpanis commented Jan 23, 2023 •

edited

ghost commented Jan 23, 2023

buyaa-n commented Mar 6, 2023

teo-tsirpanis commented Mar 6, 2023

steveharter commented Apr 3, 2023

teo-tsirpanis commented Apr 4, 2023

steveharter commented Apr 11, 2023

steveharter commented Apr 11, 2023

steveharter left a comment

teo-tsirpanis commented Apr 19, 2023

buyaa-n commented Apr 19, 2023

Avoid allocations when adding existing items to the blob heap. #81059

Avoid allocations when adding existing items to the blob heap. #81059

Conversation

teo-tsirpanis commented Jan 23, 2023 • edited

ghost commented Jan 23, 2023

buyaa-n commented Mar 6, 2023

teo-tsirpanis commented Mar 6, 2023

steveharter commented Apr 3, 2023

teo-tsirpanis commented Apr 4, 2023

steveharter commented Apr 11, 2023

steveharter commented Apr 11, 2023

steveharter left a comment

Choose a reason for hiding this comment

teo-tsirpanis commented Apr 19, 2023

buyaa-n commented Apr 19, 2023

teo-tsirpanis commented Jan 23, 2023 •

edited