Skip to content

Optimize memory usage when building the blob heap.#127304

Open
teo-tsirpanis wants to merge 4 commits intodotnet:mainfrom
teo-tsirpanis:srm-blob-heap-opt
Open

Optimize memory usage when building the blob heap.#127304
teo-tsirpanis wants to merge 4 commits intodotnet:mainfrom
teo-tsirpanis:srm-blob-heap-opt

Conversation

@teo-tsirpanis
Copy link
Copy Markdown
Contributor

@teo-tsirpanis teo-tsirpanis commented Apr 22, 2026

Background

When building the blob heap, MetadataBuilder keeps track of the blobs added, to avoid adding them multiple times. In the beginning, this was happening using a Dictionary<ImmutableArray<byte>, BlobHandle> and a custom comparer that compared the keys by value. This approach had the disadvantage of always allocating an ImmutableArray<byte> when you called GetOrAddBlob with anything except an immutable array. #81059 improved this situation and eliminated most allocations when the blob already exists. However, there are several optimization opportunities in how we build the blob heap:

  • We still get an allocation when we call GetOrAddBlob with a multi-chunk BlobBuilder, even if the blob already existed.
  • Adding a new blob to the heap still ends up making an allocation.
  • Unlike other heap types, the blob heap gets written in random order, which requires allocating a contiguous memory block as large as the size of the entire blob heap. This subverts BlobBuilder's pooling and chunking facilities, and leads to an LOH allocation.

This PR fixes all of the above.

Changes

Instead of keeping track of each blob as an ImmutableArray<byte> and writing the blob heap at the end, we write the blob heap to a BlobBuilder as each blob gets added, and keep track of each blob by its position within that BlobBuilder.

In order to do that, BlobBuilder was extended to support writing data that can be later referenced using a BlobBuilder.Segment struct. This is an internal-only functionality that slightly alters some invariants of BlobBuilder, but is invisible to external consumers. Segment-addressible buffers are written in chunks of increasingly sized buffers up to 8K bytes, matching the behavior of StringBuilder. This chunking logic will be user-configurable and expanded to all BlobBuilder APIs as part of #100418.

Afterwards, BlobDictionary was updated to use BlobBuilder.Segment as its key type, and append to the BlobBuilder to get a segment when a blob does not already exist. Also, the modern .NET implementation of BlobDictionary was significantly simplified by making use of the AlternateLookup API.

TODO

  • Benchmark

Copilot AI review requested due to automatic review settings April 22, 2026 23:01
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 22, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors how System.Reflection.Metadata builds the #Blob heap to reduce allocations and avoid a large contiguous buffer allocation by writing blob data incrementally into a BlobBuilder and deduplicating by referencing written segments.

Changes:

  • Write #Blob heap content incrementally into a dedicated HeapBlobBuilder as blobs are added, and compute heap sizes from _blobBuilder.Count.
  • Extend BlobBuilder with internal “Segment” APIs to allow later referencing of previously written data for deduplication.
  • Update BlobDictionary to use BlobBuilder.Segment keys (and AlternateLookup on .NET) instead of ImmutableArray<byte> keys.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/MetadataBuilder.cs Switches serialized heap size accounting to use _blobBuilder.Count.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/MetadataBuilder.Heaps.cs Reworks blob heap accumulation/writing to use _blobBuilder and removes the “write blob heap at end” path.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/BlobDictionary.cs Changes blob dedup dictionary to key by BlobBuilder.Segment and uses AlternateLookup on .NET.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobWriterImpl.cs Adds span-based compressed-integer writer used by segment writing.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs Adjusts invariants / chunk expansion behavior to support segment-writing scenarios.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.Segment.cs New internal segment-writing implementation and Segment struct for stable references.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/Hash.cs Refactors FNV hashing to add an “accumulate” helper.
src/libraries/System.Reflection.Metadata/src/System.Reflection.Metadata.csproj Includes the new BlobBuilder.Segment.cs file (and normalizes the first line).


_blobs.GetOrAdd(ReadOnlySpan<byte>.Empty, ImmutableArray<byte>.Empty, default, out _);
_blobHeapSize = 1;
_blobs = new BlobDictionary(_blobBuilder, 32);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial capacity for _blobs dropped from 1024 to 32. If the blob heap commonly contains hundreds/thousands of unique blobs (as the previous default implied), this will cause more dictionary resizes and allocations. Consider keeping the previous capacity (or deriving it from an existing heuristic) unless there’s data showing 32 is sufficient.

Suggested change
_blobs = new BlobDictionary(_blobBuilder, 32);
_blobs = new BlobDictionary(_blobBuilder, 1024);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@teo-tsirpanis teo-tsirpanis Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be discussed. Other heaps used multiples of 1024 as their capacity in bytes, not elements. Now that we can set the capacity of the blob heap being built, I moved the use of 1024 there, and set the dictionary's initial capacity to $\sqrt{1024} = 32$ elements.

Copilot AI review requested due to automatic review settings April 23, 2026 17:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

@teo-tsirpanis
Copy link
Copy Markdown
Contributor Author

@EgorBot -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers.Binary;
using System.Reflection.Metadata;
using System.Reflection.Metadata.Ecma335;

BenchmarkSwitcher.FromAssembly(typeof(BlobHeapBenchmarks).Assembly).Run(args);

[MemoryDiagnoser]
public class BlobHeapBenchmarks
{
    const int BlobSize = 20;

    [Benchmark]
    [Arguments(2_000)]
    [Arguments(20_000)]
    public int Run(int blobCount)
    {
        var mdBuilder = new MetadataBuilder();
        byte[] buffer = new byte[BlobSize];
        for (int i = 0; i < blobCount; i++)
        {
            BinaryPrimitives.WriteInt32LittleEndian(buffer, i);
            _ = mdBuilder.GetOrAddBlob(buffer);
        }
        var mdRootBuilder = new MetadataRootBuilder(mdBuilder, suppressValidation: true);
        BlobBuilder output = new BlobBuilder();
        mdRootBuilder.Serialize(output, 0, 0);
        return output.Count;
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Reflection.Metadata community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants