Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Extend BlobBuilder so consumers can better control allocations #100418

Open
jaredpar opened this issue Mar 28, 2024 · 1 comment
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Reflection.Metadata
Milestone

Comments

@jaredpar
Copy link
Member

jaredpar commented Mar 28, 2024

Background and motivation

The BlobBuilder type is a mix between:

  1. Trying to emulate the underlying mechanics and allocation profile of StringBuilder
  2. Extensible so that consumers of System.Reflection.Metadata can control allocations of BlobBuilder (with pooling)

In its current configuration it doesn't fully achieve either of these goals due the following reasons:

  1. BlobBuilder has no enforced maximum internal chunk size. Instead during write operations it has a much simpler strategy of use rest of current BlobBuilder then allocate a single BlobBuilder to hold the rest. That results in lots of LOH allocations during build.
  2. There are many types in System.Reflection.Metadata has no mechanism for consumers to provide derived BlobBuilder instances and instead allocate BlobBuilder types directly. This subverts attempts by consumers to pool allocations.
  3. The LinkSuffix / LinkPrefix APIs can end up silently mixing the types of BlobBuilder instances in a chain. That makes advanced caching like pooling array allocations impossible because types with different caching strategies get silently inserted into the chain. When these insertions happen the byte[] underlying the instances are swapped.
  4. There is to mechanism to control the underlying byte[] allocation which prevents these from being pooled. Only the BlobBuilder instances can be pooled which means their underlying byte[] is inefficiently managed because it can't be re-used when the containing BlobBuilder is at rest. This is in contrast to StringBuilder which leverages the ArrayPool<char> for allocations.
  5. There is no easy mechanism for derived types to control zeroing of underlying byte[] when a BlobBuilder instance from a pool is re-used. Can lead to difficult issues like 99244.

The below proposed changes are meant to address these problems such that consumers of System.Reflection.Metadata can do the following:

  1. Control the allocation of all BlobBuilder instances used in a emit pass.
  2. Control and manage the underlying byte[] in the BlobBuilder.
  3. Detect when BlobBuilder instances are linked with BlobBuilder instances of a different type.

Using the below changes I've been able to significantly improve the allocation profile of VBCSCompiler. For building a solution the scale of compilers.slnf (~500 compilation events, large, small and medium projects) I've been able to remove ~200MB of LOH for byte[] and reduce GC pause time by 1.5%.

API Proposal

namespace System.Reflection.Metadata;

public class BlobBuilder
{
+    /// <summary>
+    /// The byte array underpinning the <see cref="BlobBuilder"/>. This can only be called on
+    /// the head of a chain of <see cref="BlobBuilder"/> instances. Calling the setter will reset
+    /// the <see cref="Length"> to zero.
+    /// </summary>
+    protected byte[] Buffer { get; set; }

+    /// <summary>
+    /// Derived types can override this to restrict maximum chunk size to allocate when writing
+    /// a contiguous set of bytes through the WriteBytes APIs. When unset the default is to allocate
+    /// a chunk for the rest of the bytes that don't fit into the current chunk.
+    /// </summary>
+    protected virtual int? MaxChunkSize => null;

+    /// <summary>
+    /// Set the capacity of the <see cref="BlobBuilder"/>.
+    // </summary>
+    public int Capacity { get; set; }

+    protected BlobBuilder(byte[] buffer);

+    /// <summary>
+    /// This method is called in <see cref="LinkSuffix"> or <see cref="LinkPrefix"> for both the
+    /// current instance as well as the target of the link method. This allows derived types to 
+    /// detect when a link is being made between two different types of <see cref="BlobBuilder"/>
+    /// and take appropriate action.
+    /// </summary>
+    /// <remarks>
+    /// This method is called before the underlying buffers are swapped.
+    /// </remarks>
+    protected virtual void BeforeSwap(BlobBuilder other);

+    /// <summary>
+    /// Derived types can override this to control the allocation when <see cref="Capacity"> is 
+    /// changed.
+    // </summary>
+    protected virtual void SetCapacity(int capacity);

+    protected void WriteBytes(ReadOnlySpan<byte> buffer);
}

public class MetadataBuilder
{
+    public MetadataBuilder(
+        int userStringHeapStartOffset,
+        int stringHeapStartOffset,
+        int blobHeapStartOffset,
+        int guidHeapStartOffset,
+        Func<int, BlobBuilder>? createBlobBuilderFunc);
}

public class DebugDirectoryBuilder
{
+    public DebugDirectoryBuilder(BlobBuilder blobBuilder);
}

public class ManagedPEBuilder
{
+    /// <summary>
+    /// Dervied types can override this to control how <see cref="BlobBuilder"> instances are 
+    /// allocated during the emit pass. This allows consumers to pool <see cref="BlobBuilder">
+    /// instances more effectively.
+    /// </summary>
+    protected virtual BlobBuilder CreateBlobBuilder(int? minimumSize = null);
}

API Usage

Can see a full implementation of a PooledBlobBuilder. That branch contains the other changes necessary to use this new API.

Alternative Designs

One alternative design is to limit the ability to control the underlying byte[] allocation and have consumers focus on pooling BlobBuilders only. That will provide some benefit but it is inefficient. It means that a large number of byte[] are unused in the pooled BlobBuilder instances and hence other parts of the program end up allocating them instead.

Risks

There are a few risks to consider:

  1. Other teams besides Roslyn can provide derived instances of BlobBuilder, ManagedPEBuilder, etc ... These changes are careful to ensure that those consumers are not impacted by these changes. The behavior of the existing code only changes when the new hooks are used in derived types.
  2. That the changes don't fully hook all the places BlobBuilders are allocated. That would mean LinkSuffix / LinkPrefix are called with differing types thus limiting potential gains. In my local tests I hooked BeforeSwap such that it fails when linked with different types. Was able to successfully rebuild Roslyn with these changes so I'm confident these hooks are thorough.
  3. Taking advantage of BlobBuilder.MaxChunkSize does significantly increase the number of allocated BlobBuilder during emit. That will require changes to pooling strategies if leveraged. However the new APIs give consumers the flexibility to pursue several strategies here.
@jaredpar jaredpar added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 28, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 28, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 28, 2024
@jaredpar jaredpar added area-System.Reflection.Metadata api-ready-for-review API is ready for review, it is NOT ready for implementation and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 28, 2024
@teo-tsirpanis teo-tsirpanis removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 29, 2024
@steveharter steveharter removed the untriaged New issue has not been triaged by the area owner label Apr 17, 2024
@steveharter steveharter added this to the 9.0.0 milestone Apr 17, 2024
@terrajobst terrajobst added the blocking Marks issues that we want to fast track in order to unblock other important work label Apr 29, 2024
@bartonjs
Copy link
Member

bartonjs commented May 2, 2024

Video

  • WriteBytes should have been public (typo in the proposal)
  • There was a question as to whether MaxChunkSize should be a ctor-set field. It got moved to a ctor parameter only for the byte[]-accepting protected ctor
  • ManagedPEBuilder.CreateBlobBulder changed from int?=null to int=0 as zero has no alternative meaning
  • ManagedPEBuilder's new ctor should have all the parameters defaulted, and should "un-default" and hide the previous 4 argument ctor.
  • BeforeSwap => OnLinking
public partial class BlobBuilder
{
     protected byte[] Buffer { get; set; }
     public int Capacity { get; set; }

     protected BlobBuilder(byte[] buffer, int maxChunkSize = 0);

     protected virtual void OnLinking(BlobBuilder other);
     protected virtual void SetCapacity(int capacity);

     public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public partial class DebugDirectoryBuilder
{
     public DebugDirectoryBuilder(BlobBuilder blobBuilder);
}

public partial class ManagedPEBuilder
{
     protected virtual BlobBuilder CreateBlobBuilder(int minimumSize = 0);
}
public partial class MetadataBuilder
{
+    [EditorBrowsable(EditorBrowsableState.Never)]
     public MetadataBuilder(
-        int userStringHeapStartOffset = 0,
+        int userStringHeapStartOffset,
-        int stringHeapStartOffset = 0,
+        int stringHeapStartOffset,
-        int blobHeapStartOffset = 0,
+        int blobHeapStartOffset,
-        int guidHeapStartOffset = 0);
+        int guidHeapStartOffset);

+    public MetadataBuilder(
+        int userStringHeapStartOffset = 0,
+        int stringHeapStartOffset = 0,
+        int blobHeapStartOffset = 0,
+        int guidHeapStartOffset = 0,
+        Func<int, BlobBuilder>? createBlobBuilderFunc = null);
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Reflection.Metadata
Projects
None yet
Development

No branches or pull requests

5 participants