Skip to content

Refactors VariantValueWriter#329

Merged
CurtHagenlocher merged 4 commits intoapache:mainfrom
CurtHagenlocher:VariantValueWriter
Apr 26, 2026
Merged

Refactors VariantValueWriter#329
CurtHagenlocher merged 4 commits intoapache:mainfrom
CurtHagenlocher:VariantValueWriter

Conversation

@CurtHagenlocher
Copy link
Copy Markdown
Contributor

What's Changed

Refactors VariantValueWriter to use a new Buffer<> type to manage the output buffers instead of using MemoryStream.

Refactors `VariantValueWriter` to use a new `Buffer<>` type to manage the output buffers instead of using `MemoryStream`.
Comment thread src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs Outdated
Comment thread src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors VariantValueWriter to replace MemoryStream-based output buffering with a new pooled Buffer<T> implementation, aiming to reduce allocations and improve reuse.

Changes:

  • Introduce Buffer<T> + byte-writing extensions to manage pooled, growable buffers.
  • Refactor VariantValueWriter to use Buffer<byte>/Buffer<int>, add per-writer pools, and implement IDisposable.
  • Update call sites to dispose the writer, and add CopyValue(VariantReader) for transcoding.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs Replaces MemoryStream with pooled Buffer<>, adds disposal, and adds CopyValue transcoding.
src/Apache.Arrow.Scalars/Variant/VariantBuilder.cs Disposes VariantValueWriter via using to return pooled arrays.
src/Apache.Arrow.Scalars/Variant/Buffer.cs Adds the new pooled, growable Buffer<T> and byte-oriented write helpers.
src/Apache.Arrow.Operations/VariantJson/VariantJsonReader.cs Disposes VariantValueWriter via using to return pooled arrays.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs
Comment thread src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs
Comment thread src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs
Comment thread src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs
Comment thread src/Apache.Arrow.Scalars/Variant/VariantValueWriter.cs Outdated
Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me thanks Curt.

Is the motivation for this just to increase performance by reducing allocations, and do you have any measurements of the effect of this change on allocations?

I saw you made a comment about wanting to refactor VariantValueWriter before adding shredding support, but I'm not sure how this helps with shredding.

/// within one owner — unlike <see cref="ArrayPool{T}.Shared"/>, which
/// buckets by size class and may hand back a different array than the one
/// last returned.
/// </summary>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a comment that arrays may be added to the pool from ArrayPool<T>.Shared by Acquire and the caller is responsible for returning all arrays in the Stack back to ArrayPool<T>.Shared.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This got me to realize that the arrays being rented in Buffer but returned in VariantValueWriter was unsatisfying if not actively dangerous. I've moved the drainage code to Buffer and added a comment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that's much nicer having the array pool interactions all within Buffer

@CurtHagenlocher
Copy link
Copy Markdown
Contributor Author

I saw you made a comment about wanting to refactor VariantValueWriter before adding shredding support, but I'm not sure how this helps with shredding.

There is shredding-related code which needs to write a ReadOnlySpan<byte> and because VariantValueWriter used a MemoryStream the span needed to be materialized into an array first. This got me to change how VariantValueWriter works, and in doing so I realized that the existing pooled arrays it used were very susceptible to leaking. I have a bias towards fixing already-checked-in code before adding new code, so I submitted this PR first.

@CurtHagenlocher CurtHagenlocher merged commit 4f60cc6 into apache:main Apr 26, 2026
14 checks passed
@CurtHagenlocher CurtHagenlocher deleted the VariantValueWriter branch April 26, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants