Skip to content

Add VariantArray extension type and introduce IBinaryArray/IIndexes interfaces#325

Merged
CurtHagenlocher merged 17 commits intoapache:mainfrom
CurtHagenlocher:VariantArray
Apr 23, 2026
Merged

Add VariantArray extension type and introduce IBinaryArray/IIndexes interfaces#325
CurtHagenlocher merged 17 commits intoapache:mainfrom
CurtHagenlocher:VariantArray

Conversation

@CurtHagenlocher
Copy link
Copy Markdown
Contributor

@CurtHagenlocher CurtHagenlocher commented Apr 22, 2026

What's Changed

  • Adds a VariantArray extension type.
  • Adds a Builder class for VariantArray that encodes VariantValue instances into the variant binary format and constructs the backing StructArray.
  • Introduces internal interfaces IBinaryArray and IIndexes to decouple VariantArray and other uses from concrete array types. IBinaryArray unifies BinaryArray, LargeBinaryArray, and BinaryViewArray behind a common GetBytes API. IIndexes abstracts index resolution for DictionaryArray and RunEndEncodedArray, enabling efficient sequential enumeration for REE arrays. Integer array types implement IIndexes with GetPhysicalIndex and EnumeratePhysicalIndices methods.
  • Removes support for .NET 6.0, which has been deprecated and replaces it with .NET 8.0 where appropriate.
  • Adds support for .NET 4.6.2 to the Scalars assembly and tests

CurtHagenlocher and others added 11 commits April 21, 2026 07:03
Rename assembly, project, namespace, and test project:
- src/Apache.Arrow.Variant -> src/Apache.Arrow.Scalars
- test/Apache.Arrow.Variant.Tests -> test/Apache.Arrow.Scalars.Tests
- Update all namespace references across the solution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move all Variant*.cs files into a Variant/ subdirectory within
src/Apache.Arrow.Scalars and update the namespace from
Apache.Arrow.Scalars to Apache.Arrow.Scalars.Variant.

Update using directives in all consumers: Operations, Tests, and
Benchmarks projects.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove net6.0 target from Apache.Arrow
- Update Apache.Arrow.Flight.AspNetCore from net6.0 to net8.0
- Add net462 target to Apache.Arrow.Scalars and Apache.Arrow.Scalars.Tests
- Add ProjectReference from Apache.Arrow to Apache.Arrow.Scalars
- Implement VariantExtensionDefinition, VariantType, and VariantArray
  for the arrow.parquet.variant extension type, backed by
  struct<metadata: binary, value: binary>
- Storage validation accepts Binary, LargeBinary, or BinaryView fields
  and looks up fields by name (not index) per spec

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class support for Parquet Variant values in Apache.Arrow via a new VariantArray extension type, and introduces internal abstraction interfaces (IBinaryArray, IIndexes) to decouple decoding/enumeration code from concrete array implementations (including Dictionary and Run-End Encoded layouts).

Changes:

  • Added VariantType/VariantArray (plus VariantArray.Builder) and corresponding unit tests, including IPC round-trip coverage.
  • Introduced internal IBinaryArray and IIndexes interfaces, and updated binary and integer array types (plus RunEndEncodedArray) to implement them.
  • Updated decoding helpers to use IIndexes instead of type-switching on index array types; updated project files/TFMs and added a project reference to Apache.Arrow.Scalars.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test/Apache.Arrow.Tests/VariantArrayTests.cs Adds unit tests for building/reading VariantArray and IPC round-trip behavior.
test/Apache.Arrow.Scalars.Tests/Apache.Arrow.Scalars.Tests.csproj Adds net462 test target on Windows and overrides xUnit VS runner version for that TFM.
src/Apache.Arrow/Interfaces/IIndexes.cs Introduces internal index abstraction for dictionary/REE index resolution and sequential enumeration.
src/Apache.Arrow/Interfaces/IBinaryArray.cs Introduces internal binary abstraction to unify GetBytes(...) across binary array variants.
src/Apache.Arrow/Extensions/IArrowArrayExtensions.cs Refactors decoded list support to use IIndexes rather than ArrowTypeId-based casting.
src/Apache.Arrow/Arrays/VariantArray.cs Implements VariantType, VariantArray, extension definition, decoding, and builder.
src/Apache.Arrow/Arrays/UInt8Array.cs Implements IIndexes for dictionary/REE index abstraction.
src/Apache.Arrow/Arrays/UInt64Array.cs Implements IIndexes for dictionary/REE index abstraction (with checked casts).
src/Apache.Arrow/Arrays/UInt32Array.cs Implements IIndexes for dictionary/REE index abstraction (with checked casts).
src/Apache.Arrow/Arrays/UInt16Array.cs Implements IIndexes for dictionary/REE index abstraction.
src/Apache.Arrow/Arrays/RunEndEncodedArray.cs Implements IIndexes to allow physical index mapping via the new abstraction.
src/Apache.Arrow/Arrays/LargeBinaryArray.cs Implements IBinaryArray to unify binary access via GetBytes(...).
src/Apache.Arrow/Arrays/Int8Array.cs Implements IIndexes for dictionary/REE index abstraction.
src/Apache.Arrow/Arrays/Int64Array.cs Implements IIndexes for dictionary/REE index abstraction (with checked casts).
src/Apache.Arrow/Arrays/Int32Array.cs Implements IIndexes for dictionary/REE index abstraction.
src/Apache.Arrow/Arrays/Int16Array.cs Implements IIndexes for dictionary/REE index abstraction.
src/Apache.Arrow/Arrays/DictionaryArray.cs Adds GetIndexes() helper and exposes physical index enumeration via the new abstraction.
src/Apache.Arrow/Arrays/BinaryViewArray.cs Implements IBinaryArray to unify binary access via GetBytes(...).
src/Apache.Arrow/Arrays/BinaryArray.cs Implements IBinaryArray to unify binary access via GetBytes(...).
src/Apache.Arrow/Apache.Arrow.csproj Drops net6.0 TFM and adds project reference to Apache.Arrow.Scalars.
src/Apache.Arrow.Scalars/Apache.Arrow.Scalars.csproj Adds net462 target and required package refs for older frameworks.
src/Apache.Arrow.Flight.AspNetCore/Apache.Arrow.Flight.AspNetCore.csproj Raises target framework from net6.0 to net8.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/Apache.Arrow.Tests/VariantArrayTests.cs Outdated
Comment thread test/Apache.Arrow.Tests/VariantArrayTests.cs Outdated
Comment on lines +123 to +127
public ReadOnlySpan<byte> GetMetadataBytes(int index)
{
int physicalIndex = _metadataIndexes.GetPhysicalIndex(index);
return _metadataArray.GetBytes(physicalIndex, out bool isNull);
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMetadataBytes computes a physical index via IIndexes; for dictionary/run-end encodings IIndexes can return -1 (null index) or refer to an underlying value that is null. Calling GetBytes with a negative index will throw an ArgumentOutOfRangeException that’s hard to interpret. Consider validating the physicalIndex and/or the returned isNull flag and throwing a clearer InvalidOperationException (or mapping to VariantValue.Null) when the storage is inconsistent.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@CurtHagenlocher CurtHagenlocher Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this consequence for malformed data but would welcome other opinions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah getting an ArgumentOutOfRangeException in this scenario sounds reasonable.

Comment thread src/Apache.Arrow/Arrays/VariantArray.cs Outdated
Comment thread src/Apache.Arrow/Arrays/VariantArray.cs
Comment thread src/Apache.Arrow/Arrays/VariantArray.cs
Comment thread src/Apache.Arrow/Apache.Arrow.csproj

namespace Apache.Arrow
{
internal interface IBinaryArray
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could perhaps be public at some point.

Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me thanks Curt

@CurtHagenlocher CurtHagenlocher merged commit 90aed52 into apache:main Apr 23, 2026
14 checks passed
@CurtHagenlocher CurtHagenlocher deleted the VariantArray branch April 23, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants