Skip to content

POCO serialization#284

Open
cmettler wants to merge 5 commits intoapache:mainfrom
cmettler:feature/serializer
Open

POCO serialization#284
cmettler wants to merge 5 commits intoapache:mainfrom
cmettler:feature/serializer

Conversation

@cmettler
Copy link
Contributor

Resolves #186.

I needed Arrow POCO serialization for an internal cross-language interop project (C# ↔ vgi-rpc-python). I took inspiration from System.Text.Json's source generator and MessagePack-CSharp's attribute model, iterated on it with Claude as a coding assistant, and arrived at this implementation.

Figured it might be useful upstream — please take a look and let me know what you think.

See README.md for full documentation and examples.

Christoph Mettler and others added 5 commits March 11, 2026 13:29
Adds Apache.Arrow.Serialization with a Roslyn incremental source generator
that emits compile-time Arrow schema derivation, serialization, and
deserialization for types marked with [ArrowSerializable].

- Runtime library (Apache.Arrow.Serialization): attributes, helpers,
  IPC extension methods, reflection-based RecordBatchBuilder
- Source generator (Apache.Arrow.Serialization.Generator): code emission
  for 31+ type mappings, polymorphism, custom converters, callbacks
- Test suite: 197 tests covering all supported types and features
- Integrated into solution, central package management, Apache 2.0 headers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align with upstream CI which uses .NET 8.0 SDK. All 197 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove src/test solution folders from sln so serialization projects
appear at root level like all other projects. Change serialization
library target from net8.0;net10.0 to net8.0 for CI compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@CurtHagenlocher CurtHagenlocher requested a review from Copilot March 14, 2026 23:03
@CurtHagenlocher
Copy link
Contributor

Sorry, I created a merge conflict by checking in the Parquet variant projects :(.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new POCO serialization subsystem for Apache Arrow in .NET, resolving issue #186. It provides two serialization paths: a source-generator-based AOT-safe approach using [ArrowSerializable] attributes and a reflection-based RecordBatchBuilder for anonymous types/prototyping.

Changes:

  • New Apache.Arrow.Serialization runtime library with attributes, interfaces (IArrowSerializer<T>, IArrowConverter<T>), helper classes, reflection-based RecordBatchBuilder, and extension methods for Arrow IPC serialization
  • New Apache.Arrow.Serialization.Generator Roslyn incremental source generator that emits schema derivation, serialization, deserialization code including polymorphic type support and JSON schema emission
  • Comprehensive test suite covering primitives, collections, nested types, enums, polymorphism, custom converters, callbacks, datetime types, diagnostics, and the reflection-based builder

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/Apache.Arrow.Serialization/Attributes.cs Defines all serialization attributes (ArrowSerializable, ArrowField, ArrowType, ArrowIgnore, ArrowMetadata, ArrowPolymorphic, ArrowDerivedType) and callback interface
src/Apache.Arrow.Serialization/IArrowSerializer.cs IArrowSerializer<T> interface with static abstract members and IArrowConverter<T> for custom converters
src/Apache.Arrow.Serialization/ArrowArrayHelper.cs Utility methods for building null arrays, Guid/TimeOnly/TimeSpan/Decimal arrays, and DateTime normalization
src/Apache.Arrow.Serialization/ArrowSerializerExtensions.cs Extension methods for IPC byte/stream serialization and collection convenience methods
src/Apache.Arrow.Serialization/RecordBatchBuilder.cs Reflection-based serializer for anonymous types and non-attributed objects
src/Apache.Arrow.Serialization/README.md Comprehensive documentation covering all features
src/Apache.Arrow.Serialization/Apache.Arrow.Serialization.csproj Runtime library project (net8.0)
src/Apache.Arrow.Serialization.Generator/ArrowSerializerGenerator.cs Main incremental generator: type analysis, diagnostics, and orchestration
src/Apache.Arrow.Serialization.Generator/CodeEmitter.cs Emits serialization/deserialization code for [ArrowSerializable] types
src/Apache.Arrow.Serialization.Generator/PolymorphicCodeEmitter.cs Emits code for [ArrowPolymorphic] type hierarchies
src/Apache.Arrow.Serialization.Generator/JsonSchemaEmitter.cs Emits optional JSON schema descriptors
src/Apache.Arrow.Serialization.Generator/Models.cs Internal model classes for the generator pipeline
src/Apache.Arrow.Serialization.Generator/Apache.Arrow.Serialization.Generator.csproj Generator project (netstandard2.0)
test/Apache.Arrow.Serialization.Tests/SerializationTests.cs Tests for source-generated serialization round-trips
test/Apache.Arrow.Serialization.Tests/RecordBatchBuilderTests.cs Tests for reflection-based builder
test/Apache.Arrow.Serialization.Tests/DiagnosticTests.cs Tests for generator diagnostic reporting
test/Apache.Arrow.Serialization.Tests/TestTypes.cs Shared test type definitions
test/Apache.Arrow.Serialization.Tests/Apache.Arrow.Serialization.Tests.csproj Test project
Directory.Packages.props Adds CodeAnalysis package versions
Apache.Arrow.sln Adds new projects to solution

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +35 to 37
<PackageVersion Include="Microsoft.CodeAnalysis.Analyzers" Version="3.3.4" />
<PackageVersion Include="Microsoft.CodeAnalysis.CSharp" Version="4.11.0" />
<PackageVersion Include="Microsoft.Bcl.AsyncInterfaces" Version="8.0.0" />
public void Append(object? value)
{
if (value is null) _b.AppendNull();
else _b.Append(new DateTimeOffset((DateTime)value, TimeSpan.Zero));
Comment on lines +620 to +623
// For null slots, we need a stand-in value (first non-null item)
object? standIn = _items.FirstOrDefault(v => v is not null);
foreach (var item in _items)
typedList.Add(item ?? standIn!);
Comment on lines +456 to +461
Line($"if ({access} is {{ }} v_{index}) {{ bld_{index}_idx.Append((short)bld_{index}_dict.Count); bld_{index}_dict.Add(v_{index}.ToString()); }} else bld_{index}_idx.AppendNull();");
else
{
Line($"bld_{index}_idx.Append((short)bld_{index}_dict.Count);");
Line($"bld_{index}_dict.Add({access}.ToString());");
}
Comment on lines +465 to +466
// fall back to AppendNull for now — these are less common in polymorphic scenarios
Line($"bld_{index}.AppendNull(); // TODO: complex type {prop.Type.Kind}");
break;
}
default:
Line($"object? prop_{propIndex} = null; // TODO: unsupported type {prop.Type.Kind}");
Comment on lines +83 to +84
// Remove trailing newline, add comma
sb.Length -= sb.ToString().EndsWith("\r\n") ? 2 : 1;
<TargetFramework>net8.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<Description>Source-generated Apache Arrow serialization for .NET. Provides [ArrowSerializable] attribute and IArrowSerializer&lt;T&gt; interface for compile-time Arrow schema derivation, serialization, and deserialization.</Description>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to worry about net6.0 as it's out of support and we'll probably remove it as a build target after the next release. The inability to use with net472 or netstandard2.0 is a greater loss and it might be worth a quick test to see how hard it would be to add support for those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I'll take a look at your comments over the next few days and follow up.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great and has long been missing.

Can you please fix the merge conflict and the white space that our linter doesn't like? I think the Documentation task failure can be addressed by editing ci/scripts/docs.sh and doing something like

pushd "${source_dir}/src/Apache.Arrow.Serialization"
dotnet build -c Release
popd

before trying to build the documentation.

We should probably also add validation for the release by adding something to dev/release/verify_rc.sh like

reference_package "Apache.Arrow.Serialization" "Apache.Arrow.Serialization.Tests"

but I'm not sure the reference_package will handle the PrivateAssets="all" in the project file. If it doesn't, we could consider figuring that out later after the bulk of the change is checked in.


namespace Apache.Arrow.Serialization.Tests;

public class DiagnosticTests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Debug|x64 = Debug|x64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid adding all these targets. Is there something bitness-specific in these changes?

{
var list = items as IReadOnlyList<T> ?? items.ToList();
if (list.Count == 0)
throw new ArgumentException("Cannot infer schema from empty collection.", nameof(items));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true though? We use the type to infer the schema, not the data. It would be annoying for someone to have to special case an empty list if they want to serialize it.

builders.Add(CreateColumnBuilder(propType, arrowType));
}

var schema = new Schema.Builder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving schema above the foreach and adding the fields directly into the schema builder instead of a temporary list.

<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>net8.0</TargetFramework>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would it take to make this work for .NET 4.7.2? Is that even plausible?

| `float` | `Float32` | |
| `double` | `Float64` | |
| `Half` | `Float16` | |
| `decimal` | `Decimal128(38, 18)` | Configurable via `[ArrowType("decimal128(28, 10)")]` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth pointing out in the documentation that a CLR decimal is not a perfect match for an Arrow decimal.


namespace Apache.Arrow.Serialization.Generator
{
internal enum TypeKind2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a more descriptive name. What about ArrowTypeKind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to serialize POCOs to a Table?

3 participants