Skip to content

Support IPC Message custom_metadata in ArrowStreamWriter and ArrowStreamReader Body#283

Open
cmettler wants to merge 3 commits intoapache:mainfrom
cmettler:feature/ipc-message-custom-metadata
Open

Support IPC Message custom_metadata in ArrowStreamWriter and ArrowStreamReader Body#283
cmettler wants to merge 3 commits intoapache:mainfrom
cmettler:feature/ipc-message-custom-metadata

Conversation

@cmettler
Copy link
Contributor

Christoph Mettler and others added 2 commits March 8, 2026 20:14
The Arrow IPC format supports custom_metadata on each Message
(RecordBatch), but the C# implementation currently ignores it on read.
This adds a LastBatchCustomMetadata property to ArrowStreamReader that
exposes the key-value pairs from the most recently read batch's Message.

This is the read-side counterpart to pyarrow's
read_next_batch_with_custom_metadata() and enables use cases like
RPC frameworks that embed method routing or log metadata in per-batch
custom_metadata fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds WriteRecordBatch(batch, customMetadata) and its async counterpart
to ArrowStreamWriter, allowing callers to attach per-message
custom_metadata key-value pairs when writing IPC streams.

The Arrow IPC flatbuf Message already defines a custom_metadata field,
and pyarrow supports writing it via write_batch(batch, custom_metadata).
This brings the C# writer to parity.

Includes round-trip tests verifying custom_metadata survives
write → read through ArrowStreamWriter/ArrowStreamReader.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@CurtHagenlocher
Copy link
Contributor

Is there a way we could validate the interoperability by using pythonnet the same way we use it to validate the C API in test/Apache.Arrow.Tests/CDataInterfacePythonTests.cs?

@cmettler
Copy link
Contributor Author

Is there a way we could validate the interoperability by using pythonnet the same way we use it to validate the C API in test/Apache.Arrow.Tests/CDataInterfacePythonTests.cs?

Sure, I'll take a look at how you set it up in CDataInterfacePythonTests.cs

- C# writes IPC stream with custom_metadata → Python reads via
  read_next_batch_with_custom_metadata()
- Python writes IPC stream with custom_metadata → C# reads via
  LastBatchCustomMetadata
@CurtHagenlocher
Copy link
Contributor

Thanks! It looks like pythonnet doesn't like being reinitialized in the same process after it's already been disposed. As we now have two test classes trying to use it, we'll need to share the initialization of the test fixture. If AI is to be believed, this can be done by defining a test collection and moving the ownership of the fixture to the collection -- then applying that collection to each of the test classes using it.

I can do that as a separate change if you like.

@CurtHagenlocher
Copy link
Contributor

The other test failures here are a bit of a mystery to me so far. I can produce them locally but haven't yet been able to explain them.

@CurtHagenlocher
Copy link
Contributor

CurtHagenlocher commented Mar 14, 2026

The other test failures here are a bit of a mystery to me so far. I can produce them locally but haven't yet been able to explain them.

Okay @cmettler, I debugged two copies side-by-side and found the problem. There are three classes that derive from ArrowStreamWriter; ArrowFileWriter overrides WriteRecordBatch and WriteRecordBatchAsync and FlightDataStream overrides WriteMessageAsync. These will need to be updated to take the new overloads from the base class into account.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work! Changes are needed to the tests to accommodate the limitations of pythonnet and to the product code to handle ArrowFileWriter and FlightDataStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support IPC Message custom_metadata in ArrowStreamWriter and ArrowStreamReader Body

2 participants