Support IPC Message custom_metadata in ArrowStreamWriter and ArrowStreamReader Body#283
Support IPC Message custom_metadata in ArrowStreamWriter and ArrowStreamReader Body#283cmettler wants to merge 3 commits intoapache:mainfrom
Conversation
The Arrow IPC format supports custom_metadata on each Message (RecordBatch), but the C# implementation currently ignores it on read. This adds a LastBatchCustomMetadata property to ArrowStreamReader that exposes the key-value pairs from the most recently read batch's Message. This is the read-side counterpart to pyarrow's read_next_batch_with_custom_metadata() and enables use cases like RPC frameworks that embed method routing or log metadata in per-batch custom_metadata fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds WriteRecordBatch(batch, customMetadata) and its async counterpart to ArrowStreamWriter, allowing callers to attach per-message custom_metadata key-value pairs when writing IPC streams. The Arrow IPC flatbuf Message already defines a custom_metadata field, and pyarrow supports writing it via write_batch(batch, custom_metadata). This brings the C# writer to parity. Includes round-trip tests verifying custom_metadata survives write → read through ArrowStreamWriter/ArrowStreamReader. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Is there a way we could validate the interoperability by using pythonnet the same way we use it to validate the C API in test/Apache.Arrow.Tests/CDataInterfacePythonTests.cs? |
Sure, I'll take a look at how you set it up in CDataInterfacePythonTests.cs |
- C# writes IPC stream with custom_metadata → Python reads via read_next_batch_with_custom_metadata() - Python writes IPC stream with custom_metadata → C# reads via LastBatchCustomMetadata
|
Thanks! It looks like pythonnet doesn't like being reinitialized in the same process after it's already been disposed. As we now have two test classes trying to use it, we'll need to share the initialization of the test fixture. If AI is to be believed, this can be done by defining a test collection and moving the ownership of the fixture to the collection -- then applying that collection to each of the test classes using it. I can do that as a separate change if you like. |
|
The other test failures here are a bit of a mystery to me so far. I can produce them locally but haven't yet been able to explain them. |
Okay @cmettler, I debugged two copies side-by-side and found the problem. There are three classes that derive from |
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Thanks for this work! Changes are needed to the tests to accommodate the limitations of pythonnet and to the product code to handle ArrowFileWriter and FlightDataStream.
Closes #282 (comment)