New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C#] Cannot round-trip record batch with PyArrow #27924
Comments
Antoine Pitrou / @pitrou: You should probably use the |
Tanguy Fautre: I've replaced |
Antoine Pitrou / @pitrou: |
Antoine Pitrou / @pitrou: |
Tanguy Fautre: |
Antoine Pitrou / @pitrou: In the short term, though, I think we can relax the following check to allow "null children member" to be synonymous to "empty children array": https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/metadata_internal.cc#L757 @wesm What do you think? |
Antoine Pitrou / @pitrou: |
Tanguy Fautre: |
Tanguy Fautre: Here is a version that just does that. I've just tested it on Ubuntu 20.04 as well: ArrowSharedMemory_20210329.zip. You can run it just by typing |
Antoine Pitrou / @pitrou: |
Wes McKinney / @wesm: |
Antoine Pitrou / @pitrou: |
Eric Erhardt / @eerhardt: I found the change that broke this - 3e71ea0 The change in C# unintentionally went from an "empty" vector of tables to a I think it is time to get C# implementation into the integration tests. Can someone give me a pointer to how to enable that? |
Antoine Pitrou / @pitrou: Integration testing uses an internal tool written in Python named Archery (see here for install instructions: https://arrow.apache.org/docs/developers/archery.html). You'll find the Archery bits related to integration testing in the The C# implementation needs to expose endpoints (command line APIs) for four functionalities:
|
Has anyone ever tried to round-trip a record batch between Arrow C# and PyArrow? I can't get PyArrow to read the data correctly.
For context, I'm trying to do Arrow data-frames inter-process communication between C# and Python using shared memory (local TCP/IP is also an alternative). Ideally, I wouldn't even have to serialise the data and could just share the Arrow in-memory representation directly, but I'm not sure this is even possible with Apache Arrow. Full source code as attachment.
C#
Python
Unfortunately, it fails with the following error:
pyarrow.lib.ArrowInvalid: Expected to read 1330795073 metadata bytes, but only read 1230
.I can see that the memory content starts with
ARROW1\x00\x00\xff\xff\xff\xff\x08\x01\x00\x00\x10\x00\x00\x00
. It seems that using the API calls above, PyArrow reads "ARRO" as the length of the metadata.I assume I'm using the API incorrectly. Has anyone got a working example?
Reporter: Tanguy Fautre
Assignee: Antoine Pitrou / @pitrou
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as ARROW-12100. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: