New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Handle ChunkedArray and Table in C data interface #24492
Comments
Antoine Pitrou / @pitrou: Also cc @wesm for advice. |
Wes McKinney / @wesm: So the iteration API looks something like this: struct ArrowArrayStream {
void (*get_schema)(struct ArrowSchema*);
int (*get_next)(struct ArrowArray*);
void (*release)(struct ArrowArrayStream*);
void* private_data;
}; Consider a canonical use case: a database query returning a sequence of RecordBatches. You could say that this interface should be redefined and redefined on an application by application basis but that seems rather tedious to me. |
Neal Richardson / @nealrichardson: table.export_schema()
for col in table.chunked_arrays():
for a in col.chunks():
a.export_array() and reassemble the Table. Looking at the R and Python code we have now that does the Array and RecordBatch work, I'm not sure how simple that would be to do, and I wonder if there's a better way. |
Antoine Pitrou / @pitrou: @nealrichardson TableBatchReader gives you a stream of record batches from a Table. It's reasonably efficient as well (no data is copied). |
Wes McKinney / @wesm: In any case, perhaps we could propose a standard interface to add to the ABI for this |
Antoine Pitrou / @pitrou: |
Wes McKinney / @wesm: |
Neal Richardson / @nealrichardson: |
Currently the C data interface does Array and RecordBatch, but we're also going to need ChunkedArray and Table.
Reporter: Neal Richardson / @nealrichardson
Assignee: Neal Richardson / @nealrichardson
PRs and other links:
Note: This issue was originally created as ARROW-8301. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: