ARROW-8301: [R] Handle ChunkedArray and Table in C data interface#7648
ARROW-8301: [R] Handle ChunkedArray and Table in C data interface#7648nealrichardson wants to merge 1 commit intoapache:masterfrom
Conversation
| colnames <- maybe_py_to_r(x$column_names) | ||
| r_cols <- maybe_py_to_r(x$columns) | ||
| names(r_cols) <- colnames | ||
| Table$create(!!!r_cols) |
There was a problem hiding this comment.
Out of curiosity, what does the !!! mean?
There was a problem hiding this comment.
It's like doing **kwargs
|
The main source of potential inefficiency here is that the Schema is exported/imported once for each chunk. We may or may not case immediately about this. Also, note that you can transfer a Table as a sequence of RecordBatches, rather than a sequence of ChunkedArrays. If you have many many columns, it would probably be more efficient. (in C++, you can use |
|
(otherwise, the code here looks ok, but I'm not a R expert at all :-)) |
But Tables are a sequence of ChunkedArrays, right? Those ChunkedArrays may not be chunked the same way, and dictionaries within a ChunkedArray may not be the same, etc. Can TableBatchReader handle those cases robustly and with zero-copy? |
|
It should :-) |
|
However, it may use the offset member of arrays, which might not work with the C Data Interface... |
|
@wesm do you have an opinion on this? |
|
As long as the implementation details / semantics aren't exposed (they don't seem to be), this seems sufficient to me to have the feature established, and we an always return later and make it more efficient (or replace it with the iterator mechanism that will hopefully have some discussion on the ML) |
|
Ok then I'll merge this and we can come back and improve it later. |
In terms of number of lines of code, this wasn't bad, though I don't know how efficient these methods are. Maybe there's a better way
The one thing that would be lost is any metadata attached to the Table schema because the Table is reconstructed from its ChunkedArrays without schema. I wonder if we could export the Schema on its own--the existing
_export_to_c/_import_from_cmethods take both an array pointer and a schema pointer.