ARROW-8301: [R] Handle ChunkedArray and Table in C data interface by nealrichardson · Pull Request #7648 · apache/arrow

nealrichardson · 2020-07-06T19:35:39Z

In terms of number of lines of code, this wasn't bad, though I don't know how efficient these methods are. Maybe there's a better way

The one thing that would be lost is any metadata attached to the Table schema because the Table is reconstructed from its ChunkedArrays without schema. I wonder if we could export the Schema on its own--the existing _export_to_c/_import_from_c methods take both an array pointer and a schema pointer.

github-actions · 2020-07-06T19:46:39Z

https://issues.apache.org/jira/browse/ARROW-8301

pitrou · 2020-07-07T09:34:36Z

r/R/python.R

+  colnames <- maybe_py_to_r(x$column_names)
+  r_cols <- maybe_py_to_r(x$columns)
+  names(r_cols) <- colnames
+  Table$create(!!!r_cols)


Out of curiosity, what does the !!! mean?

It's like doing **kwargs

pitrou · 2020-07-07T09:37:56Z

The main source of potential inefficiency here is that the Schema is exported/imported once for each chunk. We may or may not case immediately about this.

Also, note that you can transfer a Table as a sequence of RecordBatches, rather than a sequence of ChunkedArrays. If you have many many columns, it would probably be more efficient.

(in C++, you can use TableBatchReader to export a Table as RecordBatches, and Table::FromRecordBatches to recreate a Table from the RecordBatches)

pitrou · 2020-07-07T09:38:14Z

(otherwise, the code here looks ok, but I'm not a R expert at all :-))

nealrichardson · 2020-07-07T15:08:42Z

Also, note that you can transfer a Table as a sequence of RecordBatches, rather than a sequence of ChunkedArrays.

But Tables are a sequence of ChunkedArrays, right? Those ChunkedArrays may not be chunked the same way, and dictionaries within a ChunkedArray may not be the same, etc. Can TableBatchReader handle those cases robustly and with zero-copy?

pitrou · 2020-07-07T15:47:34Z

It should :-)

pitrou · 2020-07-07T15:48:30Z

However, it may use the offset member of arrays, which might not work with the C Data Interface...

nealrichardson · 2020-07-07T17:04:27Z

@wesm do you have an opinion on this?

wesm · 2020-07-07T17:13:08Z

As long as the implementation details / semantics aren't exposed (they don't seem to be), this seems sufficient to me to have the feature established, and we an always return later and make it more efficient (or replace it with the iterator mechanism that will hopefully have some discussion on the ML)

nealrichardson · 2020-07-07T17:55:21Z

Ok then I'll merge this and we can come back and improve it later.

Add r_to_py/py_to_r for ChunkedArray and Table

63e2327

nealrichardson requested a review from pitrou July 6, 2020 19:35

pitrou reviewed Jul 7, 2020

View reviewed changes

nealrichardson closed this in ca9342f Jul 7, 2020

nealrichardson deleted the r-py-tables branch July 7, 2020 17:56

asfimport mentioned this pull request Jul 7, 2020

[R] Handle ChunkedArray and Table in C data interface #24492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-8301: [R] Handle ChunkedArray and Table in C data interface#7648

ARROW-8301: [R] Handle ChunkedArray and Table in C data interface#7648
nealrichardson wants to merge 1 commit intoapache:masterfrom
nealrichardson:r-py-tables

nealrichardson commented Jul 6, 2020

Uh oh!

github-actions bot commented Jul 6, 2020

Uh oh!

pitrou Jul 7, 2020

Uh oh!

nealrichardson Jul 7, 2020

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

nealrichardson commented Jul 7, 2020

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

nealrichardson commented Jul 7, 2020

Uh oh!

wesm commented Jul 7, 2020

Uh oh!

nealrichardson commented Jul 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nealrichardson commented Jul 6, 2020

Uh oh!

github-actions bot commented Jul 6, 2020

Uh oh!

pitrou Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

nealrichardson Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

nealrichardson commented Jul 7, 2020

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

pitrou commented Jul 7, 2020

Uh oh!

nealrichardson commented Jul 7, 2020

Uh oh!

wesm commented Jul 7, 2020

Uh oh!

nealrichardson commented Jul 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants