Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nanoarrow instead of arrow #1

Closed
JosiahParry opened this issue Nov 22, 2023 · 14 comments
Closed

Use nanoarrow instead of arrow #1

JosiahParry opened this issue Nov 22, 2023 · 14 comments

Comments

@JosiahParry
Copy link
Owner

I would probably encourage writers of Rust extensions to go through nanoarrow (e.g., via as_nanoarrow_array_stream() or as_nanoarrow_array()) rather than arrow directly.

@paleolimbot

@JosiahParry
Copy link
Owner Author

Probably related: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html

Prior to this, many libraries simply provided export to PyArrow data structures, using the _import_from_c and _export_from_c methods. However, this always required PyArrow to be installed. In addition, those APIs could cause memory leaks if handled improperly.

@paleolimbot
Copy link

Yes, that's very recent (just being implemented!). The idea is that libraries producing an array only have to produce something that implements __arrow_c_array__ instead of an actual pyarrow.Array. On the pyarrow side, anything that expected an Array will (eventually) be able to accept anything that implements __arrow_c_array__ by checking hasattr(x, "__arrow_c_array__").

In R we don't have the ability to do hasattr()...the closest we can do is define generics. The as_nanoarrow_array() generic is easier for an arbitrary library to implement than arrow::as_arrow_array() because nanoarrow is easier to depend on (and it would be a required dependency because nanoarrow is where the S3 method is defined). The adbcdrivermanager package takes advantage of this...you can do write_adbc(<anything that implements as_nanoarrow_array_stream()>, con) and S3 dispatch takes care of the rest.

@eitsupi
Copy link
Contributor

eitsupi commented Nov 22, 2023

Ref: pola-rs/r-polars#5

@JosiahParry
Copy link
Owner Author

ToArrowRobj is now implemented using {nanoarrow} instead of {arrow}

It is implemented for:

  • DataType
  • ArrayData
  • PrimitiveArray
  • Field
  • Schema
  • RecordBatch

It is less clear how to handle FromArrowRobj. Right now it expects arrow class objects. The approach I am leaning towards right now is to check the class of the object and process accordingly.

Meaning the arrow class objects DataType, Field, Schema, RecordBatch, ArrayData will be processed into their correct arrow-rs type. nanoarrow_array will be processed into ArrayData and nanodata_schema can be processed into Field, Schema and DataType. I think nanoarrow_stream will need to be processed into RecordBatchReader i think..

@paleolimbot
Copy link

To/From thing is still new to me, but if I were in Rust and I wanted an arrow DataType, Field, or Schema from arbitrary user SEXP input, I'd want to call as_nanoarrow_schema() on the SEXP and then do the FFI import based on the C object. I think the same pattern applies for ArrayData...I'm less clear what the arrow-rs equivalents are of Table and ChunkedArray, but those would use as_nanoarrow_array_stream() (as would RecordBatchReader).

That will get you all Arrow objects for free (because as_nanoarrow_XXX() are implemented for them already) plus any objects that have as_nanoarrow_array() methods defined in other packages (e.g., sfc objects as of five minutes ago in geoarrow/geoarrow-c/r!)

@JosiahParry
Copy link
Owner Author

To my knowledge there is no concept of a Table or a ChunkedArrow in arrow-rs as of yet. The RecordBatch serves the purpose of the Table.


Another question if you feel so kind: getting an arrow array using {arrow} isnt so bad with the export_to_c() function which takes pointers to a schema and an array and moves them (i think thats what is happening).

Using nanoarrow, i'm not so sure how to move the single pointer of the array into schema + array (or maybe that just doesnt happen?)

@eitsupi
Copy link
Contributor

eitsupi commented Nov 23, 2023

Recently, the polars package has started using the R! macro to execute as_* functions on the R side and then load Arrow objects on the Rust side.
With this method, we don't need a match arm on the Rust side, just define the S3 method for as_nanoarrow_array_stream on the R side, so isn't it simpler and has a wider range of support?

@paleolimbot
Copy link

I think nanoarrow::nanoarrow_pointer_export(<the_nanoarrow_object>, <the address of the arrow-rs FFI object as a string>) is what you want!

To my knowledge there is no concept of a Table or a ChunkedArrow in arrow-rs as of yet.

Good to know! It's a bit of a bummer...the ability to leave chunks as they are is often helpful (but not something you have to deal with now 🙂 )

@paleolimbot
Copy link

Oh, and for an array you can get the schema from nanoarrow::infer_nanoarrow_schema() 🙂 .

@JosiahParry
Copy link
Owner Author

I think nanoarrow::nanoarrow_pointer_export(<the_nanoarrow_object>, <the address of the arrow-rs FFI object as a string>) is what you want!

Yeah, this did the trick! It turns out that the arrow-rs FFI module requires a schema. Those aren't present on the array so I used infer_nanoarrow_schema() and also exported that pointer.

@JosiahParry
Copy link
Owner Author

Jinx

@JosiahParry
Copy link
Owner Author

@eitsupi If i understand correctly, that's exactly what I'm aiming for here! There should be no matching necessary!

@JosiahParry
Copy link
Owner Author

@eitsupi
I used the DBI example (thank you!!!!!) in the docs. Does this look like what you're after?
https://josiahparry.github.io/arrow-extendr/arrow_extendr/index.html


aside: closing this issue since it now uses nanoarrow in Rust -> R but still allowing arrow -> Rust as well as nanoarrow -> Rust

@eitsupi
Copy link
Contributor

eitsupi commented Nov 23, 2023

I used the DBI example (thank you!!!!!) in the docs. Does this look like what you're after?\nhttps://josiahparry.github.io/arrow-extendr/arrow_extendr/index.html

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants