-
Notifications
You must be signed in to change notification settings - Fork 43
[Python] Useful scope of the python bindings? #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is awesome! The scope of the Python bindings certainly doesn't have to be the same as the scope of the R bindings; however I'll offer up what I had in mind for the scope of the R bindings in case that helps with the conversation. Basically:
For me, the R bindings are all about facilitating the development of libraries that expose data sources as Arrays or ArrayStreams. The dream is that Arrow is the in-memory standard for this kind of thing; however, without a lightweight way to convert those objects to a data frame or numpy array there is not much incentive to do so. The nanoarrow C library makes it relatively easy to create |
Yes, that was the first thing I was planning to do, and opened a PR with an initial version of this for Array -> #62 (will have to do something similar for ArrayStream). |
Yeah, the fact that |
We now have Python bindings with some specific bullets on the roadmap ( https://arrow.apache.org/nanoarrow/main/roadmap.html#python-bindings )...feel free to reopen an issue or PR modifying those! |
Is there also a section of the docs that says what's in scope for the python bindings? As an example use case, pyarrow is so large that it's hard to use in Lambda and stay under the 250MB limit. So for example those docs mention that IPC isn't currently exposed, but is it intended/in scope for it to be exposed in the future? |
I think it's a little bit up to who has time to do all the implementing...the things I put on the roadmap are a few things that have come up (but until there's some concrete plan/resources allocated I was going to keep a lid on the issues). There's certainly no technical limitation to exposing the IPC reader! |
I opened #52 as a start for Python bindings for nanoarrow, currently just a package scaffold.
I am planning to subsequently add basic introspection/consumption of ArrowArray structs and basic conversions of buffers/arrays to numpy.
Some more elaborate notes / thoughts can be found https://docs.google.com/document/d/1119poLwF0r4AN19dGt9U8vLM07zEVdlRpP2omL1PXdc/edit?usp=sharing
More in general, it would be interesting to discuss or get feedback on what functionality would be useful / needed for other potential use cases.
The text was updated successfully, but these errors were encountered: