Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects #15573

Closed
asfimport opened this issue Jun 25, 2016 · 1 comment

Comments

@asfimport
Copy link

In practice, IO interfaces in PyArrow will need to be bidirectional

  • Exposing internal IO interfaces written purely in C++ to Python users as file-like objects

  • Exposing Python file-like objects to the C++ IO subsystem

To do this efficiently, we may want to introduce an arrow::Buffer subclass that manages the lifetime of a PyBytes object in a GIL-safe way (i.e., on destruction, the GIL is acquired and the object's refcount is decremented). We can still implement a Read method that copies bytes into some other buffer, after which the PyBytes is immediately destroyed.

Outside of these byte buffer management issues, wrapping a file-like object (having read() -> bytes, seek(), tell(), and other basic file methods) is fairly straightforward, and will allow any of the current or upcoming IO adapters to read either from native classes (file system, HDFS, etc.) or arbitrary Python streams.

To give a concrete example: consider the output of a GET http request – this can be put in a io.BytesIO object and then treated as a first class citizen alongside the native (C++) IO classes.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Note: This issue was originally created as ARROW-228. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
This was resolved as part of ARROW-302 c7e6a07#diff-cc328f934b174e9384e6c81cf4942046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants