You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In practice, IO interfaces in PyArrow will need to be bidirectional
Exposing internal IO interfaces written purely in C++ to Python users as file-like objects
Exposing Python file-like objects to the C++ IO subsystem
To do this efficiently, we may want to introduce an arrow::Buffer subclass that manages the lifetime of a PyBytes object in a GIL-safe way (i.e., on destruction, the GIL is acquired and the object's refcount is decremented). We can still implement a Read method that copies bytes into some other buffer, after which the PyBytes is immediately destroyed.
Outside of these byte buffer management issues, wrapping a file-like object (having read() -> bytes, seek(), tell(), and other basic file methods) is fairly straightforward, and will allow any of the current or upcoming IO adapters to read either from native classes (file system, HDFS, etc.) or arbitrary Python streams.
To give a concrete example: consider the output of a GET http request – this can be put in a io.BytesIO object and then treated as a first class citizen alongside the native (C++) IO classes.
In practice, IO interfaces in PyArrow will need to be bidirectional
Exposing internal IO interfaces written purely in C++ to Python users as file-like objects
Exposing Python file-like objects to the C++ IO subsystem
To do this efficiently, we may want to introduce an arrow::Buffer subclass that manages the lifetime of a PyBytes object in a GIL-safe way (i.e., on destruction, the GIL is acquired and the object's refcount is decremented). We can still implement a Read method that copies bytes into some other buffer, after which the PyBytes is immediately destroyed.
Outside of these byte buffer management issues, wrapping a file-like object (having read() -> bytes, seek(), tell(), and other basic file methods) is fairly straightforward, and will allow any of the current or upcoming IO adapters to read either from native classes (file system, HDFS, etc.) or arbitrary Python streams.
To give a concrete example: consider the output of a GET http request – this can be put in a
io.BytesIO
object and then treated as a first class citizen alongside the native (C++) IO classes.Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm
Note: This issue was originally created as ARROW-228. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: