You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While one of the Arrow promises is that it makes easy to read/write data bigger than memory, it's not immediately obvious from the pyarrow documentation how to deal with memory mapped files.
While most high level functions to read/write formats (pqt, feather, ...) have an easy to guess memory_map=True option, the doc doesn't seem to have any example of how that is meant to work for Arrow format itself. For example how you can do that using RecordBatchFile*.
An addition to the memory mapping section that makes a more meaningful example that reads/writes actual arrow data (instead of plain bytes) would probably be helpful
While one of the Arrow promises is that it makes easy to read/write data bigger than memory, it's not immediately obvious from the pyarrow documentation how to deal with memory mapped files.
The doc hints that you can open files as memory mapped ( https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files ) but then it doesn't explain how to read/write Arrow Arrays or Tables from there.
While most high level functions to read/write formats (pqt, feather, ...) have an easy to guess
memory_map=True
option, the doc doesn't seem to have any example of how that is meant to work for Arrow format itself. For example how you can do that usingRecordBatchFile*
.An addition to the memory mapping section that makes a more meaningful example that reads/writes actual arrow data (instead of plain bytes) would probably be helpful
Reporter: Alessandro Molina / @amol-
Assignee: Alessandro Molina / @amol-
PRs and other links:
Note: This issue was originally created as ARROW-12650. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: