You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ARROW-13572 (#10991) added basic support for ORC file format in the Datasets API, but the reader still reads all columns regardless of the ScanOptions. Since ORC is a columnar format that supports reading only specific fields, we can optimize this step.
The tricky part is to convert the field name of the Arrow schema to the index in the ORC schema. Currently, this logic is included in the Python bindings (
ARROW-13572 (#10991) added basic support for ORC file format in the Datasets API, but the reader still reads all columns regardless of the ScanOptions. Since ORC is a columnar format that supports reading only specific fields, we can optimize this step.
The tricky part is to convert the field name of the Arrow schema to the index in the ORC schema. Currently, this logic is included in the Python bindings (
arrow/python/pyarrow/orc.py
Lines 36 to 59 in 5ca62b9
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche
PRs and other links:
Note: This issue was originally created as ARROW-13797. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: