New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Order of columns in pyarrow.feather.read_table #32634
Comments
Alenka Frim / @AlenkaF: I would say this is not the expected behaviour. If we look at the import pyarrow as pa
table = pa.table({"a": [1, 2, 3], "b": ["a", "b", "c"]})
import pyarrow.parquet as pq
pq.write_table(table, 'example.parquet')
pq.read_table('example.parquet', columns=['b', 'a'])
# pyarrow.Table
# b: string
# a: int64
# ----
# b: [["a","b","c"]]
# a: [[1,2,3]]
import pyarrow.feather as feather
feather.write_feather(table, 'example_feather')
feather.read_table('example_feather', columns=['b', 'a'])
# pyarrow.Table
# b: string
# a: int64
# ----
# b: [["a","b","c"]]
# a: [[1,2,3]] FWIU looking at the code in pyarrow/_orc.pyx and arrow/adapters/orc/adapter.cc I think the behaviour comes from Apache ORC and can therefore be open as an issue there (about following order in the original schema). Nevertheless there are two options we have to make this work correctly:
|
Joris Van den Bossche / @jorisvandenbossche: |
Alenka Frim / @AlenkaF: > result4 = orc_file.read(columns=["struct.middle.inner"])
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_orc.py:584:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/orc.py:189: in read
table = table.select(columns)
pyarrow/table.pxi:3053: in pyarrow.lib.Table.select
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E KeyError: 'Field "struct.middle.inner" does not exist in table schema' To close this issue I will add information to the A workaround for a user with ordering issue:
|
Joris Van den Bossche / @jorisvandenbossche: |
xref pandas-dev/pandas#47944
Reporter: Matthew Roeschke / @mroeschke
Assignee: Alenka Frim / @AlenkaF
PRs and other links:
Note: This issue was originally created as ARROW-17360. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: