Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17360: [Python] Reorder columns in pyarrow.feather.read_table #14493

Closed
wants to merge 2 commits into from

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Oct 25, 2022

Before this PR:

table = pa.table({"a": [1, 2, 3], "b": ["a", "b", "c"]})
orc.write_table(table, 'example.orc')
orc.read_table('example.orc', columns=['b', 'a'])
# pyarrow.Table
# a: int64
# b: string
# ----
# a: [[1,2,3]]
# b: [["a","b","c"]]

After this PR:

table = pa.table({"a": [1, 2, 3], "b": ["a", "b", "c"]})
orc.write_table(table, 'example.orc')
orc.read_table('example.orc', columns=['b', 'a'])
# pyarrow.Table
# b: string
# a: int64
# ----
# b: [["a","b","c"]]
# a: [[1,2,3]]

@github-actions
Copy link

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jorisvandenbossche
Copy link
Member

Hmm, the failures are actually related here, and I am not directly sure how to solve this .. We allow nested columns to be selected using a "dotted path", but that doesn't work for select()

@AlenkaF
Copy link
Member Author

AlenkaF commented Oct 27, 2022

Yeah, that's unfortunate. select() with "dotted path" doesn't work for pyarrow.Table but works for ORCF.read():

>       result4 = orc_file.read(columns=["struct.middle.inner"])

opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_orc.py:584: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/orc.py:189: in read
    table = table.select(columns)
pyarrow/table.pxi:3053: in pyarrow.lib.Table.select
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   KeyError: 'Field "struct.middle.inner" does not exist in table schema'

Due to that the easy solution for reordering the columns isn't feasible anymore. Will close this PR and make another one, where I will add information to the docstrings that in orc.read_table() we always follow the order of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants