-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Table.slice
not updating pandas_metadata
#15178
Comments
Worth noting that performing a In [18]: df.index = pd.RangeIndex(2, 10, 2)
In [19]: table = pa.Table.from_pandas(df)
In [20]: table.schema.pandas_metadata
Out[20]:
{'index_columns': [{'kind': 'range',
'name': None,
'start': 2,
'stop': 10,
'step': 2}],
'column_indexes': [{'name': None,
'field_name': None,
'pandas_type': 'unicode',
'numpy_type': 'object',
'metadata': {'encoding': 'UTF-8'}}],
'columns': [{'name': 'n_legs',
'field_name': 'n_legs',
'pandas_type': 'int64',
'numpy_type': 'int64',
'metadata': None},
{'name': 'animals',
'field_name': 'animals',
'pandas_type': 'unicode',
'numpy_type': 'object',
'metadata': None}],
'creator': {'library': 'pyarrow', 'version': '10.0.1'},
'pandas_version': '1.5.2'}
In [21]: table.slice(0, 2)
Out[21]:
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,4]]
animals: [["Flamingo","Horse"]]
In [22]: table.slice(0, 2).to_pandas()
Out[22]:
n_legs animals
0 2 Flamingo
1 4 Horse
In [23]: df
Out[23]:
n_legs animals
2 2 Flamingo
4 4 Horse
6 5 Brittle stars
8 100 Centipede |
Table.slice
not updating pandas_metadata
Table.slice
not updating pandas_metadata
The pandas metadata is a quite primitive solution initially implemented to ensure correct roundtrip between pandas <-> arrow/parquet. That works for exact roundtrips, but once you do some intermediate operations on the arrow table, this can easily break down (eg you could also change columns), and we currently don't guarantee to update those metadata through operations. So I would tend to label this as "won't-fix". For slice itself, it might be relatively easy to update the pandas metadata to follow this change. But for example for a similar operation, what when you filter the table with some condition? Given that there are so many potential ways the metadata could get out of sync, I am hesitant to special case slicing. When converting with |
Describe the bug, including details regarding any error messages, version, and platform.
Table.slice
API will need to update the index-related metadata correctly inpandas_metadata
:Component(s)
Python
The text was updated successfully, but these errors were encountered: