[Python] Extra metadata gets added after multiple conversions between pd.DataFrame and pa.Table

We have a unit test that verifies that loading a dataframe from a .parq file and saving it back with no changes produces the same result as the original file. It started failing with pyarrow 0.8.0.

After digging into it, I discovered that after the first conversion from pd.DataFrame to pa.Table, the table contains the following metadata (among other things):

```Java

"column_indexes": [{"metadata": null, "field_name": null, "name": null, "numpy_type": "object", "pandas_type": "bytes"}]
```

However, after converting it to pd.DataFrame and back into a pa.Table for the second time, the metadata gets an encoding field:

```Java

"column_indexes": [{"metadata": {"encoding": "UTF-8"}, "field_name": null, "name": null, "numpy_type": "object", "pandas_type": "unicode"}]
```

See the attached file for a test case.

So specifically, it appears that dataframe->table->dataframe->table conversion produces a different result from just dataframe->table - which I think is unexpected.


**Reporter**: [Dima Ryazanov](https://issues.apache.org/jira/browse/ARROW-1940) / @dimaryaz
**Assignee**: [Phillip Cloud](https://issues.apache.org/jira/browse/ARROW-1940) / @cpcloud
#### Original Issue Attachments:
- [fail.py](https://issues.apache.org/jira/secure/attachment/12902985/fail.py)
#### PRs and other links:
- [GitHub Pull Request #1728](https://github.com/apache/arrow/pull/1728)

<sub>**Note**: *This issue was originally created as [ARROW-1940](https://issues.apache.org/jira/browse/ARROW-1940). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Extra metadata gets added after multiple conversions between pd.DataFrame and pa.Table #17930

Original Issue Attachments:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] Extra metadata gets added after multiple conversions between pd.DataFrame and pa.Table #17930

Description

Original Issue Attachments:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions