[Python] Incorrect inferred schema from pandas dataframe with length 0.

We use pandas(with pyarrow engine) to write out parquet files and those outputs will be consumed by other applications such as Java apps using org.apache.parquet.hadoop.ParquetFileReader. We found that some empty dataframes would get incorrect schema for string columns in other applications. After some investigation, we narrow down the issue to the schema inference by pyarrow:
```java

In [1]: import pandas as pd
In [2]: df = pd.DataFrame([['a', 1, 1.0]], columns=['a', 'b', 'c'])
In [3]: import pyarrow as pa
In [4]: pa.Schema.from_pandas(df)
 Out[4]:
 a: string
 b: int64
 c: double
 -- schema metadata --
 pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 562
In [5]: pa.Schema.from_pandas(df.head(0))
 Out[5]:
 a: null
 b: int64
 c: double
 -- schema metadata --
 pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 560
In [6]: pa._version_
 Out[6]: '5.0.0'
```
 As you can see, the column 'a' which should be string type if inferred as null type and is converted to int32 while writing to parquet files.

Is this an expected behavior? Or do we have any workaround for this issue? Could anyone take a look please. Thanks!

**Environment**: OS: Windows 10, CentOS 7
**Reporter**: [Yuan Zhou](https://issues.apache.org/jira/browse/ARROW-14488)

<sub>**Note**: *This issue was originally created as [ARROW-14488](https://issues.apache.org/jira/browse/ARROW-14488). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Incorrect inferred schema from pandas dataframe with length 0. #30046

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Python] Incorrect inferred schema from pandas dataframe with length 0. #30046

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions