[Python] handle timestamp type in parquet file for compatibility with older HiveQL

Hi there,


I face an issue when I write a parquet file by PyArrow.

In the older version of Hive, it can only recognize the timestamp type stored in INT96, so I use table.write_to_data with `use_deprecated timestamp_int96_timestamps=True` option to save the parquet file. But the HiveQL will skip conversion when the metadata of parquet file is not created_by "parquet-mr".

[hive/ParquetRecordReaderBase.java at f1ff99636a5546231336208a300a114bcf8c5944 · apache/hive (github.com)](https://github.com/apache/hive/blob/f1ff99636a5546231336208a300a114bcf8c5944/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L137-L139)

 

So I have to save the timestamp columns with timezone info(pad to UTC+8).

But when pyarrow.parquet read from a dir which contains parquets created by both PyArrow and parquet-mr, Arrow.Table will ignore the timezone info for parquet-mr files.

 

Maybe PyArrow can expose the created_by option in pyarrow({**}prefer{**}, parquet::WriterProperties::created_by is available in the C++ ).

Or handle the timestamp type with timezone which files created by parquet-mr?

 

Maybe related to https://issues.apache.org/jira/browse/ARROW-14422

**Reporter**: [nero](https://issues.apache.org/jira/browse/ARROW-15492)

<sub>**Note**: *This issue was originally created as [ARROW-15492](https://issues.apache.org/jira/browse/ARROW-15492). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] handle timestamp type in parquet file for compatibility with older HiveQL #30967

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] handle timestamp type in parquet file for compatibility with older HiveQL #30967

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions