Skip to content

[format] Support reading Parquet TIMESTAMP(NANOS) as timestamp(9)#7845

Merged
JingsongLi merged 1 commit into
apache:masterfrom
leaves12138:fix-parquet-timestamp-nanos-reader
May 13, 2026
Merged

[format] Support reading Parquet TIMESTAMP(NANOS) as timestamp(9)#7845
JingsongLi merged 1 commit into
apache:masterfrom
leaves12138:fix-parquet-timestamp-nanos-reader

Conversation

@leaves12138
Copy link
Copy Markdown
Contributor

Purpose

Native Parquet writers such as Arrow can encode TIMESTAMP(9) as INT64 with the TIMESTAMP(NANOS) logical annotation. This is valid Parquet, but Paimon's vectorized Parquet reader did not use the logical timestamp unit when decoding INT64 timestamp columns, so nanosecond timestamps could not be read as Paimon timestamp(9) correctly.

Changes

  • Decode INT64 Parquet timestamps according to their logical time unit (MILLIS, MICROS, or NANOS).
  • Convert TIMESTAMP(NANOS) schema annotations to Paimon timestamp precision 9.
  • Add regression coverage for top-level timestamp(9) and array<timestamp(9)> written as Parquet INT64 TIMESTAMP(NANOS).

Tests

  • mvn -pl paimon-format -DskipTests compile
  • mvn -pl paimon-format -Pfast-build -Dtest=ParquetReadWriteTest,ParquetSchemaConverterTest test

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit b97425a into apache:master May 13, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants