[SPARK-44988][SQL] Support reading Parquet TIMESTAMP(NANOS,false) #53221

AbinayaJayaprakasam · 2025-11-25T20:33:48Z

Convert TIMESTAMP(NANOS,*) to LongType regardless of nanosAsLong config to allow reading Parquet files with nanosecond precision timestamps.

What changes were proposed in this pull request?

Simplified the TIMESTAMP(NANOS) handling in ParquetSchemaConverter to always convert to LongType, removing the nanosAsLong condition check that caused TIMESTAMP(NANOS,false) files to be unreadable.

Why are the changes needed?

SPARK-40819 added spark.sql.legacy.parquet.nanosAsLong as a workaround for TIMESTAMP(NANOS,true), but:

Only worked for TIMESTAMP(NANOS,true), not for TIMESTAMP(NANOS,false)
Required users to know about an obscure internal config flag
Still required manual casting from Long to Timestamp

This fix makes all NANOS timestamps readable by default. Since Spark cannot fully support nanosecond precision in its type system, converting to LongType preserves precision while allowing files to be read.

Does this PR introduce any user-facing change?

Yes - Parquet files with TIMESTAMP(NANOS,*) are now readable by default without configuration. Values are read as LongType (nanoseconds since epoch). Users can convert to timestamp if needed: (col('nanos') / 1e9).cast('timestamp')

How was this patch tested?

Updated ParquetSchemaSuite test expectations
All tests in ParquetSchemaSuite pass
Manually tested with TIMESTAMP(NANOS,false) Parquet file generated via PyArrow

Was this patch authored or co-authored using generative AI tooling?

No

Convert TIMESTAMP(NANOS,*) to LongType regardless of nanosAsLong config to allow reading Parquet files with nanosecond precision timestamps. ### What changes were proposed in this pull request? Simplified the TIMESTAMP(NANOS) handling in ParquetSchemaConverter to always convert to LongType, removing the nanosAsLong condition check that caused TIMESTAMP(NANOS,false) files to be unreadable. ### Why are the changes needed? SPARK-40819 added spark.sql.legacy.parquet.nanosAsLong as a workaround for TIMESTAMP(NANOS,true), but: - Only worked for TIMESTAMP(NANOS,true), not for TIMESTAMP(NANOS,false) - Required users to know about an obscure internal config flag - Still required manual casting from Long to Timestamp This fix makes all NANOS timestamps readable by default. Since Spark cannot fully support nanosecond precision in its type system, converting to LongType preserves precision while allowing files to be read. ### Does this PR introduce any user-facing change? Yes - Parquet files with TIMESTAMP(NANOS,*) are now readable by default without configuration. Values are read as LongType (nanoseconds since epoch). Users can convert to timestamp if needed: (col('nanos') / 1e9).cast('timestamp') ### How was this patch tested? - Updated ParquetSchemaSuite test expectations (lines 1112-1121) - All 110 tests in ParquetSchemaSuite pass - Manually tested with TIMESTAMP(NANOS,false) Parquet file generated via PyArrow

AbinayaJayaprakasam · 2025-11-25T20:47:00Z

What problem does this solve
Parquet files with TIMESTAMP(NANOS,false) exist and are completely unreadable
SPARK-40819 which only fixed TIMESTAMP(NANOS,true) with a config flag
No workaround exists for users

Testing procedure :
Step 1: Generated a test parquet file

Step 2: Read it with pyspark

Step 3: Before fix :

Step 4: after fix

Test coverage
Updated existing test: ParquetSchemaSuite -Changed test expectation from "error" to "success with LongType"

Behavior Matrix

Scenario	Before	After	Breaking?
NANOS + nanosAsLong=true	LongType	LongType	No
NANOS + nanosAsLong=false	ERROR	LongType	No (fix!)
MICROS/MILLIS timestamps	TimestampType	TimestampType	No

AbinayaJayaprakasam · 2025-11-26T07:39:21Z

All build failures are due to CI infrastructure issues during the "Free up disk space"
setup step (exit code 100 - package download failures from Ubuntu mirrors).

The failures occurred before code compilation and tests could run, as evidenced by:

No test log files generated
No test result files uploaded
All failures in the same setup phase

This is unrelated to the code changes.So did a dummy commit [059c360] to retrigger the CI

github-actions bot added the SQL label Nov 25, 2025

Retrigger CI

059c360

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-44988][SQL] Support reading Parquet TIMESTAMP(NANOS,false) #53221

[SPARK-44988][SQL] Support reading Parquet TIMESTAMP(NANOS,false) #53221

AbinayaJayaprakasam commented Nov 25, 2025 •

edited

Loading

Uh oh!

AbinayaJayaprakasam commented Nov 25, 2025 •

edited

Loading

Uh oh!

AbinayaJayaprakasam commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-44988][SQL] Support reading Parquet TIMESTAMP(NANOS,false) #53221

Are you sure you want to change the base?

[SPARK-44988][SQL] Support reading Parquet TIMESTAMP(NANOS,false) #53221

Conversation

AbinayaJayaprakasam commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

AbinayaJayaprakasam commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AbinayaJayaprakasam commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AbinayaJayaprakasam commented Nov 25, 2025 •

edited

Loading

AbinayaJayaprakasam commented Nov 25, 2025 •

edited

Loading