Skip to content

[GH-2704] Disable TransformNestedUDTParquet on Spark 4.1+#2703

Merged
jiayuasu merged 1 commit intoapache:masterfrom
james-willis:fix/disable-vectorized-reader-nested-udt-sedona
Mar 10, 2026
Merged

[GH-2704] Disable TransformNestedUDTParquet on Spark 4.1+#2703
jiayuasu merged 1 commit intoapache:masterfrom
james-willis:fix/disable-vectorized-reader-nested-udt-sedona

Conversation

@james-willis
Copy link
Collaborator

@james-willis james-willis commented Mar 10, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Disable the TransformNestedUDTParquet optimizer rule on Spark 4.1+, where the root cause (SPARK-48942) has been fixed natively by SPARK-52651.

PR #2359 introduced TransformNestedUDTParquet to work around SPARK-48942, which caused the vectorized Parquet reader to crash on nested UDTs. SPARK-52651 (merged in Spark 4.1) fixes this at the Spark level by recursively stripping UDTs in ColumnVector, making our workaround unnecessary on 4.1+.

This PR version-gates the workaround so it is only registered on Spark < 4.1. It uses defensive version parsing (Try/getOrElse, .lift()) to avoid exceptions on malformed version strings.

How was this patch tested?

  • Verified compilation with mvn clean install -Dspark=3.5 -Dscala=2.12 -DskipTests

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

SPARK-48942: nested UDTs crash the vectorized Parquet reader on Spark < 4.1.
SPARK-52651 fixes this in Spark 4.1+ by recursively stripping UDTs in
ColumnVector, making the TransformNestedUDTParquet workaround unnecessary.

Only register the TransformNestedUDTParquet optimizer rule when running
on Spark < 4.1. Uses defensive version parsing with Try/getOrElse and
.lift() to avoid exceptions on malformed version strings.
@james-willis james-willis requested a review from jiayuasu as a code owner March 10, 2026 17:13
@james-willis james-willis marked this pull request as draft March 10, 2026 17:15
@james-willis james-willis changed the title [SPARK-48942] Disable TransformNestedUDTParquet on Spark 4.1+ [GH-2704] Disable TransformNestedUDTParquet on Spark 4.1+ Mar 10, 2026
@james-willis james-willis marked this pull request as ready for review March 10, 2026 17:17
@jiayuasu jiayuasu added this to the sedona-1.9.0 milestone Mar 10, 2026
@jiayuasu jiayuasu merged commit 1b7a804 into apache:master Mar 10, 2026
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disable TransformNestedUDTParquet workaround on Spark 4.1+

2 participants