[GH-2608] Fix RasterUDT JSON schema serialization for Delta/Parquet write #2636
+65
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes RasterUDT Failing to Write to Delta Format; But works with output of RS_Union_Aggr #2608 Closes Raster Data Types in dataframe columns #2347What changes were proposed in this PR?
Add a
jsonValueoverride toRasterUDTthat strips the trailing$from the Scala case object class name, fixing Delta Lake and Parquet write failures.RasterUDTis defined as a Scalacase object, sogetClass.getNamereturnsorg.apache.spark.sql.sedona_sql.UDT.RasterUDT$(with a trailing$). Spark'sUserDefinedType.jsonValuestores this class name in the JSON schema. When Delta Lake or Parquet tries to reconstruct the UDT during deserialization viaClass.forName(...).getConstructor().newInstance(), it fails with:NoSuchMethodException: RasterUDT$.<init>()(JSON schema round-trip)UNSUPPORTED_DATATYPEreferencingRasterUDT$(Parquet/Delta write)This is the same issue that was previously fixed in
GeometryUDTandGeographyUDT.Note:
RS_Union_Aggrwas not affected because it usesExpressionEncoderresolved viaUDTRegistration, which storesclassOf[RasterUDT].getName(without the$suffix). Other raster functions (e.g.,RS_MakeEmptyRaster) useInferredExpressionwhich references thecase objectsingleton directly.How was this patch tested?
Added 3 new tests to
RasterUDTSuite:DataType.fromJson, and verifies round-trip equalityRS_MakeEmptyRaster, writes to Parquet, reads it back, and verifies schema and row countRS_Union_Aggroutput can be written to and read from Parquet (this already worked before the fix, serving as a control test)All tests pass after the fix. Tests 1 and 2 fail without the fix, reproducing the reported issue.
Did this PR include necessary documentation updates?