Skip to content

Conversation

@jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Feb 10, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Add a jsonValue override to RasterUDT that strips the trailing $ from the Scala case object class name, fixing Delta Lake and Parquet write failures.

RasterUDT is defined as a Scala case object, so getClass.getName returns org.apache.spark.sql.sedona_sql.UDT.RasterUDT$ (with a trailing $). Spark's UserDefinedType.jsonValue stores this class name in the JSON schema. When Delta Lake or Parquet tries to reconstruct the UDT during deserialization via Class.forName(...).getConstructor().newInstance(), it fails with:

  • NoSuchMethodException: RasterUDT$.<init>() (JSON schema round-trip)
  • UNSUPPORTED_DATATYPE referencing RasterUDT$ (Parquet/Delta write)

This is the same issue that was previously fixed in GeometryUDT and GeographyUDT.

Note: RS_Union_Aggr was not affected because it uses ExpressionEncoder resolved via UDTRegistration, which stores classOf[RasterUDT].getName (without the $ suffix). Other raster functions (e.g., RS_MakeEmptyRaster) use InferredExpression which references the case object singleton directly.

How was this patch tested?

Added 3 new tests to RasterUDTSuite:

  1. JSON schema round-trip — serializes RasterUDT to JSON, parses it back via DataType.fromJson, and verifies round-trip equality
  2. Parquet write/read — creates a raster DataFrame via RS_MakeEmptyRaster, writes to Parquet, reads it back, and verifies schema and row count
  3. RS_Union_Aggr Parquet write/read — confirms that RS_Union_Aggr output can be written to and read from Parquet (this already worked before the fix, serving as a control test)

All tests pass after the fix. Tests 1 and 2 fail without the fix, reproducing the reported issue.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

RasterUDT is a Scala case object whose getClass.getName returns
'RasterUDT$' with a trailing $ sign. Spark's UserDefinedType.jsonValue
uses this class name in the JSON schema. When Delta/Parquet tries to
reconstruct the UDT via Class.forName(...).getConstructor().newInstance(),
it fails because the singleton object's constructor is private.

This fix adds a jsonValue override (identical to the existing fix in
GeometryUDT and GeographyUDT) that strips the trailing $ from the
class name, allowing correct round-trip serialization.

Closes #2608
Closes #2347
@jiayuasu jiayuasu force-pushed the fix/2608-rasterudt-delta-write branch from cb6c34c to 5fdb559 Compare February 10, 2026 08:18
@jiayuasu jiayuasu added this to the sedona-1.9.0 milestone Feb 10, 2026
@jiayuasu jiayuasu added the bug label Feb 10, 2026
@jiayuasu jiayuasu merged commit bfcc147 into master Feb 10, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RasterUDT Failing to Write to Delta Format; But works with output of RS_Union_Aggr Raster Data Types in dataframe columns

1 participant