Skip to content

[SPARK-55681][SQL] Fix singleton DataType equality after deserialization#54475

Closed
timlee0119 wants to merge 1 commit intoapache:masterfrom
timlee0119:spark-55681-datatype-equality
Closed

[SPARK-55681][SQL] Fix singleton DataType equality after deserialization#54475
timlee0119 wants to merge 1 commit intoapache:masterfrom
timlee0119:spark-55681-datatype-equality

Conversation

@timlee0119
Copy link
Contributor

What changes were proposed in this pull request?

Override equals() and hashCode() on 14 singleton DataType classes so that non-singleton instances compare equal to the case object singletons:

BinaryType, BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, DateType, TimestampType, TimestampNTZType, NullType, CalendarIntervalType, VariantType

For each type:

override def equals(obj: Any): Boolean = obj.isInstanceOf[XType]
override def hashCode(): Int = classOf[XType].getSimpleName.hashCode

getSimpleName is used because Scala's auto-generated hashCode for 0-arity case objects returns productPrefix.hashCode (the simple class name). This preserves the exact same hash values, avoiding any change in hash-dependent code paths.

Other DataTypes did not need this change:

  • VarcharType, CharType, TimeType, GeometryType, GeographyType — already matched by type (case _: XType
    =>) across the codebase
  • StringType — has custom equals() comparing collationId
  • DecimalType, ArrayType, MapType, StructType — case classes with auto-generated equals()

Why are the changes needed?

Scala case object pattern matching (e.g., case BinaryType =>) relies on equals(), which for case objects defaults to reference equality. If a non-singleton instance of a DataType class is created at runtime — through any serialization framework that bypasses readResolve() — every case BinaryType => match in the codebase silently falls through, leading to errors like (code pointer):

IllegalStateException: The data type 'binary' is not supported in generating a writer function...

Although the constructors are private, this is a compile-time guard only — serialization frameworks
bypass constructors at runtime, so non-singleton instances can be created.

Does this PR introduce any user-facing change?

Yes. Before this change, if a non-singleton DataType instance was created through deserialization, pattern matches like case BinaryType => would silently fail, leading to non-deterministic runtime errors. After this change, non-singleton instances are correctly recognized as equal to the singleton, and pattern matching works as expected.

How was this patch tested?

  1. Unit test in DataTypeSuite (singleton DataType equality after deserialization): creates non-singleton instances via reflection for all 14 types, verifies bidirectional equality with singletons, matching hashCode, and correct PhysicalDataType resolution (no fallthrough to UninitializedPhysicalType).
  2. Also verified that ExpressionSetSuite passes unchanged (its magic hashCode values are calibrated against IntegerType.hashCode, confirming hash preservation).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.6)

@timlee0119 timlee0119 changed the title fix type [SPARK-55681][SQL] Fix singleton DataType equality after deserialization Feb 25, 2026
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in dd22010 Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants