Skip to content

[SPARK-51961][SQL] Fix from_avro to handle union schemas#55412

Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-51961-avro-union
Open

[SPARK-51961][SQL] Fix from_avro to handle union schemas#55412
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-51961-avro-union

Conversation

@yadavay-amzn
Copy link
Copy Markdown

What changes were proposed in this pull request?

Added union schema resolution in AvroDataToCatalyst to unwrap single non-null union branches before passing to AvroDeserializer, mirroring the existing logic in AvroSerializer.resolveNullableType.

Why are the changes needed?

from_avro throws IncompatibleSchemaException when given a union schema like [{record}, "null"], even though to_avro works with the same schema. The root cause is that AvroSerializer unwraps the union to find the single non-null record type, but AvroDataToCatalyst passes the union directly to AvroDeserializer, which expects a RECORD.

Does this PR introduce any user-facing change?

Yes. from_avro now correctly handles union schemas containing a single record type and null, matching the behavior of to_avro.

How was this patch tested?

Added a new test in AvroFunctionsSuite that verifies roundtrip serialization with a union schema of [record, null].

Was this patch authored or co-authored using generative AI tooling?

Yes.

When a union schema like [{record}, "null"] is passed to both to_avro and
from_avro, to_avro succeeds because AvroSerializer.resolveNullableType
unwraps the union to extract the single non-null record type. However,
from_avro (AvroDataToCatalyst) passes the union schema directly to
AvroDeserializer, which expects a RECORD type, causing an
IncompatibleSchemaException.

The fix adds the same union-unwrapping logic to AvroDataToCatalyst: when
the expected schema is a UNION with a single non-null branch, unwrap it
before passing to AvroDeserializer.

Closes #XXXXX

### What changes were proposed in this pull request?
Added union schema resolution in AvroDataToCatalyst to unwrap single
non-null union branches before passing to AvroDeserializer, mirroring
the existing logic in AvroSerializer.resolveNullableType.

### Why are the changes needed?
from_avro throws IncompatibleSchemaException when given a union schema
like [{record}, "null"], even though to_avro works with the same schema.

### Does this PR introduce _any_ user-facing change?
Yes. from_avro now correctly handles union schemas containing a single
record type and null, matching the behavior of to_avro.

### How was this patch tested?
Added a new test in AvroFunctionsSuite that verifies roundtrip
serialization with a union schema of [record, null].

### Was this patch authored or co-authored using generative AI tooling?
Yes.
@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-51961-avro-union branch from 1978d29 to 55a6bfb Compare April 21, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant