fix: ProtoConversionUtil$AvroSupport static init under Avro 1.12#18571
Conversation
RECURSION_OVERFLOW_SCHEMA passes new byte[0] (via getUTF8Bytes("")) as
the default value of a BYTES HoodieSchemaField, which wraps an Avro
Schema.Field. Avro 1.12.0's Schema.validateDefault rejects byte[] for
BYTES defaults — it now requires a String (interpreted as ISO-8859-1
bytes, Avro's canonical JSON form for BYTES defaults). Under 1.11.x the
validator was lenient.
Because the failure is in a static initializer, the JVM caches the
ExceptionInInitializerError and every subsequent class access throws
NoClassDefFoundError: Could not initialize class
org.apache.hudi.utilities.sources.helpers.ProtoConversionUtil$AvroSupport.
Any downstream service loading this class (e.g. DeltaStreamer using
ProtoClassBasedSchemaProvider) is permanently broken on JVMs that have
Avro 1.12+ on the classpath, even though Hudi itself still pins 1.11.4.
Fix: use "" (empty string) instead of getUTF8Bytes(""). The default
value is never read at runtime — it is always overwritten by
ByteBuffer.wrap(messageValue.toByteArray()) at line ~365 before the
field is populated — and "" serializes bit-for-bit identically to
new byte[0] on the wire (both produce "default":"" in the JSON schema),
so behavior is strictly preserved under Avro 1.11 while satisfying
1.12's stricter validator.
Rejected alternatives (experimental testing against both Avro versions):
| default arg | 1.11.x | 1.12.0 |
| new byte[0] (current) | OK | FAIL |
| "" (fix) | OK | OK |
| ByteBuffer.wrap(...) | OK | FAIL |
| null | OK | OK, but strips the default entirely |
Also drops the now-unused getUTF8Bytes static import.
Fixes: apache#18569
Signed-off-by: tiennguyen-onehouse <tien@onehouse.ai>
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR fixes a static initializer failure in ProtoConversionUtil$AvroSupport under Avro 1.12.0 by changing the BYTES field default from new byte[0] to "", satisfying the stricter validateDefault in 1.12 while preserving on-wire behavior under the pinned 1.11.4. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18571 +/- ##
============================================
- Coverage 68.87% 68.86% -0.01%
- Complexity 28482 28493 +11
============================================
Files 2478 2478
Lines 136699 136685 -14
Branches 16634 16629 -5
============================================
- Hits 94150 94129 -21
- Misses 34980 34985 +5
- Partials 7569 7571 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Closes #18569
Summary and Changelog
RECURSION_OVERFLOW_SCHEMAinProtoConversionUtil$AvroSupportpassesnew byte[0](viagetUTF8Bytes("")) as the default value of a BYTESHoodieSchemaField, which wraps an AvroSchema.Field. Avro 1.12.0'sSchema.validateDefaultrejectsbyte[]for BYTES defaults — it now requires aString(interpreted as ISO-8859-1 bytes, Avro's canonical JSON form for BYTES defaults). Under 1.11.x the validator was lenient.Because the failure occurs in a static initializer, the JVM caches the
ExceptionInInitializerErrorand every subsequent class access throwsNoClassDefFoundError: Could not initialize class org.apache.hudi.utilities.sources.helpers.ProtoConversionUtil$AvroSupport. Any downstream service loading this class (e.g. DeltaStreamer usingProtoClassBasedSchemaProvider) is permanently broken on JVMs that have Avro 1.12+ on the classpath, even though Hudi itself still pins 1.11.4.Observed stack trace:
→
ExceptionInInitializerError→NoClassDefFoundErroron every subsequent class access.Fix: use
""(empty string) instead ofgetUTF8Bytes(""). The default value is never read at runtime — it is always overwritten byByteBuffer.wrap(messageValue.toByteArray())atProtoConversionUtil.java:~365before the field is populated — and""serializes bit-for-bit identically tonew byte[0]on the wire under Avro 1.11 (both produce"default":""in the JSON schema). Behavior is strictly preserved under Avro 1.11 while satisfying 1.12's stricter validator.Rejected alternatives (experimental testing against both Avro versions):
new byte[0](current)""(fix)ByteBuffer.wrap(...)Unknown datum class: HeapByteBuffer)nullAlso drops the now-unused
getUTF8Bytesstatic import.Scope check: grepped the repo for BYTES-default patterns with
byte[]/ByteBufferdefaults — there is exactly one occurrence (this line). No other call sites need fixing.Impact
""serializes identically tonew byte[0]on the wire.NoClassDefFoundErrorthat currently bricks any pipeline usingProtoClassBasedSchemaProvider.Risk Level
none
The changed default value is never materialized into records (always overwritten before the field is populated) and produces the same wire form on Avro 1.11. The change is inert for any consumer on the pinned Avro version and fixes a hard failure on newer Avro versions.
Documentation Update
none
Contributor's checklist