Skip to content

fix: ProtoConversionUtil$AvroSupport static init under Avro 1.12#18571

Merged
danny0405 merged 1 commit into
apache:masterfrom
tiennguyen-onehouse:ENG-38902-fix-proto-avro-1.12-oss
Apr 24, 2026
Merged

fix: ProtoConversionUtil$AvroSupport static init under Avro 1.12#18571
danny0405 merged 1 commit into
apache:masterfrom
tiennguyen-onehouse:ENG-38902-fix-proto-avro-1.12-oss

Conversation

@tiennguyen-onehouse
Copy link
Copy Markdown
Contributor

@tiennguyen-onehouse tiennguyen-onehouse commented Apr 23, 2026

Describe the issue this Pull Request addresses

Closes #18569

Summary and Changelog

RECURSION_OVERFLOW_SCHEMA in ProtoConversionUtil$AvroSupport passes new byte[0] (via getUTF8Bytes("")) as the default value of a BYTES HoodieSchemaField, which wraps an Avro Schema.Field. Avro 1.12.0's Schema.validateDefault rejects byte[] for BYTES defaults — it now requires a String (interpreted as ISO-8859-1 bytes, Avro's canonical JSON form for BYTES defaults). Under 1.11.x the validator was lenient.

Because the failure occurs in a static initializer, the JVM caches the ExceptionInInitializerError and every subsequent class access throws NoClassDefFoundError: Could not initialize class org.apache.hudi.utilities.sources.helpers.ProtoConversionUtil$AvroSupport. Any downstream service loading this class (e.g. DeltaStreamer using ProtoClassBasedSchemaProvider) is permanently broken on JVMs that have Avro 1.12+ on the classpath, even though Hudi itself still pins 1.11.4.

Observed stack trace:

org.apache.avro.AvroTypeException: Invalid default for field proto_bytes: "" not a "bytes"
    at org.apache.avro.Schema.validateDefault(Schema.java:1681)
    at org.apache.avro.Schema$Field.<init>(Schema.java:539)
    ...

ExceptionInInitializerErrorNoClassDefFoundError on every subsequent class access.

Fix: use "" (empty string) instead of getUTF8Bytes(""). The default value is never read at runtime — it is always overwritten by ByteBuffer.wrap(messageValue.toByteArray()) at ProtoConversionUtil.java:~365 before the field is populated — and "" serializes bit-for-bit identically to new byte[0] on the wire under Avro 1.11 (both produce "default":"" in the JSON schema). Behavior is strictly preserved under Avro 1.11 while satisfying 1.12's stricter validator.

Rejected alternatives (experimental testing against both Avro versions):

Default argument Avro 1.11.x Avro 1.12.0
new byte[0] (current) OK FAIL
"" (fix) OK OK
ByteBuffer.wrap(...) OK FAIL (Unknown datum class: HeapByteBuffer)
null OK OK, but strips the default entirely (changes on-disk schema shape)

Also drops the now-unused getUTF8Bytes static import.

Scope check: grepped the repo for BYTES-default patterns with byte[]/ByteBuffer defaults — there is exactly one occurrence (this line). No other call sites need fixing.

Impact

  • Hudi's default builds (Avro 1.11.4) — no-op; "" serializes identically to new byte[0] on the wire.
  • Hudi consumers bumping Avro to 1.12+ — fixes a static-init NoClassDefFoundError that currently bricks any pipeline using ProtoClassBasedSchemaProvider.
  • No schema or on-disk format change.

Risk Level

none

The changed default value is never materialized into records (always overwritten before the field is populated) and produces the same wire form on Avro 1.11. The change is inert for any consumer on the pinned Avro version and fixes a hard failure on newer Avro versions.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

RECURSION_OVERFLOW_SCHEMA passes new byte[0] (via getUTF8Bytes("")) as
the default value of a BYTES HoodieSchemaField, which wraps an Avro
Schema.Field. Avro 1.12.0's Schema.validateDefault rejects byte[] for
BYTES defaults — it now requires a String (interpreted as ISO-8859-1
bytes, Avro's canonical JSON form for BYTES defaults). Under 1.11.x the
validator was lenient.

Because the failure is in a static initializer, the JVM caches the
ExceptionInInitializerError and every subsequent class access throws
NoClassDefFoundError: Could not initialize class
org.apache.hudi.utilities.sources.helpers.ProtoConversionUtil$AvroSupport.
Any downstream service loading this class (e.g. DeltaStreamer using
ProtoClassBasedSchemaProvider) is permanently broken on JVMs that have
Avro 1.12+ on the classpath, even though Hudi itself still pins 1.11.4.

Fix: use "" (empty string) instead of getUTF8Bytes(""). The default
value is never read at runtime — it is always overwritten by
ByteBuffer.wrap(messageValue.toByteArray()) at line ~365 before the
field is populated — and "" serializes bit-for-bit identically to
new byte[0] on the wire (both produce "default":"" in the JSON schema),
so behavior is strictly preserved under Avro 1.11 while satisfying
1.12's stricter validator.

Rejected alternatives (experimental testing against both Avro versions):
| default arg              | 1.11.x | 1.12.0 |
| new byte[0] (current)    | OK     | FAIL   |
| "" (fix)                 | OK     | OK     |
| ByteBuffer.wrap(...)     | OK     | FAIL   |
| null                     | OK     | OK, but strips the default entirely |

Also drops the now-unused getUTF8Bytes static import.

Fixes: apache#18569
Signed-off-by: tiennguyen-onehouse <tien@onehouse.ai>
@github-actions github-actions Bot added the size:XS PR with lines of changes in <= 10 label Apr 23, 2026
@tiennguyen-onehouse tiennguyen-onehouse marked this pull request as ready for review April 23, 2026 23:08
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR fixes a static initializer failure in ProtoConversionUtil$AvroSupport under Avro 1.12.0 by changing the BYTES field default from new byte[0] to "", satisfying the stricter validateDefault in 1.12 while preserving on-wire behavior under the pinned 1.11.4. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

cc @yihua

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.86%. Comparing base (ace2871) to head (6a1ee3a).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18571      +/-   ##
============================================
- Coverage     68.87%   68.86%   -0.01%     
- Complexity    28482    28493      +11     
============================================
  Files          2478     2478              
  Lines        136699   136685      -14     
  Branches      16634    16629       -5     
============================================
- Hits          94150    94129      -21     
- Misses        34980    34985       +5     
- Partials       7569     7571       +2     
Flag Coverage Δ
common-and-other-modules 44.47% <100.00%> (+<0.01%) ⬆️
hadoop-mr-java-client 44.78% <ø> (+<0.01%) ⬆️
spark-client-hadoop-common 48.54% <ø> (+<0.01%) ⬆️
spark-java-tests 49.44% <0.00%> (-0.01%) ⬇️
spark-scala-tests 45.32% <0.00%> (+<0.01%) ⬆️
utilities 38.03% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...utilities/sources/helpers/ProtoConversionUtil.java 79.35% <100.00%> (ø)

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 2092890 into apache:master Apr 24, 2026
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS PR with lines of changes in <= 10

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] ProtoConversionUtil$AvroSupport static init fails under Avro 1.12 with 'Invalid default for field proto_bytes: ""'

5 participants