Skip to content

fix(flink): Support read non-VECTOR columns from table containing VEC…#18712

Merged
danny0405 merged 2 commits into
apache:masterfrom
cshuo:add_vector_type_guard
May 11, 2026
Merged

fix(flink): Support read non-VECTOR columns from table containing VEC…#18712
danny0405 merged 2 commits into
apache:masterfrom
cshuo:add_vector_type_guard

Conversation

@cshuo
Copy link
Copy Markdown
Collaborator

@cshuo cshuo commented May 9, 2026

…TOR columns

Describe the issue this Pull Request addresses

Flink table reads could fail for Hudi tables created by Spark when the table schema contains VECTOR columns, even when the query only projects non-VECTOR fields. The failure came from Flink-side schema conversion and reader validation not distinguishing between the full table schema and the required/projected read schema.

This PR improves Flink compatibility with Spark-written VECTOR tables by converting VECTOR schema metadata into Flink array element types while still rejecting actual VECTOR column reads until Flink reader support is implemented.

Summary and Changelog

  • Adds VECTOR handling to HoodieSchemaConverter, mapping VECTOR element types to Flink arrays:
    • FLOAT -> ARRAY<FLOAT>
    • DOUBLE -> ARRAY<DOUBLE>
    • INT8 -> ARRAY<TINYINT>
  • Adds Flink reader validation via DataTypeUtils.validateReaderSupportedDataTypes, rejecting reads only when projected fields include VECTOR columns.
  • Updates HoodieTableSource to validate the produced/read data type before building the source pipeline, and caches resolved table schema.

Impact

  • Flink can build read pipelines and query non-VECTOR columns from Hudi tables that contain VECTOR columns; direct VECTOR column reads still fail with a clear validation error.

Risk Level

low

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@cshuo cshuo requested review from danny0405, rahil-c and yihua May 9, 2026 08:55
@cshuo cshuo force-pushed the add_vector_type_guard branch from e5db426 to 8c684aa Compare May 9, 2026 09:14
@github-actions github-actions Bot added the size:L PR with lines of changes in (300, 1000] label May 9, 2026
@cshuo cshuo force-pushed the add_vector_type_guard branch from 8c684aa to ec8fff1 Compare May 9, 2026 11:56
danny0405
danny0405 previously approved these changes May 9, 2026
@danny0405 danny0405 dismissed their stale review May 9, 2026 13:06

have one concern.

@cshuo cshuo force-pushed the add_vector_type_guard branch from ec8fff1 to 440de18 Compare May 11, 2026 03:17
@github-actions github-actions Bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels May 11, 2026
@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.09%. Comparing base (f2f6203) to head (440de18).

Files with missing lines Patch % Lines
...va/org/apache/hudi/util/HoodieSchemaConverter.java 60.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18712   +/-   ##
=========================================
  Coverage     68.09%   68.09%           
- Complexity    29106    29123   +17     
=========================================
  Files          2528     2528           
  Lines        141467   141477   +10     
  Branches      17544    17541    -3     
=========================================
+ Hits          96330    96339    +9     
+ Misses        37217    37216    -1     
- Partials       7920     7922    +2     
Flag Coverage Δ
common-and-other-modules 44.41% <60.00%> (-0.01%) ⬇️
hadoop-mr-java-client 45.01% <ø> (+<0.01%) ⬆️
spark-client-hadoop-common 48.35% <ø> (+<0.01%) ⬆️
spark-java-tests 49.01% <ø> (+0.01%) ⬆️
spark-scala-tests 44.90% <ø> (-0.01%) ⬇️
utilities 37.64% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...va/org/apache/hudi/util/HoodieSchemaConverter.java 69.04% <60.00%> (-0.38%) ⬇️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danny0405 danny0405 merged commit 6bd2ace into apache:master May 11, 2026
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants