fix(spark): align CTAS partition fields by table partition order by fhan688 · Pull Request #18899 · apache/hudi

fhan688 · 2026-06-02T14:15:16Z

Describe the issue this Pull Request addresses

Spark SQL CTAS for Hudi tables can write incorrect values for multi-level partition fields when the partition columns in the SELECT output are not ordered the same as the table partition spec.

For example, a table created with:

partitioned by (year, month, day)

can receive a CTAS query whose output is:

select ..., month, day, year

The CTAS path currently forwards the resolved query output as-is, so the downstream write path may interpret partition field values by position instead of the declared table partition order.

This PR fixes the issue inline.

Summary and Changelog

This change aligns CTAS query output with the Hudi table partition field order before creating CreateHoodieTableAsSelectCommand.

Changes:

Reorder CTAS partition attributes according to table.partitionColumnNames in ResolveImplementationsEarly.
Preserve non-partition columns in their original query output order.
Use Spark's session resolver for partition field matching.
Avoid adding a projection when the CTAS output is already aligned.
Add Spark SQL DDL tests for multi-level partition CTAS with both ordered and out-of-order partition columns.

Impact

No public API, config, or storage format changes.

This fixes Spark SQL CTAS behavior for Hudi partitioned tables. CTAS now correctly handles multi-level partition columns even when the SELECT list orders partition fields differently from the PARTITIONED BY clause.

Risk Level

low

The change is scoped to Hudi Spark SQL CTAS analysis for resolved Hudi tables. Non-partitioned CTAS and already-aligned CTAS plans keep the existing behavior. Verification was added for both COW and MOR table types
through the existing TestCreateTable CTAS coverage.

Documentation Update

none

This is a bug fix with no new feature, config, or public API change.

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

voonhous

Thank you for the fix!

LGTM

Will love another set of eyes to take a look at this before it gets merged in :)

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR aligns the CTAS query output partition column order with the table's declared partition spec to fix a positional misinterpretation bug in multi-level partitioned tables. The logic is straightforward, scoped to the analyzer rule, and includes regression tests for both ordered and out-of-order partition columns. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One minor readability suggestion below, otherwise the code is clean.

cc @yihua

voonhous · 2026-06-03T07:28:26Z

@fhan688 Can you also try to address the nit comments if possible?

codecov-commenter · 2026-06-03T09:31:51Z

Codecov Report

❌ Patch coverage is 77.77778% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.81%. Comparing base (af38b88) to head (1abd797).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
...pache/spark/sql/hudi/analysis/HoodieAnalysis.scala	77.77%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18899      +/-   ##
============================================
+ Coverage     67.01%   68.81%   +1.80%     
- Complexity    28461    29172     +711     
============================================
  Files          2520     2520              
  Lines        140046   140073      +27     
  Branches      17197    17213      +16     
============================================
+ Hits          93850    96393    +2543     
+ Misses        38529    35902    -2627     
- Partials       7667     7778     +111

Flag	Coverage Δ
common-and-other-modules	`44.31% <0.00%> (-0.03%)`	⬇️
hadoop-mr-java-client	`44.87% <ø> (-0.04%)`	⬇️
spark-client-hadoop-common	`48.16% <ø> (-0.07%)`	⬇️
spark-java-tests	`49.36% <0.00%> (+0.02%)`	⬆️
spark-scala-tests	`45.25% <77.77%> (-0.01%)`	⬇️
utilities	`37.40% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...pache/spark/sql/hudi/analysis/HoodieAnalysis.scala	`73.89% <77.77%> (+0.17%)`	⬆️

... and 160 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hudi-bot · 2026-06-03T09:40:52Z

CI report:

1abd797 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

fix(spark): fix CTAS partition field order

c9bfa6a

github-actions Bot added the size:S PR with lines of changes in (10, 100] label Jun 2, 2026

voonhous approved these changes Jun 2, 2026

View reviewed changes

hudi-agent reviewed Jun 2, 2026

View reviewed changes

Comment thread ...datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala

fix(spark): add a short comment

1abd797

voonhous approved these changes Jun 3, 2026

View reviewed changes

voonhous merged commit ba8c4c7 into apache:master Jun 3, 2026
58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(spark): align CTAS partition fields by table partition order#18899

fix(spark): align CTAS partition fields by table partition order#18899
voonhous merged 2 commits into
apache:masterfrom
fhan688:fix-CTAS-partition-field-order

fhan688 commented Jun 2, 2026

Uh oh!

voonhous left a comment •

edited

Loading

Uh oh!

hudi-agent left a comment

Uh oh!

Uh oh!

voonhous commented Jun 3, 2026

Uh oh!

codecov-commenter commented Jun 3, 2026

Uh oh!

hudi-bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

fhan688 commented Jun 2, 2026

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

voonhous left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

voonhous commented Jun 3, 2026

Uh oh!

codecov-commenter commented Jun 3, 2026

Codecov Report

Uh oh!

hudi-bot commented Jun 3, 2026

CI report:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

voonhous left a comment •

edited

Loading