Skip to content

fix(spark): fix mor bulk insert commit type error#18878

Open
fhan688 wants to merge 1 commit into
apache:masterfrom
fhan688:fix-mor-bulk-insert-commit-type-error
Open

fix(spark): fix mor bulk insert commit type error#18878
fhan688 wants to merge 1 commit into
apache:masterfrom
fhan688:fix-mor-bulk-insert-commit-type-error

Conversation

@fhan688
Copy link
Copy Markdown
Contributor

@fhan688 fhan688 commented May 29, 2026

Describe the issue this Pull Request addresses

Spark datasource writes can pass MOR table type through hoodie.datasource.write.table.type, while the row-writer bulk_insert
path later reads hoodie.table.type from HoodieWriteConfig#getTableType. When the table config key is absent, the write config
can fall back to COW and select commit instead of deltacommit for MOR row-writer bulk_insert.

Summary and Changelog

Populate hoodie.table.type from hoodie.datasource.write.table.type in mergeParamsAndGetHoodieConfig only when
hoodie.table.type is not already present. This preserves explicit table-config precedence and adds a regression test for MOR row-
writer bulk_insert verifying the completed write instant is deltacommit. No code was copied.

Impact

Fixes Spark datasource MOR row-writer bulk_insert behavior. No public API change, no storage format change, no new config, and no
expected performance impact.

Risk Level

low
The change is limited to Spark writer parameter normalization and only fills a missing table config key from the existing datasource
table type option. Verified with mvn -Pspark3.5 -pl hudi-spark-datasource/hudi-spark -am -Dtest=org.apache.hudi.TestHoodieSparkSqlWriter#testMorRowWriterBulkInsertUsesDeltaCommitAction -Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false -Dcheckstyle.skip=true -DskipUTs=true test.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! The fix propagates the datasource hoodie.datasource.write.table.type option to the table-config key hoodie.table.type when only the former is set, which addresses the MOR row-writer bulk_insert producing commit instead of deltacommit. The change preserves precedence for explicitly set values and is symmetric with the existing reverse propagation in fetchMissingWriteConfigsFromTableConfig. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One small style inconsistency in the production change; the test is clean and well-commented.

cc @yihua

}
val mergedParams = mutable.Map.empty ++ HoodieWriterUtils.parametersWithWriteDefaults(translatedOptsWithMappedTableConfig.toMap)
if (!mergedParams.contains(HoodieTableConfig.TYPE.key()) && mergedParams.contains(TABLE_TYPE.key())) {
mergedParams.put(HoodieTableConfig.TYPE.key(), mergedParams(TABLE_TYPE.key()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the immediately following block uses the idiomatic Scala mergedParams(key) = value assignment syntax — could you use that here too (mergedParams(HoodieTableConfig.TYPE.key()) = mergedParams(TABLE_TYPE.key())) to stay consistent?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@github-actions github-actions Bot added the size:S PR with lines of changes in (10, 100] label May 29, 2026
@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@fhan688 fhan688 closed this May 29, 2026
@fhan688 fhan688 reopened this May 29, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 68.82%. Comparing base (588998f) to head (bf9c56f).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18878      +/-   ##
============================================
+ Coverage     68.78%   68.82%   +0.03%     
- Complexity    29138    29149      +11     
============================================
  Files          2515     2515              
  Lines        139973   139940      -33     
  Branches      17193    17188       -5     
============================================
+ Hits          96286    96313      +27     
+ Misses        35906    35851      -55     
+ Partials       7781     7776       -5     
Flag Coverage Δ
common-and-other-modules 44.33% <0.00%> (+<0.01%) ⬆️
hadoop-mr-java-client 44.91% <ø> (-0.02%) ⬇️
spark-client-hadoop-common 48.22% <ø> (-0.01%) ⬇️
spark-java-tests 49.39% <50.00%> (+0.03%) ⬆️
spark-scala-tests 45.29% <50.00%> (+0.02%) ⬆️
utilities 37.45% <50.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala 78.53% <50.00%> (+0.07%) ⬆️

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants