fix(spark): fix mor bulk insert commit type error#18878
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! The fix propagates the datasource hoodie.datasource.write.table.type option to the table-config key hoodie.table.type when only the former is set, which addresses the MOR row-writer bulk_insert producing commit instead of deltacommit. The change preserves precedence for explicitly set values and is symmetric with the existing reverse propagation in fetchMissingWriteConfigsFromTableConfig. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One small style inconsistency in the production change; the test is clean and well-commented.
cc @yihua
| } | ||
| val mergedParams = mutable.Map.empty ++ HoodieWriterUtils.parametersWithWriteDefaults(translatedOptsWithMappedTableConfig.toMap) | ||
| if (!mergedParams.contains(HoodieTableConfig.TYPE.key()) && mergedParams.contains(TABLE_TYPE.key())) { | ||
| mergedParams.put(HoodieTableConfig.TYPE.key(), mergedParams(TABLE_TYPE.key())) |
There was a problem hiding this comment.
🤖 nit: the immediately following block uses the idiomatic Scala mergedParams(key) = value assignment syntax — could you use that here too (mergedParams(HoodieTableConfig.TYPE.key()) = mergedParams(TABLE_TYPE.key())) to stay consistent?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18878 +/- ##
============================================
+ Coverage 68.78% 68.82% +0.03%
- Complexity 29138 29149 +11
============================================
Files 2515 2515
Lines 139973 139940 -33
Branches 17193 17188 -5
============================================
+ Hits 96286 96313 +27
+ Misses 35906 35851 -55
+ Partials 7781 7776 -5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Spark datasource writes can pass MOR table type through
hoodie.datasource.write.table.type, while the row-writerbulk_insertpath later reads
hoodie.table.typefromHoodieWriteConfig#getTableType. When the table config key is absent, the write configcan fall back to COW and select
commitinstead ofdeltacommitfor MOR row-writerbulk_insert.Summary and Changelog
Populate
hoodie.table.typefromhoodie.datasource.write.table.typeinmergeParamsAndGetHoodieConfigonly whenhoodie.table.typeis not already present. This preserves explicit table-config precedence and adds a regression test for MOR row-writer
bulk_insertverifying the completed write instant isdeltacommit. No code was copied.Impact
Fixes Spark datasource MOR row-writer
bulk_insertbehavior. No public API change, no storage format change, no new config, and noexpected performance impact.
Risk Level
low
The change is limited to Spark writer parameter normalization and only fills a missing table config key from the existing datasource
table type option. Verified with
mvn -Pspark3.5 -pl hudi-spark-datasource/hudi-spark -am -Dtest=org.apache.hudi.TestHoodieSparkSqlWriter#testMorRowWriterBulkInsertUsesDeltaCommitAction -Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false -Dcheckstyle.skip=true -DskipUTs=true test.Documentation Update
none
Contributor's checklist