Skip to content

fix: clear Hive work map after combine split failures#18719

Merged
danny0405 merged 3 commits into
apache:masterfrom
officialasishkumar:fix-hive-combine-cleanup
May 15, 2026
Merged

fix: clear Hive work map after combine split failures#18719
danny0405 merged 3 commits into
apache:masterfrom
officialasishkumar:fix-hive-combine-cleanup

Conversation

@officialasishkumar
Copy link
Copy Markdown
Contributor

Describe the issue this Pull Request addresses

Closes #17985.

Summary and Changelog

HoodieCombineHiveInputFormat.getSplits now clears Hive's work map in a finally block so ThreadLocal work state is cleaned up even when split generation fails.

Added a regression test that forces split classification to fail and verifies Utilities.clearWorkMapForConf is still invoked.

Validation:

mvn -pl hudi-hadoop-mr -am -Dtest=TestHoodieCombineHiveInputFormat#clearWorkMapForConfOnGetSplitsFailure -Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false -DskipITs=true -DskipFTs=true test

Impact

Prevents stale Hive work-map state from leaking across reused threads after exceptional split generation. There is no public API change.

Risk Level

low. The change preserves the successful cleanup behavior and extends it to failure paths.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label May 11, 2026
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR wraps HoodieCombineHiveInputFormat.getSplits in a try/finally so Utilities.clearWorkMapForConf runs even when split generation fails, preventing stale ThreadLocal work-map state from leaking across reused threads. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One comment about the finally-block comment still saying "after splits are generated" when the fix is specifically about the failure case.

cc @yihua

} finally {
// Clear work from ThreadLocal after splits are generated in case the thread is reused in a pool.
Utilities.clearWorkMapForConf(job);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the comment says "after splits are generated" but since this is now in a finally block, it runs even when generation fails — which is the whole point of the fix. Could you update it to something like // Clear work from ThreadLocal after getSplits completes (successfully or not), in case the thread is reused in a pool.?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

HoodieCombineHiveInputFormat cleared Hive's work map only after successful split generation, leaving ThreadLocal work state behind when split classification or generation failed.

Move the cleanup into a finally block and add a regression test that forces getSplits to fail before verifying Utilities.clearWorkMapForConf is still invoked.
@officialasishkumar officialasishkumar force-pushed the fix-hive-combine-cleanup branch from 34717d0 to e9438ee Compare May 14, 2026 04:04
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! The PR wraps getSplits in a try/finally so the Hive work-map ThreadLocal is cleared even when split generation throws. The change is narrowly scoped and preserves the prior success-path cleanup behavior. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One minor naming inconsistency in the new test method; the rest of the change is clean.

cc @yihua


@Test
public void clearWorkMapForConfOnGetSplitsFailure() throws Exception {
StorageConfiguration<Configuration> conf = HoodieTestUtils.getDefaultStorageConf();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: could you prefix this with test to match the convention used by the other methods in this class (testInternalSchemaCacheForMR, testHoodieRealtimeCombineHoodieInputFormat, etc.)? Something like testClearWorkMapForConfOnGetSplitsFailure.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@officialasishkumar
Copy link
Copy Markdown
Contributor Author

Updated in e9438ee: clarified the ThreadLocal cleanup comment so it describes every getSplits attempt, including failure paths.

Validation passed: mvn -pl hudi-hadoop-mr -am -DskipTests -DskipITs -Drat.skip=true -Dcheckstyle.skip=true compile.

@officialasishkumar
Copy link
Copy Markdown
Contributor Author

Follow-up nit addressed in 28ecd48: renamed the new test method to testClearWorkMapForConfOnGetSplitsFailure to match the class convention.

Validation passed: mvn -pl hudi-hadoop-mr -am -Dtest=TestHoodieCombineHiveInputFormat#testClearWorkMapForConfOnGetSplitsFailure -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false -DskipITs test.

Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR wraps getSplits in a try/finally so Utilities.clearWorkMapForConf is invoked on both success and failure paths, preventing stale ThreadLocal work state from leaking across reused threads. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

cc @yihua

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 73.33333% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.15%. Comparing base (f2fdca2) to head (28ecd48).

Files with missing lines Patch % Lines
...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java 73.33% 8 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18719   +/-   ##
=========================================
  Coverage     68.14%   68.15%           
+ Complexity    29051    29036   -15     
=========================================
  Files          2516     2516           
  Lines        140935   140935           
  Branches      17472    17475    +3     
=========================================
+ Hits          96047    96049    +2     
+ Misses        36993    36990    -3     
- Partials       7895     7896    +1     
Flag Coverage Δ
common-and-other-modules 44.41% <0.00%> (+<0.01%) ⬆️
hadoop-mr-java-client 45.00% <73.33%> (+0.02%) ⬆️
spark-client-hadoop-common 48.35% <0.00%> (+<0.01%) ⬆️
spark-java-tests 48.98% <0.00%> (-0.03%) ⬇️
spark-scala-tests 44.90% <0.00%> (+<0.01%) ⬆️
utilities 37.64% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java 51.82% <73.33%> (+0.91%) ⬆️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 071b3f1 into apache:master May 15, 2026
63 checks passed
yihua pushed a commit that referenced this pull request May 20, 2026
* fix: clear Hive work map after combine split failures

HoodieCombineHiveInputFormat cleared Hive's work map only after successful split generation, leaving ThreadLocal work state behind when split classification or generation failed.

Move the cleanup into a finally block and add a regression test that forces getSplits to fail before verifying Utilities.clearWorkMapForConf is still invoked.
dwshmilyss pushed a commit to dwshmilyss/hudi that referenced this pull request May 21, 2026
* fix: clear Hive work map after combine split failures

HoodieCombineHiveInputFormat cleared Hive's work map only after successful split generation, leaving ThreadLocal work state behind when split classification or generation failed.

Move the cleanup into a finally block and add a regression test that forces getSplits to fail before verifying Utilities.clearWorkMapForConf is still invoked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ThreadLocal Leak in HoodieCombineHiveInputFormat due to unsafe cleanup

5 participants