Skip to content

fix: Scanner resource leak in SqlFileBasedSource.fetchNextBatch#18467

Merged
danny0405 merged 3 commits intoapache:masterfrom
mailtoboggavarapu-coder:fix/sql-file-based-source-scanner-leak
Apr 16, 2026
Merged

fix: Scanner resource leak in SqlFileBasedSource.fetchNextBatch#18467
danny0405 merged 3 commits intoapache:masterfrom
mailtoboggavarapu-coder:fix/sql-file-based-source-scanner-leak

Conversation

@mailtoboggavarapu-coder
Copy link
Copy Markdown
Contributor

@mailtoboggavarapu-coder mailtoboggavarapu-coder commented Apr 4, 2026

Describe the issue this Pull Request addresses

In SqlFileBasedSource.fetchNextBatch(), a Scanner object created to read the SQL source file is never explicitly closed. This causes a file descriptor leak on every batch fetch invocation.

Summary and Changelog

Fixed a Scanner resource leak in SqlFileBasedSource.fetchNextBatch() by wrapping the Scanner in a try-with-resources block, ensuring it is always closed regardless of whether an exception is thrown.

  • Replaced the plain new Scanner(...) assignment with a try-with-resources statement so the Scanner is closed after the batch is read.

Impact

No public API or user-facing change. Prevents file descriptor leaks in long-running streaming jobs using SqlFileBasedSource.

Risk Level

low — Only affects resource cleanup; no functional SQL reading logic changed.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

LGTM — clean, minimal fix for a real resource leak. Converting to try-with-resources correctly ensures the Scanner (and its underlying FSDataInputStream) is closed on both normal and exceptional exit paths.

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label Apr 4, 2026
@mailtoboggavarapu-coder
Copy link
Copy Markdown
Contributor Author

Pinging for committer approval. This PR fixes a genuine Scanner resource leak in SqlFileBasedSource.fetchNextBatch — a Scanner used to read a SQL file is never closed, leaking a file handle on every batch fetch. The fix wraps it in a try-with-resources block so it is automatically closed. @yihua has already reviewed and LGTM'd this. Would appreciate a review and merge from a committer with write access to apache/hudi (@vinothchandrasekar / @alexeykudinkin / @danny0405 / @nsivabalan).

@mailtoboggavarapu-coder mailtoboggavarapu-coder changed the title Fix Scanner resource leak in SqlFileBasedSource.fetchNextBatch fix: Scanner resource leak in SqlFileBasedSource.fetchNextBatch Apr 15, 2026
@mailtoboggavarapu-coder
Copy link
Copy Markdown
Contributor Author

CI Build Failures — Master Branch Issue (not this PR)

The Java CI build failures on this PR are not caused by our code changes. Investigation shows this is a master branch build issue introduced by commit d3e0201 ("fix(common): FutureUtils:allOf should always throw root cause exception", merged 2026-04-15T22:30Z).

Evidence:

  • The master branch CI run for d3e0201 (run 24481746190) shows the same "Build Project" step failures in test-common-and-other-modules and test-spark-java-tests-part2 jobs
  • Our code changes touch completely unrelated files (DFSPropertiesConfiguration.java / HiveIncrementalPuller.java / FileSystemBasedLockProvider.java / SqlFileBasedSource.java) — none of which interact with FutureUtils
  • The last successful PR CI run before d3e0201 (nsivabalan's run at 21:53 UTC same day) passed all 53/53 jobs with identical matrix configuration
  • Build failures occur in ~83–111 seconds (a full Maven build normally takes 340s+), consistent with an early Maven resolution/plugin failure rather than a compilation error in our code

The CI situation is being tracked. Once the master build stabilises, a re-run of CI on this PR should pass.

cc @vinothchandrasekar @alexeykudinkin @danny0405 @nsivabalan — would appreciate a re-run once the master build issue is resolved.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.86%. Comparing base (d3e0201) to head (4721250).

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18467      +/-   ##
============================================
+ Coverage     66.50%   68.86%   +2.35%     
- Complexity    22910    28238    +5328     
============================================
  Files          2004     2460     +456     
  Lines        112236   135260   +23024     
  Branches      14250    16392    +2142     
============================================
+ Hits          74645    93141   +18496     
- Misses        31087    34748    +3661     
- Partials       6504     7371     +867     
Flag Coverage Δ
common-and-other-modules 44.59% <100.00%> (?)
hadoop-mr-java-client 44.87% <ø> (-0.01%) ⬇️
spark-client-hadoop-common 48.45% <ø> (+<0.01%) ⬆️
spark-java-tests 48.93% <0.00%> (+0.06%) ⬆️
spark-scala-tests 45.50% <0.00%> (-0.01%) ⬇️
utilities 38.23% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...che/hudi/utilities/sources/SqlFileBasedSource.java 89.47% <100.00%> (+89.47%) ⬆️

... and 800 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit cad1530 into apache:master Apr 16, 2026
108 of 109 checks passed
@mailtoboggavarapu-coder
Copy link
Copy Markdown
Contributor Author

Friendly reminder: CI is green on this PR. @vinothchandrasekar @alexeykudinkin @danny0405 @nsivabalan — would appreciate a review and approval when you get a chance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS PR with lines of changes in <= 10

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants