Skip to content

Fix AWS SDK credential provider thread leak in S3PinotFS#17869

Merged
KKcorps merged 1 commit intoapache:masterfrom
suvodeep-pyne:spyne/s3-thread-leak
Mar 13, 2026
Merged

Fix AWS SDK credential provider thread leak in S3PinotFS#17869
KKcorps merged 1 commit intoapache:masterfrom
suvodeep-pyne:spyne/s3-thread-leak

Conversation

@suvodeep-pyne
Copy link
Contributor

@suvodeep-pyne suvodeep-pyne commented Mar 12, 2026

Summary

  • Fix thread leak caused by StsClient and StsAssumeRoleCredentialsProvider being created as local variables in initOrRefreshS3Client() — never stored as fields, never closed on refresh, and not closed in close(). Each leaked provider spawns sdk-cache-* background threads that accumulate indefinitely.
  • Observed in production: 91 sdk-cache threads (1,338 total) on a controller over 4.6 days.
  • Store STS resources as instance fields and close old ones after building the new client (avoids a window where concurrent S3 operations could hit a closed credential provider).
  • Synchronize initOrRefreshS3Client() and close() on _clientLock to prevent races between credential refresh and shutdown.

Test plan

  • mvn compile passes
  • Existing unit tests pass (tests use init(S3Client) path — no IAM role — so they validate no regression on the non-IAM path)

StsClient and StsAssumeRoleCredentialsProvider were created as local
variables in initOrRefreshS3Client(), never stored as fields, and never
closed on refresh or in close(). Each leaked provider spawned sdk-cache-*
background threads that accumulated indefinitely (observed: 91 threads
over 4.6 days on a production controller).

- Store StsClient and StsAssumeRoleCredentialsProvider as instance fields
- Close old credential resources after building new client (not before,
  to avoid a window where concurrent S3 operations hit a closed provider)
- Synchronize initOrRefreshS3Client() and close() on _clientLock to
  prevent races between credential refresh and shutdown
- Add closeQuietly() helper for safe resource cleanup
@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 19.11765% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.27%. Comparing base (111cbf7) to head (ff3b681).
⚠️ Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
.../org/apache/pinot/plugin/filesystem/S3PinotFS.java 19.11% 55 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17869      +/-   ##
============================================
+ Coverage     63.25%   63.27%   +0.02%     
- Complexity     1460     1468       +8     
============================================
  Files          3190     3190              
  Lines        192022   192127     +105     
  Branches      29413    29433      +20     
============================================
+ Hits         121455   121564     +109     
+ Misses        61056    61052       -4     
  Partials       9511     9511              
Flag Coverage Δ
custom-integration1 100.00% <ø> (?)
integration 100.00% <ø> (+100.00%) ⬆️
integration1 100.00% <ø> (?)
integration2 0.00% <ø> (ø)
java-11 63.22% <19.11%> (+0.01%) ⬆️
java-21 63.24% <19.11%> (+0.01%) ⬆️
temurin 63.27% <19.11%> (+0.02%) ⬆️
unittests 63.26% <19.11%> (+0.01%) ⬆️
unittests1 55.58% <ø> (+<0.01%) ⬆️
unittests2 34.27% <19.11%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

setMultiPartUploadConfigs(_s3Config);
} catch (S3Exception e) {
throw new RuntimeException("Could not initialize S3PinotFS", e);
// Close old resources after new client is live (order: provider → STS client → S3 client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@KKcorps KKcorps merged commit cb6ea77 into apache:master Mar 13, 2026
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants