Optimize RetentionManager performance by replacing List with Set for segment lookups#16824
Merged
xiangfu0 merged 1 commit intoSep 16, 2025
Merged
Conversation
xiangfu0
approved these changes
Sep 16, 2025
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes RetentionManager performance by replacing List with Set for segment lookups to eliminate O(n) performance bottlenecks when processing large numbers of segments.
Key changes:
- Replaced List with Set for segment collections to convert O(n) contains() operations to O(1) HashSet lookups
- Added performance test to validate handling of 400,000 segments within 30 seconds
- Made findUntrackedSegmentsToDeleteFromDeepstore method package-private with @VisibleForTesting annotation
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| RetentionManager.java | Updated segment collection types from List to Set and modified method signature for performance optimization |
| RetentionManagerTest.java | Added comprehensive performance test with mock filesystem to validate optimization with 400,000 segments |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16824 +/- ##
=========================================
Coverage 63.33% 63.33%
Complexity 1399 1399
=========================================
Files 3057 3057
Lines 179191 179221 +30
Branches 27456 27463 +7
=========================================
+ Hits 113489 113518 +29
+ Misses 56944 56933 -11
- Partials 8758 8770 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
pushed a commit
to xiangfu0/pinot
that referenced
this pull request
Nov 13, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
RetentionManager was experiencing high runtime when processing large numbers of segments due to O(n) List.contains() operations during segment exclusion checks.
Solution
Replaced List with Set for segment collections, converting O(n) lookups to O(1) HashSet operations. This eliminates the performance bottleneck in findUntrackedSegmentsToDeleteFromDeepstore().
Impact
Dramatically improves performance for large-scale deployments. Added performance test validates handling 400,000 segments within 30 seconds, preventing future regressions.
Sample runs using List and Set