Skip to content

Optimize RetentionManager performance by replacing List with Set for segment lookups#16824

Merged
xiangfu0 merged 1 commit into
apache:masterfrom
9aman:faster_lookups_for_existing_segments
Sep 16, 2025
Merged

Optimize RetentionManager performance by replacing List with Set for segment lookups#16824
xiangfu0 merged 1 commit into
apache:masterfrom
9aman:faster_lookups_for_existing_segments

Conversation

@9aman
Copy link
Copy Markdown
Contributor

@9aman 9aman commented Sep 16, 2025

Problem

RetentionManager was experiencing high runtime when processing large numbers of segments due to O(n) List.contains() operations during segment exclusion checks.

03:18:05.742 INFO [RetentionManager] [pool-16-thread-12] Found: 394300 segments in deepstore for table: <tableName>. Time taken to list segments: 37021 ms
03:49:30.354 INFO [RetentionManager] [pool-16-thread-12] Took: 1921633 ms to identify 56 segments for deletion from deep store for table: <tableName> as they have no corresponding entry in the property store.

Solution

Replaced List with Set for segment collections, converting O(n) lookups to O(1) HashSet operations. This eliminates the performance bottleneck in findUntrackedSegmentsToDeleteFromDeepstore().

Impact

Dramatically improves performance for large-scale deployments. Added performance test validates handling 400,000 segments within 30 seconds, preventing future regressions.

Sample runs using List and Set

image image

Copy link
Copy Markdown
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@xiangfu0 xiangfu0 added enhancement Improvement to existing functionality performance Related to performance optimization labels Sep 16, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes RetentionManager performance by replacing List with Set for segment lookups to eliminate O(n) performance bottlenecks when processing large numbers of segments.

Key changes:

  • Replaced List with Set for segment collections to convert O(n) contains() operations to O(1) HashSet lookups
  • Added performance test to validate handling of 400,000 segments within 30 seconds
  • Made findUntrackedSegmentsToDeleteFromDeepstore method package-private with @VisibleForTesting annotation

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
RetentionManager.java Updated segment collection types from List to Set and modified method signature for performance optimization
RetentionManagerTest.java Added comprehensive performance test with mock filesystem to validate optimization with 400,000 segments

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 16, 2025

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.33%. Comparing base (05c685b) to head (13b79ac).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...troller/helix/core/retention/RetentionManager.java 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #16824   +/-   ##
=========================================
  Coverage     63.33%   63.33%           
  Complexity     1399     1399           
=========================================
  Files          3057     3057           
  Lines        179191   179221   +30     
  Branches      27456    27463    +7     
=========================================
+ Hits         113489   113518   +29     
+ Misses        56944    56933   -11     
- Partials       8758     8770   +12     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.31% <33.33%> (-0.01%) ⬇️
java-21 63.31% <33.33%> (+<0.01%) ⬆️
temurin 63.33% <33.33%> (+<0.01%) ⬆️
unittests 63.33% <33.33%> (+<0.01%) ⬆️
unittests1 56.32% <ø> (+0.01%) ⬆️
unittests2 33.39% <33.33%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 merged commit 700cf05 into apache:master Sep 16, 2025
18 checks passed
xiangfu0 pushed a commit to xiangfu0/pinot that referenced this pull request Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality performance Related to performance optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants