You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please Do Not Review yet
This is a Work In Progress. Wish to create the PR to go through the testing before asking for Review
When brokers are doing pruning and selection, it passes SegmentName String objects around as a handle to represent Segments. This is OK when we have roughly thousands of segments.
But when we have 30k+ segments, each loop through the Segment name will create 30k+ Strings because String objects are Copy On Assignment. The CPU cost is actually high even if we have O(n) loop.
In our written small performance test, PartitionSegmentPruner needs roughly 2~5 ms to finish iterating the whole Segment comparison.
Our changes here is to
Create a SegmentBrokerView class (because SegmentMetadata is taken already) to represent all the Segments in Broker, and this class will be passed around in the Broker Selector/Pruner pipeline
The SegmentBrokerView class will have all the needed information for Pruners stored in it, so there is no need to do lookup operations when doing pruning; it saves extra String copying when looping through all segments
The RoutingManager will be in charge of maintaining the list of all SegmentBrokerViews, and filling all information into that class
Refactor PartitionSegmentPruner so it will compute "ValidPartitions" first and then lookup quickly.
The Integration test shows the improvement from 150~200 pruning/second to 1200+ pruning/second. That is roughly 6-8x improvement.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
Yes (Please label as backward-incompat, and complete the section below on Release Notes)
Does this PR fix a zero-downtime upgrade introduced earlier?
Yes (Please label this as backward-incompat, and complete the section below on Release Notes)
Does this PR otherwise need attention when creating release notes? Things to consider:
New configuration options
Deprecation of configurations
Signature changes to public methods/interfaces
New plugins added or old plugins removed
Yes (Please label this PR as release-notes and complete the section on Release Notes)
Merging #7377 (aff0d5b) into master (8fab3be) will decrease coverage by 2.16%.
The diff coverage is 21.05%.
❗ Current head aff0d5b differs from pull request most recent head f9151bd. Consider uploading reports for the commit f9151bd to get more accurate results
Thanks for taking this up, this would be really useful for large tables. Just for your reference, here's a link to the PEP guidelines, you may want to start by creating an issue and include your proposal/idea there with details.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please Do Not Review yet
This is a Work In Progress. Wish to create the PR to go through the testing before asking for Review
When brokers are doing pruning and selection, it passes SegmentName String objects around as a handle to represent Segments. This is OK when we have roughly thousands of segments.
But when we have 30k+ segments, each loop through the Segment name will create 30k+ Strings because String objects are Copy On Assignment. The CPU cost is actually high even if we have O(n) loop.
In our written small performance test, PartitionSegmentPruner needs roughly 2~5 ms to finish iterating the whole Segment comparison.
Our changes here is to
SegmentBrokerViewclass (becauseSegmentMetadatais taken already) to represent all the Segments in Broker, and this class will be passed around in the Broker Selector/Pruner pipelineSegmentBrokerViewclass will have all the needed information for Pruners stored in it, so there is no need to do lookup operations when doing pruning; it saves extra String copying when looping through all segmentsThe Integration test shows the improvement from 150~200 pruning/second to 1200+ pruning/second. That is roughly 6-8x improvement.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notesand complete the section on Release Notes)Release Notes
Documentation