Skip to content

[WIP] Refactor broker for PartitionSegmentPruner 10x perf if we have 30k+ segments#7377

Closed
dongxiaoman wants to merge 13 commits intoapache:masterfrom
dongxiaoman:xd-broker-refactor
Closed

[WIP] Refactor broker for PartitionSegmentPruner 10x perf if we have 30k+ segments#7377
dongxiaoman wants to merge 13 commits intoapache:masterfrom
dongxiaoman:xd-broker-refactor

Conversation

@dongxiaoman
Copy link
Contributor

@dongxiaoman dongxiaoman commented Aug 29, 2021

Description

Please Do Not Review yet
This is a Work In Progress. Wish to create the PR to go through the testing before asking for Review

When brokers are doing pruning and selection, it passes SegmentName String objects around as a handle to represent Segments. This is OK when we have roughly thousands of segments.
But when we have 30k+ segments, each loop through the Segment name will create 30k+ Strings because String objects are Copy On Assignment. The CPU cost is actually high even if we have O(n) loop.
In our written small performance test, PartitionSegmentPruner needs roughly 2~5 ms to finish iterating the whole Segment comparison.

Our changes here is to

  1. Create a SegmentBrokerView class (because SegmentMetadata is taken already) to represent all the Segments in Broker, and this class will be passed around in the Broker Selector/Pruner pipeline
  2. The SegmentBrokerView class will have all the needed information for Pruners stored in it, so there is no need to do lookup operations when doing pruning; it saves extra String copying when looping through all segments
  3. The RoutingManager will be in charge of maintaining the list of all SegmentBrokerViews, and filling all information into that class
  4. Refactor PartitionSegmentPruner so it will compute "ValidPartitions" first and then lookup quickly.

The Integration test shows the improvement from 150~200 pruning/second to 1200+ pruning/second. That is roughly 6-8x improvement.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2021

Codecov Report

Merging #7377 (aff0d5b) into master (8fab3be) will decrease coverage by 2.16%.
The diff coverage is 21.05%.

❗ Current head aff0d5b differs from pull request most recent head f9151bd. Consider uploading reports for the commit f9151bd to get more accurate results
Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7377      +/-   ##
============================================
- Coverage     71.81%   69.65%   -2.17%     
+ Complexity     3349     3256      -93     
============================================
  Files          1517     1124     -393     
  Lines         75007    52954   -22053     
  Branches      10914     7971    -2943     
============================================
- Hits          53867    36884   -16983     
+ Misses        17513    13452    -4061     
+ Partials       3627     2618    -1009     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 69.65% <21.05%> (-0.05%) ⬇️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ment/spi/partition/ByteArrayPartitionFunction.java 37.50% <0.00%> (-37.50%) ⬇️
...gment/spi/partition/HashCodePartitionFunction.java 40.00% <0.00%> (-45.72%) ⬇️
...segment/spi/partition/ModuloPartitionFunction.java 45.45% <0.00%> (-25.98%) ⬇️
...segment/spi/partition/MurmurPartitionFunction.java 80.00% <33.33%> (-13.55%) ⬇️
...t/segment/spi/partition/PartitionFunctionType.java 100.00% <100.00%> (ø)
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...pache/pinot/common/utils/grpc/GrpcQueryClient.java 0.00% <0.00%> (-100.00%) ⬇️
... and 578 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8fab3be...f9151bd. Read the comment docs.

@mayankshriv
Copy link
Contributor

mayankshriv commented Aug 30, 2021

Thanks for taking this up, this would be really useful for large tables. Just for your reference, here's a link to the PEP guidelines, you may want to start by creating an issue and include your proposal/idea there with details.

@ashishkf
Copy link

Any updates on this?

@dongxiaoman
Copy link
Contributor Author

Any updates on this?

I don't think this is valid anymore. Will try later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants