Skip to content

Fix intermediate stage routing for multi-stage engine#11393

Merged
Jackie-Jiang merged 2 commits intoapache:masterfrom
Jackie-Jiang:fix_multi_stage_routing
Aug 20, 2023
Merged

Fix intermediate stage routing for multi-stage engine#11393
Jackie-Jiang merged 2 commits intoapache:masterfrom
Jackie-Jiang:fix_multi_stage_routing

Conversation

@Jackie-Jiang
Copy link
Contributor

Currently we use server tag to determine the worker for intermediate stage, and it has the following issues:

  • Maintaining the table to tag mapping in BrokerRoutingManager is super expensive. It reads all table configs for each instance config change (this can result in millions of read when rolling restart all servers for a large cluster)
  • It has a bug that only change from new enabled servers can be picked up. Changes to existing servers cannot be picked up
  • When instance assignment config is used, server tag is ignored, so the mapping will be wrong

To address the above issues, instead of using tag to determine servers, directly picking the serving servers from the InstanceSelector.

@kishoreg
Copy link
Member

Are we losing any functionality/feature? What about brokers as intermediate nodes.. also, how can we mark some nodes as shuffle servers in future for expensive queries

@codecov-commenter
Copy link

codecov-commenter commented Aug 20, 2023

Codecov Report

Merging #11393 (373b649) into master (1783f2a) will increase coverage by 0.00%.
Report is 2 commits behind head on master.
The diff coverage is 71.73%.

@@            Coverage Diff            @@
##             master   #11393   +/-   ##
=========================================
  Coverage     61.44%   61.44%           
+ Complexity     6515     6514    -1     
=========================================
  Files          2234     2234           
  Lines        120174   120154   -20     
  Branches      18240    18238    -2     
=========================================
- Hits          73838    73829    -9     
+ Misses        40911    40894   -17     
- Partials       5425     5431    +6     
Flag Coverage Δ
integration1 0.00% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 61.41% <71.73%> (+0.02%) ⬆️
java-17 61.29% <71.73%> (+0.01%) ⬆️
java-20 61.29% <71.73%> (-0.02%) ⬇️
temurin 61.44% <71.73%> (+<0.01%) ⬆️
unittests1 66.95% <72.00%> (-0.02%) ⬇️
unittests2 14.57% <32.60%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...ker/routing/instanceselector/InstanceSelector.java 100.00% <ø> (ø)
...che/pinot/broker/routing/BrokerRoutingManager.java 58.74% <20.00%> (-0.45%) ⬇️
...broker/routing/instanceselector/SegmentStates.java 87.50% <50.00%> (-12.50%) ⬇️
.../org/apache/pinot/query/routing/WorkerManager.java 68.82% <72.00%> (-0.63%) ⬇️
...routing/instanceselector/BaseInstanceSelector.java 92.97% <92.85%> (-0.47%) ⬇️

... and 16 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang
Copy link
Contributor Author

Jackie-Jiang commented Aug 20, 2023

Are we losing any functionality/feature? What about brokers as intermediate nodes.. also, how can we mark some nodes as shuffle servers in future for expensive queries

@kishoreg We are not losing any functionality, and this should be the correct way to handle it. Without this fix, it can only support tables with the basic balanced assignment.
In the future if we want to introduce dedicated shuffle servers, we can add a new tag for it. Initially I tried to solve the problem by maintaining a tag to servers map (no table config read), but realize it won't work for more advanced assignment, e.g. one table referencing the instance partitions from another table (this is a common setup to colocate tables). That won't be big change, so we can do it that way when adding the shuffle server feature.

@Jackie-Jiang Jackie-Jiang added the multi-stage Related to the multi-stage query engine label Aug 20, 2023
}
}
}
serverInstances = new ArrayList<>(servers.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actual serverInstances size might be smaller than servers.size()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but this should give a very good estimation. It doesn't need to be exact

_segmentStates = new SegmentStates(instanceCandidatesMap, servingInstances, unavailableSegments);
}

private List<SegmentInstanceCandidate> getEnabledCandidates(List<SegmentInstanceCandidate> candidates,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments since this method is doing more than the method name or fix the method name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Jackie-Jiang Jackie-Jiang merged commit 2c02389 into apache:master Aug 20, 2023
@Jackie-Jiang Jackie-Jiang deleted the fix_multi_stage_routing branch August 20, 2023 18:47
@walterddr
Copy link
Contributor

this is a thorough fix over #11386

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix enhancement multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants