Skip to content

Refactor usage of WorkerMetadata and StageMetadata#10756

Closed
xiangfu0 wants to merge 2 commits intoapache:masterfrom
xiangfu0:separate-stagemetadata
Closed

Refactor usage of WorkerMetadata and StageMetadata#10756
xiangfu0 wants to merge 2 commits intoapache:masterfrom
xiangfu0:separate-stagemetadata

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented May 11, 2023

  • Refactor StageMetadata from pinot-query-planner to pinot-query-runtime package
  • Only set necessary WorkerMetadata for OpChainExecutionContext
  • Move WorkerMetadata out from StageMetadata to DistributedStagePlan

@xiangfu0 xiangfu0 requested a review from walterddr May 11, 2023 09:34
@codecov-commenter
Copy link

codecov-commenter commented May 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.34%. Comparing base (ca37a1e) to head (2a0b6c6).
Report is 3183 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #10756      +/-   ##
============================================
+ Coverage     68.48%   70.34%   +1.86%     
- Complexity     6463     6475      +12     
============================================
  Files          2152     2152              
  Lines        115789   115780       -9     
  Branches      17500    17498       -2     
============================================
+ Hits          79295    81445    +2150     
+ Misses        30885    28661    -2224     
- Partials       5609     5674      +65     
Flag Coverage Δ
integration1 24.14% <0.00%> (+0.10%) ⬆️
integration2 23.75% <0.00%> (?)
unittests1 67.88% <100.00%> (-0.01%) ⬇️
unittests2 13.70% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the separate-stagemetadata branch from aacac59 to a58d3c7 Compare May 11, 2023 10:25
@xiangfu0 xiangfu0 changed the title Refactor StageMetadata to pinot-query-runtime package Refactor usage of WorkerMetadata and StageMetadata May 11, 2023
@walterddr walterddr added the multi-stage Related to the multi-stage query engine label May 11, 2023
@xiangfu0 xiangfu0 force-pushed the separate-stagemetadata branch from a58d3c7 to 36eee3a Compare May 14, 2023 21:28
@xiangfu0 xiangfu0 force-pushed the separate-stagemetadata branch from 36eee3a to 2a0b6c6 Compare May 15, 2023 21:42
return new DistributedStagePlan(stageId, serverAddress,
dispatchableSubPlan.getQueryStageList().get(stageId).getPlanFragment().getFragmentRoot(),
dispatchableSubPlan.getQueryStageList().get(stageId).toStageMetadata());
StageMetadata.from(dispatchableSubPlan.getQueryStageList().get(stageId)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the right way to solve this is do not construct DistributedStagePlan on broker. instead construct the proto object directly and when deserialize, put them into multiple DistributedStagePlan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. intead of doing this in Dispatcher

            _executorService.submit(() -> client.submit(Worker.QueryRequest.newBuilder().setStagePlan(
                        QueryPlanSerDeUtils.serialize(
                            constructDistributedStagePlan(dispatchableSubPlan, finalStageId, virtualServerAddress)))
                    .putMetadata(QueryConfig.KEY_OF_BROKER_REQUEST_ID, String.valueOf(requestId))
                    .putMetadata(QueryConfig.KEY_OF_BROKER_REQUEST_TIMEOUT_MS, String.valueOf(timeoutMs))
                    .putAllMetadata(queryOptions).build(), finalStageId, queryServerInstance, deadline,
                dispatchCallbacks::offer));
  1. directly serialized the dispatchableSubPlan to proto format:
QueryPlanSerDeUtils.serialize(dispatchableSubPlan, finalStageId, virtualServerAddress)
  1. when deserialize, deserialize it into multiple distributed plan
List<DistributedStagePlan> QueryPlanSerDeUtils.deserialize(Worker.StagerPlan stagePlan)

string virtualAddress = 2;
StageNode stageRoot = 3;
StageMetadata stageMetadata = 4;
WorkerMetadata workerMetadata = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not be the right abstraction for proto.
we want to send a single proto request from broker to server to launch all workers, not sending 1 grpc request per worker which will be inefficient. let's still keep the wire protocol as a list.

@walterddr
Copy link
Contributor

this can be closed as covered by #10791

@walterddr walterddr closed this Jun 2, 2023
@xiangfu0 xiangfu0 deleted the separate-stagemetadata branch July 11, 2023 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants