New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not serialize metrics in each Operator #10473
Do not serialize metrics in each Operator #10473
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10473 +/- ##
============================================
+ Coverage 69.97% 70.17% +0.20%
- Complexity 6118 6254 +136
============================================
Files 2092 2112 +20
Lines 112712 113050 +338
Branches 17143 17007 -136
============================================
+ Hits 78874 79338 +464
+ Misses 28260 28167 -93
+ Partials 5578 5545 -33
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 210 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@walterddr one query: Are you fine if I simply create OpChainStats inside OpChainExecutionContext and then use it everywhere instead of using The only downside I see is context should be immutable while stats are mutable. |
1351cd8
to
3f16d70
Compare
TransferableBlock eosBlockWithStats = TransferableBlockUtils.getEndOfStreamTransferableBlock( | ||
OperatorUtils.getMetadataFromOperatorStats(_opChainStats.getOperatorStatsMap())); | ||
_exchange.send(eosBlockWithStats); | ||
return eosBlockWithStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to return this instead of just transferableBlock? we don't have any further processing of the stats right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that it gets serialized and sent to next OpChain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_exchange.send(eosBlockWithStats)
sends to the next opChainreturn eosBlockWithStats
is returning to the opChainScheduler.
do we process anything related to stats on opChainScheduler? if not returning the original transferableBlock seems more reasonable
...-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultiStageOperator.java
Outdated
Show resolved
Hide resolved
@@ -39,21 +35,18 @@ public abstract class MultiStageOperator implements Operator<TransferableBlock>, | |||
|
|||
// TODO: Move to OperatorContext class. | |||
protected final OperatorStats _operatorStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this as final reference? can we also put this in OpChainStats and access via OpChainStats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we still need it as OperatorStats contains stats only for a single operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, there will be 0 or 1 OperatorStats object corresponding to this operator in the OpChainStats map. if so why do we keep strong reference to both objects in 2 classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. So now I have made the change to use directly via the map in OpChainStats rather than creating and storing a strong reference in the base class.
if (_opChainStats != null && !block.getResultMetadata().isEmpty()) { | ||
for (Map.Entry<String, OperatorStats> entry : block.getResultMetadata().entrySet()) { | ||
_opChainStats.getOperatorStatsMap().compute(entry.getKey(), (_key, _value) -> entry.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this is deserializing the metadata from received metadata from the previous stage and putting them into the current ones. if this is true. when there's no trace enabled, skip this step
- where is the opChain stats being encoded? or it is also in the map but when trace is enabled there will be more keys in the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracing thing is being taken care of in the next PR. Need to rebase that PR with the changes in this one.
8b26fcd
to
68a3787
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following up my previous comments. also could you fix conflict and CI failures? thank you
TransferableBlock eosBlockWithStats = TransferableBlockUtils.getEndOfStreamTransferableBlock( | ||
OperatorUtils.getMetadataFromOperatorStats(_opChainStats.getOperatorStatsMap())); | ||
_exchange.send(eosBlockWithStats); | ||
return eosBlockWithStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_exchange.send(eosBlockWithStats)
sends to the next opChainreturn eosBlockWithStats
is returning to the opChainScheduler.
do we process anything related to stats on opChainScheduler? if not returning the original transferableBlock seems more reasonable
@@ -39,21 +35,18 @@ public abstract class MultiStageOperator implements Operator<TransferableBlock>, | |||
|
|||
// TODO: Move to OperatorContext class. | |||
protected final OperatorStats _operatorStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, there will be 0 or 1 OperatorStats object corresponding to this operator in the OpChainStats map. if so why do we keep strong reference to both objects in 2 classes?
…the send operator
1838382
to
919eafe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
...query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/MailboxSendOperator.java
Outdated
Show resolved
Hide resolved
* WIP: Do not serialize metrics * No need to pass stats between operator. Only collected in the end at the send operator * Use opchain stats to record operatorStats * No need to serialie metrics in receive operator * Remove attachStats method and create stats object inside context itself * Make stats thread safe * Add test for opchain stats * Ensure SendOperator stats are populated before serializing stats * Fix variable scope * Use operator stats map directly from opchain stats * unify return statements outside inner for loop in MailboxSendOperator --------- Co-authored-by: Kartik Khare <kharekartik@Kartiks-MacBook-Pro.local>
There's no need to serialize the metrics unless they are sent or received. We can simply use the OperatorStats objects for rest of the operators.