-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gluten-986] Remove unimportant metrics in Agg and Join operator and input metrics for all operator. #998
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Some of the metrics are used in our analysis tool. Let me check later. |
"concatTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_coalescebatch"), | ||
"numInputBatches" -> SQLMetrics.createMetric(sparkContext, "number of input batches"), | ||
"numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "number of output batches"), | ||
"collectTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "time to collect batch"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our script needs totaltime as prefix. Let's change to "totaltime to collect batch"
"numInputBatches" -> SQLMetrics.createMetric(sparkContext, "number of input batches"), | ||
"numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "number of output batches"), | ||
"collectTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "time to collect batch"), | ||
"concatTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "time to coalesce batch"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to totaltime to coalesce batch
"numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "output_batches"), | ||
"processTime" -> SQLMetrics.createTimingMetric(sparkContext, "totaltime_rowtoarrowcolumnar") | ||
"numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "number of output batches"), | ||
"convertTime" -> SQLMetrics.createTimingMetric(sparkContext, "time to convert") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to "totaltime to convert"
"aggWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_aggregation"), | ||
sparkContext, "wall time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to "totaltime of aggregation" We need to breakdown the operator time by its name
@@ -180,11 +180,11 @@ case class ColumnarShuffleExchangeAdaptor(override val outputPartitioning: Parti | |||
override lazy val metrics: Map[String, SQLMetric] = Map( | |||
"dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size"), | |||
"bytesSpilled" -> SQLMetrics.createSizeMetric(sparkContext, "shuffle bytes spilled"), | |||
"computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_computepid"), | |||
"splitTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_split"), | |||
"computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compute pid time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we moved the computepid to Velox pipeline, It's not used now. We can delete it.
"computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_computepid"), | ||
"splitTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_split"), | ||
"computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compute pid time"), | ||
"splitTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "split time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"totaltime to split"
"spillTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "shuffle spill time"), | ||
"compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_compress"), | ||
"prepareTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_prepare"), | ||
"compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compress time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totaltime to compress
"compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_compress"), | ||
"prepareTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_prepare"), | ||
"compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compress time"), | ||
"prepareTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "prepare time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totaltime to prepare
"preProjectionCount" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "preProjection cpu wall time count"), | ||
"preProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_preProjection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this one, may change to "totaltime of preProjection"
"postProjectionCount" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "postProjection cpu wall time count"), | ||
"postProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_postProjection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep this one, change to "totaltime of postProjection"
"buildCount" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "build side cpu wall time count"), | ||
"buildWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_build_input"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this. It actually count the arrow2Velox conversion
"streamCount" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "stream side cpu wall time count"), | ||
"streamWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_stream_input"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this, change to totaltime stream input
"streamWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_stream_input"), | ||
"streamVeloxToArrow" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_velox_to_arrow_converter"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it, it only count part of the conversion
"streamPreProjectionCount" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "stream preProjection cpu wall time count"), | ||
"streamPreProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_stream_preProjection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this
"buildPreProjectionCount" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "build preProjection cpu wall time count"), | ||
"buildPreProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_build_preProjection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it
"hashBuildWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_hashbuild"), | ||
sparkContext, "hash build wall time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to totaltime of hash build
"hashProbeWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_hashprobe"), | ||
sparkContext, "hash probe wall time"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totaltime of probe
"postProjectionCount" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "postProjection cpu wall time count"), | ||
"postProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric( | ||
sparkContext, "totaltime_postProjection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it, use totaltime of postProjection
sparkContext, "number of build preProjection memory allocations"), | ||
|
||
"hashBuildInputRows" -> SQLMetrics.createMetric( | ||
sparkContext, "number of hash build input rows"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this. Then we needn't to check the previous operator to get the input rows
@@ -248,26 +136,16 @@ trait HashJoinLikeExecTransformer | |||
"hashBuildSpilledFiles" -> SQLMetrics.createMetric( | |||
sparkContext, "total spilled files of hash build"), | |||
|
|||
"hashProbeInputRows" -> SQLMetrics.createMetric( | |||
sparkContext, "number of hash probe input rows"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same, let keep this one also
"hashProbeRawInputRows" -> SQLMetrics.createMetric( | ||
sparkContext, "number of hash probe raw input rows"), | ||
"hashProbeRawInputBytes" -> SQLMetrics.createSizeMetric( | ||
sparkContext, "number of hash probe raw input bytes"), | ||
"hashProbeOutputRows" -> SQLMetrics.createMetric( | ||
sparkContext, "number of hash probe output rows"), | ||
"hashProbeOutputVectors" -> SQLMetrics.createMetric( | ||
sparkContext, "number of hash probe output vectors"), | ||
"hashProbeOutputBytes" -> SQLMetrics.createSizeMetric( | ||
sparkContext, "number of hash probe output bytes"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks it's broken. let's remove it.
number of hash probe output bytes | -39,682,000,181,513,076,736 (495,594,061,042,182,848.0) |
---|
"postProjectionRawInputBytes" -> SQLMetrics.createSizeMetric( | ||
sparkContext, "number of postProjection raw input bytes"), | ||
"postProjectionOutputRows" -> SQLMetrics.createMetric( | ||
sparkContext, "number of postProjection output rows"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep it. It's the final output rows from hash join
sparkContext, "number of postProjection raw input bytes"), | ||
"postProjectionOutputRows" -> SQLMetrics.createMetric( | ||
sparkContext, "number of postProjection output rows"), | ||
"postProjectionOutputVectors" -> SQLMetrics.createMetric( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this one
we can remove the other metrics. Thank you for your clean up, @Yohahaha |
e4b579e
to
77b20a3
Compare
@waitinfuture OK to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…input metrics for all operator. (apache#998) clean up the metrics
What changes were proposed in this pull request?
Agg should not care about preProject, postProject and extractionNeeded's metrics, these nodes does not modify row numbers, maybe change total bytes. We should keep it clearly and remove these.
Operator's input metrics and output metrics are equals, we just need output metrics and align with Spark.
https://github.com/facebookincubator/velox/blob/db4ea520893c30ac9197e75b88a3b85bfc801758/velox/exec/Driver.cpp#L346-L373
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)