[Gluten-986] Remove unimportant metrics in Agg and Join operator and input metrics for all operator. #998

Yohahaha · 2023-02-22T03:55:45Z

What changes were proposed in this pull request?

Agg should not care about preProject, postProject and extractionNeeded's metrics, these nodes does not modify row numbers, maybe change total bytes. We should keep it clearly and remove these.

Operator's input metrics and output metrics are equals, we just need output metrics and align with Spark.

https://github.com/facebookincubator/velox/blob/db4ea520893c30ac9197e75b88a3b85bfc801758/velox/exec/Driver.cpp#L346-L373

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

github-actions · 2023-02-22T03:56:03Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[Gluten-${ISSUES_ID}] ${detailed message}

See also:

Other pull requests

github-actions · 2023-02-22T03:56:25Z

#986

FelixYBW · 2023-02-22T09:35:52Z

Some of the metrics are used in our analysis tool. Let me check later.

FelixYBW · 2023-02-25T02:41:05Z

gluten-core/src/main/scala/io/glutenproject/execution/CoalesceBatchesExec.scala

-    "concatTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_coalescebatch"),
+    "numInputBatches" -> SQLMetrics.createMetric(sparkContext, "number of input batches"),
+    "numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "number of output batches"),
+    "collectTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "time to collect batch"),


Our script needs totaltime as prefix. Let's change to "totaltime to collect batch"

FelixYBW · 2023-02-25T02:41:22Z

gluten-core/src/main/scala/io/glutenproject/execution/CoalesceBatchesExec.scala

+    "numInputBatches" -> SQLMetrics.createMetric(sparkContext, "number of input batches"),
+    "numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "number of output batches"),
+    "collectTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "time to collect batch"),
+    "concatTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "time to coalesce batch"),


change to totaltime to coalesce batch

FelixYBW · 2023-02-25T02:42:20Z

gluten-core/src/main/scala/io/glutenproject/execution/GlutenRowToColumnarExec.scala

-    "numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "output_batches"),
-    "processTime" -> SQLMetrics.createTimingMetric(sparkContext, "totaltime_rowtoarrowcolumnar")
+    "numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "number of output batches"),
+    "convertTime" -> SQLMetrics.createTimingMetric(sparkContext, "time to convert")


change to "totaltime to convert"

FelixYBW · 2023-02-25T02:43:21Z

gluten-core/src/main/scala/io/glutenproject/execution/HashAggregateExecBaseTransformer.scala

    "aggWallNanos" -> SQLMetrics.createNanoTimingMetric(
-      sparkContext, "totaltime_aggregation"),
+      sparkContext, "wall time"),


change to "totaltime of aggregation" We need to breakdown the operator time by its name

FelixYBW · 2023-02-25T02:44:32Z

gluten-core/src/main/scala/org/apache/spark/sql/execution/ColumnarShuffleExchangeExec.scala

@@ -180,11 +180,11 @@ case class ColumnarShuffleExchangeAdaptor(override val outputPartitioning: Parti
  override lazy val metrics: Map[String, SQLMetric] = Map(
    "dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size"),
    "bytesSpilled" -> SQLMetrics.createSizeMetric(sparkContext, "shuffle bytes spilled"),
-    "computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_computepid"),
-    "splitTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_split"),
+    "computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compute pid time"),


Since we moved the computepid to Velox pipeline, It's not used now. We can delete it.

FelixYBW · 2023-02-25T02:44:46Z

gluten-core/src/main/scala/org/apache/spark/sql/execution/ColumnarShuffleExchangeExec.scala

-    "computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_computepid"),
-    "splitTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_split"),
+    "computePidTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compute pid time"),
+    "splitTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "split time"),


"totaltime to split"

FelixYBW · 2023-02-25T02:45:18Z

gluten-core/src/main/scala/org/apache/spark/sql/execution/ColumnarShuffleExchangeExec.scala

    "spillTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "shuffle spill time"),
-    "compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_compress"),
-    "prepareTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_prepare"),
+    "compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compress time"),


totaltime to compress

FelixYBW · 2023-02-25T02:45:30Z

gluten-core/src/main/scala/org/apache/spark/sql/execution/ColumnarShuffleExchangeExec.scala

-    "compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_compress"),
-    "prepareTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_prepare"),
+    "compressTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "compress time"),
+    "prepareTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "prepare time"),


totaltime to prepare

FelixYBW · 2023-02-25T03:03:11Z

gluten-core/src/main/scala/io/glutenproject/execution/HashAggregateExecBaseTransformer.scala

-    "preProjectionCount" -> SQLMetrics.createNanoTimingMetric(
-      sparkContext, "preProjection cpu wall time count"),
-    "preProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric(
-      sparkContext, "totaltime_preProjection"),


Let's keep this one, may change to "totaltime of preProjection"

FelixYBW · 2023-02-25T03:04:03Z

gluten-core/src/main/scala/io/glutenproject/execution/HashAggregateExecBaseTransformer.scala

-    "postProjectionCount" -> SQLMetrics.createNanoTimingMetric(
-      sparkContext, "postProjection cpu wall time count"),
-    "postProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric(
-      sparkContext, "totaltime_postProjection"),


let's keep this one, change to "totaltime of postProjection"

FelixYBW · 2023-02-25T09:01:21Z