Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31253][SQL][FOLLOW-UP] Improve the partition data size metrics in CustomShuffleReaderExec #28175

Closed
wants to merge 11 commits into from

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Apr 10, 2020

What changes were proposed in this pull request?

Currently the partition data size metrics contain three entries (min/max/avg) in Spark UI, which is not user friendly. This PR lets the metrics with min/max/avg in one entry by calling SQLMetrics.postDriverMetricUpdates multiple times.
Before this PR, the spark UI is shown in the following:
image

After this PR. the spark UI is shown in the following:
image

Why are the changes needed?

Improving UI

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing ut

@JkSelf
Copy link
Contributor Author

JkSelf commented Apr 10, 2020

@cloud-fan @maryannxue

@HyukjinKwon HyukjinKwon changed the title [SPARK-31253][SQL][followup] Improve the partition data size metrics in CustomShuffleReaderExec [SPARK-31253][SQL][FOLLOW-UO] Improve the partition data size metrics in CustomShuffleReaderExec Apr 10, 2020
@HyukjinKwon HyukjinKwon changed the title [SPARK-31253][SQL][FOLLOW-UO] Improve the partition data size metrics in CustomShuffleReaderExec [SPARK-31253][SQL][FOLLOW-UP] Improve the partition data size metrics in CustomShuffleReaderExec Apr 10, 2020
metrics("partitionDataSize").set(dataSize)
SQLMetrics.postDriverMetricUpdates(
sparkContext, executionId,
metrics.filter(_._1 == "partitionDataSize").values.toSeq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we look up the partitionDataSize SQLMetric at the beginning of this method? then here we can simply write Seq(metric).

@cloud-fan
Copy link
Contributor

can you put the before/after screenshots?

@SparkQA
Copy link

SparkQA commented Apr 10, 2020

Test build #121054 has finished for PR 28175 at commit 806a143.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private def sendPartitionDataSizeMetrics(
executionId: String,
partitionMetrics: SQLMetric): Unit = {
val mapStats = shuffleStage.get.mapStats.get.bytesByPartitionId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow the previous code: https://github.com/apache/spark/pull/28175/files#diff-a42cafdbb5870e28c4e03df50ffc44f6L111

If shuffleStage.get.mapStats.isEmpty, we send the metric value as 0 only once.

@SparkQA
Copy link

SparkQA commented Apr 10, 2020

Test build #121076 has finished for PR 28175 at commit 4aabfaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2020

Test build #121071 has finished for PR 28175 at commit bbd6324.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

val mapStats = shuffleStage.get.mapStats
if (mapStats.isEmpty) {
metrics("partitionDataSize").set(0)
SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, Seq{partitionMetrics})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seq{partitionMetrics} ?

should be Seq(partitionMetrics)

val dataSize = startReducerIndex.until(endReducerIndex).map(
mapStats.get.bytesByPartitionId(_)).sum
metrics("partitionDataSize").set(dataSize)
SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, Seq{partitionMetrics})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

partitionMetrics: SQLMetric): Unit = {
val mapStats = shuffleStage.get.mapStats
if (mapStats.isEmpty) {
metrics("partitionDataSize").set(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionMetrics.set(0)

sum += dataSize
case p: PartialReducerPartitionSpec =>
metrics("partitionDataSize").set(p.dataSize)
SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, Seq{partitionMetrics})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

case CoalescedPartitionSpec(startReducerIndex, endReducerIndex) =>
val dataSize = startReducerIndex.until(endReducerIndex).map(
mapStats.get.bytesByPartitionId(_)).sum
metrics("partitionDataSize").set(dataSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

case p => throw new IllegalStateException("unexpected " + p)
}
// Set sum value to "partitionDataSize" metric.
metrics("partitionDataSize").set(sum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

metrics.filter(_._1 != "partitionDataSize").values.toSeq)

if(!isLocalReader && shuffleStage.get.mapStats.isDefined) {
sendPartitionDataSizeMetrics(executionId, metrics.get("partitionDataSize").get)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do val partitionMetrics = metrics("partitionDataSize") inside the method instead of passing as a parameter?

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121168 has finished for PR 28175 at commit e6c50df.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121173 has finished for PR 28175 at commit ae1824a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -128,6 +104,34 @@ case class CustomShuffleReaderExec private(
Map("numSkewedPartitions" -> metrics)
}

private def sendPartitionDataSizeMetrics(
executionId: String): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge this to the previous line now.

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121184 has finished for PR 28175 at commit fda6846.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JkSelf
Copy link
Contributor Author

JkSelf commented Apr 13, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121196 has finished for PR 28175 at commit fda6846.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121214 has finished for PR 28175 at commit fda6846.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maryannxue
Copy link
Contributor

After offline discussions with @cloud-fan, we agree that it'd be more efficient to add a new metric type other than post metrics for all the partitions. With the new type, we can have max, min, avg, median (or anything you want).

@JkSelf
Copy link
Contributor Author

JkSelf commented Apr 14, 2020

@maryannxue Instead of creating new metric type, we can add new method postDriverMetricsUpdatedByValue to pass all the partition data size only once to reduce the overhead.

@@ -222,6 +222,15 @@ object SQLMetrics {
}
}

def postDriverMetricsUpdatedByValue(
sc: SparkContext, executionId: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one parameter per line

@SparkQA
Copy link

SparkQA commented Apr 14, 2020

Test build #121264 has finished for PR 28175 at commit efdd9b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 14, 2020

Test build #121270 has finished for PR 28175 at commit 3ee19f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val id = partitionMetrics.id
val accumUpdates = sizes.map(value => (id, value))
SQLMetrics.postDriverMetricsUpdatedByValue(sparkContext, executionId, accumUpdates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we send all metrics together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. And already updated.

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121347 has finished for PR 28175 at commit 72d46eb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121345 has finished for PR 28175 at commit 6c70108.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121351 has finished for PR 28175 at commit 2e2dfb8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121362 has finished for PR 28175 at commit 2e2dfb8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JkSelf
Copy link
Contributor Author

JkSelf commented Apr 17, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Apr 17, 2020

Test build #121385 has finished for PR 28175 at commit 2e2dfb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in d136b72 Apr 17, 2020
jiangxb1987 pushed a commit that referenced this pull request Apr 17, 2020
…etrics of AQE shuffle

### What changes were proposed in this pull request?

A followup of #28175:
1. use mutable collection to store the driver metrics
2. don't send size metrics if there is no map stats, as UI will display size as 0 if there is no data
3. calculate partition data size separately, to make the code easier to read.

### Why are the changes needed?

code simplification

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #28240 from cloud-fan/refactor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants