Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35529][SQL] Add fallback metrics for hash aggregate #32671

Closed
wants to merge 2 commits into from

Conversation

c21
Copy link
Contributor

@c21 c21 commented May 26, 2021

What changes were proposed in this pull request?

Add the metrics to record how many tasks fallback to sort-based aggregation for hash aggregation. This will help developers and users to debug and optimize query. Object hash aggregation has similar metrics already.

Why are the changes needed?

Help developers and users to debug and optimize query with hash aggregation.

Does this PR introduce any user-facing change?

Yes, the added metrics will show up in Spark web UI.
Example:
Screen Shot 2021-05-26 at 12 17 08 AM

How was this patch tested?

Changed unit test in SQLMetricsSuite.scala.

@github-actions github-actions bot added the SQL label May 26, 2021
@c21
Copy link
Contributor Author

c21 commented May 26, 2021

@cloud-fan could you help take a look when you have time? Thanks.

SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list iters"))
SQLMetrics.createAverageMetric(sparkContext, "avg hash probe bucket list iters"),
"numTasksFallBacked" -> SQLMetrics.createMetric(sparkContext,
"number of tasks fall-backed to sort-based aggregation"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this name the same with object hash agg? it is super long..

probably "number of sort fallback tasks"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same as in #31340. Let me change them together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks.

@SparkQA
Copy link

SparkQA commented May 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43492/

@SparkQA
Copy link

SparkQA commented May 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43492/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in dd67777 May 26, 2021
@SparkQA
Copy link

SparkQA commented May 26, 2021

Test build #138976 has finished for PR 32671 at commit 0456d2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2021

Test build #138972 has finished for PR 32671 at commit 638cbf5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@c21
Copy link
Contributor Author

c21 commented May 26, 2021

Thank you @cloud-fan for review!

@c21 c21 deleted the agg-metrics branch May 26, 2021 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants