[SPARK-38354][SQL] Add hash probes metric for shuffled hash join#35686
[SPARK-38354][SQL] Add hash probes metric for shuffled hash join#35686c21 wants to merge 1 commit intoapache:masterfrom
Conversation
|
cc @cloud-fan could you help take a look when you have time? Thanks. |
| */ | ||
| private def updateIndex(key: Long, address: Long): Unit = { | ||
| numKeyLookups += 1 | ||
| numProbes += 1 |
There was a problem hiding this comment.
hmm, do we need to track the probe time when building the hash relation?
There was a problem hiding this comment.
@cloud-fan - This is the same behavior for UnsafeHashedRelation, while when it builds hash relation, it updates the lookup/probe metrics as well. I guess it would be good to keep consistent between UnsafeHashedRelation and LongHashedRelation here?
|
thanks, merging to master! |
|
Thank you @cloud-fan for review! |
### What changes were proposed in this pull request? For hash aggregate, there's a SQL metrics to track number of hash probes per looked-up key. It would be better to add a similar metrics for shuffled hash join as well, to get some idea of hash probing performance. Also renamed the existing SQL metrics (and related methods names) in hash aggregate, from `avg hash probe bucket list iters` to `avg hash probes per key`, as the original name is quite obscured to understand. ### Why are the changes needed? To show up in Spark web UI (and allow metrics collection) for shuffled hash join probing performance. When the metrics is more closer to 1.0, the probing performance is better. ### Does this PR introduce _any_ user-facing change? Yes, the added SQL metrics. Will attach screenshot later. ### How was this patch tested? The modified unit test in `SQLMetricsSuite.scala`. Closes apache#35686 from c21/probe-metrics. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
@c21 @cloud-fan This has caused a performance regression in our tests where broadcast hash join is 5x slower. I could not figure out why it caused a regression, but it is clear it goes away on reverting the commit. |
|
probably because adding a new metrics in a critical code path has perf overhead. @c21 can you open a PR to revert it? We can have more time to think about how to add this metrics without significant perf overhead in Spark 3.4. |
|
@cloud-fan and @somani - makes sense, let me revert this to unblock release for now. |
…oin" This reverts commit 1584366, as the original PR caused performance regression reported in #35686 (comment) . Closes #36338 from c21/revert-metrics. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…oin" This reverts commit 1584366, as the original PR caused performance regression reported in #35686 (comment) . Closes #36338 from c21/revert-metrics. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 6b5a1f9) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
For hash aggregate, there's a SQL metrics to track number of hash probes per looked-up key. It would be better to add a similar metrics for shuffled hash join as well, to get some idea of hash probing performance. Also renamed the existing SQL metrics (and related methods names) in hash aggregate, from
avg hash probe bucket list iterstoavg hash probes per key, as the original name is quite obscured to understand.Why are the changes needed?
To show up in Spark web UI (and allow metrics collection) for shuffled hash join probing performance. When the metrics is more closer to 1.0, the probing performance is better.
Does this PR introduce any user-facing change?
Yes, the added SQL metrics. Will attach screenshot later.
How was this patch tested?
The modified unit test in
SQLMetricsSuite.scala.