[SPARK-21052][SQL] Add hash map metrics to join #18301

viirya · 2017-06-14T08:57:57Z

What changes were proposed in this pull request?

This adds the average hash map probe metrics to join operator such as BroadcastHashJoin and ShuffledHashJoin.

This PR adds the API to HashedRelation to get average hash map probe.

How was this patch tested?

Related test cases are added.

SparkQA · 2017-06-14T11:08:36Z

Test build #78047 has finished for PR 18301 at commit 14e65e8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2017-06-15T01:38:39Z

Can you put a screenshot of the UI up, for both join and aggregate?

viirya · 2017-06-15T03:25:27Z

The screenshot of BroadcastHashJoin:

viirya · 2017-06-15T03:26:42Z

The screenshot of HashAggregate:

viirya · 2017-06-15T03:27:22Z

The screenshot of ShuffledHashJoin:

hvanhovell · 2017-06-15T03:46:26Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala

@@ -573,8 +586,11 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
  private def updateIndex(key: Long, address: Long): Unit = {
    var pos = firstSlot(key)
    assert(numKeys < array.length / 2)
+    numKeyLookups += 1


you should also add this code to the het and the getValue methods.

Should we? It seems to me that we should only care about the hash collision happened when inserting the data into the hash map.

IMO we should. The number of required probes is the different per key, and is also dependent on the order in which the map was constructed. If you combine this with some skew and missing keys, the number of probes can be much higher than expected.

You could even argue that we do not really care about the number of probes when building the map.

Yeah. OK. I think you're right. We should also care about the collision when searching keys in join operator. I'll update this in next commit.

Ain't you on a beach somewhere?!

Thanks for review even you're on a beach.

rxin · 2017-06-15T06:31:51Z

I'd shorten it to "avg hash probe". Also do we really need min, med, max? Maybe just a single global avg?

viirya · 2017-06-15T06:34:42Z

So just get the global average of all avg hash probe metrics of all tasks? If there's skew, won't we like to see min, med, max?

rxin · 2017-06-15T06:41:53Z

yes but i just feel it is getting very long and verbose ..

rxin · 2017-06-15T06:42:24Z

also the avg probe probably shouldn't be an integer. at least we should show something like 1.9?

viirya · 2017-06-15T06:46:56Z

Because SQLMetric just stores long value. I was using a trick to multiply the avg probe by 1000 to get a long.

When preparing the values for UI, dividing the long with 1000 to get a float back.

So it's a workaround for long-based SQLMetric. But I finally don't use it.

Does it sound too hacky for you?

viirya · 2017-06-15T07:23:06Z

Maybe just min and max? Or med and max?

viirya · 2017-06-15T08:29:56Z

The screenshot of UI showing float numbers instead of integer numbers.

SparkQA · 2017-06-15T09:28:15Z

Test build #78088 has finished for PR 18301 at commit bf4618a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-15T09:37:57Z

Test build #78089 has finished for PR 18301 at commit 438d0e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-15T10:28:11Z

Test build #78091 has finished for PR 18301 at commit 69e8216.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-15T12:54:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala

+  // Because `SQLMetric` only stores long value, in order to store double average metrics, we
+  // multiply the given double with a base integer. When showing the metrics, it will be
+  // divided by the base integer to restore back a double.
+  def setWithDouble(v: Double): Unit = _value = (v * SQLMetrics.baseForAvgMetric).toLong


Not sure if you think this is a bit hacky. To store a float value into SQLMetric, currently I have no better idea to do. Any suggestion welcome.

SparkQA · 2017-06-15T15:41:44Z

Test build #78105 has finished for PR 18301 at commit 6b71956.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-15T15:46:58Z

Test build #78107 has finished for PR 18301 at commit eb979dd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-15T16:32:21Z

Test build #78109 has finished for PR 18301 at commit 59c3e93.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-16T00:48:49Z

cc @cloud-fan @gatorsmile for review.

rxin · 2017-06-29T04:49:25Z

hey i didn't track super closely, but it is pretty important to show at least one more digit, e.g. 1.7, rather than just 2.

viirya · 2017-06-29T04:57:07Z

@rxin I just revert it in previous commits. @cloud-fan should I revert it back?

SparkQA · 2017-06-29T05:24:09Z

Test build #78859 has finished for PR 18301 at commit 9cbd627.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-06-29T05:26:14Z

@viirya ok let's add it back

cloud-fan · 2017-06-29T05:27:33Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

+    //
+    // WholeStageCodegen enabled:
+    // ... ->
+    // WholeStageCodegen(nodeId = 0, Filter(nodeId = 4) -> Project(nodeId = 3) ->


can you format it a little bit? to indicate that we only have a WholeStageCodegen, all other plans are the inner children of WholeStageCodegen.

SparkQA · 2017-06-29T06:56:02Z

Test build #78862 has finished for PR 18301 at commit 9a048f8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-29T07:04:58Z

Test build #78878 has finished for PR 18301 at commit 27cf740.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-29T07:15:32Z

Verified with UI. Now showing float values with one decimal place.

viirya · 2017-06-29T07:15:47Z

retest this please.

cloud-fan · 2017-06-29T07:46:06Z

LGTM, pending jenkins

SparkQA · 2017-06-29T08:30:21Z

Test build #78882 has finished for PR 18301 at commit 27cf740.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-29T08:32:01Z

retest this please.

SparkQA · 2017-06-29T10:43:40Z

Test build #78893 has finished for PR 18301 at commit 27cf740.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-06-29T13:28:59Z

thanks, merging to master!

viirya · 2017-06-29T13:37:27Z

Thanks! @cloud-fan @rxin @hvanhovell @dongjoon-hyun

## What changes were proposed in this pull request? This adds the average hash map probe metrics to join operator such as `BroadcastHashJoin` and `ShuffledHashJoin`. This PR adds the API to `HashedRelation` to get average hash map probe. ## How was this patch tested? Related test cases are added. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#18301 from viirya/SPARK-21052.

gatorsmile · 2017-06-30T01:23:02Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala

+  /**
+   * Returns the average number of probes per key lookup.
+   */
+  def getAverageProbesPerLookup(): Double


def getAverageProbesPerLookup(): Double -> def getAverageProbesPerLookup: Double

If you insist this change, I can do it in a related PR or a follow-up PR.

gatorsmile · 2017-06-30T01:23:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala

@@ -273,6 +279,8 @@ private[joins] class UnsafeHashedRelation(
  override def read(kryo: Kryo, in: Input): Unit = Utils.tryOrIOException {
    read(in.readInt, in.readLong, in.readBytes)
  }
+
+  override def getAverageProbesPerLookup(): Double = binaryMap.getAverageProbesPerLookup()


override def getAverageProbesPerLookup: Double = binaryMap.getAverageProbesPerLookup

gatorsmile · 2017-06-30T02:37:52Z

How about getting rid of numHashCollisions and timeSpentResizingNs in BytesToBytesMap by a follow-up PR?

Also remove the useless () in the same PR

gatorsmile · 2017-06-30T02:49:14Z

A dumb question. Why not reporting numHashCollisions/numKeyLookups?

viirya · 2017-06-30T03:01:59Z

numHashCollisions is increased only when the hash of the key to insert is equal to the hash of an existing key in the map, but their actual keys are not equal.

But we want to show the average number of probes, which includes the cases different key hash, equal key hash (hash collision) before finding a empty slot in the map or verifying it is an existing key in the map.

gatorsmile · 2017-06-30T07:16:05Z

average number of probes will be 1 if no collision, right?

gatorsmile · 2017-06-30T07:17:25Z

Let me rephrase my question. Why users care the average number of probe, if they already know average number of collision?

viirya · 2017-06-30T15:34:10Z

My understanding is, even a key doesn't have collision when inserting, it still possible needs multiple probes to find empty slot. Personally I think number of collision doesn't tell too much information to users. It reflects how well the hash algorithm is designed.

gatorsmile · 2017-06-30T22:14:02Z

@viirya Could you show me the codes that need multiple probes to find an empty slot?

viirya · 2017-07-01T02:32:06Z

spark/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java

Line 473 in fd13255

} else {

In the a probe at L473, if the slot pointed by the hash code is not empty, it's possible that there's hash collision (equal hash codes, different keys), but it's possible too that the slot is occupied by a key with different hash (the if condition at L475 is false). In this case, we continue to look up an empty slot by going forward step at 492, and increase the number of probe.

viirya added 3 commits June 14, 2017 03:03

Add hash map metrics to join.

b320feb

Merge remote-tracking branch 'upstream/master' into SPARK-21052

babef42

Add related test cases.

14e65e8

viirya changed the title ~~[SPARK-21052][SQL][WIP] Add hash map metrics to join~~ [SPARK-21052][SQL] Add hash map metrics to join Jun 14, 2017

hvanhovell reviewed Jun 15, 2017

View reviewed changes

Address comments.

438d0e1

viirya force-pushed the SPARK-21052 branch from bf4618a to 438d0e1 Compare June 15, 2017 07:25

Try to show float values for average hash probe in UI.

69e8216

viirya commented Jun 15, 2017

View reviewed changes

viirya force-pushed the SPARK-21052 branch 2 times, most recently from eb979dd to 59c3e93 Compare June 15, 2017 14:12

Use set method.

9a048f8

cloud-fan reviewed Jun 29, 2017

View reviewed changes

Revert back to show float metrics.

27cf740

asfgit closed this in 18066f2 Jun 29, 2017

gatorsmile reviewed Jun 30, 2017

View reviewed changes

viirya deleted the SPARK-21052 branch December 27, 2023 18:20

[SPARK-21052][SQL] Add hash map metrics to join #18301

[SPARK-21052][SQL] Add hash map metrics to join #18301

Conversation

viirya commented Jun 14, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jun 14, 2017

rxin commented Jun 15, 2017

viirya commented Jun 15, 2017

viirya commented Jun 15, 2017

viirya commented Jun 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxin commented Jun 15, 2017

viirya commented Jun 15, 2017

rxin commented Jun 15, 2017

rxin commented Jun 15, 2017

viirya commented Jun 15, 2017 • edited Loading

viirya commented Jun 15, 2017 • edited Loading

viirya commented Jun 15, 2017

SparkQA commented Jun 15, 2017

SparkQA commented Jun 15, 2017

SparkQA commented Jun 15, 2017

viirya Jun 15, 2017 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Jun 15, 2017

SparkQA commented Jun 15, 2017

SparkQA commented Jun 15, 2017

viirya commented Jun 16, 2017

rxin commented Jun 29, 2017

viirya commented Jun 29, 2017

SparkQA commented Jun 29, 2017

cloud-fan commented Jun 29, 2017

Choose a reason for hiding this comment

SparkQA commented Jun 29, 2017

SparkQA commented Jun 29, 2017

viirya commented Jun 29, 2017

viirya commented Jun 29, 2017

cloud-fan commented Jun 29, 2017

SparkQA commented Jun 29, 2017

viirya commented Jun 29, 2017

SparkQA commented Jun 29, 2017

cloud-fan commented Jun 29, 2017

viirya commented Jun 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jun 30, 2017 • edited Loading

gatorsmile commented Jun 30, 2017

viirya commented Jun 30, 2017

gatorsmile commented Jun 30, 2017

gatorsmile commented Jun 30, 2017

viirya commented Jun 30, 2017

gatorsmile commented Jun 30, 2017

viirya commented Jul 1, 2017

viirya commented Jun 15, 2017 •

edited

Loading

viirya commented Jun 15, 2017 •

edited

Loading

viirya Jun 15, 2017 •

edited

Loading

gatorsmile commented Jun 30, 2017 •

edited

Loading