Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21052][SQL] Add hash map metrics to join #18301

Closed
wants to merge 12 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jun 14, 2017

What changes were proposed in this pull request?

This adds the average hash map probe metrics to join operator such as BroadcastHashJoin and ShuffledHashJoin.

This PR adds the API to HashedRelation to get average hash map probe.

How was this patch tested?

Related test cases are added.

@SparkQA
Copy link

SparkQA commented Jun 14, 2017

Test build #78047 has finished for PR 18301 at commit 14e65e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-21052][SQL][WIP] Add hash map metrics to join [SPARK-21052][SQL] Add hash map metrics to join Jun 14, 2017
@rxin
Copy link
Contributor

rxin commented Jun 15, 2017

Can you put a screenshot of the UI up, for both join and aggregate?

@viirya
Copy link
Member Author

viirya commented Jun 15, 2017

The screenshot of BroadcastHashJoin:

screen shot 2017-06-15 at 11 17 02 am

@viirya
Copy link
Member Author

viirya commented Jun 15, 2017

The screenshot of HashAggregate:

screen shot 2017-06-15 at 11 18 21 am

@viirya
Copy link
Member Author

viirya commented Jun 15, 2017

The screenshot of ShuffledHashJoin:

screen shot 2017-06-15 at 11 23 26 am

@@ -573,8 +586,11 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
private def updateIndex(key: Long, address: Long): Unit = {
var pos = firstSlot(key)
assert(numKeys < array.length / 2)
numKeyLookups += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also add this code to the het and the getValue methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we? It seems to me that we should only care about the hash collision happened when inserting the data into the hash map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should. The number of required probes is the different per key, and is also dependent on the order in which the map was constructed. If you combine this with some skew and missing keys, the number of probes can be much higher than expected.

You could even argue that we do not really care about the number of probes when building the map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. OK. I think you're right. We should also care about the collision when searching keys in join operator. I'll update this in next commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ain't you on a beach somewhere?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review even you're on a beach.

@rxin
Copy link
Contributor

rxin commented Jun 15, 2017

I'd shorten it to "avg hash probe". Also do we really need min, med, max? Maybe just a single global avg?

@viirya
Copy link
Member Author

viirya commented Jun 15, 2017

So just get the global average of all avg hash probe metrics of all tasks? If there's skew, won't we like to see min, med, max?

@rxin
Copy link
Contributor

rxin commented Jun 15, 2017

yes but i just feel it is getting very long and verbose ..

@rxin
Copy link
Contributor

rxin commented Jun 15, 2017

also the avg probe probably shouldn't be an integer. at least we should show something like 1.9?

@viirya
Copy link
Member Author

viirya commented Jun 15, 2017

Because SQLMetric just stores long value. I was using a trick to multiply the avg probe by 1000 to get a long.

When preparing the values for UI, dividing the long with 1000 to get a float back.

So it's a workaround for long-based SQLMetric. But I finally don't use it.

Does it sound too hacky for you?

@viirya
Copy link
Member Author

viirya commented Jun 15, 2017

Maybe just min and max? Or med and max?

@viirya
Copy link
Member Author

viirya commented Jun 15, 2017

The screenshot of UI showing float numbers instead of integer numbers.

screen shot 2017-06-15 at 4 27 38 pm

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78088 has finished for PR 18301 at commit bf4618a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78089 has finished for PR 18301 at commit 438d0e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78091 has finished for PR 18301 at commit 69e8216.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Because `SQLMetric` only stores long value, in order to store double average metrics, we
// multiply the given double with a base integer. When showing the metrics, it will be
// divided by the base integer to restore back a double.
def setWithDouble(v: Double): Unit = _value = (v * SQLMetrics.baseForAvgMetric).toLong
Copy link
Member Author

@viirya viirya Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you think this is a bit hacky. To store a float value into SQLMetric, currently I have no better idea to do. Any suggestion welcome.

@viirya viirya force-pushed the SPARK-21052 branch 2 times, most recently from eb979dd to 59c3e93 Compare June 15, 2017 14:12
@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78105 has finished for PR 18301 at commit 6b71956.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78107 has finished for PR 18301 at commit eb979dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78109 has finished for PR 18301 at commit 59c3e93.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 16, 2017

cc @cloud-fan @gatorsmile for review.

@rxin
Copy link
Contributor

rxin commented Jun 29, 2017

hey i didn't track super closely, but it is pretty important to show at least one more digit, e.g. 1.7, rather than just 2.

@viirya
Copy link
Member Author

viirya commented Jun 29, 2017

@rxin I just revert it in previous commits. @cloud-fan should I revert it back?

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78859 has finished for PR 18301 at commit 9cbd627.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@viirya ok let's add it back

//
// WholeStageCodegen enabled:
// ... ->
// WholeStageCodegen(nodeId = 0, Filter(nodeId = 4) -> Project(nodeId = 3) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you format it a little bit? to indicate that we only have a WholeStageCodegen, all other plans are the inner children of WholeStageCodegen.

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78862 has finished for PR 18301 at commit 9a048f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78878 has finished for PR 18301 at commit 27cf740.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 29, 2017

Verified with UI. Now showing float values with one decimal place.

screen shot 2017-06-29 at 3 10 39 pm

@viirya
Copy link
Member Author

viirya commented Jun 29, 2017

retest this please.

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78882 has finished for PR 18301 at commit 27cf740.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 29, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78893 has finished for PR 18301 at commit 27cf740.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 18066f2 Jun 29, 2017
@viirya
Copy link
Member Author

viirya commented Jun 29, 2017

Thanks! @cloud-fan @rxin @hvanhovell @dongjoon-hyun

robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
## What changes were proposed in this pull request?

This adds the average hash map probe metrics to join operator such as `BroadcastHashJoin` and `ShuffledHashJoin`.

This PR adds the API to `HashedRelation` to get average hash map probe.

## How was this patch tested?

Related test cases are added.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#18301 from viirya/SPARK-21052.
/**
* Returns the average number of probes per key lookup.
*/
def getAverageProbesPerLookup(): Double
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def getAverageProbesPerLookup(): Double -> def getAverageProbesPerLookup: Double

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist this change, I can do it in a related PR or a follow-up PR.

@@ -273,6 +279,8 @@ private[joins] class UnsafeHashedRelation(
override def read(kryo: Kryo, in: Input): Unit = Utils.tryOrIOException {
read(in.readInt, in.readLong, in.readBytes)
}

override def getAverageProbesPerLookup(): Double = binaryMap.getAverageProbesPerLookup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override def getAverageProbesPerLookup: Double = binaryMap.getAverageProbesPerLookup

@gatorsmile
Copy link
Member

gatorsmile commented Jun 30, 2017

How about getting rid of numHashCollisions and timeSpentResizingNs in BytesToBytesMap by a follow-up PR?

Also remove the useless () in the same PR

@gatorsmile
Copy link
Member

A dumb question. Why not reporting numHashCollisions/numKeyLookups?

@viirya
Copy link
Member Author

viirya commented Jun 30, 2017

numHashCollisions is increased only when the hash of the key to insert is equal to the hash of an existing key in the map, but their actual keys are not equal.

But we want to show the average number of probes, which includes the cases different key hash, equal key hash (hash collision) before finding a empty slot in the map or verifying it is an existing key in the map.

@gatorsmile
Copy link
Member

average number of probes will be 1 if no collision, right?

@gatorsmile
Copy link
Member

Let me rephrase my question. Why users care the average number of probe, if they already know average number of collision?

@viirya
Copy link
Member Author

viirya commented Jun 30, 2017

My understanding is, even a key doesn't have collision when inserting, it still possible needs multiple probes to find empty slot. Personally I think number of collision doesn't tell too much information to users. It reflects how well the hash algorithm is designed.

@gatorsmile
Copy link
Member

@viirya Could you show me the codes that need multiple probes to find an empty slot?

@viirya
Copy link
Member Author

viirya commented Jul 1, 2017

In the a probe at L473, if the slot pointed by the hash code is not empty, it's possible that there's hash collision (equal hash codes, different keys), but it's possible too that the slot is occupied by a key with different hash (the if condition at L475 is false). In this case, we continue to look up an empty slot by going forward step at 492, and increase the number of probe.

@viirya viirya deleted the SPARK-21052 branch December 27, 2023 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants