-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12888][SQL][follow-up] benchmark the new hash expression #10917
Conversation
@nongli maybe we should just use the simpler multiplication and addition? |
@cloud-fan Simple is just a single int right? It's not even doing anything in the previous case? |
@nongli It's not doing anything to get the hash code of the int field, but do a simple multiplication and addition to get the hash code of the row. |
Test build #50072 has finished for PR 10917 at commit
|
LGTM. We can have different hash functions with different entropy later but this seems okay to me. |
No, let's re-run them when the results are easier to explain. Can you also tune the iterations so that the iterations is a higher value. The harness does some rounding with the less significant digits. |
Test build #50634 has started for PR 10917 at commit |
test this please |
Test build #50640 has finished for PR 10917 at commit
|
retest this please |
Test build #50642 has finished for PR 10917 at commit
|
So many flaky tests... |
retest this please |
Test build #50653 has finished for PR 10917 at commit
|
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz | ||
Hash For map: Avg Time(ms) Avg Rate(M/s) Relative Rate | ||
------------------------------------------------------------------------------- | ||
interpreted version 64709.73 0.00 1.00 X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does this benchmark take to run? THis looks really long. I think we should keep benchmarks to run in low number of seconds total if possible.
cc @cloud-fan on follow-up |
e443f32
to
315af8c
Compare
@@ -124,7 +124,7 @@ private[spark] object Benchmark { | |||
} | |||
val best = runTimes.min | |||
val avg = runTimes.sum / iters | |||
Result(avg / 1000000, num / (best / 1000), best / 1000000) | |||
Result(avg / 1000000, num.toDouble / (best / 1000), best / 1000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep the precision here. The bestMs/avgMs
can be well controlled in an appropriate number, but the rate
can't. And we use rate
as a divisor later, so if rate
is small(assume we are benchmarking some slow operations), we will get large deviation. BTW, we use %10.1f
to print rate
but previously rate
is always integral.
cc @davies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hot fixed in master.
Test build #50858 has finished for PR 10917 at commit
|
Hash For normal: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative | ||
------------------------------------------------------------------------------------------- | ||
interpreted version 2209 / 2271 0.9 1053.4 1.0X | ||
codegen version 1887 / 2018 1.1 899.9 1.2X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the generated version is slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codegen version is 20% faster, because it doesn't have runtime reflection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I read it wrong (Maybe I commented in wrong line, meant the previous result).
Test build #50899 has finished for PR 10917 at commit
|
LGTM, merging this into master, thanks! |
Adds the benchmark results as comments.
The codegen version is slower than the interpreted version for
simple
case becasue of 3 reasons:Murmur3_x86_32.hashInt
vs simple multiplication and addition.GenerateHasher
that can generate code to return hash value directly and got about 60% speed up for thesimple
case, does it worth?simple
case only has one int field, so the runtime reflection may be removed because of branch prediction, which makes the interpreted version faster.The
array
case is also slow for similar reasons, e.g. array elements are of same type, so interpreted version can probably get rid of runtime reflection by branch prediction.