-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark #32357
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
@cloud-fan could you help take a look when you have time? Thanks. |
codegen = F 10873 10998 176 7.7 129.6 1.0X | ||
codegen = T, hashmap = F 5906 6005 95 14.2 70.4 1.8X | ||
codegen = T, row-based hashmap = T 2325 2410 94 36.1 27.7 4.7X | ||
codegen = T, vectorized hashmap = T 1185 1259 78 70.8 14.1 9.2X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, we should probably pick vectorized hashmap under certain conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. single group key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - this sounds interesting to me. Created SPARK-35241 for followup, thanks.
thanks, merging to master! |
Test build #137981 has finished for PR 32357 at commit
|
Thank you @cloud-fan for review! |
What changes were proposed in this pull request?
AggregateBenchmark
is only testing the performance for vectorized fast hash map, but not row-based hash map (which is used by default). We should add the row-based hash map into the benchmark.java 8 benchmark run - https://github.com/c21/spark/actions/runs/787731549
java 11 benchmark run - https://github.com/c21/spark/actions/runs/787742858
Why are the changes needed?
To have and track a basic sense of benchmarking different fast hash map used in hash aggregate.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing unit test, as this only touches benchmark code.