[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark #32357

c21 · 2021-04-27T03:57:33Z

What changes were proposed in this pull request?

AggregateBenchmark is only testing the performance for vectorized fast hash map, but not row-based hash map (which is used by default). We should add the row-based hash map into the benchmark.

java 8 benchmark run - https://github.com/c21/spark/actions/runs/787731549
java 11 benchmark run - https://github.com/c21/spark/actions/runs/787742858

Why are the changes needed?

To have and track a basic sense of benchmarking different fast hash map used in hash aggregate.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit test, as this only touches benchmark code.

SparkQA · 2021-04-27T05:26:16Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42501/

SparkQA · 2021-04-27T05:26:18Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42501/

c21 · 2021-04-27T06:43:13Z

@cloud-fan could you help take a look when you have time? Thanks.

cloud-fan · 2021-04-27T06:50:44Z

sql/core/benchmarks/AggregateBenchmark-jdk11-results.txt

+codegen = F                                       10873          10998         176          7.7         129.6       1.0X
+codegen = T, hashmap = F                           5906           6005          95         14.2          70.4       1.8X
+codegen = T, row-based hashmap = T                 2325           2410          94         36.1          27.7       4.7X
+codegen = T, vectorized hashmap = T                1185           1259          78         70.8          14.1       9.2X


interesting, we should probably pick vectorized hashmap under certain conditions.

e.g. single group key.

@cloud-fan - this sounds interesting to me. Created SPARK-35241 for followup, thanks.

cloud-fan · 2021-04-27T06:53:37Z

thanks, merging to master!

SparkQA · 2021-04-27T08:59:00Z

Test build #137981 has finished for PR 32357 at commit 469ac63.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

c21 · 2021-04-27T21:08:12Z

Thank you @cloud-fan for review!

c21 added 2 commits April 26, 2021 20:13

Add row-based hash map into aggregate benchmark

4db9c34

Update benchmark result

469ac63

github-actions bot added the SQL label Apr 27, 2021

cloud-fan reviewed Apr 27, 2021

View reviewed changes

cloud-fan approved these changes Apr 27, 2021

View reviewed changes

cloud-fan changed the title ~~[SPARK-35235][SQL] Add row-based hash map into aggregate benchmark~~ [SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark Apr 27, 2021

cloud-fan closed this in c4ad86f Apr 27, 2021

c21 deleted the agg-benchmark branch April 27, 2021 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark #32357

[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark #32357

c21 commented Apr 27, 2021

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

c21 commented Apr 27, 2021

cloud-fan Apr 27, 2021

cloud-fan Apr 27, 2021

c21 Apr 27, 2021

cloud-fan commented Apr 27, 2021

SparkQA commented Apr 27, 2021

c21 commented Apr 27, 2021

[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark #32357

[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark #32357

Conversation

c21 commented Apr 27, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Apr 27, 2021

SparkQA commented Apr 27, 2021

c21 commented Apr 27, 2021

cloud-fan Apr 27, 2021

Choose a reason for hiding this comment

cloud-fan Apr 27, 2021

Choose a reason for hiding this comment

c21 Apr 27, 2021

Choose a reason for hiding this comment

cloud-fan commented Apr 27, 2021

SparkQA commented Apr 27, 2021

c21 commented Apr 27, 2021