Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark #32357

Closed
wants to merge 2 commits into from

Conversation

c21
Copy link
Contributor

@c21 c21 commented Apr 27, 2021

What changes were proposed in this pull request?

AggregateBenchmark is only testing the performance for vectorized fast hash map, but not row-based hash map (which is used by default). We should add the row-based hash map into the benchmark.

java 8 benchmark run - https://github.com/c21/spark/actions/runs/787731549
java 11 benchmark run - https://github.com/c21/spark/actions/runs/787742858

Why are the changes needed?

To have and track a basic sense of benchmarking different fast hash map used in hash aggregate.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit test, as this only touches benchmark code.

@github-actions github-actions bot added the SQL label Apr 27, 2021
@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42501/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42501/

@c21
Copy link
Contributor Author

c21 commented Apr 27, 2021

@cloud-fan could you help take a look when you have time? Thanks.

codegen = F 10873 10998 176 7.7 129.6 1.0X
codegen = T, hashmap = F 5906 6005 95 14.2 70.4 1.8X
codegen = T, row-based hashmap = T 2325 2410 94 36.1 27.7 4.7X
codegen = T, vectorized hashmap = T 1185 1259 78 70.8 14.1 9.2X
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, we should probably pick vectorized hashmap under certain conditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. single group key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - this sounds interesting to me. Created SPARK-35241 for followup, thanks.

@cloud-fan cloud-fan changed the title [SPARK-35235][SQL] Add row-based hash map into aggregate benchmark [SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark Apr 27, 2021
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c4ad86f Apr 27, 2021
@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Test build #137981 has finished for PR 32357 at commit 469ac63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@c21
Copy link
Contributor Author

c21 commented Apr 27, 2021

Thank you @cloud-fan for review!

@c21 c21 deleted the agg-benchmark branch April 27, 2021 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants