Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14464] More efficient grouping keys in precombiner table. #17641

Merged
merged 11 commits into from
Jun 24, 2022

Conversation

robertwb
Copy link
Contributor

The current code constructs, hashes, and compares full WindowedValues
for the grouping key, which ends up dominating the time spent in the
combining table when using trivial combiners (like Sum.integers()).
We only need compare the (structural value of the) key and windows,
and can emit the windows in the global case.

Per added microbenchmarks, this is roughly a 50% improvement for
singly-windowed values, and roughly 2.5x improvement for the common
GlobalWindows case.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

…owedValueCoder.

The latter skips a whole bunch of checks to generate the correctly specialized
WindowedValue type, especially for elements in the global window.
The current code constructs, hashes, and compares full WindowedValues
for the grouping key, which ends up dominating the time spent in the
combining table when using trivial combiners (like Sum.integers()).
We only need compare the (structural value of the) key and windows,
and can emit the windows in the global case.

Per added microbenchmarks, this is roughly a 50% improvement for
singly-windowed values, and roughly 2.5x improvement for the common
GlobalWindows case.
@github-actions github-actions bot added the java label May 12, 2022
@robertwb
Copy link
Contributor Author

R: @y1chi

@lukecwik
Copy link
Member

I have been looking for someone to review #17327, do you mind if we get that merged first and then take on this change?

Copy link
Contributor

@y1chi y1chi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robertwb
Copy link
Contributor Author

Thanks, Yichi! As mentioned, I'm going to hold off merging this until that other PR goes in.

@aaltay
Copy link
Member

aaltay commented Jun 2, 2022

The other PR is merged. Could this be merged?

@robertwb
Copy link
Contributor Author

Failures

org.apache.beam.sdk.io.aws2.kinesis.KinesisIOWriteTest.testWriteFailure
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRequestByteSizeLimits
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest.classMethod

look unrelated.

@robertwb
Copy link
Contributor Author

Run Java PreCommit

@robertwb robertwb merged commit 305537f into apache:master Jun 24, 2022
@Abacn
Copy link
Contributor

Abacn commented Jun 24, 2022

precommit is broken again due to checkStyle warnings

@robertwb
Copy link
Contributor Author

robertwb commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants