Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14104] Support shard aware aggregation in Kinesis writer. #17113

Merged
merged 1 commit into from
Apr 7, 2022

Conversation

mosche
Copy link
Member

@mosche mosche commented Mar 17, 2022

This PR introduces shard aware aggregation to achieve better aggregation results (to better max out Kinesis API limits).

If random partitioning is sufficient, this can already be achieved using the explicit random partitioner. The benefit of this approach is that it doesn't require pulling shard details through the API.
However, the random partitioner uses a static configuration that doesn't handle any resharding, that may occur.

Shard aware aggregation is implemented as follows:

  • Periodically the writer pulls hash key ranges assigned to shards. These are statically shared using an ObjectPool to minimize the necessary API calls.

  • The configured partitioner is used to generate the partitionKey and the optional explicitHashKey for each record. If explicitHashKey is set, it is used as is with no change in behavior. Otherwise, however, the lower bound of the hash key range of the target shard (based on the hashed partitionKey) is chosen for aggregation and set as explicitHashKey.

This makes sure aggregated records always contain only records that match the shard they are routed to even if the stream is in the process of resharding.

To simplify statically sharing the shard state, ClientPool was generalized to ObjectPool. This should be fine as the code wasn't released yet.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@mosche
Copy link
Member Author

mosche commented Mar 17, 2022

R: @aromanenko-dev

@mosche mosche force-pushed the BEAM-14104-ShardAwareAggregation branch 2 times, most recently from 13bf1c0 to 22724c9 Compare March 18, 2022 08:44
@mosche
Copy link
Member Author

mosche commented Mar 21, 2022

@aromanenko-dev Not sure if feasible in terms of reviewing, though would be great to get this into the 2.38 release as well :)

@mosche mosche force-pushed the BEAM-14104-ShardAwareAggregation branch from 22724c9 to 7dec404 Compare March 21, 2022 08:02
@mosche
Copy link
Member Author

mosche commented Mar 21, 2022

Run Java PostCommit

1 similar comment
@mosche
Copy link
Member Author

mosche commented Mar 21, 2022

Run Java PostCommit

@mosche
Copy link
Member Author

mosche commented Mar 21, 2022

Same BigQuery it tests failing

@mosche mosche force-pushed the BEAM-14104-ShardAwareAggregation branch from 7dec404 to e4630b6 Compare March 21, 2022 13:12
Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sorry for delay.
LGTM, just several minor notes.

@mosche mosche force-pushed the BEAM-14104-ShardAwareAggregation branch from e4630b6 to 416fc9b Compare April 7, 2022 09:24
@aromanenko-dev
Copy link
Contributor

Run Java PostCommit

@aromanenko-dev
Copy link
Contributor

LGTM, just a very minor note on the not obvious constant.
Please, merge it by your self once it will be fixed and Java PostCommit passed

@mosche
Copy link
Member Author

mosche commented Apr 7, 2022

All significant tests passed, failures are unrelated ...

@mosche mosche merged commit ad4561e into apache:master Apr 7, 2022
@mosche mosche deleted the BEAM-14104-ShardAwareAggregation branch April 7, 2022 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants