Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap to using CoGBK as grouping primitive instead of GBK #18032

Open
kennknowles opened this issue Jun 3, 2022 · 0 comments
Open

Swap to using CoGBK as grouping primitive instead of GBK #18032

kennknowles opened this issue Jun 3, 2022 · 0 comments

Comments

@kennknowles
Copy link
Member

The intent is for the semantics of both GBK and CoGBK to be
unchanged, just swapping their status as primitives.

CoGBK is a more powerful operator then GBK allowing for two key benefits:

  1. SDKs are simplified: transforming a CoGBK into a GBK is trivial while the reverse is not.
  2. It will be easier for runners to provide more efficient implementations of CoGBK as they will be responsible for the logic which takes their own internal grouping implementation and maps it onto a CoGBK.

This requires the following modifications to the Beam code base:

  1. Make GBK a composite transform in terms of CoGBK.
  2. Move the CoGBK from contrib to runners-core as an adapter*. Runners that more naturally support GBK can just use this and everything executes exactly as before.

*just like GroupByKeyViaGroupByKeyOnly and UnboundedReadFromBoundedSource

Imported from Jira BEAM-490. Original Jira may contain additional context.
Reported by: lcwik.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant