Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

apache / beam Public

Notifications You must be signed in to change notification settings
Fork 4.2k
Star 7.7k

Code
Issues 4.3k
Pull requests 97
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Feature Request]: Add hotkey shuffle/fanout support to BeamSQL #28186

Open

1 of 15 tasks

Polber opened this issue Aug 28, 2023 · 0 comments

Open

1 of 15 tasks

[Feature Request]: Add hotkey shuffle/fanout support to BeamSQL #28186

Polber opened this issue Aug 28, 2023 · 0 comments

Labels

awaiting triage java new feature P2 sql

Comments

Copy link

Contributor

Polber commented Aug 28, 2023

What would you like to happen?

Currently BeamSQL does not have a way to reshuffle data that contains hotkeys during sub-transforms. For example, if a pipeline performs a JOIN operation as part of a SQLTransform, there is a good chance that a given key contains a disproportionate amount of data when it is fed into the GroupByKey that is performed as part of the expanded JOIN operation. Adding hotkey detection and reshuffling/fanout similar to Combine.PerKey.withHotKeyFanout should greatly increase performance.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner

The text was updated successfully, but these errors were encountered:

All reactions

Polber added awaiting triage new feature labels

github-actions bot added java P2 labels

liferoad added the sql label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

No one assigned

Labels

awaiting triage java new feature P2 sql

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

2 participants

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.