Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-35533][runtime] Support Flink hybrid shuffle integration with Apache Celeborn #24900

Merged
merged 4 commits into from
Jun 12, 2024

Conversation

TanYuxin-tyx
Copy link
Contributor

What is the purpose of the change

Flink hybrid shuffle supports transitions between memory, disk, and remote storage to improve performance and job stability. Concurrently, Apache Celeborn provides a stable, performant, scalable remote shuffle service. This integration proposal is to harness the benefits from both hybrid shuffle and Celeborn simultaneously. This change is a pluggable remote tier implementation in Flink side.

Brief change log

  • Create teir factories according to the configured remote tier class
  • Refactor the APIs and implementations for supporting remote tier plugin
  • Add a option to config the external remote tier factory class

Verifying this changeq

  • Added test TierFactoryInitializerTest that validates the creation of tier factories

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (Yes)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 6, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@reswqa reswqa changed the title [Draft][FLINK-35533][runtime] Support Flink hybrid shuffle integration with Apache Celeborn [FLINK-35533][runtime] Support Flink hybrid shuffle integration with Apache Celeborn Jun 11, 2024
Copy link
Member

@reswqa reswqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, overall looks good. I only left a couple of comments.

Copy link
Member

@reswqa reswqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TanYuxin-tyx for the update, LGTM.

@reswqa reswqa merged commit 23fa2ae into apache:master Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants