Skip to content

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation#23866

Closed
zhengzhili333 wants to merge 11 commits intoapache:masterfrom
zhengzhili333:Flink-33500
Closed

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation#23866
zhengzhili333 wants to merge 11 commits intoapache:masterfrom
zhengzhili333:Flink-33500

Conversation

@zhengzhili333
Copy link
Copy Markdown

What is the purpose of the change

Currently, submitting a job starts with storing the JobGraph (in HA setups) in the JobGraphStore. This includes writing the file to S3 (or some other remote file system). The job submission is done in the Dispatcher's main thread. If writing the JobGraph is slow, it would block any other operation on the Dispatcher.

Brief change log

  • The ZooKeeperStateHandleStore create path in ZooKeeper and locks it then write asynchronously state in Executor
  • The KubernetesStateHandleStore stores key in ConfigMap and write asynchronously state in Executor
  • *The dispatcher put JobGraph asynchronously in ioExecutor *
  • *The dispatcher write To ExecutionGraphInfoStore asynchronously in ioExecutor *

Verifying this change

This change added tests and can be verified as follows:

  • Added the Dispatcher JobSubmission test, use ZooKeeperStateHandleStore as JobGraphStore
  • Added the Dispatcher JobSubmission test, use KubernetesStateHandleStore as JobGraphStore

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Dec 4, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@zhengzhili333
Copy link
Copy Markdown
Author

@flinkbot run azure

1 similar comment
@zhengzhili333
Copy link
Copy Markdown
Author

@flinkbot run azure

zhengzhili333 and others added 4 commits December 4, 2023 21:26
…tion

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation

[FLINK-33500][Runtime] Run storing the JobGraph an asynchronous operation
@zhengzhili333
Copy link
Copy Markdown
Author

@flinkbot run azure

@zhengzhili333 zhengzhili333 reopened this Dec 5, 2023
@zhengzhili333
Copy link
Copy Markdown
Author

@flinkbot run azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants