-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-17075][coordination] Reconcile deployed Executions #12815
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit c7057fb (Fri Jul 03 08:18:15 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @zentol. The changes look already good. I had a couple of minor comments. Please take a look.
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java
Outdated
Show resolved
Hide resolved
...-runtime/src/main/java/org/apache/flink/runtime/jobmaster/ExecutionDeploymentReconciler.java
Outdated
Show resolved
Hide resolved
...-runtime/src/main/java/org/apache/flink/runtime/jobmaster/ExecutionDeploymentReconciler.java
Outdated
Show resolved
Hide resolved
...-runtime/src/test/java/org/apache/flink/runtime/jobmaster/JobMasterPartitionReleaseTest.java
Outdated
Show resolved
Hide resolved
...org/apache/flink/runtime/taskexecutor/TaskExecutorExecutionDeploymentReconciliationTest.java
Outdated
Show resolved
Hide resolved
...org/apache/flink/runtime/taskexecutor/TaskExecutorExecutionDeploymentReconciliationTest.java
Outdated
Show resolved
Hide resolved
...org/apache/flink/runtime/taskexecutor/TaskExecutorExecutionDeploymentReconciliationTest.java
Outdated
Show resolved
Hide resolved
...org/apache/flink/runtime/taskexecutor/TaskExecutorExecutionDeploymentReconciliationTest.java
Outdated
Show resolved
Hide resolved
Please also make sure that the failed e2e has not failed because of these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments. I had one minor comment. This means the only thing which is left to decide is the interface of the ExecutionDeploymentReconciliationHandler
. Concretely whether we want to encourage a call per Execution
or also support batched calls.
With this PR the
JobMaster
can detect and reconcile deployments that are missing or unknown.The
TaskExecutor
is periodically informing theJobMaster
about the set of deployed executions, submitted with the heartbeat.The
Scheduler
tracks the set of expected deployments with a newExecutionDeploymentTracker
component. The tracker is updated of deployed/terminated deployments by 2 new listeners in theExecutionGraph
.The
JobMaster
reconciles deployments with a newExecutionDeploymentReconciler
component.If the
TaskExecutor
is hosting an unexpected execution, then that execution will be canceled.If the
TaskExecutor
is unexpectedly not hosting an execution, then the execution is failed in theExecutionGraph.