[FLINK-21258] Add Canceling state for DeclarativeScheduler#14909
[FLINK-21258] Add Canceling state for DeclarativeScheduler#14909rmetzger wants to merge 3 commits intoapache:masterfrom
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit b1053be (Tue Feb 09 11:32:29 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for creating this PR @rmetzger. The changes look already quite good. I had a couple of comments concerning the tests. Please take a look.
| public void testTransitionToFinishedWhenCancellationCompletes() throws Exception { | ||
| try (MockStateWithExecutionGraphContext ctx = new MockStateWithExecutionGraphContext()) { | ||
| Canceling canceling = createCancelingState(ctx); | ||
|
|
There was a problem hiding this comment.
here we don't have to call onEnter?
| @Test | ||
| public void testTransitionToFinishedWhenCancellationCompletes() throws Exception { | ||
| try (MockStateWithExecutionGraphContext ctx = new MockStateWithExecutionGraphContext()) { | ||
| Canceling canceling = createCancelingState(ctx); |
There was a problem hiding this comment.
It is a bit hidden why this test works. I assume it is the case because the ExecutionGraph does not contain any Executions which are in DEPLOYING or RUNNING state. That's why a ExecutionGraph.cancel can directly cancel all Executions. Maybe we can make this a bit more explicit.
| public void testTransitionToSuspend() throws Exception { | ||
| try (MockStateWithExecutionGraphContext ctx = new MockStateWithExecutionGraphContext()) { | ||
| Canceling canceling = createCancelingState(ctx); | ||
|
|
There was a problem hiding this comment.
I assume that you don't call onEnter because it would make the state go to Finished. I think that we should stick to the convention of State and call onEnter. Otherwise we risk that we are testing something which is impossible to reach in production. The proper solution would be to ensure that the ExecutionGraph does not directly go to CANCELLED when onEnter is called.
| ExecutionState.FAILED, | ||
| new RuntimeException())); | ||
| canceling.updateTaskExecutionState(update); | ||
| ctx.assertNoStateTransition(); |
There was a problem hiding this comment.
We should also assert that the ExecutionGraph has been updated. Otherwise a passing implementation of the Canceling.updateTaskExecutionState could be an empty method.
Technically, we would also have to do this for the other StateWithExecutionGraph sub classes.
| public void notifyGlobalFailure(Throwable t) { | ||
| canceling.handleGlobalFailure(t); | ||
| } |
There was a problem hiding this comment.
For which test do we need this implementation?
There was a problem hiding this comment.
For the testTaskFailuresAreIgnored() test.
I'm actually proposing to remove that test. It doesn't belong here. We need to test the proper handling of the updateTaskExecutionState() on the DeclarativeScheduler itself (which registers an InternalFailuresListener and forwards failures to the handleGlobalFailure() method of the State).
Since we are testing the handleGlobalFailure() in a unit test, having a test for updateTaskExecutionState() here doesn't make much sense.
@zentol: Can we add such a test to the DeclarativeScheduler skeleton PR?
There was a problem hiding this comment.
If I am not mistaken, then we don't need an InternalFailuresListener for testing the updateTaskExecutionState. I thought that his listener is only used when a failure in some other operation (e.g. deploy occurs). Testing that updateTaskExecutionState stays in the CANCELING state makes sense to me.
|
I rebased this change to the latest master (thus fewer commits are included), and addressed all comments I once again introduced a MockExecutionGraph for this test. But extracting an interface from the ExecutionGraph is a really involved change that would not be worth it for my needs in this PR. |
b1053be to
41d226e
Compare
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for updating this PR @rmetzger. I think it makes sense to keep testTaskFailuresAreIgnored because we want to ensure that we don't leave the state on a failed task state update. Moreover, I think we shouldn't need TestInternalFailuresListener.
|
Thanks for your review! I'll remove the |
This PR is based on #14879
What is the purpose of the change
Declarative Scheduler consists of a number of internal states.
Note that this change is currently not usable as-is, as the other parts of declarative scheduler are not merged yet (See for the prototype this PR is based on: https://github.com/tillrohrmann/flink/tree/declarative-scheduler)
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation
Will be handled in a separate PR.