-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-31133][tests] Fix timeouts in PartiallyFinishedSourcesITCase #22022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gaoyunhaii would you be able to take a look at this PR? |
dmvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch Roman; this reads very sane! (though I'm not an expert in this part of Flink).
Apart from inline questions:
- The first commit should be rephrased
Prevent timeouts when waiting for a status of a …
->
Prevent timeouts when waiting for a result of a …
--
- For the second commit
Otherwise, any command with SINGLE_SUBTASK scope might be dispatched to a finished source.
This will result in a timeout while waiting for this command to be executed.
Can you please point me to where precisely the timeout is supposed to happen? As far as I can tell, on the source side, we'd only add it to the queue (there will most likely be something on the executor side that I'm missing).
...-tests/src/test/java/org/apache/flink/runtime/operators/lifecycle/graph/TestEventSource.java
Show resolved
Hide resolved
flink-tests/src/test/java/org/apache/flink/runtime/operators/lifecycle/TestJobExecutor.java
Show resolved
Hide resolved
|
Thanks for the review @dmvk! I've updated both commit messages and marked
Sure, the timeout happens in |
...-tests/src/test/java/org/apache/flink/runtime/operators/lifecycle/graph/TestEventSource.java
Outdated
Show resolved
Hide resolved
dmvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good job 👍 Can you, please squash the hotfixes into the original two commits before merging?
- Only obtain execution exception if the job is in globally terminal state - [hotfix] Unsubscribe finished TestEventSources from test commands. Otherwise, any command with SINGLE_SUBTASK scope might be dispatched to a finished source. This will result in TestJobExecutor.waitForFailover timing out while waiting for the command to be executed and ACKed. - [hotfix] Mark TestEventSource.scheduledCommands volatile - [hotfix] Make sure to process all commands in TestEventSource
|
Thanks for the review @dmvk! |
If a checkpoint fails for some reason,
PartiallyFinishedSourcesITCasemight hang (and indirectly affect other tests).There are two issues in case of a checkpoint failure:
Mostly happens in 1.15, because in later versions have increased timeouts/attempts to upload state.
This PR fixes the above root causes of test hanging.