Skip to content

Conversation

zentol
Copy link
Contributor

@zentol zentol commented Jul 17, 2018

What is the purpose of the change

This PR hardens the YarnTestBase against jobs that just don't want to shut down that quickly (i.e. within 500ms).
The maximum waiting time has been increased to 10 seconds, during which we periodically check the state of all applications.

Additionally, the failure condition from @Before was moved to the @After method.

This change will allow us to better differentiate between simple timing issues and unsuccessful job shutdowns.

@zentol zentol changed the title [FLINK-8163][yarn][tests] Harden tests against slow job shutdowns [FLINK-9815][yarn][tests] Harden tests against slow job shutdowns Jul 17, 2018
@zentol
Copy link
Contributor Author

zentol commented Jul 17, 2018

the test failures may highlight tests that weren't shutting down the last application properly; previously this would've succeeded since the check was done in @Before.

@zentol
Copy link
Contributor Author

zentol commented Jul 17, 2018

will look into them tomorrow

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the YarnTestBase @zentol. The changes look good to me. Before merging, it would be good to fix the failing test cases.

if (runningApp.isPresent()) {
final ApplicationReport app = runningApp.get();
Assert.fail("There is at least one application on the cluster that is not finished." +
"App " + app.getApplicationId() + " is in state " + app.getYarnApplicationState());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we log all running applications instead of any?

@tillrohrmann
Copy link
Contributor

The failing test case seems to be caused by YARNSessionFIFOITCase#testJavaAPI not calling yarnCluster.shutdownCluster. I've fixed this problem in master and the release branch. Rebasing onto the latest master should hopefully give a green Travis build.

@zentol
Copy link
Contributor Author

zentol commented Jul 18, 2018

The YARNHighAvailabilityITCase still has the same problem.

@tillrohrmann
Copy link
Contributor

Interesting. I thought I would have executed flink-yarn-tests locally with all tests. But it looks as if YARNHighAvailabilityITCase suffers from the same problem as the YARNSessionFIFOITcase#testJavaAPI test.

@zentol
Copy link
Contributor Author

zentol commented Jul 18, 2018

yay travis is green.

asfgit pushed a commit that referenced this pull request Jul 19, 2018
@asfgit asfgit closed this in 230f817 Jul 19, 2018
@zentol zentol deleted the 8163 branch July 19, 2018 07:09
sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants