[FLINK-6521] Add per job cleanup methods to HighAvailabilityServices #4376

FangYongs · 2017-07-20T07:09:41Z

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

General
- The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
- The pull request addresses only one issue
- Each commit in the PR has a meaningful commit message (including the JIRA id)
Documentation
- Documentation has been added for new functionality
- Old documentation affected by the pull request has been updated
- JavaDoc for public methods has been added
Tests & Build
- Functionality added by the pull request is covered by tests
- mvn clean verify has been executed successfully locally or a Travis build has passed

tillrohrmann · 2017-07-28T09:43:38Z

...e/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperHaServices.java

+		try {
+			this.submittedJobGraphStore = ZooKeeperUtils.createSubmittedJobGraphs(client, configuration, executor);
+		} catch (Exception e) {
+			throw new RuntimeException(e);


We should not throw RuntimeException but instead a meaningful checked exception.

tillrohrmann · 2017-07-28T09:44:44Z

...st/java/org/apache/flink/runtime/highavailability/zookeeper/HighAvailabilityServiceTest.java

+import org.junit.Assert;
+import org.junit.rules.ExpectedException;
+
+public class HighAvailabilityServiceTest {


If we let the test case extend from TestLogger, then we get nice testing log statement on Travis.

tillrohrmann

Thanks for your contribution @zjureel. The changes look good to me. I had some minor comments which we could address. Moreover, there are some test cases failing most likely due to your changes:

ZooKeeperRegistryTest.testZooKeeperRegistry
ZooKeeperLeaderRetrievalTest.before

tillrohrmann · 2017-07-28T09:46:49Z

...st/java/org/apache/flink/runtime/highavailability/zookeeper/HighAvailabilityServiceTest.java

+
+		SubmittedJobGraph recoverJobGraph2 = submittedJobGraphStore.recoverJobGraph(jobGraph2.getJobId());
+		Assert.assertEquals(recoverJobGraph2.getJobId(), jobGraph2.getJobId());
+		thrown.expectMessage("Could not retrieve the submitted job graph state handle for /" +


Could we rather check for the exception type? Matching exception messages is really brittle.

…KeeperHaServices

tillrohrmann · 2017-07-31T10:02:55Z

The test case JobManagerHACheckpointRecoveryITCase.testCheckpointedStreamingProgramIncrementalRocksDB seems to be failing on Travis. It might be something caused by the changes.

FangYongs · 2017-08-01T04:30:54Z

I found the following kinda stuff from CI, and it seems not relevant to this issue, what do you think? @tillrohrmann

Running org.apache.flink.test.recovery.JobManagerHACheckpointRecoveryITCase
java.lang.RuntimeException: org.apache.zookeeper.server.ZooKeeperServer class is frozen
	at javassist.CtClassType.checkModify(CtClassType.java:288)
	at javassist.CtBehavior.setBody(CtBehavior.java:432)
	at javassist.CtBehavior.setBody(CtBehavior.java:412)
	at org.apache.curator.test.ByteCodeRewrite.fixMethods(ByteCodeRewrite.java:91)
	at org.apache.curator.test.ByteCodeRewrite.<clinit>(ByteCodeRewrite.java:50)
	at org.apache.curator.test.TestingServer.<clinit>(TestingServer.java:33)
	at org.apache.flink.test.recovery.JobManagerHACheckpointRecoveryITCase.testCheckpointedStreamingProgram(JobManagerHACheckpointRecoveryITCase.java:350)
	at org.apache.flink.test.recovery.JobManagerHACheckpointRecoveryITCase.testCheckpointedStreamingProgramIncrementalRocksDB(JobManagerHACheckpointRecoveryITCase.java:336)

tisonkun · 2018-12-14T07:25:44Z

@tillrohrmann @zjureel

I think the functionality is implemented occasionally by #6587 FLINK-10011

However, it is still a valid question that who is the proper actor to do the clean-up job. As for SubmittedJobGraph, it is managed by Dispatcher, but the RunningJobsRegistry is tricky that both JobManagerRunner and Dispatcher can write it.

Under the topic "per job clean up", I'd like to raise a question that how flink considered the status of a job? If we said that "per job clean up" is Dispatcher's responsibility, then we should prevent JM from writing such a RunningJobsRegistry and also it means that the status on Dispatcher is what we(users) think that of a certain job.

tillrohrmann · 2019-09-27T10:24:44Z

Closed for inactivity.

FangYongs added 3 commits July 12, 2017 13:33

add cleanupData(JobID jobID) in HighAvailabilityServices

0290f34

add test case

7e9860e

add test case

e3fd500

tillrohrmann reviewed Jul 28, 2017

View reviewed changes

tillrohrmann requested changes Jul 28, 2017

View reviewed changes

FangYongs added 2 commits July 31, 2017 10:31

add Exception check for test && throw exception in constructor of Zoo…

0b5ae66

…KeeperHaServices

Use TemporaryFolder to set HA_STORAGE_PATH

83ad085

rmetzger added the component=Runtime/Coordination label Mar 14, 2019

tillrohrmann closed this Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-6521] Add per job cleanup methods to HighAvailabilityServices #4376

[FLINK-6521] Add per job cleanup methods to HighAvailabilityServices #4376

FangYongs commented Jul 20, 2017

tillrohrmann Jul 28, 2017

tillrohrmann Jul 28, 2017

tillrohrmann left a comment

tillrohrmann Jul 28, 2017

tillrohrmann commented Jul 31, 2017

FangYongs commented Aug 1, 2017

tisonkun commented Dec 14, 2018

tillrohrmann commented Sep 27, 2019

[FLINK-6521] Add per job cleanup methods to HighAvailabilityServices #4376

[FLINK-6521] Add per job cleanup methods to HighAvailabilityServices #4376

Conversation

FangYongs commented Jul 20, 2017

tillrohrmann Jul 28, 2017

Choose a reason for hiding this comment

tillrohrmann Jul 28, 2017

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

tillrohrmann Jul 28, 2017

Choose a reason for hiding this comment

tillrohrmann commented Jul 31, 2017

FangYongs commented Aug 1, 2017

tisonkun commented Dec 14, 2018

tillrohrmann commented Sep 27, 2019