-
Notifications
You must be signed in to change notification settings - Fork 1
[FLINK-11813] Introduces cleanup of dirty jobs in Dispatcher #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Factory.createJobResultStore
… PartialDispatcherServicesWithJobPersistenceComponents
…thJobPersistenceComponents
Instead, null checks are added
…SlotPoolServicesSchedulerFactory
…overyFactory.createRecoveredCompletedCheckpointStore
8ccb2b0
to
5ac5028
Compare
… JobManager configuration into the interface
5ac5028
to
30ac5c1
Compare
30ac5c1
to
d8b8782
Compare
return false; | ||
} | ||
|
||
private void cleanupDirtyJobs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth moving all the Dispatcher's cleanup logic into its own class, e.g. DispatcherCleanup
.
I realized that it's not smart to use the same |
})); | ||
|
||
cleanupTaskResults.add( | ||
CompletableFuture.supplyAsync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shoud never use async future operations without explicitly providing an executor. This will take the one from a common pool, which could result in weird memory leaks (eg. with thread locals) and other unexpected behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that should operate utilizing the ioExecutor
return blobServer.cleanupJob(jobId, jobGraphRemoved); | ||
} | ||
|
||
private boolean cleanupHighAvailabilityServices(JobID jobId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these cleanup methods looks alike. Would it be possible to extract it behind a common interfaces and simplify this? (eg. having something along the lines of List<CleanupStage> ...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an interface was my initial approach. I backed off from it because I realized that FutureUtils
's retry mechanism only relies on passing in some callback. The interface didn't bring any value - so I left it out for now.
try { | ||
jobResultStore.markResultAsClean(jobId); | ||
} catch (IOException e) { | ||
log.warn("Could not properly mark job {} result as clean.", jobId, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the consequences of ignoring the failure here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of failure, the job should be picked up again for cleanup.
} | ||
|
||
startRecoveredJobs(); | ||
cleanupDirtyJobs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about having a separate code path for already terminated jobs as the dispatcher is already fairly complex.
What you think about unifying these and having a custom JobManagerRunner implementation that only performs checkpoint cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, that's a good point. It should be doable - it just didn't cross my mind. The JobManagerRunner
could be in charge of handling the job's lifecycle entirely, i.e. also including the cleanup. I will work on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed the JobManagerRunner
approach with Chesnay: We addressed the issue from a user's point of view: We somehow want to visualize the running jobs and the jobs that are completed but still in cleanup phase. Having everything in Dispatcher.runningJobs
makes sense if we want to the cleanup phase still being part of the general job lifecycle. Dirty jobs that need to be cleaned up would be listed as not-finally-completed jobs until the cleanup is successful. The JobManagerRunner
could be responsible for maintaining this lifecycle (even after the JobMaster
finished). That would mean that we would move the cleanup logic into the JobManagerRunner
and have it being used by both, the JobMasterServiceLeadershipRunner
and the new JobManagerRunner
implementation that is in charge of cleaning up the checkpoint-related artefacts.
The flaw of this approach is, that running jobs are meant to have a ExecutionGraphInfo
which is provided by the JobMaster
. For the cleanup, we don't have this information. It is accessible in the ExecutionGraphInfoStore
in the dispatcher, though. We could use that store in the new implementation. But the ExecutionGraphInfoStore
is only persisted locally. Hence, the data would be gone in case of failover.
One workaround for that would be moving the ExecutionGraphInfo
into the JobResultStore
as additional metadata. Essentially, we would merge ExecutionGraphInfoStore
and the JobResultStore
into a single store. The issue with that approach is, that the ExecutionGraphInfoStore
has no requirements to be backwards-compatible right now. Persisting the ExecutionGraph
in the JobResultStore
would change that which is not what we want, I guess. Having a JobResultStore
that stores only a limited amount of metadata per job makes it easier to maintain backwards-compatibility.
Another approach is treating the two phases separately: Dispatcher.runningJobs
is actually only listing jobs with a JobMaster
providing the ExecutionGraph
. The dirty jobs should be handled separately through a separate member (and accessed through a dedicated REST API endpoint as well) like it's currently implemented in this prototyping PR.
We can move the cleanup logic into its own class to remove complexity from the Dispatcher
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the user then informed about problems? There's going to be a dedicated REST endpoint (and new section in the Flink UI) listing the jobs from the JobResultStore
. Any dirty jobs can be labeled as "cleanup pending". The Dispatcher
tries to clean things up. The user is informed about issues through the logs. We could also think about adding some kind of retry counter that is persisted in the JobResultStore
. The Dispatcher
tries to clean things up infinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the user handle failures? The cluster shutdown can be either blocking until all jobs are cleaned up? Or fail with a non-0 exit code in case of jobs not being able to be cleaned up. That would trigger a restart in case of HA, which would pick up the dirty jobs again for cleanup. Here, it might make sense to set some limit on the retries after a shutdown is triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it once more, we could actually follow your approach by extracting a new interface out of the JobManagerRunner
interface which does only contain getResultFuture()
, start()
and getJobID()
. The JobManagerRunner
could extend this interface. A new implementation of the extracted interface could be used to implement the checkpoint-related cleanup. After that is done, the common cleanup other components can be triggered. We could add a method cancelCleanup()
to the new interface to enable explicit cancelling of the cleanup phase. Chesnay voted against reusing JobManagerRunner.cancel()
because of different semantics (cancelling the job resulting in a cancelled job vs. cancelling the job cleanup resulting in a still globally-terminated job).
That would enable us to cancel the cleanup of a single job without shutting down the cluster (through a new REST endpoint). We still have to decide, how a running job behaves for which the cancellation of the cleanup is called.
} | ||
|
||
@VisibleForTesting | ||
static CompletedCheckpointStore createCompletedCheckpointStore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intuition behind moving this into CheckpointRecoveryFactory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intuition was that I wanted to reuse the CheckpointsCleaner
instance in the cleanup of the CompletedCheckpointsStore
. But that's not possible as is because the CheckpointsCleaner
is closed on the JobMaster
. Instead, we should use a CheckpointsCleanerFactory
, instead, that is passed into the different components that need a CheckpointsCleaner
instance.
e6037a0
to
59ac153
Compare
Closing this draft in favor of PR Draft #3 which implements the JobManagerRunner interface |
…dirty JobResults We don't want to retrigger jobs that finished already based on the JobResultStore.
…dirty JobResults We don't want to retrigger jobs that finished already based on the JobResultStore.
…erProcess This change covers now all cases for recovery.
…o be used by both the local and global cleanup
What is the purpose of the change
This PR introduces the cleanup of the dirty job results in the
Dispatcher
. For now, only theioExecutor
is used to trigger the cleanup once.Brief change log
See the individual commits messages for further details on each change.
Verifying this change
TODO: No extensive tests added, yet.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation