-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-4150] [runtime] Don't clean up BlobStore on BlobServer shut down #2256
Conversation
Just a quick question. Do we want to remove also failed jobs from the BlobStore and ZK? Or only finished or cancelled jobs? |
I don't know if we "want to", but it is the current behaviour. A job should only fail if its restart strategy is exhausted though. Do you think we should change that behaviour? |
In general, I think it would be helpful for users to be able to retrieve checkpoints of a failed job. I could imagine a scenario where a job is faulty but one only runs into after some time. Being then able to transform a checkpoint into a savepoint and then restarting the failed job with a corrected jar could be helpful. Thus, I think we should only remove the persisted job data if the job has reached FINISHED or CANCELED. Admittedly, this is a very conservative approach, but then users are less likely to lose data. However, this should be out of the scope of this PR. |
@@ -77,7 +77,7 @@ public BlobLibraryCacheManager(BlobService blobService, long cleanupInterval) { | |||
|
|||
// Initializing the clean up task | |||
this.cleanupTimer = new Timer(true); | |||
this.cleanupTimer.schedule(this, cleanupInterval); | |||
this.cleanupTimer.schedule(this, cleanupInterval, cleanupInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👍
Changes look good to me :-). Really good work @uce. I'm just wondering whether we could remove empty folders upon shutdown of the |
Thank you for your review. I've addressed your comment and now parent directories are deleted if empty, resulting in an empty storage folder after regular cleanup. If there are no objections, I would like to merge this later today. |
The `BlobServer` acts as a local cache for uploaded BLOBs. The life-cycle of each BLOB is bound to the life-cycle of the `BlobServer`. If the BlobServer shuts down (on JobManager shut down), all local files will be removed. With HA, BLOBs are persisted to another file system (e.g. HDFS) via the `BlobStore` in order to have BLOBs available after a JobManager failure (or shut down). These BLOBs are only allowed to be removed when the job that requires them enters a globally terminal state (`FINISHED`, `CANCELLED`, `FAILED`). This commit removes the `BlobStore` clean up call from the `BlobServer` shutdown. The `BlobStore` files will only be cleaned up via the `BlobLibraryCacheManager`'s' clean up task (periodically or on BlobLibraryCacheManager shutdown). This means that there is a chance that BLOBs will linger around after the job has terminated, if the job manager fails before the clean up.
The
BlobServer
acts as a local cache for uploaded BLOBs. The life-cycle of each BLOB is bound to the life-cycle of theBlobServer
. If the BlobServer shuts down (on JobManager shut down), all local files will be removed.With HA, BLOBs are persisted to another file system (e.g. HDFS) via the
BlobStore
in order to have BLOBs available after a JobManager failure (or shut down). These BLOBs are only allowed to be removed when the job that requires them enters a globally terminal state (FINISHED
,CANCELLED
,FAILED
).This commit removes the
BlobStore
clean up call from theBlobServer
shutdown. TheBlobStore
files will only be cleaned up via theBlobLibraryCacheManager
's' clean up task (periodically or on BlobLibraryCacheManager shutdown). This means that there is a chance that BLOBs will linger around after the job has terminated, if the job manager fails before the clean up.