-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup script needs to check if run is active or not #4245
Comments
The Active state is "calculated" in the client side of WMStats, so a direct verification is not possible. However the input data (a huge Json document) received by the client is available via REST I think. The criteria used to mark a run as "active" is to check if any of the workflows is in "new" state. This could be checked from the clean up script. |
The cleanup script in its current form (cron job) will likely never be fully integrated into the Tier0 repacking status or send any alarms. As such what is asked here wonlt be provided, but we might be able to implement another way to do streamer deletions (from within the Tier0 for instance). Have to think about it a bit. |
I think we must check if it's safe to delete data. If the current tools cannot support that, we have to develop new tools. So let's identified what exactly needs to be done and define a time line. |
Its desirable. OTOH, unmerged is cleaned up on a 14 day timer. We don't check here whether the data is safe either. And the unmerged space at CERN is in the RAW data path the same way as the streamer buffer. So it's not like we don't already rely on following procedures in a timely manner... |
While I agree that if everything is done in a timely manner everything works fine, I do think we need to protect from a potential unrecoverable data loss by checking if it's safe to delete. |
Will see what we can do. Just saying that unmerged cleanup is the same and we have no hooks there either into the Tier0 (not do we necessarily want to). |
I would say we need the protection for anything that cannot be recovered, i.e. anything that may lead to loss of RAW data. For recoverable data, i.e. all other data tiers and processing types the current system is good enough. |
RAW goes through unmerged like anything else, therefore it's not recoverable... |
Ok, just to make sure there is no misunderstanding we need a system that would prevent deletion of:
|
First part of this is implemented, I added a run/stream processing completion publication into the Tier0 Data Service. Will take a while before this becomes available in cmsweb though. Once it does I can look at using it in the t0streamer cleanup script. |
This has been deployed long ago. |
We need to make sure that data are not lost if Tier-0 cannot keep up with either the streamer file repacking or transferring data out to custodial tape storage. We need a clear warning when a certain level of available space utilization is reached, so that preventive action can be taken.
The text was updated successfully, but these errors were encountered: