Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup script needs to check if run is active or not #4245

Closed
drkovalskyi opened this issue Dec 14, 2015 · 11 comments
Closed

Cleanup script needs to check if run is active or not #4245

drkovalskyi opened this issue Dec 14, 2015 · 11 comments

Comments

@drkovalskyi
Copy link

We need to make sure that data are not lost if Tier-0 cannot keep up with either the streamer file repacking or transferring data out to custodial tape storage. We need a clear warning when a certain level of available space utilization is reached, so that preventive action can be taken.

@johnhcasallasl
Copy link
Contributor

The Active state is "calculated" in the client side of WMStats, so a direct verification is not possible. However the input data (a huge Json document) received by the client is available via REST I think. The criteria used to mark a run as "active" is to check if any of the workflows is in "new" state. This could be checked from the clean up script.

@hufnagel
Copy link
Member

The cleanup script in its current form (cron job) will likely never be fully integrated into the Tier0 repacking status or send any alarms. As such what is asked here wonlt be provided, but we might be able to implement another way to do streamer deletions (from within the Tier0 for instance).

Have to think about it a bit.

@drkovalskyi
Copy link
Author

I think we must check if it's safe to delete data. If the current tools cannot support that, we have to develop new tools. So let's identified what exactly needs to be done and define a time line.

@hufnagel
Copy link
Member

Its desirable. OTOH, unmerged is cleaned up on a 14 day timer. We don't check here whether the data is safe either. And the unmerged space at CERN is in the RAW data path the same way as the streamer buffer.

So it's not like we don't already rely on following procedures in a timely manner...

@drkovalskyi
Copy link
Author

While I agree that if everything is done in a timely manner everything works fine, I do think we need to protect from a potential unrecoverable data loss by checking if it's safe to delete.

@hufnagel
Copy link
Member

Will see what we can do. Just saying that unmerged cleanup is the same and we have no hooks there either into the Tier0 (not do we necessarily want to).

@drkovalskyi
Copy link
Author

I would say we need the protection for anything that cannot be recovered, i.e. anything that may lead to loss of RAW data. For recoverable data, i.e. all other data tiers and processing types the current system is good enough.

@hufnagel
Copy link
Member

RAW goes through unmerged like anything else, therefore it's not recoverable...

@drkovalskyi
Copy link
Author

Ok, just to make sure there is no misunderstanding we need a system that would prevent deletion of:

  1. streamer files till they are no longer needed to get RAW unmerged files
  2. RAW unmerged files till they are no longer needed to get RAW files
  3. RAW files on EOS till we have a custodial copy on tape.
    It's possible that we are talking about a multiple tools, but they all are part of one main objective: protect unrecoverable data and create back-pressure in Tier-0 processing.

@hufnagel
Copy link
Member

hufnagel commented Feb 5, 2016

First part of this is implemented, I added a run/stream processing completion publication into the Tier0 Data Service. Will take a while before this becomes available in cmsweb though. Once it does I can look at using it in the t0streamer cleanup script.

@hufnagel
Copy link
Member

hufnagel commented Jun 3, 2017

This has been deployed long ago.

@hufnagel hufnagel closed this as completed Jun 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants