-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaning /scratch of old files #249
Comments
Thanks for noticing this! We should definitely have a cleaning policy and announce it. Deleting files older than six months with a cron job sounds great for now. We can then figure out if there is a better practical policy for the future. |
Well, I was looking at something else but I'll cron something up or extend tmpwatch (the one that does tmp and /var/tmp) |
I would rather treat this generally as temporary space. In general, if somebody wants to write tmp data during a job, each job creates a job-specific temporary directory that is present in the TMPDIR environment variable:
This directory will be removed upon completion of the job. So the only use case I see for the remainder of Could we just have a designates directory for this (e.g., |
+1 on @akahles's suggestion for policy. I had totally forgotten about the automatic setting g of $TMPDIR. It may be worth an email announcement that people should use $TMPDIR (with a pointer to the wiki docs) or else the safety of their files cannot be guaranteed. The addition of /scratch/shared that is cleaned of old files periodically is also a great idea. |
+1
|
Noted above and will draft up some items here and prep the defined cleaning items. Nothing will happen for awhile so others might note this. |
/scratch/shared was created. Working on logic of cleaning it and above dirs and announcement. No cleaning will take place for awhile. This is a back burner item but moving it along. |
Is $TMPDIR supposed to refer to the /scratch/$jobid folder as specified in the wiki? Currently it seems to just point to /scratch/ despite a job-specific folder being created. e.g.:
|
I don't show the above so I would check your wrapper script in case it is resetting or clearing environment.
|
Right as usual, @tatarsky. Thanks. |
Well, I don't know about that ;) BTW...I've never gotten back to the original goal of this Git entry. I'll re-engage it shortly. |
Note shortly (and I can announce this more) that any items in the top level of /scratch will be deleted using an age based cron job. My current proposal is 60 days of no mtime. I am in no hurry to implement this but figured I'd move it along a notch. Items in Items correctly using PBS $TMPDIR will be deleted based on age if left after a job as by default those are rooted in /scratch itself. Items in docker images areas will be handled with a still being discussed #288 method. Questions/concerns can continue to go here. I will make a louder statement of implementation when necessary. |
Just for clarification. $TMPDIR defaults to /scratch/JOBID. This directory is normally removed when the job finishes. Are there exceptions to this rule? |
Sorry that was unclear in my comment above. I've seen I believe a few instances where it appeared perhaps due to some of our past "lost jobs" some dirs in /scratch that looked like orphaned TMPDIR items. I wanted to make sure people understood I wasn't going to delete their TMPDIR areas unless they were really really old and obviously not in use. May have been another cluster, but basically active TMPDIR areas won't match the cron mtime rule in this plan. |
Thanks for clarifying. |
Some examples on cpu-6-1 which I suspect were orphaned during some of the issues that machine had:
Such obvious orphans would match the cron job and be deleted. If you feel such a dir is NOT an orphan now would be the time to see whats out there. You for example on the same system I see clearly your TMPDIR areas for your running jobs. All with Oct 14 timestamps. |
Helpful pattern to observe such animals on a node:
|
I am going to document the concept of this in the Wiki and then announce a trial run. I've left this idle for too long but the space out there isn't really that used so I've ignored it. But we should have a policy and a cleaning script in case that changed. |
I have attempted to define the above and will likely re-issue a Git request when the actual script is ready for running. This is not viewed as urgent but its probably something we will need someday. https://github.com/cBio/cbio-cluster/wiki/MSKCC-cBio-Cluster-User-Guide#scratch-disk-space |
There appears to be no concept of removal of old files in /scratch and they are frankly littered with such files. Not a space issue, but a large number of "top dir" files making it a bit pokey to stat/ls.
But I don't know the policy that was adopted for /scratch on the nodes so before I add a cron job to say remove any file more than six months old please comment.
Alternately I can "not care" and let folks clean up themselves but the fact there are files as far back as 2013 suggests that isn't being done ;)
The text was updated successfully, but these errors were encountered: