Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Multiple instances of the cache cleaner can run simultaneously on large caches #1337

Closed
jeffkaufman opened this issue Jul 6, 2016 · 3 comments
Assignees

Comments

@jeffkaufman
Copy link
Contributor

Our cache cleaning algorithm is:

  • walk the whole cache, collecting paths, sizes, and last-modified times [1]
  • sort the whole list by last-modified
  • delete the oldest-modified ones until our total size is under the limit

We do this once an hour, and if the previous run hasn't completed we start a new run. This means that if your cache is large enough that we can't walk it in an hour you're going to have a bad time.

What we should do instead is either include our PID in the lockfile or keep pinging it every so often to indicate we're still working, so that we know not to start a new cache-cleaning run in the middle of cleaning the cache.

[1] This is dependent on the fs being mounted with the atime option, which isn't that common. Without it we move from last-recently used to last-modified.

@jeffkaufman
Copy link
Contributor Author

If we switch to a PID based system then we don't handle cases where two machines share a disk cache.

@jeffkaufman jeffkaufman self-assigned this Jul 13, 2016
@jeffkaufman
Copy link
Contributor Author

I have a draft of this I'll be sending out for review soon.

@jmarantz
Copy link
Contributor

I think we should tell users explicitly not to put the file cache on a
shared FS. In fact I think we already do discourage it in the doc.

So designing a locking system assuming that seems Ok to me. Having said
that I think centralizing the file cache cleaning in the controller process
should be considered.
On Jul 13, 2016 12:22 PM, "Jeff Kaufman" notifications@github.com wrote:

I have a draft of this I'll be sending out for review soon.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1337 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AB2kPVNs3mZe_ZZ3mtHUQ023UXWjUDVOks5qVR66gaJpZM4JGBrQ
.

@jeffkaufman jeffkaufman changed the title Cache cleaning spawns multiple times on large caches Multiple instances of the cache cleaner can run simultaneously on large caches Oct 11, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants