Skip to content
This repository was archived by the owner on Apr 10, 2025. It is now read-only.
This repository was archived by the owner on Apr 10, 2025. It is now read-only.

Multiple instances of the cache cleaner can run simultaneously on large caches #1337

@jeffkaufman

Description

@jeffkaufman

Our cache cleaning algorithm is:

  • walk the whole cache, collecting paths, sizes, and last-modified times [1]
  • sort the whole list by last-modified
  • delete the oldest-modified ones until our total size is under the limit

We do this once an hour, and if the previous run hasn't completed we start a new run. This means that if your cache is large enough that we can't walk it in an hour you're going to have a bad time.

What we should do instead is either include our PID in the lockfile or keep pinging it every so often to indicate we're still working, so that we know not to start a new cache-cleaning run in the middle of cleaning the cache.

[1] This is dependent on the fs being mounted with the atime option, which isn't that common. Without it we move from last-recently used to last-modified.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions