Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage limit support? #64

Closed
mjarkk opened this issue Oct 31, 2022 · 4 comments
Closed

Storage limit support? #64

mjarkk opened this issue Oct 31, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@mjarkk
Copy link

mjarkk commented Oct 31, 2022

馃殌 Feature Proposal

Is there a way I can limit the amount of storage that is used and if not is there any change this feature might get added?

Motivation

I only have a finite amount of cloud storage and I'm worried using this will quickly fill up all my cloud storage.
So I would like to set a limit to the amount of storage the cache can take up.

Example

I would like to have for example a remote cache with a maximum of 20 GiB
If that size is reached and new cache entries are added the oldest caches are removed.

Alternatives

I presume I can clear the cache myself and I would accept to do that once a week / month but there Is currently no documentation on this :(

@fox1t fox1t added the enhancement New feature or request label Nov 2, 2022
@fox1t
Copy link
Collaborator

fox1t commented Nov 2, 2022

Hi! I like this feature. It can be tricky to implement since we support several storage targets and deployment environments. Do you have something in mind?

Which storage provider are you using?

@mjarkk
Copy link
Author

mjarkk commented Nov 2, 2022

Currently just testing this and thus using the local storage but would prefer using the s3 storage.

Maybe instead of having the limit here you hand that over to the storage system the user is using.
Then when you hit the limit of the storage (by trying to upload a new file) you loop over all the entries and remove for example 10% of the cache entries sorted by oldest to newest.
Why remove 10% of all caches?
This action probably takes a lot of time so you don't want to execute this every time a cache user tries to upload a new cache entry so removing 10% of the oldest caches allows the end user to upload at least x more caches before this function runs again.

I have 2 conserns with the approach above:

  1. What if the limit is for example 1TiB and that limit is reached will looping tough all cache files be a problem? I think so but at the same time is this even possible to create so many cache entries in a real environment?
  2. I think every platform has different error messages for reaching the max storage capacity and finding all of those errors is a bit annoying i presume

An alternative would maybe be to have some sort of ttl on the cache entries and have a function that runs every half a day or so that fetches a list of all caches and check if any of them outlived the ttl and thus can be removed.

@matteovivona
Copy link
Collaborator

We currently save the cache in an S3 bucket, and to clean up the old cache, we have set up a lifecycle rule at the bucket level. It is the easiest thing to do and all the major providers support it https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html

@fox1t
Copy link
Collaborator

fox1t commented Dec 14, 2022

I am closing this for no further follow-up.

@fox1t fox1t closed this as completed Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants