-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement asynchronous cleanup of unreferenced blob data #462
Comments
cc @ncdc |
This would be really useful for us at Shopify. We had a hacked-together GC script for our V1 registry but haven't bothered to write anything for V2 yet. Our growth is a little unsustainable though and we're definitely going to need the ability to prune old blobs sooner or later. (cc @sirupsen) |
In the meantime, a bash script that can delete a v2 registry image, metadata and blobs, if you are storing data on the local filesystem (not s3). |
@burnettk That looks great. I am surprised to haven't seen more implementations like that. We're working on a read only mode to make this style of GC easier to implement, which should work nicely with that script. It would be great to get updates on your experience with the script so we can inform the GC implementation. |
@stevvooe cool. the script deleted 30GB of the 100GB of data we were using in one hour in its inaugural run. i had it delete the 400 oldest of our 1200 images (we don't really tag, but just create new images/repos with each build). no sign of data corruption yet. :) |
@burnettk Great to hear!
I really don't recommend creating new image names for every build. The registry is designed more to handle a high cardinality of tags, over a high cardinality of repositories. You'll also pay an extra price in bandwidth to push the layers to a new repository each time. |
Just to be sure I understand how the content-indexed storage works: if one were to push an indentical stack of layers to a new repository in the same registry, would there be any transfer required? Would it be different if it were a new tag in the same repository? |
@jonathanperret This is more about the security model than content-addressed storage. For a layer to be available in a given repository, the client must prove it has the data before it is accessible through a given repository. This requires a transfer of the data into the repository, which is verified. A new tag in the same repository does not require this transfer. This is required to prevent layer ids from becoming a "secret", as is the case in v1 of the protocol. Otherwise, simply knowing a given layer id would give any client access. An optimization, #634, is in the works to avoid this case in the future. |
+1 for this. I've got few remarks:
2nd option favors scenarios where a system of record lies outside of registry. |
+1 |
@MohsenBs Please avoid extraneous commentary. We understand this is a needed feature and it is in the works. If you would like to get updates, you can subscribe to notifications on this issue. |
@stevvooe Would it be feasible to consider some model of kv (file) store where the key could be the blob-id and the value a list of the manifests/images that reference this blob? In such scenario the item would be created/updated(append) when a new image that references the blob is pushed and updated when an image that references the blob is deleted (remove the image from key's value). GC could look for empty keys and one by one lock them (mechanism TBD), remove the blobs and remove the keys. Thank you for your time! |
The issue is keeping the set of "empty" keys consistent with the set of items. Without transactions, what we thought was an "empty" key may become referenced again in the meantime, invalidating the ability to delete. The problem is that the GC may encounter a new blob before it gets added to the reference list. The GC would mark it for deletion. The other issue is the failure of dependent writes. For example, what if one writes the blob but the update to the reference list is not? What if two writes are made to the same bucket, resulting in a data race where one list addition is not added? In both cases, the GC will delete data that was intended to be kept. Even higher-level, where do you hold the locks? Both of the above issues can be solved by transactions. Let's take adding a new blob reference:
Note that we are using reference counting in this example, since content addressable data is always a DAG, but I hope the concept is clear. Deleting a manifest:
Now, we have a set of references, with counts. To collect blobs that are no longer referenced, we do the following:
With such an approach, we could have fairly safe online GC. The requirements are that transactions are completely serializable and that we have the ability to calculate references from a provided document (ie manifest). I hope this clarifies the problem, in general. If you're curious about the concept of GC, Go's blog has a very gentle introduction to the concept. They couple concurrent, tri-color mark and sweep with mutation capture while a GC is ongoing. While not super easy to implement, understanding the solution lends itself to understanding the main problem. It is very different from the example above (it handles cycles and other conditions), but it would be equivalent to keeping track of every remove and delete while sweeping all manifests. BTW, we've merged an implementation of the mark and sweep collector. It does require the registry backend to be locked in some way, but it is ready for use. |
closed by #1386 |
@RichardScothern I am not sure if #1386 covers the "lock" portion of this issue. We may want to keep another issue open so that important detail isn't lost. |
Hi, I may understand this issue all wrong, but I thought scenarios like this simple example should be covered by this one. (Running this example in boot2docker) 1/ pull two different images: 2/ Run registry: 3/ Tag those two images as "test:lates" and push them: 4/ Now there should be unreferenced blobs in my registry right? Stop registry. Run garbage collection: 5/ the blobs were not deleted Am I missing something here? Thanks! |
@vehovmar Please open another issue. |
@stevvooe why another, shouldn't this one be re-opened? Or am I misunderstanding this issue and the use case I posted should not be covered by this one? |
@vehovmar This issue is a feature request and you are looking for support of the feature. What you describe may be a bug or it could be completely unrelated. This is our policy and I appreciate you for following it. |
Merge master into vnext-compose
Merge master into vnext-compose
A process should lock and sweep registry data to garbage collect unreferenced. #422 (comment) covers many of the issues will implementing such a solution. During this time, the registry may be in down time or read only mode.
#461 would be a prerequisite of this implementation.
The text was updated successfully, but these errors were encountered: