Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement asynchronous cleanup of unreferenced blob data #462

Closed
stevvooe opened this issue Apr 28, 2015 · 22 comments
Closed

Implement asynchronous cleanup of unreferenced blob data #462

stevvooe opened this issue Apr 28, 2015 · 22 comments
Assignees
Labels
Milestone

Comments

@stevvooe
Copy link
Collaborator

A process should lock and sweep registry data to garbage collect unreferenced. #422 (comment) covers many of the issues will implementing such a solution. During this time, the registry may be in down time or read only mode.
#461 would be a prerequisite of this implementation.

@stevvooe
Copy link
Collaborator Author

cc @ncdc

@burke
Copy link

burke commented Jun 1, 2015

This would be really useful for us at Shopify. We had a hacked-together GC script for our V1 registry but haven't bothered to write anything for V2 yet. Our growth is a little unsustainable though and we're definitely going to need the ability to prune old blobs sooner or later. (cc @sirupsen)

@burnettk
Copy link
Contributor

burnettk commented Sep 3, 2015

In the meantime, a bash script that can delete a v2 registry image, metadata and blobs, if you are storing data on the local filesystem (not s3).

@stevvooe
Copy link
Collaborator Author

stevvooe commented Sep 3, 2015

@burnettk That looks great. I am surprised to haven't seen more implementations like that. We're working on a read only mode to make this style of GC easier to implement, which should work nicely with that script. It would be great to get updates on your experience with the script so we can inform the GC implementation.

@burnettk
Copy link
Contributor

burnettk commented Sep 4, 2015

@stevvooe cool. the script deleted 30GB of the 100GB of data we were using in one hour in its inaugural run. i had it delete the 400 oldest of our 1200 images (we don't really tag, but just create new images/repos with each build). no sign of data corruption yet. :)

@stevvooe
Copy link
Collaborator Author

stevvooe commented Sep 4, 2015

@burnettk Great to hear!

we don't really tag, but just create new images/repos with each build

I really don't recommend creating new image names for every build. The registry is designed more to handle a high cardinality of tags, over a high cardinality of repositories. You'll also pay an extra price in bandwidth to push the layers to a new repository each time.

@jonathanperret
Copy link

You'll also pay an extra price in bandwidth to push the layers to a new repository each time.

Just to be sure I understand how the content-indexed storage works: if one were to push an indentical stack of layers to a new repository in the same registry, would there be any transfer required? Would it be different if it were a new tag in the same repository?

@stevvooe
Copy link
Collaborator Author

stevvooe commented Sep 8, 2015

@jonathanperret This is more about the security model than content-addressed storage. For a layer to be available in a given repository, the client must prove it has the data before it is accessible through a given repository. This requires a transfer of the data into the repository, which is verified. A new tag in the same repository does not require this transfer.

This is required to prevent layer ids from becoming a "secret", as is the case in v1 of the protocol. Otherwise, simply knowing a given layer id would give any client access.

An optimization, #634, is in the works to avoid this case in the future.

@miminar
Copy link
Contributor

miminar commented Sep 14, 2015

+1 for this. I've got few remarks:

  1. Wouldn't it be more client-friendly to do the pruning on a blob immediately (or with some delay) when it gets unreferenced? It would require to keep a graph of objects in memory though. Each guarded with atomic reference counter to avoid global locking. Are there any drawbacks apart from code's complexity?
  2. Could there be an alternative in form of some pruning API?
    • e.g. POST */vacuum/pruneBlobs - on demand lock world and remove unreferenced blobs
    • e.g. POST */vacuum/recursiveRemove/<repo>/<name>?tag=*

2nd option favors scenarios where a system of record lies outside of registry.

@stevvooe
Copy link
Collaborator Author

@miminar The main issue with such an approach is that it requires consensus. I'd like to avoid any extra APIs as we can likely do without. Please see the road map for details on deletes.

@mohsen0
Copy link

mohsen0 commented Jan 13, 2016

+1

@stevvooe
Copy link
Collaborator Author

@MohsenBs Please avoid extraneous commentary. We understand this is a needed feature and it is in the works. If you would like to get updates, you can subscribe to notifications on this issue.

@ghost
Copy link

ghost commented Mar 17, 2016

@stevvooe
I've been looking over the issues and constraints for deletion as I have experienced some storage problems and housekeeping came up as a concern.

Would it be feasible to consider some model of kv (file) store where the key could be the blob-id and the value a list of the manifests/images that reference this blob?

In such scenario the item would be created/updated(append) when a new image that references the blob is pushed and updated when an image that references the blob is deleted (remove the image from key's value).

GC could look for empty keys and one by one lock them (mechanism TBD), remove the blobs and remove the keys.

Thank you for your time!

@stevvooe
Copy link
Collaborator Author

@realreadmedotcom GC could look for empty keys and one by one lock them (mechanism TBD), remove the blobs and remove the keys.

The issue is keeping the set of "empty" keys consistent with the set of items. Without transactions, what we thought was an "empty" key may become referenced again in the meantime, invalidating the ability to delete. The problem is that the GC may encounter a new blob before it gets added to the reference list. The GC would mark it for deletion.

The other issue is the failure of dependent writes. For example, what if one writes the blob but the update to the reference list is not? What if two writes are made to the same bucket, resulting in a data race where one list addition is not added? In both cases, the GC will delete data that was intended to be kept. Even higher-level, where do you hold the locks?

Both of the above issues can be solved by transactions. Let's take adding a new blob reference:

tx := begin()
tx.add(blob)
references := tx.add(manifest)
increment(references)
commit(tx) // must apply all changes or none

Note that we are using reference counting in this example, since content addressable data is always a DAG, but I hope the concept is clear. Deleting a manifest:

tx := begin()
references := tx.delete(manifest)
decrement(references)
commit(tx)

Now, we have a set of references, with counts. To collect blobs that are no longer referenced, we do the following:

tx := begin()
deletable := find(blob joined with references where count=0)
tx.delete(deletable)
commit(tx)

With such an approach, we could have fairly safe online GC. The requirements are that transactions are completely serializable and that we have the ability to calculate references from a provided document (ie manifest).

I hope this clarifies the problem, in general.

If you're curious about the concept of GC, Go's blog has a very gentle introduction to the concept. They couple concurrent, tri-color mark and sweep with mutation capture while a GC is ongoing. While not super easy to implement, understanding the solution lends itself to understanding the main problem. It is very different from the example above (it handles cycles and other conditions), but it would be equivalent to keeping track of every remove and delete while sweeping all manifests.

BTW, we've merged an implementation of the mark and sweep collector. It does require the registry backend to be locked in some way, but it is ready for use.

@RichardScothern
Copy link
Contributor

closed by #1386

@stevvooe
Copy link
Collaborator Author

stevvooe commented Apr 4, 2016

@RichardScothern I am not sure if #1386 covers the "lock" portion of this issue. We may want to keep another issue open so that important detail isn't lost.

@vehovsky
Copy link

vehovsky commented Apr 5, 2016

Hi,

I may understand this issue all wrong, but I thought scenarios like this simple example should be covered by this one. (Running this example in boot2docker)

1/ pull two different images:
docker pull distribution/registry:master
docker pull ubuntu

2/ Run registry:
docker volume create --name registry
docker run -d -p 5000:5000 -p 5001:5001 -v /root/registry-config.yml:/etc/docker/registry/config.yml -v registry:/var/lib/registry --name registry distribution/registry:master

3/ Tag those two images as "test:lates" and push them:
docker tag distribution/registry:master localhost:5000/test
docker push localhost:5000/test
docker tag ubuntu localhost:5000/test
docker push localhost:5000/test

4/ Now there should be unreferenced blobs in my registry right? Stop registry. Run garbage collection:
docker stop registry
docker run -v /root/registry-config.yml:/etc/docker/registry/config.yml -v registry:/var/lib/registry --rm distribution/registry:master garbage-collect /etc/docker/registry/config.yml

5/ the blobs were not deleted

Am I missing something here?

Thanks!

@RichardScothern
Copy link
Contributor

@stevvooe : #1386 does not enforce locking (though the documentation in that PR does mandate it). Enforcing locking/read-only mode across a registry cluster would be a sizable undertaking and I think the move toward live GC is a more worthwhile effort.

@stevvooe
Copy link
Collaborator Author

stevvooe commented Apr 5, 2016

@vehovmar Please open another issue.

@vehovsky
Copy link

vehovsky commented Apr 5, 2016

@stevvooe why another, shouldn't this one be re-opened? Or am I misunderstanding this issue and the use case I posted should not be covered by this one?

@stevvooe
Copy link
Collaborator Author

stevvooe commented Apr 5, 2016

@vehovmar This issue is a feature request and you are looking for support of the feature. What you describe may be a bug or it could be completely unrelated.

This is our policy and I appreciate you for following it.

@vehovsky
Copy link

vehovsky commented Apr 6, 2016

All right @stevvooe here it is #1600. Thanks!

thaJeztah pushed a commit to thaJeztah/distribution that referenced this issue Apr 22, 2021
thaJeztah pushed a commit to thaJeztah/distribution that referenced this issue Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment