Implement asynchronous cleanup of unreferenced blob data #462

stevvooe · 2015-04-28T19:34:09Z

A process should lock and sweep registry data to garbage collect unreferenced. #422 (comment) covers many of the issues will implementing such a solution. During this time, the registry may be in down time or read only mode.
#461 would be a prerequisite of this implementation.

stevvooe · 2015-04-29T20:31:27Z

cc @ncdc

burke · 2015-06-01T18:53:29Z

This would be really useful for us at Shopify. We had a hacked-together GC script for our V1 registry but haven't bothered to write anything for V2 yet. Our growth is a little unsustainable though and we're definitely going to need the ability to prune old blobs sooner or later. (cc @sirupsen)

burnettk · 2015-09-03T17:41:57Z

In the meantime, a bash script that can delete a v2 registry image, metadata and blobs, if you are storing data on the local filesystem (not s3).

stevvooe · 2015-09-03T18:42:37Z

@burnettk That looks great. I am surprised to haven't seen more implementations like that. We're working on a read only mode to make this style of GC easier to implement, which should work nicely with that script. It would be great to get updates on your experience with the script so we can inform the GC implementation.

burnettk · 2015-09-04T14:10:05Z

@stevvooe cool. the script deleted 30GB of the 100GB of data we were using in one hour in its inaugural run. i had it delete the 400 oldest of our 1200 images (we don't really tag, but just create new images/repos with each build). no sign of data corruption yet. :)

stevvooe · 2015-09-04T20:34:06Z

@burnettk Great to hear!

we don't really tag, but just create new images/repos with each build

I really don't recommend creating new image names for every build. The registry is designed more to handle a high cardinality of tags, over a high cardinality of repositories. You'll also pay an extra price in bandwidth to push the layers to a new repository each time.

jonathanperret · 2015-09-05T14:56:08Z

You'll also pay an extra price in bandwidth to push the layers to a new repository each time.

Just to be sure I understand how the content-indexed storage works: if one were to push an indentical stack of layers to a new repository in the same registry, would there be any transfer required? Would it be different if it were a new tag in the same repository?

stevvooe · 2015-09-08T18:20:07Z

@jonathanperret This is more about the security model than content-addressed storage. For a layer to be available in a given repository, the client must prove it has the data before it is accessible through a given repository. This requires a transfer of the data into the repository, which is verified. A new tag in the same repository does not require this transfer.

This is required to prevent layer ids from becoming a "secret", as is the case in v1 of the protocol. Otherwise, simply knowing a given layer id would give any client access.

An optimization, #634, is in the works to avoid this case in the future.

miminar · 2015-09-14T19:55:24Z

+1 for this. I've got few remarks:

Wouldn't it be more client-friendly to do the pruning on a blob immediately (or with some delay) when it gets unreferenced? It would require to keep a graph of objects in memory though. Each guarded with atomic reference counter to avoid global locking. Are there any drawbacks apart from code's complexity?
Could there be an alternative in form of some pruning API?
- e.g. POST */vacuum/pruneBlobs - on demand lock world and remove unreferenced blobs
- e.g. POST */vacuum/recursiveRemove/<repo>/<name>?tag=*

2nd option favors scenarios where a system of record lies outside of registry.

stevvooe · 2015-09-14T20:01:47Z

@miminar The main issue with such an approach is that it requires consensus. I'd like to avoid any extra APIs as we can likely do without. Please see the road map for details on deletes.

mohsen0 · 2016-01-13T14:58:59Z

+1

stevvooe · 2016-01-13T21:23:16Z

@MohsenBs Please avoid extraneous commentary. We understand this is a needed feature and it is in the works. If you would like to get updates, you can subscribe to notifications on this issue.

ghost · 2016-03-17T17:21:23Z

@stevvooe
I've been looking over the issues and constraints for deletion as I have experienced some storage problems and housekeeping came up as a concern.

Would it be feasible to consider some model of kv (file) store where the key could be the blob-id and the value a list of the manifests/images that reference this blob?

In such scenario the item would be created/updated(append) when a new image that references the blob is pushed and updated when an image that references the blob is deleted (remove the image from key's value).

GC could look for empty keys and one by one lock them (mechanism TBD), remove the blobs and remove the keys.

Thank you for your time!

stevvooe · 2016-03-17T19:55:09Z

@realreadmedotcom GC could look for empty keys and one by one lock them (mechanism TBD), remove the blobs and remove the keys.

The issue is keeping the set of "empty" keys consistent with the set of items. Without transactions, what we thought was an "empty" key may become referenced again in the meantime, invalidating the ability to delete. The problem is that the GC may encounter a new blob before it gets added to the reference list. The GC would mark it for deletion.

The other issue is the failure of dependent writes. For example, what if one writes the blob but the update to the reference list is not? What if two writes are made to the same bucket, resulting in a data race where one list addition is not added? In both cases, the GC will delete data that was intended to be kept. Even higher-level, where do you hold the locks?

Both of the above issues can be solved by transactions. Let's take adding a new blob reference:

tx := begin()
tx.add(blob)
references := tx.add(manifest)
increment(references)
commit(tx) // must apply all changes or none

Note that we are using reference counting in this example, since content addressable data is always a DAG, but I hope the concept is clear. Deleting a manifest:

tx := begin()
references := tx.delete(manifest)
decrement(references)
commit(tx)

Now, we have a set of references, with counts. To collect blobs that are no longer referenced, we do the following:

tx := begin()
deletable := find(blob joined with references where count=0)
tx.delete(deletable)
commit(tx)

With such an approach, we could have fairly safe online GC. The requirements are that transactions are completely serializable and that we have the ability to calculate references from a provided document (ie manifest).

I hope this clarifies the problem, in general.

If you're curious about the concept of GC, Go's blog has a very gentle introduction to the concept. They couple concurrent, tri-color mark and sweep with mutation capture while a GC is ongoing. While not super easy to implement, understanding the solution lends itself to understanding the main problem. It is very different from the example above (it handles cycles and other conditions), but it would be equivalent to keeping track of every remove and delete while sweeping all manifests.

BTW, we've merged an implementation of the mark and sweep collector. It does require the registry backend to be locked in some way, but it is ready for use.

RichardScothern · 2016-04-04T22:57:42Z

closed by #1386

stevvooe · 2016-04-04T23:17:25Z

@RichardScothern I am not sure if #1386 covers the "lock" portion of this issue. We may want to keep another issue open so that important detail isn't lost.

vehovsky · 2016-04-05T08:31:17Z

Hi,

I may understand this issue all wrong, but I thought scenarios like this simple example should be covered by this one. (Running this example in boot2docker)

1/ pull two different images:
docker pull distribution/registry:master
docker pull ubuntu

2/ Run registry:
docker volume create --name registry
docker run -d -p 5000:5000 -p 5001:5001 -v /root/registry-config.yml:/etc/docker/registry/config.yml -v registry:/var/lib/registry --name registry distribution/registry:master

3/ Tag those two images as "test:lates" and push them:
docker tag distribution/registry:master localhost:5000/test
docker push localhost:5000/test
docker tag ubuntu localhost:5000/test
docker push localhost:5000/test

4/ Now there should be unreferenced blobs in my registry right? Stop registry. Run garbage collection:
docker stop registry
docker run -v /root/registry-config.yml:/etc/docker/registry/config.yml -v registry:/var/lib/registry --rm distribution/registry:master garbage-collect /etc/docker/registry/config.yml

5/ the blobs were not deleted

Am I missing something here?

Thanks!

RichardScothern · 2016-04-05T17:53:23Z

@stevvooe : #1386 does not enforce locking (though the documentation in that PR does mandate it). Enforcing locking/read-only mode across a registry cluster would be a sizable undertaking and I think the move toward live GC is a more worthwhile effort.

stevvooe · 2016-04-05T18:00:13Z

@vehovmar Please open another issue.

vehovsky · 2016-04-05T19:08:15Z

@stevvooe why another, shouldn't this one be re-opened? Or am I misunderstanding this issue and the use case I posted should not be covered by this one?

stevvooe · 2016-04-05T19:19:07Z

@vehovmar This issue is a feature request and you are looking for support of the feature. What you describe may be a bug or it could be completely unrelated.

This is our policy and I appreciate you for following it.

vehovsky · 2016-04-06T01:03:51Z

All right @stevvooe here it is #1600. Thanks!

Merge master into vnext-compose

stevvooe added the feature label Apr 28, 2015

stevvooe mentioned this issue Apr 28, 2015

Implement deletes per the Docker Registry HTTP API V2 #422

Closed

wking mentioned this issue Jun 1, 2015

Docs for storage drivers versioned in external repositories #577

Closed

stevvooe mentioned this issue Jul 10, 2015

add cleanup feature #106

Closed

dmp42 assigned RichardScothern Jul 23, 2015

dmp42 added this to the Registry/2.2 milestone Jul 23, 2015

This was referenced Sep 28, 2015

WIP: Added pruner binary #1032

Closed

Exported storage API functions needed for pruning #1050

Closed

Roadmap for registry pruning and metadata fixing #1052

Open

yoanisgil mentioned this issue Oct 15, 2015

Add ability to delete tags docker/hub-feedback#68

Closed

RichardScothern modified the milestones: Registry/2.3, Registry/2.2 Oct 30, 2015

bwb mentioned this issue Nov 20, 2015

Image deletion not really deleting image blobs? #1183

Closed

holgerreif mentioned this issue Jan 8, 2016

Support for tag editing in UI SUSE/Portus#641

Open

RichardScothern mentioned this issue Jan 19, 2016

Can an image be deleted from the registry? #1349

Closed

amylindburg modified the milestones: Registry/2.4, Registry/2.3 Jan 26, 2016

holgerreif mentioned this issue Mar 1, 2016

How do we destroy a namespace? SUSE/Portus#767

Closed

RichardScothern closed this as completed Apr 4, 2016

vehovsky mentioned this issue Apr 6, 2016

registry:2.4-rc1 garbage-collect of unreferenced blob data not working #1600

Closed

Vanuan mentioned this issue Jul 21, 2016

Manifest/tag deletion use cases need to be thought through #1859

Open

thaJeztah pushed a commit to thaJeztah/distribution that referenced this issue Apr 22, 2021

Merge pull request distribution#462 from mstanleyjones/vnext-compose2

afa0ef7

Merge master into vnext-compose

thaJeztah pushed a commit to thaJeztah/distribution that referenced this issue Jan 19, 2022

Merge pull request distribution#462 from mstanleyjones/vnext-compose2

af1573e

Merge master into vnext-compose

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement asynchronous cleanup of unreferenced blob data #462

Implement asynchronous cleanup of unreferenced blob data #462

stevvooe commented Apr 28, 2015

stevvooe commented Apr 29, 2015

burke commented Jun 1, 2015

burnettk commented Sep 3, 2015

stevvooe commented Sep 3, 2015

burnettk commented Sep 4, 2015

stevvooe commented Sep 4, 2015

jonathanperret commented Sep 5, 2015

stevvooe commented Sep 8, 2015

miminar commented Sep 14, 2015

stevvooe commented Sep 14, 2015

mohsen0 commented Jan 13, 2016

stevvooe commented Jan 13, 2016

ghost commented Mar 17, 2016

stevvooe commented Mar 17, 2016

RichardScothern commented Apr 4, 2016

stevvooe commented Apr 4, 2016

vehovsky commented Apr 5, 2016

RichardScothern commented Apr 5, 2016

stevvooe commented Apr 5, 2016

vehovsky commented Apr 5, 2016

stevvooe commented Apr 5, 2016

vehovsky commented Apr 6, 2016

Implement asynchronous cleanup of unreferenced blob data #462

Implement asynchronous cleanup of unreferenced blob data #462

Comments

stevvooe commented Apr 28, 2015

stevvooe commented Apr 29, 2015

burke commented Jun 1, 2015

burnettk commented Sep 3, 2015

stevvooe commented Sep 3, 2015

burnettk commented Sep 4, 2015

stevvooe commented Sep 4, 2015

jonathanperret commented Sep 5, 2015

stevvooe commented Sep 8, 2015

miminar commented Sep 14, 2015

stevvooe commented Sep 14, 2015

mohsen0 commented Jan 13, 2016

stevvooe commented Jan 13, 2016

ghost commented Mar 17, 2016

stevvooe commented Mar 17, 2016

RichardScothern commented Apr 4, 2016

stevvooe commented Apr 4, 2016

vehovsky commented Apr 5, 2016

RichardScothern commented Apr 5, 2016

stevvooe commented Apr 5, 2016

vehovsky commented Apr 5, 2016

stevvooe commented Apr 5, 2016

vehovsky commented Apr 6, 2016