New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collect untagged manifests #1844

Open
Vanuan opened this Issue Jul 16, 2016 · 17 comments

Comments

Projects
None yet
9 participants
@Vanuan
Copy link

Vanuan commented Jul 16, 2016

As suggested in #1600, there should be a garbage collect option to delete all manifests that are not references by any tags (dangling).

Use case 1:

  • push a new tag
  • delete an old tag (by deleting a manifest)
  • reclaim disk space by running garbage collection

Use case 2:

  • push a latest tag (creating "manifest 1")
  • push a latest tag (creating "manifest 2")
  • reclaim disk space by running garbage collection (delete "manifest 1" automatically)

@Vanuan Vanuan changed the title Garbage collect untagged images Garbage collect untagged manifets Jul 16, 2016

@Vanuan Vanuan changed the title Garbage collect untagged manifets Garbage collect untagged manifests Jul 16, 2016

@Vanuan

This comment has been minimized.

Copy link
Author

Vanuan commented Jul 17, 2016

#1813 is a pre-requisite to this feature

@Vanuan

This comment has been minimized.

Copy link
Author

Vanuan commented Jul 17, 2016

I assume that the first use case should already be working:

  1. fetch a list of tags: GET /v2/<name>/tags/list (sorted by create/update date)
  2. for each tag, fetch a manifest digest: GET /v2/<name>/manifests/<reference> (use tag for reference)
  3. delete all tags except of the last created/updated: DELETE /v2/<name>/manifests/<reference> (use digest for reference)

But there's no way to implement the second one.

@mhornbech

This comment has been minimized.

Copy link

mhornbech commented Jul 18, 2016

In our company we use tags that match the branch name og the code, which corresponds to your use case 2. I was actually quite surprised that this wasn't the standard behaviour, since as far as I can see there is no way to retrieve the old manifest digests from the API. Our current workaround is to delete the current manifest before pushing a new one.

@bwb

This comment has been minimized.

Copy link

bwb commented Jan 19, 2017

Use case 2 is extremely common.

Blobs that would be eligible for garbage collection were it not for untagged manifests consume the majority of storage space allocated to the private registries I'm responsible for.

Is anyone working on this?

@pszczekutowicz

This comment has been minimized.

Copy link

pszczekutowicz commented Aug 2, 2017

Correct me if I'm wrong but second use case scenario is:

For each repository:

  1. get list of all manifests
  2. for each tag remove tag's manifest from list of all manifests
  3. remove remaining manifests from storage

After above steps garbage collector will reclaim disk space.

@Vanuan

This comment has been minimized.

Copy link
Author

Vanuan commented Aug 3, 2017

Yeah that would be how to solve it. Though there are 3 more steps:

  1. Stop registry
  2. Run gc
  3. Start registry
@Vanuan

This comment has been minimized.

Copy link
Author

Vanuan commented Sep 15, 2017

This should solve point 1: #2199
So that we can implement 1-3 through HTTP API.
4-6 still impossible without some docker commands.

@rdalverny

This comment has been minimized.

Copy link

rdalverny commented Jan 23, 2018

Fixed by #2302 ?

@Vanuan

This comment has been minimized.

Copy link
Author

Vanuan commented Jan 23, 2018

Is it released yet?

@Vanuan

This comment has been minimized.

Copy link
Author

Vanuan commented Jan 23, 2018

Does it require putting registry to read only mode or restarting registry?

@rdalverny

This comment has been minimized.

Copy link

rdalverny commented Jan 23, 2018

It was merged into master ~two weeks ago. Latest release was in July 2017.
It's worth testing it.

I tried a build of the registry (through https://github.com/docker/distribution-library-image) and it seems to work (pushed several images on the same tag, all but the latest push, but I didn't try it on a large setup.

It still requires a docker command (docker exec -i -t registry /bin/registry garbage-collect /etc/docker/registry/config.yml -m will do - the culprit is the new option).

It works without changing the registry mode, but I am not experienced enough with Docker to say if that's safe - feedback welcome. I may be wrong, but I'd test to run it as a daily cron task rather than having a remote script triggering it.

@travisghansen

This comment has been minimized.

Copy link

travisghansen commented Jun 13, 2018

I'd love to see a new release tagged so this functionality can start to permeate the ecosystem :)

sf-project-io pushed a commit to redhat-cip/dci-ansible-agent that referenced this issue Jun 18, 2018

avoid a disk space leak of the registry storage
If we remove all the tag of the existing image, we won't be able to actuallly
delete it from the registry. As a result, the registry size will grow up indefinitely.
Any tag can potentially be already used by an image and by the last "string"
attached to it. So before we apply a tag, we delete any potential existing tag.
See: docker/distribution#1844

Change-Id: I43ab70660a68d230487124a270dfd635bda16469
@sargun

This comment has been minimized.

Copy link
Contributor

sargun commented Jul 11, 2018

Wouldn’t this be problematic for people that are just pushing digests?

@taladar

This comment has been minimized.

Copy link

taladar commented Jul 11, 2018

Maybe if they only pushed digests they should not call the garbage-collect command. However I really do not see a reason to only push digests.

@sargun

This comment has been minimized.

Copy link
Contributor

sargun commented Jul 23, 2018

@taladar Why push tags, if your entire pipeline just needs digests?

@Vanuan

This comment has been minimized.

Copy link
Author

Vanuan commented Jul 24, 2018

@sargun It's just a feature. You're not required to use it. If you can afford infinite disk space you would just not use the "garbage collect untagged manifests" option. But people running small projects actually run out of space sometimes :)

Yes, there's a drawback, as if you're running tag "latest" and you push another tag "latest" (thus automatically deleting a previous digest if it isn't tagged) your services will fail as digest that they're running wouldn't be found.

As a workaround you could just tag every image you push with a timestamp. This way you would essentially have human readable digests and ability to remove, say, all digests older than a week. To implement the same functionality now you have to go through unbearable pain implementing that service which would talk to registry through its API which still doesn't support proper deletion and then restart it to garbage collect.

sf-project-io pushed a commit to redhat-cip/dci-openstack-agent that referenced this issue Sep 24, 2018

avoid a disk space leak of the registry storage
If we remove all the tag of the existing image, we won't be able to actuallly
delete it from the registry. As a result, the registry size will grow up indefinitely.
Any tag can potentially be already used by an image and by the last "string"
attached to it. So before we apply a tag, we delete any potential existing tag.
See: docker/distribution#1844

Change-Id: I43ab70660a68d230487124a270dfd635bda16469
@glensc

This comment has been minimized.

Copy link

glensc commented Dec 4, 2018

2.7.0 has been officially released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment