Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the layer's link by garbage-collect #2288

Conversation

m-masataka
Copy link
Contributor

Signed-off-by: Masataka Mizukoshi m.mizukoshi.wakuwaku@gmail.com
The gabage-collect command should removes layer's link files (_layers/<algorithm>/<digest>/link).
These link files don't have blobs to link with.

@codecov
Copy link

codecov bot commented Jun 1, 2017

Codecov Report

Merging #2288 into master will decrease coverage by 9.66%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2288      +/-   ##
==========================================
- Coverage   60.25%   50.58%   -9.67%     
==========================================
  Files         125      125              
  Lines       14353    14389      +36     
==========================================
- Hits         8648     7279    -1369     
- Misses       4821     6361    +1540     
+ Partials      884      749     -135
Impacted Files Coverage Δ
registry/storage/driver/base/regulator.go 100% <100%> (ø) ⬆️
registry/storage/linkedblobstore.go 73.37% <100%> (ø) ⬆️
registry/storage/registry.go 90.05% <100%> (+0.15%) ⬆️
registry/storage/vacuum.go 37.2% <40%> (+0.84%) ⬆️
registry/storage/paths.go 75.67% <66.66%> (-0.15%) ⬇️
registry/storage/garbagecollect.go 55.55% <70%> (-7.74%) ⬇️
registry/storage/driver/gcs/gcs.go 0.32% <0%> (-66.13%) ⬇️
registry/storage/driver/oss/oss.go 0.45% <0%> (-56.5%) ⬇️
registry/storage/driver/s3-aws/s3.go 4.07% <0%> (-55.4%) ⬇️
registry/storage/driver/s3-goamz/s3.go 0.4% <0%> (-52.4%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5cfdfbd...9016620. Read the comment docs.

manifests.go Outdated
@@ -67,6 +67,7 @@ type ManifestService interface {
type ManifestEnumerator interface {
// Enumerate calls ingester for each manifest.
Enumerate(ctx context.Context, ingester func(digest.Digest) error) error
LayersEnumerate(ctx context.Context, repoName string, ingester func(digest.Digest, string) error) error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just add this method to an interface called ManifestEnumerator.

@@ -65,6 +66,11 @@ func MarkAndSweep(ctx context.Context, storageDriver driver.StorageDriver, regis
return nil
})

err = manifestEnumerator.LayersEnumerate(ctx, repoName, func(dgst digest.Digest, linkpath string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All you need to do here is find the layers that aren't referenced in the repository and remove. It should look similar to the global blob collection, but can be local to a repository.

@stevvooe
Copy link
Collaborator

In the course of reviewing #2302, it looks like this needs to do a collection round inside each repo, then the global blobs directory. This should just belong in the MarkAndSweep. Unfortunately, this PR does nothing to ensure that they are not currently in use, so this would remove data that is still required by certain images.

@zetaab
Copy link
Contributor

zetaab commented Jun 12, 2017

I am thinking that is it so simple that if blob digest called X is deleted, then just scan layer folder is there file called X and if yes delete it? Not sure is that perfect way, but at least what I checked my layers, those are named similar way than some blobs.

some test data:

docker tag alpine:3.4 localhost:5000/alpine:latest
docker push localhost:5000/alpine:latest
docker tag alpine:latest localhost:5000/alpine:latest
docker push localhost:5000/alpine:latest

 v2 % du -h
4.0K	./blobs/sha256/0b/0b94d1d1b5eb130dd0253374552445b39470653fb1a1ec2d81490948876e462c <- manifest 
4.0K	./blobs/sha256/0b
1.9M	./blobs/sha256/2a/2aecc7e1714b6fad58d13aedb0639011b37b86f743ba7b6a52d82bd03014b78e <- layer
1.9M	./blobs/sha256/2a
4.0K	./blobs/sha256/36/36fbf18a36d0410de46201a76b7cfb1d69d44a8f407d6a0814517247b896f8e9 <- manifest
4.0K	./blobs/sha256/36
2.3M	./blobs/sha256/48/486a8e636d6250a74d15cdb3582f4dd198271a80118f5a2f59de3d9cd8433611 <- layer
2.3M	./blobs/sha256/48
4.0K	./blobs/sha256/60/6008ce38ddc131d5fc1e35b4a06b78fa1388ca94fd8efc1246fcbc44b45b6b74 <- layer
4.0K	./blobs/sha256/60
4.0K	./blobs/sha256/a4/a41a7446062d197dd4b21b38122dcc7b2399deb0750c4110925a7dd37c80f118 <- layer
4.0K	./blobs/sha256/a4
4.2M	./blobs/sha256
4.2M	./blobs
4.0K	./repositories/alpine/_layers/sha256/2aecc7e1714b6fad58d13aedb0639011b37b86f743ba7b6a52d82bd03014b78e
4.0K	./repositories/alpine/_layers/sha256/486a8e636d6250a74d15cdb3582f4dd198271a80118f5a2f59de3d9cd8433611
4.0K	./repositories/alpine/_layers/sha256/6008ce38ddc131d5fc1e35b4a06b78fa1388ca94fd8efc1246fcbc44b45b6b74
4.0K	./repositories/alpine/_layers/sha256/a41a7446062d197dd4b21b38122dcc7b2399deb0750c4110925a7dd37c80f118
 16K	./repositories/alpine/_layers/sha256
 16K	./repositories/alpine/_layers
4.0K	./repositories/alpine/_manifests/revisions/sha256/0b94d1d1b5eb130dd0253374552445b39470653fb1a1ec2d81490948876e462c
4.0K	./repositories/alpine/_manifests/revisions/sha256/36fbf18a36d0410de46201a76b7cfb1d69d44a8f407d6a0814517247b896f8e9
8.0K	./repositories/alpine/_manifests/revisions/sha256
8.0K	./repositories/alpine/_manifests/revisions
4.0K	./repositories/alpine/_manifests/tags/latest/current
4.0K	./repositories/alpine/_manifests/tags/latest/index/sha256/0b94d1d1b5eb130dd0253374552445b39470653fb1a1ec2d81490948876e462c
4.0K	./repositories/alpine/_manifests/tags/latest/index/sha256/36fbf18a36d0410de46201a76b7cfb1d69d44a8f407d6a0814517247b896f8e9
8.0K	./repositories/alpine/_manifests/tags/latest/index/sha256
8.0K	./repositories/alpine/_manifests/tags/latest/index
 12K	./repositories/alpine/_manifests/tags/latest
 12K	./repositories/alpine/_manifests/tags
 20K	./repositories/alpine/_manifests
  0B	./repositories/alpine/_uploads
 36K	./repositories/alpine
 36K	./repositories
4.2M	.

@stevvooe
Copy link
Collaborator

I am thinking that is it so simple that if blob digest called X is deleted, then just scan layer folder is there file called X and if yes delete it? Not sure is that perfect way, but at least what I checked my layers, those are named similar way than some blobs.

That would work in the wrong direction.

You really need to scan the blobs that are referenced via manifests and then delete the unreferenced ones in _layers.

@zetaab
Copy link
Contributor

zetaab commented Jun 13, 2017

but that is how currently garbage collector is working right?

It is first scanning all manifest blobs, and if there is blobs which are not referenced in manifests those are deleted?

@stevvooe
Copy link
Collaborator

@zetaab Yes, but there is a set of layer links in each repository that isn't getting cleaned up by the current garbage collection (unless I missed it). These links provide access control for the blob. If the link is present, the repository has access to that blob. These link out to the main blobs store, which is shared by everyone.

For this to work correctly, you must ensure that no reachable manifests reference these linked blobs before removing them.

@m-masataka
Copy link
Contributor Author

m-masataka commented Jun 14, 2017

Sorry for my late reply.
I made some changes.

  • LayersEnumerate just find the layers that aren't referenced in the repository
  • and remove these layers
    Is it OK?

@stevvooe
Copy link
Collaborator

@m-masataka Unfortunately, exposing that on the manifest interface doesn't make a whole of sense. It exposes internal details of implementation to the external model.

This should be done internally, inside of the garbage collection function. There may already be a function that lists the blobs that are already part of a repository.

@@ -231,7 +231,47 @@ func (lbs *linkedBlobStore) Delete(ctx context.Context, dgst digest.Digest) erro
return nil
}

func (lbs *linkedBlobStore) Enumerate(ctx context.Context, ingestor func(digest.Digest) error) error {
// LayersEnumerate find layer's digest in the Repository.
func LayersEnumerate(ctx context.Context, storageDriver driver.StorageDriver, repoName string, ingester func(digest.Digest, string) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add linkDirectoryPathSpec to linkedBlobStore in repository.Blobs, you will be able to use linkedBlobStore.Enumerate instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmage Thank you for your review.
I deleted LayerEnumerate(), and use linkedBlogStore.Enumerate.

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "garbagecollection-remove-layers" git@github.com:m-masataka/distribution.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354643304
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@m-masataka
Copy link
Contributor Author

m-masataka commented Jul 18, 2017

@stevvooe
This PR git the label dco/no because it wasn´t signed (I was mistaken).
How to remove this label?

@stevvooe
Copy link
Collaborator

@m-masataka There are several commits that are not signed. I'd recommend sqaushing them all into a single commit.

@m-masataka
Copy link
Contributor Author

@stevvooe
Thank you for your advice.
Is this all right?

@stevvooe
Copy link
Collaborator

@m-masataka No, please make this PR into a single commit.

@m-masataka m-masataka force-pushed the garbagecollection-remove-layers branch from 3128dd3 to 815a0bc Compare July 20, 2017 13:49
Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

Remove ther layer's link by garbage-collect

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

fix typo

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

modify code format

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

Delete LayerEnumerator from ManifestEnumerator interface.

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

add comment.

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

just find the layers that aren't referenced and remove.

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

Delete LayerEnumrate func

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

Fix signalling Wait in regulator.enter

In some conditions, regulator.exit may not send a signal to blocked
regulator.enter.

Let's assume we are in the critical section of regulator.exit and r.available
is equal to 0. And there are three more gorotines. One goroutine also executes
regulator.exit and waits for the lock. Rest run regulator.enter and wait for
the signal.

We send the signal, and after releasing the lock, there will be lock
contention:

  1. Wait from regulator.enter
  2. Lock from regulator.exit

If the winner is Lock from regulator.exit, we will not send another signal to
unlock the second Wait.

Signed-off-by: Oleg Bulatov <obulatov@redhat.com>

fix error

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>

check sign

Signed-off-by: Masataka Mizukoshi <m.mizukoshi.wakuwaku@gmail.com>
@m-masataka m-masataka force-pushed the garbagecollection-remove-layers branch from 815a0bc to 9016620 Compare July 20, 2017 14:18
@AndreaGiardini
Copy link

Can this PR go forward now?

@@ -38,11 +38,7 @@ func (r *regulator) enter() {

func (r *regulator) exit() {
r.L.Lock()
// We only need to signal to a waiting FS operation if we're already at the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this can be removed?

"time"
)

func TestRegulatorEnterExit(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated. What is this testing?

@@ -367,6 +369,12 @@ type layerLinkPathSpec struct {

func (layerLinkPathSpec) pathSpec() {}

type layerDirectoryPathSpec struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document this type. Notice the other comments around here and please try to follow that style.

@@ -289,6 +289,8 @@ func (repo *repository) Blobs(ctx context.Context) distribution.BlobStore {
statter = repo.registry.blobDescriptorServiceFactory.BlobAccessController(statter)
}

LayerDirectoryPathSpec := layerDirectoryPathSpec{name: repo.name.Name()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please choose a better variable name.

@@ -300,6 +302,7 @@ func (repo *repository) Blobs(ctx context.Context) distribution.BlobStore {
// TODO(stevvooe): linkPath limits this blob store to only layers.
// This instance cannot be used for manifest checks.
linkPathFns: []linkPathFunc{blobLinkPath},
linkDirectoryPathSpec: LayerDirectoryPathSpec,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this unset in all cases? Is this the actual fix here?

@stevvooe
Copy link
Collaborator

@AndreaGiardini Looking at it closely. The biggest blocker is testing. Have you tried out these changes?

@dmcgowan dmcgowan added the area/gc Related to garbage collection label Dec 17, 2019
@wy65701436
Copy link
Collaborator

hi @m-masataka, is it possible to move this PR forward? It's great if we can merge it before v2.8.0.
And another question, can this PR remove the orphan layer links?

Base automatically changed from master to main January 27, 2021 15:50
@milosgajdos
Copy link
Member

milosgajdos commented Mar 18, 2021

@m-masataka can you please rebase? I'd like to pick this up and properly look into this.

@jianzhangbjz
Copy link

/cc

@zhangguanzhang
Copy link

any update?

@milosgajdos
Copy link
Member

This needs a rebase.

@microyahoo
Copy link
Contributor

hi @m-masataka, any updates?

nawfg214 referenced this pull request Apr 30, 2024
Signed-off-by: David Karlsson <35727626+dvdksn@users.noreply.github.com>
@microyahoo
Copy link
Contributor

hi @milosgajdos @corhere, @m-masataka, I've noticed that this PR has been inactive for a while. If you don't mind, I can take over this PR. However, if the original author wishes to continue, I am also willing to step back. Thanks. :)

@milosgajdos
Copy link
Member

@microyahoo go for it!

microyahoo added a commit to microyahoo/distribution that referenced this pull request May 8, 2024
The garbage-collect should remove unsed layer link file

P.S. This was originally contributed by @m-masataka, now I would like to take over it.
Thanks @m-masataka efforts with PR distribution#2288

Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
microyahoo added a commit to microyahoo/distribution that referenced this pull request May 8, 2024
The garbage-collect should remove unsed layer link file

P.S. This was originally contributed by @m-masataka, now I would like to take over it.
Thanks @m-masataka efforts with PR distribution#2288

Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
microyahoo added a commit to microyahoo/distribution that referenced this pull request Jul 2, 2024
The garbage-collect should remove unsed layer link file

P.S. This was originally contributed by @m-masataka, now I would like to take over it.
Thanks @m-masataka efforts with PR distribution#2288

Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
@milosgajdos
Copy link
Member

Closing in favour of #4344

@milosgajdos milosgajdos closed this Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gc Related to garbage collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.