Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete S3 keys incrementally in batches #3635

Merged
merged 1 commit into from Nov 4, 2022
Merged

Delete S3 keys incrementally in batches #3635

merged 1 commit into from Nov 4, 2022

Conversation

milosgajdos
Copy link
Member

Instead of first collecting all S3 keys first and then batch deleting them, we will do incremental delete online per max allowed batch. Doing this prevents frequent Go heap allocations for large S3 keyspaces.

NOTE: we also make sure we Skip S3 tests for List unless explicitly triggered.

@codecov-commenter
Copy link

codecov-commenter commented May 27, 2022

Codecov Report

Merging #3635 (e3d3e15) into main (5fe6934) will increase coverage by 0.99%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #3635      +/-   ##
==========================================
+ Coverage   56.33%   57.33%   +0.99%     
==========================================
  Files         101      105       +4     
  Lines        7313    10847    +3534     
==========================================
+ Hits         4120     6219    +2099     
- Misses       2536     3944    +1408     
- Partials      657      684      +27     
Impacted Files Coverage Δ
registry/storage/driver/storagedriver.go 0.00% <0.00%> (ø)
...ribution/distribution/registry/handlers/helpers.go
...m/distribution/distribution/reference/reference.go
...distribution/distribution/registry/handlers/app.go
...ion/distribution/registry/storage/manifeststore.go
...tribution/distribution/registry/proxy/proxyauth.go
...bution/distribution/registry/api/errcode/errors.go
...m/distribution/distribution/reference/normalize.go
...com/distribution/distribution/registry/registry.go
...istribution/registry/storage/ocimanifesthandler.go
... and 197 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Quiet: aws.Bool(false),
},
})
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this API, you need to go through the response and see if there are any errors. The status code can still be 200 is some files fail. We've done a similar implementation, so there's an example: https://gitlab.com/gitlab-org/container-registry/-/blob/c9c2b20eb45250a0a5336038fdd053423490a502/registry/storage/driver/s3-aws/s3.go#L1046

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is, that resp.Errors will contain errors ONLY if DeleteObjects call fails with error so you can inspect which of the keys have caused the deletion failure. In other words, in any key deletion failure, DeleteObjects will always return error and set resp.Errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible our comments are wrong, but I'm not really sure 🤔

@joaodrp you worked on this more closely, do you remember the specifics?

Copy link
Collaborator

@Jamstah Jamstah Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample in the AWS API doc is this:

            HTTP/1.1 200 OK
            x-amz-id-2: 5h4FxSNCUS7wP5z92eGCWDshNpMnRuXvETa4HH3LvvH6VAIr0jU7tH9kM7X+njXx
            x-amz-request-id: A437B3B641629AEE
            Date: Fri, 02 Dec 2011 01:53:42 GMT
            Content-Type: application/xml
            Server: AmazonS3
            Content-Length: 251

            <?xml version="1.0" encoding="UTF-8"?>
            <DeleteResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
             <Deleted>
               <Key>sample1.txt</Key>
             </Deleted>
             <Error>
              <Key>sample2.txt</Key>
              <Code>AccessDenied</Code>
              <Message>Access Denied</Message>
             </Error>
            </DeleteResult>

And the go code to invoke the delete is this, which is fairly generic and i'm not sure it would parse for DeleteErrors:
https://github.com/aws/aws-sdk-go-v2/blob/main/service/s3/api_op_DeleteObjects.go#L66

To me, it looks like there doesn't have to be an err object even if there are DeleteErrors.

It could be something to do with adding the middleware, but I didn't think so having a quick look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milosgajdos I had a go at this one here: Jamstah@60bec85

Testing by remove the delete permission from the user, the code in this PR currently misses the error, and the code in my branch does not, so I think it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I did make the same test a few weeks ago, but somehow forgot to update the comment -- actually not quite the same test but similar. I shall update this PR soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just pushed this e3d3e15 which makes me rather sad about AWS SDK still (I thought I had reached the bottom, but there are clearly new depths). @deleteriousEffect can you please have another look? 🙇

listObjectsInput := &s3.ListObjectsV2Input{
Bucket: aws.String(d.Bucket),
Prefix: aws.String(s3Path),
}
ListLoop:

for {
// list all the objects
resp, err := d.S3.ListObjectsV2(listObjectsInput)

// resp.Contents can only be empty on the first call
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the resp.IsTruncated validating at here?

Copy link
Collaborator

@wy65701436 wy65701436 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, I'm ok with the change.

Instead of first collecting all keys and then batch deleting them,
we will do the incremental delete _online_ per max allowed batch.
Doing this prevents frequent allocations for large S3 keyspaces
and OOM-kills that might happen as a result of those.

This commit introduces storagedriver.Errors type that allows to return
multierrors as a single error from any storage driver implementation.

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
@deleteriousEffect deleteriousEffect merged commit e3509fc into distribution:main Nov 4, 2022
@milosgajdos milosgajdos deleted the make-s3-driver-delete-faster branch November 4, 2022 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants