New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete S3 keys incrementally in batches #3635
Delete S3 keys incrementally in batches #3635
Conversation
Quiet: aws.Bool(false), | ||
}, | ||
}) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this API, you need to go through the response and see if there are any errors. The status code can still be 200
is some files fail. We've done a similar implementation, so there's an example: https://gitlab.com/gitlab-org/container-registry/-/blob/c9c2b20eb45250a0a5336038fdd053423490a502/registry/storage/driver/s3-aws/s3.go#L1046
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is, that resp.Errors
will contain errors ONLY if DeleteObjects
call fails with error
so you can inspect which of the keys have caused the deletion failure. In other words, in any key deletion failure, DeleteObjects
will always return error
and set resp.Errors
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible our comments are wrong, but I'm not really sure 🤔
@joaodrp you worked on this more closely, do you remember the specifics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample in the AWS API doc is this:
HTTP/1.1 200 OK
x-amz-id-2: 5h4FxSNCUS7wP5z92eGCWDshNpMnRuXvETa4HH3LvvH6VAIr0jU7tH9kM7X+njXx
x-amz-request-id: A437B3B641629AEE
Date: Fri, 02 Dec 2011 01:53:42 GMT
Content-Type: application/xml
Server: AmazonS3
Content-Length: 251
<?xml version="1.0" encoding="UTF-8"?>
<DeleteResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Deleted>
<Key>sample1.txt</Key>
</Deleted>
<Error>
<Key>sample2.txt</Key>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
</Error>
</DeleteResult>
And the go code to invoke the delete is this, which is fairly generic and i'm not sure it would parse for DeleteErrors:
https://github.com/aws/aws-sdk-go-v2/blob/main/service/s3/api_op_DeleteObjects.go#L66
To me, it looks like there doesn't have to be an err object even if there are DeleteErrors.
It could be something to do with adding the middleware, but I didn't think so having a quick look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milosgajdos I had a go at this one here: Jamstah@60bec85
Testing by remove the delete permission from the user, the code in this PR currently misses the error, and the code in my branch does not, so I think it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I did make the same test a few weeks ago, but somehow forgot to update the comment -- actually not quite the same test but similar. I shall update this PR soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just pushed this e3d3e15 which makes me rather sad about AWS SDK still (I thought I had reached the bottom, but there are clearly new depths). @deleteriousEffect can you please have another look? 🙇
listObjectsInput := &s3.ListObjectsV2Input{ | ||
Bucket: aws.String(d.Bucket), | ||
Prefix: aws.String(s3Path), | ||
} | ||
ListLoop: | ||
|
||
for { | ||
// list all the objects | ||
resp, err := d.S3.ListObjectsV2(listObjectsInput) | ||
|
||
// resp.Contents can only be empty on the first call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the resp.IsTruncated validating at here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, I'm ok with the change.
Instead of first collecting all keys and then batch deleting them, we will do the incremental delete _online_ per max allowed batch. Doing this prevents frequent allocations for large S3 keyspaces and OOM-kills that might happen as a result of those. This commit introduces storagedriver.Errors type that allows to return multierrors as a single error from any storage driver implementation. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Instead of first collecting all S3 keys first and then batch deleting them, we will do incremental delete online per max allowed batch. Doing this prevents frequent
Go
heap allocations for large S3 keyspaces.NOTE: we also make sure we Skip
S3
tests forList
unless explicitly triggered.