Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object_store: Conditional put and rename_if_not_exist on S3 #6285

Closed
wjones127 opened this issue Aug 21, 2024 · 9 comments · Fixed by #6682
Closed

object_store: Conditional put and rename_if_not_exist on S3 #6285

wjones127 opened this issue Aug 21, 2024 · 9 comments · Fixed by #6682
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface

Comments

@wjones127
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This is now supported on S3: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/

Describe the solution you'd like

We should make sure this works out-of-the-box. Also need to maintain compatibility with Minio, Localstack, and CloudFlare R2.

Describe alternatives you've considered

Additional context

@wjones127 wjones127 added enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface labels Aug 21, 2024
@bentsku
Copy link

bentsku commented Aug 22, 2024

We are implementing it on LocalStack, should be available in latest by tomorrow: localstack/localstack#11402

From what I've seen, it's less powerful than R2, as you cannot specify any value other than * in If-None-Match (or it raises an exception), so you might need different code path between S3 and R2, for example. Also, they didn't implement it for CopyObject apparently, but it is also implemented for CompleteMultipartUpload. The logic is somewhat a bit complicated for multiparts, especially the "delete after initiation", and you'll often need to recreate a complete multipart upload.

I was already looking for the PR/feature request here when I read about that feature 😄

@rtyler
Copy link
Contributor

rtyler commented Sep 9, 2024

@wjones127 👋 if you find yourself too busy with other things, I could probably take a crack at this 😄

@wjones127 wjones127 removed their assignment Sep 9, 2024
@wjones127
Copy link
Member Author

I haven't finished this.

But from what's I've looked at, it's already implemented. It's just a matter of changing the default behavior / configuration and documenting and testing that.

@fhilgers
Copy link

fhilgers commented Sep 13, 2024

How do you plan on implementing ObjectStore::copy_if_not_exists. The new conditional features from aws only allow checking etag on the source for copy operations. x-amz-copy-source-if-match and x-amz-copy-source-if-match only check the source, not the destination as far as i understand (https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html). Is there a workaround through the conditonal Put operations?

Minio already supported these headers prior to AWS, but has anyone tried to use Minio for the conditional operations without dynamodb?

@fhilgers
Copy link

fhilgers commented Sep 13, 2024

One idea maybe would be to use the conditional Put with the if-none-match header to create a dummy file and if that succeeds, copy the file. However this is not atomic and in the case where a client willingly overwrote the dummy file prior to the copy completing, their file would be overwritten by the copy?

Is this that bad? In the cases where clients use conditonal operations everything is safeguarded and in the cases where clients already just overwrite a file it does not change anything because there is no guarantee that another client will not overwrite anything. In terms of clients making sure that there is no overwriting happening, conditional put on a client and then force copying should semantically be the same as conditional copy.

Deleting and other operations would be a problem though. What happens if the dummy file is deleted prior to copying? Get with have empty files, etc.

Clients need to then ignore markers on list, delete and get and only honor them for put and copy.

@fhilgers
Copy link

After playing around a bit, I found out that this can be worked around just by using multipart copy:

  1. Initiate Multipart Upload
  2. Upload Part Copy
  3. Supply if-none-match to Complete Multipart Upload

Using the header just for CopyObject results in an NotImplemented error.

benesch added a commit to benesch/arrow-rs that referenced this issue Nov 5, 2024
Add support for `PutMode::Create` and `copy_if_not_exists` on native AWS
S3, which uses the underlying conditional write primitive that Amazon
launched earlier this year [0].

The conditional write primitive is simpler than what's available in
other S3-like products (e.g., R2), so new modes for
`s3_copy_if_not_exists` and `s3_conditional_put` are added to select the
native S3-specific behavior.

To maintain strict backwards compatibility (e.g. with older versions of
LocalStack), the new behavior is not on by default. It must be
explicitly requested by the end user.

The implementation for `PutMode::Create` is straightforward. The
implementation of `copy_if_not_exists` is a bit more involved, as it
requires managing a multipart upload that uses the UploadPartCopy
operation, which was not previously supported by this crate's S3 client.

To ensure test coverage, the object store workflow now runs the AWS
integration tests with conditional put both disabled and enabled.

Fix apache#6285.

[0]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
@benesch
Copy link
Contributor

benesch commented Nov 5, 2024

I put up #6682, which implements both PutMode::Create and copy_if_not_exists for native Amazon S3. I'm unfamiliar with the maintainership for object_store so if someone can tag in the right folks to review, I'd be much obliged!

After playing around a bit, I found out that this can be worked around just by using multipart copy:

  1. Initiate Multipart Upload
  2. Upload Part Copy
  3. Supply if-none-match to Complete Multipart Upload

@fhilgers thanks for the pointer here. Indeed that seems to work as expected.

benesch added a commit to benesch/arrow-rs that referenced this issue Nov 6, 2024
Add support for `PutMode::Create` and `copy_if_not_exists` on native AWS
S3, which uses the underlying conditional write primitive that Amazon
launched earlier this year [0].

The conditional write primitive is simpler than what's available in
other S3-like products (e.g., R2), so new modes for
`s3_copy_if_not_exists` and `s3_conditional_put` are added to select the
native S3-specific behavior.

To maintain strict backwards compatibility (e.g. with older versions of
LocalStack), the new behavior is not on by default. It must be
explicitly requested by the end user.

The implementation for `PutMode::Create` is straightforward. The
implementation of `copy_if_not_exists` is a bit more involved, as it
requires managing a multipart upload that uses the UploadPartCopy
operation, which was not previously supported by this crate's S3 client.

To ensure test coverage, the object store workflow now runs the AWS
integration tests with conditional put both disabled and enabled.

Fix apache#6285.

[0]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
@tustvold
Copy link
Contributor

FYI the support has been broadened to not just wildcards

#6799

@benesch
Copy link
Contributor

benesch commented Nov 26, 2024

PRs are up! #6799 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants