Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support native S3 conditional writes #6682

Merged

Conversation

benesch
Copy link
Contributor

@benesch benesch commented Nov 5, 2024

Add support for PutMode::Create and copy_if_not_exists on native AWS S3, which uses the underlying conditional write primitive that Amazon launched earlier this year 0.

The conditional write primitive is simpler than what's available in other S3-like products (e.g., R2), so new modes for s3_copy_if_not_exists and s3_conditional_put are added to select the native S3-specific behavior.

To maintain strict backwards compatibility (e.g. with older versions of LocalStack), the new behavior is not on by default. It must be explicitly requested by the end user.

The implementation for PutMode::Create is straightforward. The implementation of copy_if_not_exists is a bit more involved, as it requires managing a multipart upload that uses the UploadPartCopy operation, which was not previously supported by this crate's S3 client.

To ensure test coverage, the object store workflow now runs the AWS integration tests with conditional put both disabled and enabled.

Which issue does this PR close?

Fix #6285.

@github-actions github-actions bot added the object-store Object Store Interface label Nov 5, 2024
@benesch
Copy link
Contributor Author

benesch commented Nov 5, 2024

@tustvold are you the right person to review this? I saw you implemented quite a bit of the previous conditional put/get support.

@benesch
Copy link
Contributor Author

benesch commented Nov 5, 2024

To maintain strict backwards compatibility (e.g. with older versions of LocalStack), the new behavior is not on by default. It must be explicitly requested by the end user.

This isn't ideal, but it seemed best for the short term to avoid breaking backcompat for anyone who might be using a version of S3 that doesn't support CAS. In the long term, I think the ideal would definitely be to have the native S3 CAS support enabled by default (i.e. unless overridden explicitly by the user).

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks good to me, mostly just some minor nits.

I agree with the decision to leave this off by default for a couple of reasons:

  • S3 compatible stores support these mechanisms with fewer drawbacks (e.g. full conditional update, native copy_if_not_exists, etc...) and users should prefer these
  • AWS may eventually bring S3 into feature parity with literally every other object store, at which point this will provide a coherent API to converge on

@@ -651,6 +651,12 @@ pub async fn put_opts(storage: &dyn ObjectStore, supports_update: bool) {
assert_eq!(b.as_ref(), b"a");

if !supports_update {
let err = storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// Encoded as `etag-create-only` ignoring whitespace.
///
/// [announcement]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
ETagCreateOnly,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ETagCreateOnly,
ETagPutIfNotExists,

Maybe? I don't feel strongly though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, works for me.

Comment on lines 53 to 54
/// WARNING: When using this mode, `copy_if_not_exists` does not copy
/// tags or attributes from the source object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I had a brief scan and couldn't see a way around this

}
}
}

pub(crate) enum PutPartPayload<'a> {
Inline(PutPayload),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Inline(PutPayload),
Part(PutPayload),

Maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no strong feels from me on the name here!

PutPartPayload::Inline(payload) => request.with_payload(payload),
PutPartPayload::Copy(path) => request.header(
"x-amz-copy-source",
&format!("{}/{}", self.config.bucket, path),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use encode_path here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, seems like it, thanks!

object_store/src/aws/client.rs Show resolved Hide resolved
)
.await
{
Err(e @ Error::Precondition { .. }) => Err(Error::AlreadyExists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to abort the multipart upload in this case? Does this happen automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, good call. It does not happen automatically. I added code to do a best effort multipart upload, and a docs note that this isn't guaranteed and that the user should configure automatic multipart expiration via a lifecycle rule.

///
/// WARNING: When using this mode, `copy_if_not_exists` does not copy
/// tags or attributes from the source object.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could possibly also link to the module docs around lifecycle rules for cleaning up incomplete multipart uploads - https://docs.rs/object_store/latest/object_store/aws/index.html#multipart-uploads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done!

CompleteMultipartRequest { source: crate::client::retry::Error },
CompleteMultipartRequest {
source: crate::client::retry::Error,
path: String,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the path also be in the displayed error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done.

Add support for `PutMode::Create` and `copy_if_not_exists` on native AWS
S3, which uses the underlying conditional write primitive that Amazon
launched earlier this year [0].

The conditional write primitive is simpler than what's available in
other S3-like products (e.g., R2), so new modes for
`s3_copy_if_not_exists` and `s3_conditional_put` are added to select the
native S3-specific behavior.

To maintain strict backwards compatibility (e.g. with older versions of
LocalStack), the new behavior is not on by default. It must be
explicitly requested by the end user.

The implementation for `PutMode::Create` is straightforward. The
implementation of `copy_if_not_exists` is a bit more involved, as it
requires managing a multipart upload that uses the UploadPartCopy
operation, which was not previously supported by this crate's S3 client.

To ensure test coverage, the object store workflow now runs the AWS
integration tests with conditional put both disabled and enabled.

Fix apache#6285.

[0]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
@benesch benesch force-pushed the object-store-preconditions-native-s3 branch 2 times, most recently from dc2a634 to 075c524 Compare November 6, 2024 02:55
@benesch benesch force-pushed the object-store-preconditions-native-s3 branch from 075c524 to 9cf76bc Compare November 6, 2024 02:56
Copy link
Contributor Author

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the quick review, @tustvold! Addressed all your feedback, I believe.

CompleteMultipartRequest { source: crate::client::retry::Error },
CompleteMultipartRequest {
source: crate::client::retry::Error,
path: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done.

}
}
}

pub(crate) enum PutPartPayload<'a> {
Inline(PutPayload),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no strong feels from me on the name here!

PutPartPayload::Inline(payload) => request.with_payload(payload),
PutPartPayload::Copy(path) => request.header(
"x-amz-copy-source",
&format!("{}/{}", self.config.bucket, path),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, seems like it, thanks!

object_store/src/aws/client.rs Show resolved Hide resolved
///
/// WARNING: When using this mode, `copy_if_not_exists` does not copy
/// tags or attributes from the source object.
///
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done!

/// Encoded as `etag-create-only` ignoring whitespace.
///
/// [announcement]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
ETagCreateOnly,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, works for me.

)
.await
{
Err(e @ Error::Precondition { .. }) => Err(Error::AlreadyExists {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, good call. It does not happen automatically. I added code to do a best effort multipart upload, and a docs note that this isn't guaranteed and that the user should configure automatic multipart expiration via a lifecycle rule.

To a version that supports conditional writes.
@benesch
Copy link
Contributor Author

benesch commented Nov 8, 2024

@tustvold if you can give me another re-run here I think we’ll be all green on CI. I fixed a minor clippy failure and upgraded to a version of localstack that supports conditional put.

@benesch
Copy link
Contributor Author

benesch commented Nov 8, 2024

Whew, all tests here are green! @tustvold let me know if there's anything else here you'd like to see, but from my perspective this is ready to (squash) merge.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit, then I think we're good to go. Thank you for this

let part_id = self
.client
.put_part(to, &upload_id, 0, PutPartPayload::Copy(from))
.await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error should also trigger cleanup, might be worth encapsulating the steps into a function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, doesn't it? If anything in this async block (L306-329) returns an error, the cleanup on L336 should fire.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed that this is wrapped in an async block, cunning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that async blocks can function like try blocks is one of my favorite Rust tricks!

@tustvold tustvold merged commit c36ff79 into apache:master Nov 8, 2024
33 checks passed
@criccomini
Copy link
Contributor

Amazing. Any idea when this might go out 🔥

@tustvold
Copy link
Contributor

tustvold commented Nov 8, 2024

#6596 tracks the next release

@benesch
Copy link
Contributor Author

benesch commented Nov 9, 2024

Thanks very much for the review and merge, @tustvold!

/// Encoded as `etag-put-if-not-exists` ignoring whitespace.
///
/// [announcement]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
ETagPutIfNotExists,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 now supports the full header so I wonder if we need this anymore?

#6799

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! I put up #6802 to remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

object_store: Conditional put and rename_if_not_exist on S3
4 participants