Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[S3 Client] Skip upload when uploaded object will overwrite identical object #243

Open
psantus opened this issue Dec 4, 2020 · 9 comments
Assignees
Labels
feature-request New feature or request s3 service-api This issue pertains to the AWS API

Comments

@psantus
Copy link

psantus commented Dec 4, 2020

Since PUT requests are more expensive than GET, before uploading an object it'd be great to check a fully identical object doesn't already exist.

Describe the Feature

For some workloads to be repeatable/idempotent, we sometimes need to be able to try object upload without the pain of checking the object already exists.

Is your Feature Request related to a problem?

Currently, API users who want to do that

  • need to have their own implementation
  • need to either GET the object (may incur transfer charges) or LIST bucket objects (same price as PUT)
    so it ends up being a quite expensive operation.

Proposed Solution

  • Would require AWS implement a get-object-metadata / get-object-summary service in the S3 API (with a veeery cheap pricing).
  • Then AmazonS3Client.putObject implementations should compare object properties (including MD5 eTag), and if identical, make an early return without trying the PUT API request.

Describe alternatives you've considered

None :)

Additional Context

We use S3 to archive request/response with external 3rd parties for audit purposes

Your Environment

  • AWS Java SDK version used:
  • JDK version used: 2.15.33
  • Operating System and version: irrelevant
@debora-ito debora-ito transferred this issue from aws/aws-sdk-java Dec 4, 2020
@debora-ito
Copy link
Member

The HeadObject operation is what you're looking for, it retrieves metadata from an object without returning the object itself.

@debora-ito debora-ito added feature-request New feature or request response-requested This issue requires a response to continue labels Dec 4, 2020
@psantus
Copy link
Author

psantus commented Dec 6, 2020

Thank you! Then I guess my feature request is for the putObject action to have an option that, if enabled, would lead the client to first call HeadObject and call API PutObject only if object to upload is different from object currently stored in S3

@github-actions github-actions bot removed the response-requested This issue requires a response to continue label Dec 6, 2020
@debora-ito
Copy link
Member

@psantus that would be a feature request for S3, the SDK does not customize API operations. I can forward your request to the S3 team if you want.

If you're trying to avoid accidental overwrite, take a look at Object Versioning - every putObject call will create a new version.

@psantus
Copy link
Author

psantus commented Dec 9, 2020

Hi @debora-ito,

No I meant for the S3 client to first check existing file eTag before trying to upload. I'm not trying to avoid accidental overwrite (otherwise indeed I'd have used Object Versioning) but rather avoid the charge of PutObject API calls when unnecessary.

@debora-ito
Copy link
Member

Yes, that is what I meant by customizing API operations :) SDK clients are automatically generated, they basically call the service API operations as they are, with the same options available for each operation. The clients don't have additional logic or validation on the SDK side. So if you need a new option on S3 PutObject, it needs to be implemented by the service to be available in the SDK.

@psantus
Copy link
Author

psantus commented Dec 15, 2020

That I wasn't aware of ;)

@debora-ito debora-ito added the service-api This issue pertains to the AWS API label Jun 17, 2021
@debora-ito
Copy link
Member

For visibility to other SDKs, I'm moving this to the aws-sdk repository.

@debora-ito debora-ito transferred this issue from aws/aws-sdk-java-v2 May 17, 2022
@debora-ito debora-ito self-assigned this May 17, 2022
@debora-ito debora-ito added the s3 label May 17, 2022
@steve-o
Copy link

steve-o commented Apr 9, 2024

The main issue here is that AWS S3 service does not support conditional requests per RFC-7232 for PUT or POST requests, only for GET. One should be able to PUT with a if-none-match and the MD5 digest of the object to prevent the re-upload. This may need a Expect: 100-continue to defer the body transfer, otherwise the payload is still sent, just ignored.

https://datatracker.ietf.org/doc/html/rfc7232#section-3.2

@kellertk
Copy link
Contributor

P126357928

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request s3 service-api This issue pertains to the AWS API
Projects
None yet
Development

No branches or pull requests

4 participants