Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 upload stream with http throws stream mark and reset error #1748

Closed
thisarattr opened this issue Sep 4, 2018 · 10 comments
Closed

S3 upload stream with http throws stream mark and reset error #1748

thisarattr opened this issue Sep 4, 2018 · 10 comments
Labels
feature-request A feature should be added or improved.

Comments

@thisarattr
Copy link

thisarattr commented Sep 4, 2018

I am trying to stream a file straight into S3 rather than upload/buffer into our own server and reupload into S3.
When I use http aws client is trying to calculate message digest and failed to reset the stream. Further, I haven't set a explicit read limit, so it default to 128kb and im uploading stream larger than that.
As per the AWS client code, it set the mark() to the request read limit and then it reads the whole stream, which is beyond the mark() and try to reset() it. Which is obviously going to fail and throw the reset error.

Note: When im using https this wont happen as signing is disabled by default, but u will face the same with https if you enable signing.

AWS4Signer.java

    protected String calculateContentHash(SignableRequest<?> request) {
        InputStream payloadStream = getBinaryRequestPayloadStream(request);
        ReadLimitInfo info = request.getReadLimitInfo();
        payloadStream.mark(info == null ? -1 : info.getReadLimit());
        String contentSha256 = BinaryUtils.toHex(hash(payloadStream));
        try {
            payloadStream.reset();
        } catch (IOException e) {
            throw new SdkClientException(
                    "Unable to reset stream after calculating AWS4 signature",
                    e);
        }
        return contentSha256;
    }

AbstractAWSSigner.java

    protected byte[] hash(InputStream input) throws SdkClientException {
        try {
            MessageDigest md = getMessageDigestInstance();
            @SuppressWarnings("resource")
            DigestInputStream digestInputStream = new SdkDigestInputStream(input, md);
            byte[] buffer = new byte[1024];
            while (digestInputStream.read(buffer) > -1)
                ;
            return digestInputStream.getMessageDigest().digest();
        } catch (Exception e) {
            throw new SdkClientException(
                    "Unable to compute hash while signing request: "
                            + e.getMessage(), e);
        }
    }

Exception thrown,

Caused by: com.amazonaws.SdkClientException: Unable to reset stream after calculating AWS4 signature
at com.amazonaws.auth.AWS4Signer.calculateContentHash(AWS4Signer.java:562)
at com.amazonaws.services.s3.internal.AWSS3V4Signer.calculateContentHash(AWSS3V4Signer.java:118)
at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:233)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1210)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1749)
at com.platform.common.services.S3BinaryUploadService.uploadBinaryToUploadBucket(S3BinaryUploadService.java:61)
... 84 common frames omitted
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.util.LengthCheckInputStream.reset(LengthCheckInputStream.java:126)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.auth.AWS4Signer.calculateContentHash(AWS4Signer.java:560)
... 98 common frames omitted
@varunnvs92
Copy link
Contributor

This is a known issue and a current limitation of the SDK. There are similar posts with workarounds. Please refer to them and see if they work for you.
#427 (comment)
#474

@thisarattr
Copy link
Author

thisarattr commented Sep 5, 2018

Thanks a lot for the response. I saw ur answer before, but what I am trying to do here is, stream a file straight from user into S3 rather than download/buffer into our server. Thus, I don't have the file, so option 1 is out for me.
Yes, I can set the read limit beyond the max expected file size, but then in that case aws-sdk will read the whole file in memory to do the signing (and fail with the exception), which is what I want to avoid. Because this api expect large binaries which could go close to a GB.

By the way, I know this will be solved, by using https, but wanted to raise this so it will be solved in future. (atleast stop failing by fixing mark and reset issue)

@dagnir
Copy link
Contributor

dagnir commented Sep 14, 2018

@thisarattr Unfortunately there's no way around this as the SDK needs to consume the full contents of the stream (which in this case requires buffering the stream to memory) to be able to set the checksum as part of the request signature. The easiest way around this would be to switch to using an HTTPS endpoint if possible.

@dagnir
Copy link
Contributor

dagnir commented Sep 14, 2018

By the way, I know this will be solved, by using https, but wanted to raise this so it will be solved in future. (atleast stop failing by fixing mark and reset issue)

It sounds like this is a feature request so I'll mark it as such for now, but I'm not sure how we'll be able to avoid this.

@dagnir dagnir added the feature-request A feature should be added or improved. label Sep 14, 2018
@thisarattr
Copy link
Author

thisarattr commented Sep 17, 2018

@dagnir I agree that, when it uses http there is no way to calculate the hash/checksum without buffering in memory. But still, it should not fail by throwing mark and reset exception, right?

Because, hashing is client lib responsibility, api consumer does not need to know about it. It should throw meaningful error message instead of mark and reset exception, which does not mean much to the consumer, without looking at the client lib code.

@dagnir
Copy link
Contributor

dagnir commented Sep 17, 2018

Okay I see; we can certainly throw/log a more descriptive error message.

@steveloughran
Copy link

Could we actually have a specific subclass of SdkClientException for these retryable signing/hashing problems? The Hadoop S3A client already splits failures into those which may be recoverable (no response, throttle errors, socket timeouts etc and then decides which to retry.

@debora-ito
Copy link
Member

We are closing stale v1 issues before going into Maintenance Mode.

If this issue is still relevant in v2 please open a new issue in the v2 repo.

Reference:

  • Announcing end-of-support for AWS SDK for Java v1.x effective December 31, 2025 - blog post

@debora-ito debora-ito closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2024
Copy link

This issue is now closed.

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

@steveloughran
Copy link

FYI as HADOOP-19221 shows, v2 SDK actually makes things worse in terms of s3 upload recoverability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

5 participants