S3 upload stream with http throws stream mark and reset error #1748

thisarattr · 2018-09-04T01:46:47Z

I am trying to stream a file straight into S3 rather than upload/buffer into our own server and reupload into S3.
When I use http aws client is trying to calculate message digest and failed to reset the stream. Further, I haven't set a explicit read limit, so it default to 128kb and im uploading stream larger than that.
As per the AWS client code, it set the mark() to the request read limit and then it reads the whole stream, which is beyond the mark() and try to reset() it. Which is obviously going to fail and throw the reset error.

Note: When im using https this wont happen as signing is disabled by default, but u will face the same with https if you enable signing.

AWS4Signer.java

    protected String calculateContentHash(SignableRequest<?> request) {
        InputStream payloadStream = getBinaryRequestPayloadStream(request);
        ReadLimitInfo info = request.getReadLimitInfo();
        payloadStream.mark(info == null ? -1 : info.getReadLimit());
        String contentSha256 = BinaryUtils.toHex(hash(payloadStream));
        try {
            payloadStream.reset();
        } catch (IOException e) {
            throw new SdkClientException(
                    "Unable to reset stream after calculating AWS4 signature",
                    e);
        }
        return contentSha256;
    }

AbstractAWSSigner.java

    protected byte[] hash(InputStream input) throws SdkClientException {
        try {
            MessageDigest md = getMessageDigestInstance();
            @SuppressWarnings("resource")
            DigestInputStream digestInputStream = new SdkDigestInputStream(input, md);
            byte[] buffer = new byte[1024];
            while (digestInputStream.read(buffer) > -1)
                ;
            return digestInputStream.getMessageDigest().digest();
        } catch (Exception e) {
            throw new SdkClientException(
                    "Unable to compute hash while signing request: "
                            + e.getMessage(), e);
        }
    }

Exception thrown,

Caused by: com.amazonaws.SdkClientException: Unable to reset stream after calculating AWS4 signature
at com.amazonaws.auth.AWS4Signer.calculateContentHash(AWS4Signer.java:562)
at com.amazonaws.services.s3.internal.AWSS3V4Signer.calculateContentHash(AWSS3V4Signer.java:118)
at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:233)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1210)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1749)
at com.platform.common.services.S3BinaryUploadService.uploadBinaryToUploadBucket(S3BinaryUploadService.java:61)
... 84 common frames omitted
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.util.LengthCheckInputStream.reset(LengthCheckInputStream.java:126)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.auth.AWS4Signer.calculateContentHash(AWS4Signer.java:560)
... 98 common frames omitted

The text was updated successfully, but these errors were encountered:

varunnvs92 · 2018-09-04T20:14:00Z

This is a known issue and a current limitation of the SDK. There are similar posts with workarounds. Please refer to them and see if they work for you.
#427 (comment)
#474

thisarattr · 2018-09-05T01:15:31Z

Thanks a lot for the response. I saw ur answer before, but what I am trying to do here is, stream a file straight from user into S3 rather than download/buffer into our server. Thus, I don't have the file, so option 1 is out for me.
Yes, I can set the read limit beyond the max expected file size, but then in that case aws-sdk will read the whole file in memory to do the signing (and fail with the exception), which is what I want to avoid. Because this api expect large binaries which could go close to a GB.

By the way, I know this will be solved, by using https, but wanted to raise this so it will be solved in future. (atleast stop failing by fixing mark and reset issue)

dagnir · 2018-09-14T17:55:49Z

@thisarattr Unfortunately there's no way around this as the SDK needs to consume the full contents of the stream (which in this case requires buffering the stream to memory) to be able to set the checksum as part of the request signature. The easiest way around this would be to switch to using an HTTPS endpoint if possible.

dagnir · 2018-09-14T18:00:09Z

By the way, I know this will be solved, by using https, but wanted to raise this so it will be solved in future. (atleast stop failing by fixing mark and reset issue)

It sounds like this is a feature request so I'll mark it as such for now, but I'm not sure how we'll be able to avoid this.

thisarattr · 2018-09-17T04:30:10Z

@dagnir I agree that, when it uses http there is no way to calculate the hash/checksum without buffering in memory. But still, it should not fail by throwing mark and reset exception, right?

Because, hashing is client lib responsibility, api consumer does not need to know about it. It should throw meaningful error message instead of mark and reset exception, which does not mean much to the consumer, without looking at the client lib code.

dagnir · 2018-09-17T17:11:50Z

Okay I see; we can certainly throw/log a more descriptive error message.

steveloughran · 2019-06-07T10:41:07Z

Could we actually have a specific subclass of SdkClientException for these retryable signing/hashing problems? The Hadoop S3A client already splits failures into those which may be recoverable (no response, throttle errors, socket timeouts etc and then decides which to retry.

debora-ito · 2024-07-29T22:36:01Z

We are closing stale v1 issues before going into Maintenance Mode.

If this issue is still relevant in v2 please open a new issue in the v2 repo.

Reference:

Announcing end-of-support for AWS SDK for Java v1.x effective December 31, 2025 - blog post

github-actions · 2024-07-29T22:36:17Z

This issue is now closed.

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

steveloughran · 2024-08-01T19:10:34Z

FYI as HADOOP-19221 shows, v2 SDK actually makes things worse in terms of s3 upload recoverability.

varunnvs92 added the waiting-reply label Sep 4, 2018

debora-ito removed the waiting-reply label Sep 11, 2018

dagnir added the feature-request A feature should be added or improved. label Sep 14, 2018

debora-ito mentioned this issue Oct 3, 2020

"java.net.SocketException: Broken pipe (Write failed): error on file upload (reproducible + workaround) #2408

Closed

longjuan mentioned this issue Jan 7, 2023

Compatibility of minio s3 halo-dev/plugin-s3#8

Closed

debora-ito closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 upload stream with http throws stream mark and reset error #1748

S3 upload stream with http throws stream mark and reset error #1748

thisarattr commented Sep 4, 2018 •

edited

Loading

varunnvs92 commented Sep 4, 2018

thisarattr commented Sep 5, 2018 •

edited

Loading

dagnir commented Sep 14, 2018

dagnir commented Sep 14, 2018

thisarattr commented Sep 17, 2018 •

edited

Loading

dagnir commented Sep 17, 2018

steveloughran commented Jun 7, 2019

debora-ito commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

steveloughran commented Aug 1, 2024

S3 upload stream with http throws stream mark and reset error #1748

S3 upload stream with http throws stream mark and reset error #1748

Comments

thisarattr commented Sep 4, 2018 • edited Loading

varunnvs92 commented Sep 4, 2018

thisarattr commented Sep 5, 2018 • edited Loading

dagnir commented Sep 14, 2018

dagnir commented Sep 14, 2018

thisarattr commented Sep 17, 2018 • edited Loading

dagnir commented Sep 17, 2018

steveloughran commented Jun 7, 2019

debora-ito commented Jul 29, 2024

Reference:

github-actions bot commented Jul 29, 2024

steveloughran commented Aug 1, 2024

thisarattr commented Sep 4, 2018 •

edited

Loading

thisarattr commented Sep 5, 2018 •

edited

Loading

thisarattr commented Sep 17, 2018 •

edited

Loading