Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransferManager multipart upload from a FileInputStream instance fails with ResetException #427

Closed
lolski opened this issue May 9, 2015 · 31 comments

Comments

@lolski
Copy link

lolski commented May 9, 2015

Multipart upload of a FileInputStream using the following code will fail with ResetException: Failed to reset the request input stream. I also tried, with no luck, to wrap the FileInputStream in a BufferedReader, which supports marking (confirmed by checking that BufferedInputStream.markSupported() indeed returns true).

object S3TransferExample {
// in main class
def main(args: Array[String]): Unit = {
    ...
    val file = new File("/mnt/2gbfile.zip")

    val in = new FileInputStream(file)) // new BufferedInputStream(new FileInputStream(file))) --> FYI, using buffered input stream will still result in the same error
    upload("mybucket", "mykey", in, file.length, "application/zip").waitForUploadResult
    ...
}

val awsCred = new BasicAWSCredentials("access_key", "secret_key")
val s3Client = new AmazonS3Client(awsCred)
val tx = new TransferManager(s3Client)

def upload(bucketName: String,  keyName: String,  inputStream: InputStream,  contentLength: Long,  contentType: String,  serverSideEncryption: Boolean = true,  storageClass: StorageClass = StorageClass.ReducedRedundancy ):Upload = {
  val metaData = new ObjectMetadata
  metaData.setContentType(contentType)
  metaData.setContentLength(contentLength)

  if(serverSideEncryption) {
    metaData.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION)
  }

  val putRequest = new PutObjectRequest(bucketName, keyName, inputStream, metaData)
  putRequest.setStorageClass(storageClass)
  putRequest.getRequestClientOptions.setReadLimit(100000)

  tx.upload(putRequest)

}
}

Here is the stack trace:

Unable to execute HTTP request: mybucket.s3.amazonaws.com failed to respond
org.apache.http.NoHttpResponseException: mybuckets3.amazonaws.com failed to respond
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) ~[httpcore-4.3.2.jar:4.3.2]
    at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) ~[httpcore-4.3.2.jar:4.3.2]
    at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) ~[httpcore-4.3.2.jar:4.3.2]
    at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66) ~[aws-java-sdk-core-1.9.13.jar:na]
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) ~[httpcore-4.3.2.jar:4.3.2]
    at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) ~[httpclient-4.3.4.jar:4.3.4]
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:685) [aws-java-sdk-core-1.9.13.jar:na]
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460) [aws-java-sdk-core-1.9.13.jar:na]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295) [aws-java-sdk-core-1.9.13.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3710) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:2799) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:2784) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:259) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:193) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:125) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:129) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:50) [aws-java-sdk-s3-1.9.13.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
com.amazonaws.ResetException: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
  at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:636)
  at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460)
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3710)
  at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:2799)
  at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:2784)
  at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:259)
  at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:193)
  at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:125)
  at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:129)
  at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:50)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Resetting to invalid mark
  at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
  at com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
  at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:103)
  at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:139)
  at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:103)
  at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:634) 
@hansonchar
Copy link
Contributor

Hi @lolski,

The reason of the failure has something to do with the default buffer limit in a BufferedInputStream which you used to wrap the underlying file input stream. One way to fix this is to simply use:

val in = new FileInputStream(file)

and pass it to the request instead of the buffered input stream. The S3 Java Client is able to handle a FileInputStream without being constrained by any mark-and-reset limit.

However, in this case we recommend to use a simpler approach: you can directly specify the original file in the PutObjectRequest instead of specifying an input stream. The S3 Java Client will then figure out the optimal way to handle the file upload free of any mark-and-reset limit.

For completeness, suppose you had an input stream that is not associated with a file, it would still be NOT necessary to wrap it with a BufferedinputStream. Given an input stream in the request, the S3 Java Client will wrap it automatically as necessary. In such case, however, you would need to set the "read limit" (which is the maximum buffer size that could be consumed) as suggested in the error message:

com.amazonaws.ResetException: Failed to reset the request input stream;  
If the request involves an input stream, the maximum stream buffer size can be configured
via request.getRequestClientOptions().setReadLimit(int)

Hope this makes sense.

Regards,
Hanson

@lolski
Copy link
Author

lolski commented May 10, 2015

Hi @hansonchar,

I have confirmed that on SDK version 1.9.33, the exact same error happens even when I use a FileInputStream instead of a BufferedInputStream instance.

However, specifying a File instance works with no problem.

Here's the code excerpt:

object S3TransferExample {
// in main class
def main(args: Array[String]): Unit = {
    ...
    val file = new File("/mnt/2gbfile.zip")
    val in = new FileInputStream(file)
    upload("mybucket", "mykey", in, file.length, "application/zip").waitForUploadResult
    ...
}

val awsCred = new BasicAWSCredentials("access_key", "secret_key")
val s3Client = new AmazonS3Client(awsCred)
val tx = new TransferManager(s3Client)

def upload(bucketName: String,  keyName: String,  inputStream: InputStream,  contentLength: Long,  contentType: String,  serverSideEncryption: Boolean = true,  storageClass: StorageClass = StorageClass.ReducedRedundancy ):Upload = {
  val metaData = new ObjectMetadata
  metaData.setContentType(contentType)
  metaData.setContentLength(contentLength)

  if(serverSideEncryption) {
    metaData.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION)
  }

  val putRequest = new PutObjectRequest(bucketName, keyName, inputStream, metaData)
  putRequest.setStorageClass(storageClass)
  putRequest.getRequestClientOptions.setReadLimit(100000)

  tx.upload(putRequest)

}
}

@hansonchar
Copy link
Contributor

Hi @lolski,

If you looked at the release note of 1.9.34, you will see there is a bug fix exactly on this related to FileInputStream. Please give that a try when you got a chance.

(But, of course, specifying a file is the recommended approach.)

Regards,
Hanson

@hansonchar
Copy link
Contributor

On a side note, suppose you have a (non-file) input stream with a max expected size of 100,000 bytes, the read limit to set would need to be 1 extra byte more i.e. 100,001 so that the mark and reset will always work for 100,000 bytes or less.

Regards,
Hanson

@lolski
Copy link
Author

lolski commented May 11, 2015

@hansonchar Does that rule apply to file input stream too? I think it's good to add this info to the docs: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/RequestClientOptions.html#setReadLimit(int)

@hansonchar
Copy link
Contributor

Nop. It should just work if you specified a FileInputStream without any additional configuration, assuming 1.9.34+.

Agree on the javadoc.

@lolski
Copy link
Author

lolski commented May 11, 2015

@hansonchar then I want to confirm that the problem still persist on 1.9.3.4. I will edit the title of this issue appropriately.

You should be able to reproduce it by allocating a large file e.g. fallocate -l 10G /mnt/10gbfile, and uploading it using transfer manager with the code above.

Also, sometimes the upload will succeed especially when the file is not that large, e.g. 1GB or 2GB. I've had 1 success out of trying to upload 2GB file after 4 tries. This might mean that the retry part of the code is the cause.

@lolski lolski changed the title TransferManager multipart upload from a BufferedInputStream(FileInputStream) instance fails with ResetException TransferManager multipart upload from a FileInputStream instance fails with ResetException May 11, 2015
@hansonchar
Copy link
Contributor

Hi @lolski,

This is because the FileInputStream got wrapped by TransferManager into a different type of stream for multi-part uploads before passing it to the low-level S3 client, and therefore the stream got treated as if it needed memory buffering. I think the fixes should be rather straightforward. Will look into this.

@hansonchar
Copy link
Contributor

I just tested a fix and got a 10G file uploaded using TransferManager with FileInputStream as the input in the request. Will include the fix in the next release.

@lolski
Copy link
Author

lolski commented May 13, 2015

@hansonchar thanks

@lyfbitsadmin
Copy link

@hansonchar Do you know which version of the SDK you have released the fix in? I am still seeing this error and I am using AWS Java SDK version 1.10.20.

@juiceblender
Copy link

@hansonchar I'm also seeing this with AWS Java SDK 1.10.15. Particularly for large files > 60GB using just a normal InputStream, the transfer manager seems to be wrapping the stream into a mark supported stream which eventually fails with the same error.

@lolski
Copy link
Author

lolski commented Dec 3, 2015

@hansonchar I agree with the above posts about the error still recurring. I am fortunate that it's feasible for me to simply use the file-based method instead

@garretthall
Copy link

I'm having success setting the multi-part size to the buffer size (this way the part can always be reset in case of connection failure):

      val uploader = new TransferManager(...)
      val request = new PutObjectRequest(...)

      // set the buffer size (ReadLimit) equal to the multipart upload size, allowing us to resend data if the connection breaks
      request.getRequestClientOptions.setReadLimit(TEN_MB)
      uploader.getConfiguration.setMultipartUploadThreshold(TEN_MB)

      val upload = uploader.upload(request)

adamhooper added a commit to overview/overview-server that referenced this issue Dec 11, 2015
This should fix sporadic uploading errors, and errors that occurred when
uploading huge files (aws/aws-sdk-java#427).
Funnily enough, we were always using tempfiles for uploads anyway.

[finishes #109975976]
@juiceblender
Copy link

We managed to get around the problem through implementing our own mark and resettable stream by wrapping a FileChannel:

public class SeekableByteChannelInputStream extends ChannelInputStream {
    public final SeekableByteChannel ch;
    public long markPos = -1;

    public SeekableByteChannelInputStream(final SeekableByteChannel channel) {
        super(channel);
        this.ch = channel;
    }

    @Override
    public long skip(final long n) throws IOException {
        final long position = Math.max(0, Math.min(ch.size(), ch.position() + n));
        final long skipped = Math.abs(position - ch.position());

        ch.position(position);

        return skipped;
    }


    @Override
    public synchronized void mark(final int readlimit) {
        try {
            markPos = ch.position();

        } catch (IOException e) {
            throw Throwables.propagate(e);
        }
    }

    @Override
    public synchronized void reset() throws IOException {
        if (markPos < 0)
            throw new IOException("Resetting to invalid mark");

        ch.position(markPos);
    }

    @Override
    public boolean markSupported() {
        return true;
    }
}

We passed this stream to the TransferManager final InputStream s = new SeekableByteChannelInputStream(FileChannel.open(targetFile)) and there have been no problems so far. Hope it's a feasible alternative to anyone else still suffering from the same problem.

@rdifalco
Copy link

I'm having this issue using TransferManager to transfer an s3Object retrieved from one accoun into another account. The code is like this:

    try (InputStream input = new BufferedInputStream(s3Object.getObjectContent())) {
      UploadResult result = archiver
          .upload(bucketName, archivePath, input, uploadObjectMetadata)
          .waitForUploadResult();

Not using the BufferedInputStream in this case seems to be incredibly slow. Maybe there is a better way to transfer an s3 object from one account to another?

@spieden
Copy link

spieden commented Apr 5, 2016

@rdifalco Did you try @garretthall's suggestion? I have the same use case as you and it's working perfectly.

One small adjustment: I believe read limit needs to be set to the part size, not the multipart threshold. You can get this value via TransferManagerUtils.calculateOptimalPartSize, which should be the maximum number of bytes that'll be buffered for a given upload.

@rdifalco
Copy link

rdifalco commented Apr 8, 2016

@spieden are you suggesting the following?

    archiver.getConfiguration().setMultipartUploadThreshold(TEN_MB);

And then to set the BufferedInputStream buffer size and the client options read Limit to the calculateOptimalPartSize result? Doesn't that create a chicken and egg issue since the calculate part size method needs the PutRequest which requires the BufferedInputStream that you need to size?

@rdifalco
Copy link

rdifalco commented Apr 8, 2016

Now I'm starting to question the value of this. Is it better to have optimal part sizes instead of part sizes I feel comfortable having completely buffered? If there is a reset error then I just retry the entire operation myself instead of relying solely on the AWS SDK to retry it for me. What do you think @hansonchar?

@stevematyas
Copy link

stevematyas commented Nov 4, 2016

What's the official/unofficial fix / implementation approach to avoid the ResetException when using the an InputStream (not FileInputStream or File)?

@kiiadi : What do you or your colleagues recommend?

@mcanlas
Copy link

mcanlas commented Jan 5, 2017

We've been running into this issue even after putting @garretthall 's fix in place. Any ideas?

@varunnvs92
Copy link
Contributor

varunnvs92 commented Jan 18, 2017

Just a quick summary of the issue and best practices:
Problem summary
When uploading objects to Amazon S3 using streams (either through S3 client or Transfer Manager), it is possible to run into network connectivity or timeout issues. The AWS Java SDK by default attempts to retry these failed transfers. The input stream is marked before the start of transfer and reset before retrying. The SDK recommends customers to use resettable streams (streams that support mark and reset operations). If the stream does not support mark and reset, then the SDK throws ResetException when there are any transient failures and retries are enabled.
 
Best Practices

  1. The most reliable way to avoid ResetException is to provide data via File or FileInputStream which can be handled by Java SDK without being constrained by any mark-and-reset limit.
  2. If the stream is not FileInputStream but supports mark/reset, you can set the mark limit using RequestClientOptions#setReadLimit. The default value is 128KB. Setting this value to one byte greater than the size of stream will reliably avoid ResetExceptions. For example, if the max expected size of stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes so that the mark and reset will always work for 100,000 bytes or less. Please be aware that this might cause some streams to buffer that number of bytes into memory.

@dakshinrajavel
Copy link

I am not sure what will be ideal value for resetLimit(). The data files I like to upload to S3 is in the range of 8GB to 15GB. I have set initial partSize as 5GB. In this case, what will be ideal value for resetLimit()? For now, I have set 10MB as the readLimit. I like to get recommendations on what is the value that is ideal for my use case.
Example:
new UploadPartRequest().withBucketName(bucketName).withKey(bucketKey) .withUploadId(initResponse.getUploadId()).withPartNumber(i).withInputStream(streamUpload).withPartSize(partSize);
int readLimit = 10485760 ;
uploadRequest.getRequestClientOptions().setReadLimit(readLimit);
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());

@varunnvs92
Copy link
Contributor

If you are using stream to upload an object, SDK will do a single upload and can't upload in parts. So in that case, the read limit would be object size (8 - 15gb in your case) + 1. If no content length is specified, the http client might buffer entire stream into memory. So it is recommended to provide content length when uploading via stream.

@shorea
Copy link
Contributor

shorea commented Sep 21, 2017

Please note that when uploading from a stream the readLimit will result in buffering that much data into memory so it's recommended to set that conservatively. Uploading from a file is a more reliable and performant option as we can know the content length from the length of the file and reproduce the content as much as needed for retries.

@xerikssonx
Copy link

Hi, I've faced with almost the same problem when use S3ObjectInputStream. https://stackoverflow.com/questions/46360321/unable-to-reset-stream-after-calculating-aws4-signature

@MikeFHay
Copy link

MikeFHay commented Nov 10, 2017

Best Practices

The most reliable way to avoid ResetException is to provide data via File or FileInputStream which can be handled by Java SDK without being constrained by any mark-and-reset limit.

If the stream is not FileInputStream but supports mark/reset, you can set the mark limit using RequestClientOptions#setReadLimit. The default value is 128KB. Setting this value to one byte greater than the size of stream will reliably avoid ResetExceptions. For example, if the max expected size of stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes so that the mark and reset will always work for 100,000 bytes or less. Please be aware that this might cause some streams to buffer that number of bytes into memory.

I don't see how this is an acceptable workaround. There are many reasons why I wouldn't want to write data to a temporary file (disk usage, file permissions, security concerns), and obviously not all data has a known-in-advance size or fits into memory, so having to permit TransferManager to buffer the whole thing in-memory is also inadequate.

Why doesn't TransferManager simply buffer the batch size of data that it sends? Then retrying a part upload is trivial.

@jjqq2013
Copy link

jjqq2013 commented Jan 26, 2018

I'v investigated this issue, it was a long story.

The conclusion is: pass a system property to java by insert following options to java command line

-Dcom.amazonaws.sdk.s3.defaultStreamBufferSize=YOUR_MAX_PUT_SIZE

See https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1668

This tells AmazonS3Client to set max appropriate unwindable buffer size.

Edit 20181102: the link should be setReadLimit

@thauk-copperleaf
Copy link

I'v investigated this issue, it was a long story.

The conclusion is: pass a system property to java by insert following options to java command line

-Dcom.amazonaws.sdk.s3.defaultStreamBufferSize=YOUR_MAX_PUT_SIZE

See https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1668

This tells AmazonS3Client to set max appropriate unwindable buffer size.

Thanks for posting the explanation and the link! However your link is now incorrect. I believe the correct canonical link is:

@jjqq2013
Copy link

jjqq2013 commented Nov 2, 2018

@thauk-copperleaf thank you, you are right.

@HagarJNode
Copy link

HagarJNode commented Sep 23, 2022

I just had this problem streaming content larger than 5GB from a ftp server to S3. I tried most of all what people wrote here and linked to other issues or sites. None of them worked, but I finally got it working. It looks like AWS need some time after upload is done, to do their stuff, and doing that, the connection is getting closed. I found a setting that make it work:

public AmazonS3 amazonS3(final Regions _regions,
                         final AWSCredentialsProvider _awsCredentialsProvider)
  {
    final ClientConfiguration clientConfiguration = new ClientConfiguration();

    clientConfiguration.setConnectionMaxIdleMillis(300_000); // Give AWS 5 minutes to do their stuff

    return
            AmazonS3Client
                    .builder()
                    .withRegion(_regions)
                    .withClientConfiguration(clientConfiguration)
                    .withCredentials(_awsCredentialsProvider)
                    .build();
  }

Maybe an answer to the question @stevematyas asked #427 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests