Skip to content

S3Utils can only upload files with max size of 5GB #11059

@splunk-tschetter

Description

@splunk-tschetter

If S3Utils.uploadFileIfPossible is used to push up a file greater than 5GB, it will hit S3 limits for object sizes. In S3, a single PutRequest can at max handle 5GB and a multipart upload should be used for files that are larger.

Affected Version

0.18 and main branch (

final PutObjectRequest putObjectRequest = new PutObjectRequest(bucket, key, file);
)

Description

If S3Utils.uploadFileIfPossible is used to push up a file greater than 5GB, it will hit S3 limits for object sizes. In S3, a single PutRequest can at max handle 5GB and a multipart upload should be used for files that are larger.

We ran into this because of a task log trying to be uploaded exceeding the size limits. I actually don't know the exact size of the file, but it failed with

java.lang.RuntimeException: com.amazonaws.services.s3.model.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size (Service: Amazon S3; Status Code: 400; Error Code: EntityTooLarge; Request ID: XXXX; S3 Extended Request ID:XXXX; Proxy: null), S3 Extended Request ID: XXXX
	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:145) ~[?:?]
	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskLog(S3TaskLogs.java:122) ~[?:?]
	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:369) [druid-indexing-service-0.18.0]
	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:132) [druid-indexing-service-0.18.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size (Service: Amazon S3; Status Code: 400; Error Code: EntityTooLarge; Request ID: XXXX; S3 Extended Request ID: XXXX; Proxy: null)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) ~[aws-java-sdk-core-1.11.837.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062) ~[aws-java-sdk-s3-1.11.837.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008) ~[aws-java-sdk-s3-1.11.837.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394) ~[aws-java-sdk-s3-1.11.837.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5950) ~[aws-java-sdk-s3-1.11.837.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1812) ~[aws-java-sdk-s3-1.11.837.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1772) ~[aws-java-sdk-s3-1.11.837.jar:?]
	at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.putObject(ServerSideEncryptingAmazonS3.java:110) ~[?:?]
	at org.apache.druid.storage.s3.S3Utils.uploadFileIfPossible(S3Utils.java:225) ~[?:?]
	at org.apache.druid.storage.s3.S3TaskLogs.lambda$pushTaskFile$0(S3TaskLogs.java:138) ~[?:?]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.18.0]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.18.0]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.18.0]
	at org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:87) ~[?:?]
	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:136) ~[?:?]
	... 7 more

This stackoverflow has me believing that the limit is 5GB so that's what we were hitting: https://stackoverflow.com/questions/26319815/entitytoolarge-error-when-uploading-a-5g-file-to-amazon-s3

This AWS documentation agrees that a single PutObject maxes out at 5GB: https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html

And points to this for how to do a multi-part:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions