-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
If S3Utils.uploadFileIfPossible is used to push up a file greater than 5GB, it will hit S3 limits for object sizes. In S3, a single PutRequest can at max handle 5GB and a multipart upload should be used for files that are larger.
Affected Version
0.18 and main branch (
druid/extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3Utils.java
Line 272 in 8296123
| final PutObjectRequest putObjectRequest = new PutObjectRequest(bucket, key, file); |
Description
If S3Utils.uploadFileIfPossible is used to push up a file greater than 5GB, it will hit S3 limits for object sizes. In S3, a single PutRequest can at max handle 5GB and a multipart upload should be used for files that are larger.
We ran into this because of a task log trying to be uploaded exceeding the size limits. I actually don't know the exact size of the file, but it failed with
java.lang.RuntimeException: com.amazonaws.services.s3.model.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size (Service: Amazon S3; Status Code: 400; Error Code: EntityTooLarge; Request ID: XXXX; S3 Extended Request ID:XXXX; Proxy: null), S3 Extended Request ID: XXXX
at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:145) ~[?:?]
at org.apache.druid.storage.s3.S3TaskLogs.pushTaskLog(S3TaskLogs.java:122) ~[?:?]
at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:369) [druid-indexing-service-0.18.0]
at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:132) [druid-indexing-service-0.18.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size (Service: Amazon S3; Status Code: 400; Error Code: EntityTooLarge; Request ID: XXXX; S3 Extended Request ID: XXXX; Proxy: null)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) ~[aws-java-sdk-core-1.11.837.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062) ~[aws-java-sdk-s3-1.11.837.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008) ~[aws-java-sdk-s3-1.11.837.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394) ~[aws-java-sdk-s3-1.11.837.jar:?]
at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5950) ~[aws-java-sdk-s3-1.11.837.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1812) ~[aws-java-sdk-s3-1.11.837.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1772) ~[aws-java-sdk-s3-1.11.837.jar:?]
at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.putObject(ServerSideEncryptingAmazonS3.java:110) ~[?:?]
at org.apache.druid.storage.s3.S3Utils.uploadFileIfPossible(S3Utils.java:225) ~[?:?]
at org.apache.druid.storage.s3.S3TaskLogs.lambda$pushTaskFile$0(S3TaskLogs.java:138) ~[?:?]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.18.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.18.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.18.0]
at org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:87) ~[?:?]
at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:136) ~[?:?]
... 7 more
This stackoverflow has me believing that the limit is 5GB so that's what we were hitting: https://stackoverflow.com/questions/26319815/entitytoolarge-error-when-uploading-a-5g-file-to-amazon-s3
This AWS documentation agrees that a single PutObject maxes out at 5GB: https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html
And points to this for how to do a multi-part:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html