Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3TransferManager with CRT is failing with throttling issue #4048

Closed
siva-subramani opened this issue May 28, 2023 · 7 comments
Closed

S3TransferManager with CRT is failing with throttling issue #4048

siva-subramani opened this issue May 28, 2023 · 7 comments
Assignees
Labels
bug This issue is a bug. p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. transfer-manager

Comments

@siva-subramani
Copy link

siva-subramani commented May 28, 2023

Describe the bug

our storage structure in S3 - YYYY/MM/DD/FOLDER1/SUBFOLDER1/EVENTID/EVENT_REQ and YYYY/MM/DD/FOLDER1/SUBFOLDER1/EVENTID/EVENT_RES. For every event requests and responses are loaded into S3 before event gets published to broker.

YYYY/MM/DD/FOLDER1/SUBFOLDER1/EVENTID/ is the prefix and EVENT_REQ and EVENT_RES are object names.

Expected Behavior

We would like to have successful upload to S3 at load.

Current Behavior

We see lines - software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: Response code indicates throttling

which seems to be coming from S3CrtResponseHandlerAdapter

private void handleError(int crtCode, int responseStatus, byte[] errorPayload) {
        if (isErrorResponse(responseStatus) && errorPayload != null) {
            onErrorResponseComplete(errorPayload);
        } else {
            SdkClientException sdkClientException =
                SdkClientException.create("Failed to send the request: " +
                                          CRT.awsErrorString(crtCode));
            failResponseHandlerAndFuture(sdkClientException);
        }
    }

Reproduction Steps

Our S3TransferManager config code below, with this setting when we do around 3500 PUT/GET request per second to S3 we see the issue in our spring boot project

public S3TransferManager s3TransferManager(AwsCredentialsProvider awsCredentialsProvider) { 
            S3AsyncClient s3AsyncClient = S3AsyncClient.crtBuilder()
                                .credentialsProvider(awsCredentialsProvider)
                                .maxConcurrency(200)
                                .checksumValidationEnabled(true)
                                .targetThroughputInGbps(10.0)
                                .minimumPartSizeInBytes(1500 * KB)
                                .build();
            
            S3TransferManager s3TransferManager  =  S3TransferManager.builder()
                                .s3Client(s3AsyncClient)
                                .executor(executor())
                                .build();
}
 
 private Executor executor() {
        ThreadPoolExecutor executor = new ThreadPoolExecutor(0, 300,
                60, TimeUnit.SECONDS,
                new LinkedBlockingQueue<>(1200),
                new ThreadFactoryBuilder().threadNamePrefix("exec").build());
        executor.allowCoreThreadTimeOut(true);
        return executor;
    }                   

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.0

JDK version used

18

Operating System and version

Linux

@siva-subramani siva-subramani added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 28, 2023
@siva-subramani
Copy link
Author

Adding to the problem statement - We use KMS and CRR to another bucket.

@siva-subramani
Copy link
Author

Disabling CRR helped to come out of this issue. Any thoughts?

@debora-ito
Copy link
Member

@siva-subramani can you provide the full stacktrace with the error?

@debora-ito debora-ito self-assigned this Jun 8, 2023
@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Jun 8, 2023
@siva-subramani
Copy link
Author

@debora-ito

Thank you for looking into this.

This is the stack trace when we use the CRT client with transfer manager

"software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: Response code indicates throttling" stackTrace=software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:65),software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51),java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934),java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911),java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510),java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162),software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79),java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863),java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841),java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510),java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162),software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103),software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:184),software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159),java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863),java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841),java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510),java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162),software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79),java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863),java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841),java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510),java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162),software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:103),java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863),java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841),java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510),java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162),software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:165),java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863),java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841),java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482),java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136),java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635),java.base/java.lang.Thread.run(Thread.java:833)

We also tried key space pattern to C[1-9]YYYY/C[1-9]MMM/C[1-9]DD/FOLDER1/SUB_FOLDER1/FILE which worked a little better with NettyNioAsyncHttpClient

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jun 10, 2023
@debora-ito
Copy link
Member

Did a little investigation on the CRT code, and CRT will show "Response code indicates throttling" when the status code 503 Slow Down is returned from S3 (references in the aws-c-s3 repo: ref.1 and ref.2).

S3 will return 503 Slow Down errors when you make requests at a high request rate that's close to the rate limit. The 3,500 requests per second you are executing seems to be close to the S3 limits according to the documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html. The documentation also describes ways to prevent 503s.

S3TransferManager would automatically retry with exponential backoff. Is it retrying?

@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. transfer-manager p3 This is a minor priority issue labels Jul 6, 2023
@siva-subramani
Copy link
Author

@debora-ito That is helpful. Issue occurs when we keep at 1200 TPS consistently but with CRR in place to replicate to another bucket, we are hitting the 3500 limit around 1200 TPS. We have added full jitter and also worked with support to help on this. For now, we are working on options like different prefix patterns for S3 key which seem to help a bit.

Appreciate and thank you for looking further into this and referring to the right articles. I will mark this issue closed now.

@github-actions
Copy link

github-actions bot commented Jul 6, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. transfer-manager
Projects
None yet
Development

No branches or pull requests

2 participants