-
Notifications
You must be signed in to change notification settings - Fork 964
Description
Describe the bug
While changing from using our on-premise proxies for s3 traffic to new cloudproxies and all s3 traffic is now supposed to go through Gateway VPC Endpoints instead of previously going through the on-premise proxies. Since this change we have started seeing intermittent socket timeout errors when writing to s3. We have several microservices running in EKS and our services use the java aws v2 sdk to communicate - specifically, we use the S3TransferManager, configured with the S3AsyncClient created using crtBuilder() for writing. Now that we are going through the Gateway VPC endpoint we have started to see these socket timeouts intermittently.
As a side note, we had to upgrade from version 2.28.6 to 2.36.3 of the aws sdk v2 and additionally upgraded from software.amazon.awssdk.crt:aws-crt:0.30.9 to software.amazon.awssdk.crt:aws-crt:0.39.4 so that the library would honor our NO_PROXY environment variable which looks like below. This ensures that are s3 requests do not go through the proxy, but instead use the gateway vpc endpoint.
export NO_PROXY=.jpmchase.net,.jpmorganchase.com,localhost,127.0.0.1,169.254.169.254,10.100.0.1,.us-east-1.es.amazonaws.com,.eks.amazonaws.com,sts.us-east-1.amazonaws.com,secretsmanager.us-east-1.amazonaws.com,lambda.us-east-1.amazonaws.com,.s3.us-east-1.amazonaws.com,*.s3.amazonaws.com,.s3.amazonaws.com,s3.amazonaws.com
We also have a network policy in place to allow egress from EKS to s3:
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: default.aws-s3-service
namespace: hc
spec:
order: 301
selector: all()
types:
- Egress
egress:
- action: Allow
protocol: TCP
destination:
domains:
- '.s3.amazonaws.com'
- '.s3.us-east-1.amazonaws.com'
ports:
- 443
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
We expected to use the S3CrtAsyncClient like we were before without any issues. We were not getting intermittent timeouts previously.
Current Behavior
The current behavior is intermittent socket timeout errors. We can write several files to s3 before getting a timeout exception. It can actually sometimes succeed on one file that it later fails on - both the same file.
This is the stacktrace for the error we get intermittently:
software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: socket operation timed out.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.handleError(S3CrtResponseHandlerAdapter.java:184)
at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onFinished(S3CrtResponseHandlerAdapter.java:148)
at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onFinished(S3MetaRequestResponseHandlerNativeAdapter.java:25)
Reproduction Steps
@Bean
S3TransferManager s3TransferManager() {
log.info("Creating S3 Transfer Manager to the aws region - {}", region)
S3AsyncClient s3AsyncClient =
S3AsyncClient.crtBuilder()
.credentialsProvider(DefaultCredentialsProvider.create())
.region(Region.of(region))
.targetThroughputInGbps(20.0)
.minimumPartSizeInBytes(partSizeInMB * MB)
.build()
S3TransferManager transferManager =
S3TransferManager.builder()
.s3Client(s3AsyncClient)
.build()
return transferManager
}
Possible Solution
This is not a solution, but a workaround that points to the S3CrtAsyncClient possibly having a bug.
Please note that if we do not use the S3CrtAsyncClientBuilder, but instead use the DefaultS3AsyncClientBuilder like below, we don't get the intermittent socket timeout - everything works fine:
S3AsyncClient s3AsyncClient =
S3AsyncClient.builder()
.multipartEnabled(true)
.credentialsProvider(DefaultCredentialsProvider.create())
.region(Region.of(region))
.build()
Additional Information/Context
I have some wire trace logging but I can't paste that here. I can provide if someone looks at this issue.
AWS Java SDK version used
2.36.3
JDK version used
java version "17.0.16" 2025-07-15 LTS Java(TM) SE Runtime Environment (build 17.0.16+12-LTS-247) Java HotSpot(TM) 64-Bit Server VM (build 17.0.16+12-LTS-247, mixed mode, sharing)
Operating System and version
Red Hat Enterprise Linux 8.10 (Ootpa)