Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on sdk v2 - Getting java.net.SocketTimeout Error when loading Token from IMDSv2( http://169.254.169.254/latest/api/token ) #3846

Closed
gvsharma opened this issue Mar 20, 2023 · 5 comments
Labels
bug This issue is a bug. p3 This is a minor priority issue

Comments

@gvsharma
Copy link

gvsharma commented Mar 20, 2023

Describe the bug

I am running a kotlin KTOR service running on AWS SDK - 2.17.100, Java 11.

The below exception thrown while fetching any of these AWS DdynamDb client, S3Client, CloudWatchMetricPublisher.

Code Snippet:

@Provides @Singleton fun provideSimpleDynamoDbClient(): DynamoDbAsyncClient { return DynamoDbAsyncClient.create() }

@Provides @Singleton fun provideCloudWatchMetricPublisher(): CloudWatchMetricPublisher { return CloudWatchMetricPublisher.builder().apply { namespace(config.getProperty("meterRegistry.cloudwatch.sqs.namespace")) metricLevel(MetricLevel.TRACE) }.build() }

@Provides @Singleton fun provideS3Client(): S3Client = S3ClientImpl( s3AsyncClientWrapper = S3AsyncClientWrapper(S3AsyncClient.create()), )

Error Message: java.net.SocketTimeoutException: Read timed out from this API http://169.254.169.254/latest/api/token

I have added the error trace below, it is having 1second timeout.

I have checked the AWS source files here and here the calls for fetching the token are blocked, and it wont be a configurable value.

I am also adding the the SDK Metrics "CredentialsFetchDuration" below.

Screenshot 2023-03-20 at 3 57 10 PM

I have referred #3448 and they are not as what i am facing.

I have also here, in my case every aws resource is throwing this error.

Please advise on how to fix this issue.

Expected Behavior

The API should handle the error itself, or AWS should give the configurable way to reduce the

Current Behavior

high latency and error rate in the service because of this exception.

Reproduction Steps

AWS SDK 2.17.100
Java - 11
Ktor service,
try to create singleton s3client or dynamodb client and try to use them to send or receive message.

Possible Solution

  • instead of throwing this to the caller, the SDK should handle itself and will throw a warning.
  • The SDK can provide a configurable timeout.
  • Since this is the issue at AWS IMDSv2 they can add about this in documentation.

Additional Information/Context

Error trace:
java.net.SocketTimeoutException: Read timed out
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:788)
at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:723)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1615)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
at software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:116)
at software.amazon.awssdk.regions.internal.util.EC2MetadataUtils.getToken(EC2MetadataUtils.java:412)
at software.amazon.awssdk.regions.internal.util.EC2MetadataUtils.getItems(EC2MetadataUtils.java:379)
at software.amazon.awssdk.regions.internal.util.EC2MetadataUtils.getData(EC2MetadataUtils.java:348)
at software.amazon.awssdk.regions.internal.util.EC2MetadataUtils.getData(EC2MetadataUtils.java:344)
at software.amazon.awssdk.regions.internal.util.EC2MetadataUtils.getEC2InstanceRegion(EC2MetadataUtils.java:228)
at software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider.tryDetectRegion(InstanceProfileRegionProvider.java:68)
at software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider.getRegion(InstanceProfileRegionProvider.java:52)
at software.amazon.awssdk.regions.providers.AwsRegionProviderChain.getRegion(AwsRegionProviderChain.java:51)
at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.regionFromDefaultProvider(AwsDefaultClientBuilder.java:217)
at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.resolveRegion(AwsDefaultClientBuilder.java:199)
at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.finalizeChildConfiguration(AwsDefaultClientBuilder.java:145)
at software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.asyncClientConfiguration(SdkDefaultClientBuilder.java:184)
at software.amazon.awssdk.services.cloudwatch.DefaultCloudWatchAsyncClientBuilder.buildClient(DefaultCloudWatchAsyncClientBuilder.java:29)
at software.amazon.awssdk.services.cloudwatch.DefaultCloudWatchAsyncClientBuilder.buildClient(DefaultCloudWatchAsyncClientBuilder.java:22)
at software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.build(SdkDefaultClientBuilder.java:133)
at software.amazon.awssdk.services.cloudwatch.CloudWatchAsyncClient.create(CloudWatchAsyncClient.java:142)

AWS Java SDK version used

2.17.100

JDK version used

11

Operating System and version

AWS Linux

@gvsharma gvsharma added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 20, 2023
@gvsharma
Copy link
Author

gvsharma commented Mar 21, 2023

here is the EndPointProvider.

Line 116 here is causing SocketTimeoutException with 1 second timeout. Please have a look.

@debora-ito
Copy link
Member

@gvsharma I'm sorry to hear you're frustrated with the IMDS token fetching.

The high latency is coming from the IMDS side, and since the SDK needs to fetch the security credentials and/or the region from the IMDS endpoint, the SDK client can't be instantiated without that info. We can't simply throw a warning in this case.

The client-side metrics show that you occasionally see the 1s timeout, which can be attributed to network connectivity issues. If you are not doing this already, we recommend you make the most use of the fetched region by reusing the same client during the whole application run (instead of creating a new client for each request, for example). If you are fetching credentials from IMDS, setting asyncCredentialUpdateEnabled to true in the InstanceProfileCredentialsProvider will reduce the chances of getting blocked waiting for new credentials; also, the failed requests should be retried by default.

For context, the timeout was 5s in the past, but we received requests to reduce it because 5s was too high, some teams wanted the request to fail faster.

@debora-ito debora-ito added p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Mar 30, 2023
@gvsharma
Copy link
Author

gvsharma commented Mar 30, 2023

@debora-ito Thanks for replying,

even though client side metrics see the 1s timeout, my uptime is getting affected with this error count. i am singleton scope for all clients.

I am using four clients (dynamo db client, sdk metric client, s3 client, sqs client), in all the cases for each ECS task. I believe fetching token from IMDS is for every 6 hours or whenever the task getting created.(please correct me if am wrong here).

If you see the APM screenshot attached, the IMDS call for same instance is happening with 2hours time which i guess shouldn't.

Note: I am not fetching credentials from IMDS with client side code.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Mar 30, 2023
@gvsharma
Copy link
Author

This issues closed , reason for the bug is mishandled scoping of objects in my application.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

2 participants