Reduced timeout and log output for IMDS timeout #192

willarmiros · 2020-08-07T19:26:30Z

Description of changes:
Currently, EC2Plugin.isEnabled calls the IMDS endpoint to determine whether it is operating on EC2. This is fine when we are operating on EC2, since from my tests it takes <3 milliseconds to reach IMDS on an instance.

However, this is problematic as we move toward an opt-out plugin model. In this model, we check isEnabled for each plugin regardless of environment, so both failing and passing should be fast. Currently EC2Plugin.isEnabled() takes 4 seconds to fail, because the timeout for the IMDS endpoint check is 2 seconds and it checks IMDSv1 and v2 before failing. This is far too long for non-EC2 customers using opt-out plugins.

This PR lowers the timeout quite a bit and suppresses logging if we hit it to be friendlier to non-EC2, opt-out plugin customers. Given how fast successful IMDS requests are, the lower timeout should not impact the plugin's detection ability. To be even faster, we could avoid checking the second check to IMDSv1 when the v2 request times out, but that would require a bit of weird throwing logic.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

anuraaga · 2020-08-08T02:16:12Z

aws-xray-recorder-sdk-core/src/main/java/com/amazonaws/xray/plugins/EC2MetadataFetcher.java

@@ -46,7 +47,7 @@
        AMI_ID,
    }

-    private static final int TIMEOUT_MILLIS = 2000;
+    private static final int TIMEOUT_MILLIS = 100;


What do you think about keeping a higher (maybe not this high) read timeout and only lowering connect?

I'm wondering wether there's a better detection mechanism for EC2, before we actually start up the plug-in... similar to how we detect Lambda environments from env vars. Failing fast is good too though.

I have a feeling it's possible to speed up the positive case but we may be stuck on the negative case. EC2 provides low level customization of everything so env cars, file system can't be relied. I found a post where the hostname will end with an Amazon suffix by default but that's only if it's not overridden...

@willarmiros Out of curiosity what JVM did you test the timeout on? Java 11+ have a completely revamped network stack and I wonder if this may have improved. I think most tools fail fast when an IP is completely unreachable, which should be what's happening with metadata service which uses a link local, not public, IP address.

@awssandra as Rag said, can't rely on much other than IMDS. There's no single file to check for that exists on all instances on Windows & Linux. Hostname is overridden in VPCs which many EC2 Customers use.

@anuraaga I tested the timeout using Java 8. I can change it to have say a 100 ms Connect timeout and 1 second read timeout?

I think most tools fail fast when an IP is completely unreachable

Do you mean they throw an exception before even timing out? Would this reliably happen in Java 8 & 11?

Ah I just meant that newer network stack might through an exception right away regardless of the timeout value. Not a big deal mostly just curious :)

reduced timeout and log output for imds

afbcb56

willarmiros requested review from anuraaga and shengxil as code owners August 7, 2020 19:26

anuraaga approved these changes Aug 8, 2020

View reviewed changes

separated read and connect timeouts

410e5ea

anuraaga merged commit 3d2c733 into aws:master Aug 10, 2020

willarmiros mentioned this pull request Aug 17, 2020

added lazy initialization to improve withDefaultPlugins performance #194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced timeout and log output for IMDS timeout #192

Reduced timeout and log output for IMDS timeout #192

willarmiros commented Aug 7, 2020

anuraaga Aug 8, 2020

awssandra Aug 8, 2020 •

edited

anuraaga Aug 8, 2020

willarmiros Aug 10, 2020

willarmiros Aug 10, 2020

anuraaga Aug 10, 2020

Reduced timeout and log output for IMDS timeout #192

Reduced timeout and log output for IMDS timeout #192

Conversation

willarmiros commented Aug 7, 2020

anuraaga Aug 8, 2020

Choose a reason for hiding this comment

awssandra Aug 8, 2020 • edited

Choose a reason for hiding this comment

anuraaga Aug 8, 2020

Choose a reason for hiding this comment

willarmiros Aug 10, 2020

Choose a reason for hiding this comment

willarmiros Aug 10, 2020

Choose a reason for hiding this comment

anuraaga Aug 10, 2020

Choose a reason for hiding this comment

awssandra Aug 8, 2020 •

edited