Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightsail credentials fetching from IMDS is slow #2678

Closed
2 of 3 tasks
midnight-wonderer opened this issue Mar 9, 2022 · 26 comments
Closed
2 of 3 tasks

Lightsail credentials fetching from IMDS is slow #2678

midnight-wonderer opened this issue Mar 9, 2022 · 26 comments
Labels
service-api General API label for AWS Services.

Comments

@midnight-wonderer
Copy link
Contributor

midnight-wonderer commented Mar 9, 2022

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
AWS STS takes seconds to respond to credential requests.
This is only my assumption; check below for the observed behavior.

Gem name
aws-sdk-s3 (1.111.2)

Version of Ruby, OS environment
ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [x86_64-linux]

To Reproduce (observed behavior)
I am running a docker container on Lightsail which makes use of Lightsail object storage.
I granted access to the storage via "Resource access" by attaching Lightsail instances to the bucket.
I reused the S3 client properly:

class S3ClientCache
  include ::Singleton

  def s3_client
    @_s3_client ||= ::Aws::S3::Client.new(
      region: ENV['S3_REGION'],
    )
  end
end

S3ClientCache.instance.s3_client

On the very first S3 request it will take 2 seconds in my environment to presign an S3 URL.
This is also true for subsequent credential updates.
Here I capture the moment when X-Amz-Credential changes:

I, [2022-03-08T22:45:12.250483 #80] INFO -- : [4f8ad353-b102-42c1-ac04-c366db624dc6] Processing by Tenant::APIController#entry as HTML
I, [2022-03-08T22:45:12.250549 #80] INFO -- : [4f8ad353-b102-42c1-ac04-c366db624dc6] Parameters: {"tenant"=>"MamyPoko"}
I, [2022-03-08T22:45:12.258026 #80] INFO -- : [4f8ad353-b102-42c1-ac04-c366db624dc6] Redirected to https://bucket-example.s3.ap-southeast-1.amazonaws.com/path/to/file/cfb936de1799956e43497e24143112d280ee0a37542860a2.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQ2SOE3UGR6ERY4BJ%2F20220308%2Fap-southeast-1%2Fs3%2Faws4_request&X-Amz-Date=20220308T224512Z&X-Amz-Expires=180&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEFkaDmFwLXNvdXRoZWFzdC0xIkgwRgIhAMfBG5GLgaN%2BgW11Kmw%2FXCoJ1FExxRw4SnVweJgdDnHSAiEAo%2Bl9MeG6kBirNBxakZdbk%2BzZX2uDIvv8fYC1CuGD5qkqjQQIwv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARACGgwwNTcwNzIxNjYxNTciDPvzW5DkJutlREbflCrhAx%2FO%2F3I7KGHpNBYBqpsq%2B8a4L%2B7kBimTanjBCJdBy%2Bvjo3%2B73D32qhV6A71wCVz1tCQJBVS7oSLcle%2FFMXTA7lfBU%2FOhaxU%2FXJiVukJjTh4KpTZcwvgichUVxALagWFQAfDk2ks2%2FUzpYCD1T76%2Fqfow4PxrtlBNazH17%2BJoJ6kmlfgUxraCnhfX%2F%2Bg3%2BorcOFS5fl5Urj%2F94tPWBINwcOiS5K239NazDMOGsDopfFsE3hl5hm8084kUGQSI1PFSeX2pXqiuGsNiN%2FqEPuKz2z7gAIr7VFVZqzJ9tzuHQkWsXLsd6Ug8D7QO%2B3dLEROo3Kb1iRq%2BnP7VGW4YIN7pV2E6hehm0%2BoCdY8xS8DJ20%2BZf0EnuMPU%2BVcv%2F12%2Bc8BJNHpUDxMVLICC9cG2r%2Ff%2BvuRcB50xKlQ87LGEbbqTSgxh8qELIqk9xm%2FqqO%2BOk1uoAYV2tUQ%2B0JX2%2BLeSKz5zc8j6xUqXHY7OP%2B%2BLeGKQhuLicLVFqxWLSUuGCB7sZ8TbYW0fMdLJqCtgp1Kf9HkIO8N4lwH3QU%2FnMOzQRIpT42BvZ4FA6TF%2FpbCrlFIXFkFONMuvxwUAmCjymTuSjzfJ0lHipUtawVbF0y5X4Vi0dzfTZpCqodHto45xbuKsWWYT7F4wr4yekQY6pAFVWYuE7ZIXKCT8c8n3TwLtV%2BB%2BvrCq3n%2FlpJicdeE%2FdcUxftOqnlEWPFpwP%2FZBEPCi%2FLduFlt8g1mHdnfe6tDt1jPlrlsI0VkQHK3hLFjJKnkdxe1MV8RsgF85x39gXOsTZ9%2B%2FfxIuLiNaCxAKQG40W9wfs1eevF3CbZsqoWeE8S%2B7AtNVBbgNl7b54voduEyZ1zcngI%2FmlbR29gcW%2FBgA%2Fc4cOA%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=bddf6357fd9d119cec222e5d336d79c8e0ec3b26e5c41a9da9ea4636e1d2be11
I, [2022-03-08T22:45:12.258229 #80] INFO -- : [4f8ad353-b102-42c1-ac04-c366db624dc6] Completed 302 Found in 8ms (MongoDB: 0.0ms | Allocations: 2463)
--
I, [2022-03-08T22:46:10.726935 #73] INFO -- : [b1809e63-b1b6-4598-86ec-e03dfa84278d] Processing by Tenant::APIController#entry as HTML
I, [2022-03-08T22:46:10.727178 #73] INFO -- : [b1809e63-b1b6-4598-86ec-e03dfa84278d] Parameters: {"tenant"=>"MamyPoko"}
I, [2022-03-08T22:46:12.750322 #73] INFO -- : [b1809e63-b1b6-4598-86ec-e03dfa84278d] Redirected to https://bucket-example.s3.ap-southeast-1.amazonaws.com/path/to/file/eecba882f9292553e49cca4c1842600b8a020f5dfb78d4bf.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQ2SOE3UGYS2UJBEZ%2F20220308%2Fap-southeast-1%2Fs3%2Faws4_request&X-Amz-Date=20220308T224612Z&X-Amz-Expires=180&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEF8aDmFwLXNvdXRoZWFzdC0xIkcwRQIhALO%2Bs19ZZLZ3N7N4p9uBRc2FStbeyhvUigHmb1lKWjoDAiB1iIGhcYdF0IvSu5KAHMRfO6t37tQ4lpTqIBU1MWEY6CqNBAjI%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAIaDDA1NzA3MjE2NjE1NyIMDC5KhdK%2F3kA6H6TrKuEDiLln50%2BdoQIby73nGAAa%2Fw4PQkPs6DJzZccOPkxxaPY7OhWTnwb4iNHm6HwP6u%2BBHHTg7qNqdatGsUGlU1Rm3p6NmICU49lShcXpB3rzxBHVjK56mTtnVhPjOHEjhkuTeucdtmLLtVSaaiw9CvuyNDnrFlyWOSM1TFskLT0hZ%2Fpqy548iHpFgOieCDgzpccrGoR3UjscBab7%2FM2sYJzHjQFGH3H%2BQSCjocA5rEGuPDnKucS1cJ7KZ2Yd6keGqEWWvkZjHg1s4zIdoJIuxGqXvmJafJlmMaR%2FOWFVI5y8Zz%2FOamwa%2FWDaEvPBT6yZixCt0gheXmfM%2BDe89xNjwdtuZPNmMfsQ07il6wcaRVB6YDU9aF%2F76l0V2wPupVGHpHhA%2Fnwir8xv%2BHB46jx8Y7aYx1E3KMfktndtCkZx%2FD3TPp%2Fu1YmJpZ0b4rK1qwNuMKwtfp7BOA8Bow53Ik05hdwwqQa42dDc92S8sQJl7%2F7haAD4wG6JGK%2BVOJVIeid%2F%2Bve5tvmX2ZBur2KHi2JjvKLAnTWFu8oToYIo9zoMP789L0ZBnTXgIBB8Pwc%2F9ZcpzvHEMdX74HF86N7ppoChaygIH30MxwZ9wNJfUvyXTt0PehXBU3LuwbRyWwjTkiX0p%2FKiwDDDs5%2BRBjqlARNwiIesLwp2EzwzCa1AhUl5fGG3dHF7bZ%2Fh8B8P6PmRhzzXaxVAX5ZtEX5IObQxzSMZ2GhnjEQuv0g2oJIHrUO6YbHumilPUZ%2BbO8HKruC5ANaRm1BX0P8jc%2BepKCvt6o5gyNUoguM5VuM0Nc%2BJeH%2FSBvj2%2FIMXPRGxnOqmOg%2FDwDcxBMdz5820WzCSobdKO7GDKDQtIj0rrkwKOmrjQHd5Qyps5A%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=dcb5668939995c17c148491dc879acd38a4c16d70dfbca0d2806a1b5e890e667
I, [2022-03-08T22:46:12.750548 #73] INFO -- : [b1809e63-b1b6-4598-86ec-e03dfa84278d] Completed 302 Found in 2023ms (MongoDB: 0.0ms | Allocations: 4046)

The regular one takes only 8ms to presign and redirect.
The one with STS credential update takes 2023ms.
Please note that both use the same S3 client instance to presign.
Is this a performance bug? 2 seconds seems too long for me, even if it happens occasionally.

Expected behavior
Faster response from aws-sdk-s3.

@midnight-wonderer midnight-wonderer added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 9, 2022
@alextwoods
Copy link
Contributor

2 seconds does seem like a long time to refresh credentials - is the S3_REGION the same as the region the container is running in? Do you see the same 2 second delay when creating other service clients?

We have an in progress PR (#2642) that does credential refreshes async - this won't help with the startup time, but should help with requests after that.

@midnight-wonderer
Copy link
Contributor Author

The instance and S3 bucket are in the same region, as it is the requirement for Lightsail Resource access permission type.

I don't use any other service with a similar access type apart from Lightsail object storage (S3). So, I can't create any other service clients, or at least can't test in a meaningful way. However, if you want to try scope down the issue, I am happy to try any code via the Rails console; you may test STS in isolation if you know the code to run.

Please note the timezone difference and optimize test cases for the minimal roundtrip. I don't mind if it needs to be multiple times, but better fast than slow.

Not sure if this is related: aws/aws-sdk-go#2972
I read about it but don't understand; the issue has something to do with HttpPutResponseHopLimit which, I supposed, is out of control for a Lightsail instance.

Glad to hear about the improvement underway.

@mullermp
Copy link
Contributor

mullermp commented Mar 9, 2022

Perhaps STS credential fetching is retried many times, resulting in delay? When you construct your credentials (which are you using?), you can pass in an STS client, with http_wire_trace: true. Maybe that will provide some insight.

@danielvdao
Copy link
Contributor

danielvdao commented Mar 9, 2022

There are a couple of optimizations that we had in mind and a few that we've tackled:

  1. Singleton - It looks like you did that.
  2. Depending on the circumstances of your application or how you're interacting with the client, we instantiated the client on Rails application boot (e.g. using a Rails initializer). I'm not too familiar with Lightsail so I don't know if you can do something similar.
  3. Warming the S3 connection - I believe S3 might need to warm / prime its DNS lookup upon the first request (First DynamoDB request is very slow aws-sdk-java-v2#1340)

The async refresh will help with long-tail latency like @alextwoods mentioned, but for initial latency cost, we saw that doing initializing on boot helped as well.

@mullermp
Copy link
Contributor

One thing I do notice is that you aren't providing any credentials directly, and you're using the provider chain. Is it possible you're fetching credentials with IMDS, and if so, are requests failing?

@midnight-wonderer
Copy link
Contributor Author

OK, here are some updates:

http_wire_trace

I tried http_wire_trace: true, nothing printed out because there is no request being sent with presigning.

The actual command
s3_client = ::Aws::S3::Client.new(region: ENV['S3_REGION'], http_wire_trace: true)
test_object = ::Aws::S3::Object.new(ENV['S3_BUCKET'], 'testfile.bin', client: s3_client)
test_object.presigned_url(:get, expires_in: 5.seconds.to_i)

But, I also identified that for the first presigning, the slowness is in ::Aws::S3::Client.new not actually on the first S3 operation.

irb(main):007:0> puts ::Benchmark.measure { ::Aws::S3::Client.new(region: ENV['S3_REGION']) }
0.001648 0.004603 0.006251 ( 2.011745)

So I can, potentially, warm up the singleton on application start.

IMDS

And when I tried fetching directly from IMDS (from within the container)

curl -v http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance

I got the result instantaneously.

$ curl -v http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance
*   Trying 169.254.169.254:80...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET /latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance HTTP/1.1
> Host: 169.254.169.254
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Accept-Ranges: bytes
< Content-Length: 1410
< Content-Type: text/plain
< Date: Thu, 10 Mar 2022 03:17:56 GMT
< Last-Modified: Thu, 10 Mar 2022 02:32:02 GMT
< Connection: close
< Server: EC2ws
< 
{
  "Code" : "Success",
  "LastUpdated" : "2022-03-10T02:32:21Z",
  "Type" : "AWS-HMAC",
  "AccessKeyId" : "ASIAQ2SOE3UG6RKWHR7V",
  "SecretAccessKey" : [snipped],
  "Token" : "IQoJb3JpZ2luX2VjEHsaDmFwLXNvdXRoZWFzdC0xIkgwRgIhAOa3jLdXXLYpZNL7ZT/QflIfDD32bM6n2Q1XG2zD7uFhAiEA8kmVi1B89DwC1gU7UzGLipZ7iChUMA8RE9cG9jqt+BYq4wMI5P//////////ARACGgwwNTcwNzIxNjYxNTciDD+D9lQJjqKIwNQYbyq3A0oqvgE/oS+TCVp45nq9JADD0z2W+Ie7B/I73fUSGl4+fXXAeke+Hxlyr99f4AN6ms9wOm6Wrp+oIGGbMfzrc5KeI2jqRDWqHjqrw3wrQjKQ03ise9Md1sfSnJKJGEoyFkss6a6bkPaROgQ87M7OtOMF1m0tgvVyc/2slMFgWkoRt3w07jMxAJ1iNDi3bCXqRt5V3XFBGGQCXfGjBabZeSMuMSRB3NTt3K2hH90050ACx907ceI8wPpmIF4I7SU1APomGhurBnfY+WQHgrMEaP1+ZVzUX4SANDNxVNdWD/Nw+yfpArs+0D89GXifcJ78+feIrsq+PTlkJNLxr03SSNoqFzUG+Gl+C38awC3YI64iQjijAjzI987nJiYiZf7QhTZ0WNmgO7Z931AVpjqu8+i6u37XOOsGa+GA74c+8w1rmQFZdtgL1p+mEXaKjUiLdzBqYO6i/2AMmhZox8XGVoXEvJG1jImqeJ3QhGHRw87WzKKoRVGyrR6aUr8OpT1JatG/Cc5f5Gv7lwN8Lx0TdHA604QdTy0/4h8JbPfcMOM3d2uFzIGu2hEfTAaidfVT3OGZZ9jhsK0wosalkQY6igLDtjBu/OvNwJHrbgGFVqXeD1q0cIaABV/MdyIegKsoV/xcedkZ48FG2fF2NFNea/KgjftG7bqo8FI9lx651lnLwcJWFq02qHSedd8mlqZvbGHYb60Lo2ZZ8VFsO5C6RJWBC7G+CF4RRBcoTvFTnebdr1jJmqsioJiscZHx8u/msqqE1oFTVcikBEC8dshjUqkde+jaX1fs9bTiyz2pWWlSb8XsRT3FeebePHCAP5b4Ges3WLbubsKAg48DA7vUQC8ee4eVMNATIKoQsGJ7YPUVtuhhcUyGAFRdFG5ZXWCPDEViQ1Q4iFFK7EEFkVk3hGhtdZNMBv/pAW/EJDlVpD9kCXo7EqrV6jnOQw==",
  "Expiration" : "2022-03-10T08:48:32Z"
* Closing connection 0

@alextwoods
Copy link
Contributor

Since the first presigning is in client creation that likely means that the time is being taken by the CredentialProviderChain (which is attempting to figure out which type of credentials are available). And http_wire_trace wouldn't work on the client config since the issue may be in the credential provider used. It sounds like you're using InstanceProfileCredentials and not STS (assume role) credentials?

Sorry for the back and forth, but can you try:

client_explicit = nil
puts ::Benchmark.measure {
  client_explicit = Aws::S3::Client.new(region: ENV['S3_REGION']), credentials: InstanceProfileCredentials.new(http_debug_output: $stdout))
}
puts client_explicit.config.credentials.inspect

client_default = nil
puts ::Benchmark.measure {
  client_default = Aws::S3::Client.new(region: ENV['S3_REGION']))
}
puts client_default.config.credentials.inspect

@alextwoods
Copy link
Contributor

Also good to see confirmation that the issue is likely not related to setting the hops correctly (based on the curl).

@mullermp
Copy link
Contributor

mullermp commented Mar 10, 2022

I was suggesting wire trace on the STS client used for AssumeRoleCredentials. Of course it won't print anything for pre signing

@mullermp
Copy link
Contributor

It's not clear what credentials are being used here. The issue post says the slowness is with STS, but I don't see any STS related logging that indicates it's being used.

@midnight-wonderer
Copy link
Contributor Author

midnight-wonderer commented Mar 10, 2022

Luckily, I am still up today; it is almost 2 AM here.

puts ::Benchmark.measure {
  client_explicit = Aws::S3::Client.new(region: ENV['S3_REGION'], credentials: Aws::InstanceProfileCredentials.new(http_debug_output: $stdout))
}

prints

opening connection to 169.254.169.254:80...
opened                                                                                                                                
<- "PUT /latest/api/token HTTP/1.1\r\nUser-Agent: aws-sdk-ruby3/3.125.5\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: 169.254.169.254\r\nContent-Length: 0\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\n"                                              
<- ""                                                                                                                                 
Conn close because of error Net::ReadTimeout, and retry                                                                               
opening connection to 169.254.169.254:80...                                                                                           
opened                                                                                                                                
<- "PUT /latest/api/token HTTP/1.1\r\nUser-Agent: aws-sdk-ruby3/3.125.5\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: 169.254.169.254\r\nContent-Length: 0\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\n"                                              
<- ""                                                                                                                                 
Conn close because of error Net::ReadTimeout                                                                                          
Conn close because of error Net::ReadTimeout                                                                                          
opening connection to 169.254.169.254:80...                                                                                           
opened
<- "PUT /latest/api/token HTTP/1.1\r\nUser-Agent: aws-sdk-ruby3/3.125.5\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: 169.254.169.254\r\nContent-Length: 0\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\n"
<- ""
Conn close because of error Net::ReadTimeout, and retry
opening connection to 169.254.169.254:80...
opened
<- "PUT /latest/api/token HTTP/1.1\r\nUser-Agent: aws-sdk-ruby3/3.125.5\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: 169.254.169.254\r\nContent-Length: 0\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\n"
<- ""
Conn close because of error Net::ReadTimeout
Conn close because of error Net::ReadTimeout
opening connection to 169.254.169.254:80...
opened
<- "GET /latest/meta-data/iam/security-credentials/ HTTP/1.1\r\nUser-Agent: aws-sdk-ruby3/3.125.5\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: 169.254.169.254\r\n\r\n"
-> "HTTP/1.0 200 OK\r\n"
-> "Accept-Ranges: bytes\r\n"
-> "Content-Length: 26\r\n"
-> "Content-Type: text/plain\r\n"
-> "Date: Thu, 10 Mar 2022 18:40:52 GMT\r\n"
-> "Last-Modified: Thu, 10 Mar 2022 18:15:13 GMT\r\n"
-> "Connection: close\r\n"
-> "Server: EC2ws\r\n"
-> "\r\n"
reading 26 bytes...
-> "bucket-example.obj-mgmt"
read 26 bytes
Conn close
opening connection to 169.254.169.254:80...
opened
<- "GET /latest/meta-data/iam/security-credentials/bucket-example.obj-mgmt HTTP/1.0\r\nUser-Agent: aws-sdk-ruby3/3.125.5\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: 169.254.169.254\r\n\r\n"
-> "HTTP/1.0 200 OK\r\n"
-> "Accept-Ranges: bytes\r\n"
-> "Content-Length: 1342\r\n"
-> "Content-Type: text/plain\r\n"
-> "Date: Thu, 10 Mar 2022 18:40:52 GMT\r\n"
-> "Last-Modified: Thu, 10 Mar 2022 18:15:13 GMT\r\n"
-> "Connection: close\r\n"
-> "Server: EC2ws\r\n"
-> "\r\n"
reading 1342 bytes...
-> "{\n  \"Code\" : \"Success\",\n  \"LastUpdated\" : \"2022-03-10T18:15:18Z\",\n  \"Type\" : \"AWS-HMAC\",\n  \"AccessKeyId\" : \"ASIAQ2SOE3UGZYX64QCY\",\n  \"SecretAccessKey\" : \"[snipped]\",\n  \"Token\" : \"IQoJb3JpZ2luX2VjEIr//////////wEaDmFwLXNvdXRoZWFzdC0xIkcwRQIhAMO9i56bSk1MXw42MCEoxDsEtjBdpYWKXCPeNy/fP3X0AiBtOfvDFeTOf2k9HK8lV5F5e4sZaak1dy4X3ZWMrjG3JSqNBAjz//////////8BEAIaDDA1NzA3MjE2NjE1NyIMugxje0LpMCvCgvwGKuED1Mt55KNZ49zK6tXYXKZ9qFKp1wMXruy7aoxspsJijpB+nPkLcWN/XOsjiXDuWI6/BiR2bpdONwqD7sAMi7yE4CZIZOeaMXk9TOk1MH32JmnnxxrZ7uowYU6nWsl9nsL2HCFz7SQdohvDQC7Fov4bICvkOnlr+dxbEIRa+8NEBfmMcTJWQDCBFCH+fYg8ZsscEd3B9d1qmpJIjkdcqdb7EuywD9J6na9vUqX5VcuKAEl6JP3aadIL2XvKBynvhqciLDDWy/dtMqgvStNplbxdi6bSR6Vy7g2dDx8KfpXn2Yjg+N7k1VuRznNhnvVct1iGEbMWGIKxtZAta3ga1/zRAu48z/AgnD1OjYFf6SezbgLKEuqHRWyvL0QrwH2vD6B7FXUsPdJi4HjxujhdkLzq2avyB4qMa1RZRwVtA/MqNGfeeB6KzUOtVprMc+1e24sHg7WKC+f71CDEVYW2XdciFYAvfXR4yYi98BTOmsT3kLwj4r9fjngQGkrvIzIOKXX8rD33DsqC4KDTxOuupOaYI4inKeuQLDVX1Bt07HwN3rZKEy30Q9zLgF3/TyUxkiBnBfr6RIOZh7GrBGG2HK0mtdRPWsaPEWwH80Q4BEEi8RoBVyq4WNVTnOefYhpl7CiyRDCxgKmRBjqlAbO4SNiINEtZ56Kx8jVz6L2Zt8eTbJ792eNHmAha+YV6DLmwMgWdGY3XN7zm83l3qj9DeP0AGWb5n+4Y6UDl1UmKQ9nRfFvmvPAuR7GGQ3nD6w+KX5ToQE2GMWfnzQqxXtXGgWuEaRthJrgJ+vbGUGoYDfP5fguKaTXRCfnWPbVxjpxmh2/pABgwn6cpX0rjC1KLATRwLx1ZNrznwwZjl++7CELaqw==\",\n  \"Expiration\" : \"2022-03-11T00:38:22Z\"\n}"
read 1342 bytes
Conn close
  0.012732   0.005216   0.017948 (  5.031752)

then

puts ::Benchmark.measure {
  client_default = Aws::S3::Client.new(region: ENV['S3_REGION'])
}

prints

  0.005731   0.000248   0.005979 (  2.012086)

then

puts client_default.config.credentials.inspect

prints

#<Aws::InstanceProfileCredentials:0x00007ff951622be8 @retries=0, @endpoint="http://169.254.169.254", @port=80, @http_open_timeout=1, @http_read_timeout=1, @http_debug_output=nil, @backoff=#<Proc:0x00007ff951622a58 /usr/local/bundle/gems/aws-sdk-core-3.125.5/lib/aws-sdk-core/instance_profile_credentials.rb:119 (lambda)>, @token_ttl=21600, @token=nil, @mutex=#<Thread::Mutex:0x00007ff951622a30>, @credentials=#<Aws::Credentials access_key_id="ASIAQ2SOE3UGZYX64QCY">, @expiration=2022-03-11 00:38:22 UTC>

This is why I like Ruby; it makes debugging way easier than its peers.

I don't know exactly if it STS; I just presumed; I did not utilize STS myself, just thought it is called internally by the client.
Using S3 from the container without providing any form of credentials is still magical to me.

I hope the output from the first one provides you with insights.

@mullermp
Copy link
Contributor

The slowness looks to be from read timeout on InstanceProfileCredentials. Are you using a proxy? Do you need to configure hop limits? https://docs.aws.amazon.com/cli/latest/reference/ec2/modify-instance-metadata-options.html

@midnight-wonderer
Copy link
Contributor Author

Just a quick update, I just read about the hop limiting mechanism today; I used IP without knowing there is such capability.
I used docker, which creates NAT for its containers, which means there is indeed extra hop to IMDS.

However, ec2 bundled as a Lightsail instance does not own by my account; it is under Amazon's. I cannot interact with it with aws ec2 but aws lightsail which has no such capability to modify metadata options.

To summarize:

  • I am behind NAT
  • I can't configure metadata options

I may be able to help with the situation if I understand a bit more.
Do you know why the SDK gets the required token eventually despite being effectively blocked by the IP hop limit?

@midnight-wonderer
Copy link
Contributor Author

midnight-wonderer commented Mar 11, 2022

I hacked it just for myself; basically, an iptables rule needs to be added on the Lightsail host.

iptables -t mangle -A PREROUTING -s 169.254.169.254 -d 172.26.0.0/16 -m ttl --ttl-lt 2 -j TTL --ttl-inc 1
# the metadata server address is always `169.254.169.254`, and Lightsail instance addresses are always inside `172.26.0.0/16`

Dangerous rules like this are only possible because the company put so much trust in me.
I wonder if the general public has this luxury.

My previous question is still stands. I might be able to come up with a solution that works for the general public.

@mullermp
Copy link
Contributor

If you're using ec2 (even as part of lightsail) you should be able to modify instance metadata from anywhere, not just from the instance you are using. You have the ec2 instance id, and you have the AWS account that created the instance, right?

@mullermp
Copy link
Contributor

mullermp commented Mar 11, 2022

Do you know why the SDK gets the required token eventually despite being effectively blocked by the IP hop limit?

Unsure. You are experiencing read timeouts and it eventually succeeds. That makes me less suspicious about hop limits. Either the instance is not available (unlikely) or the packets are lost during routing through the NAT?

Edit: It looks like it's falling back to IMDS V1, so maybe it is hop limits.

@mullermp mullermp added investigating Issue is being investigated and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 11, 2022
@midnight-wonderer
Copy link
Contributor Author

I technically am able to obtain the instance-id, but the owner of the ec2 is Amazon.
Please see the peering connection made by AWS; you'll understand it right away.

Screenshot

To add some context: the Lightsail network can peer with the default VPC. When peered we will be able to see the account id of the peered network.
From the screenshot, the owner of the Lightsail network is 057072166157, and my company account id is 7536....

The account with id 057072166157 is owned by Amazon. Whenever I have to interact with the resource from that account, I have to interact with the API provided by Amazon. IAM roles and credentials from my company won't work there unless through the Lightsail API.

From further reading, it seems that if IMDSv2 fails, the SDK will fall back to IMDSv1, which does not implement such security measures. I haven't looked into how to implement it yet or what the interface will look like. But how about we add a user-supplied hint as an option somewhere to make the SDK try IMDSv1 first?

From what I have read, IMDSv1 should be supported by Amazon indefinitely.

@mullermp
Copy link
Contributor

Ah! You're right that it's falling back to IMDSv1 and succeeding. I think the solution here is that you do need to increase the hop limits on the ec2 instance. The SDKs do not allow for turning off IMDSv2 and that was deliberate.

Even though the owner of the instance is Amazon, do you really not have access to that instance using an assumed role (provided by Lightsail) with permissions?

Otherwise, I think this is an issue for Lightsail and not for the SDK. Could you maybe try AWS support in the console?

@midnight-wonderer
Copy link
Contributor Author

midnight-wonderer commented Mar 14, 2022

I don't know how to try an assumed role yet, but I'll be surprised if that works; the whole point about Lightsail is that we don't have to deal with ec2 hassles. It is highly unlikely that AWS would let us directly mess with the underlying ec2 beneath Lightsail. But I'll try to learn more about it.

It is not like I don't want to communicate, but apart from the service limit increase, AWS will charge customers for any other kinds of support tickets, including (obvious) bug reports. (They called it "premium support".)

After the company spends a stupid load of money on bandwidth and some other products, they don't feel like spending percentage on top of it, especially to discuss issues we can solve on our end but benefit Amazon as a collective.
Thanks to you guys, GitHub is the only place I can have meaningful discussions about the issues I encountered. The other places are mostly one-sided; either we asked, no one answered, or someone answered but ignored interactions.

Good works, as always
There is nothing to be done on the SDK then; I'll put forth with the packet mangling.
Thank you for helping me navigate the issue; I would be blind without your insights.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@mullermp
Copy link
Contributor

I'm poking around in Lightsail to see what we can do. Let me see if I can make some noise internally.

@mullermp mullermp reopened this Mar 14, 2022
@mullermp
Copy link
Contributor

@midnight-wonderer I have a point of contact on the Lightsail team. They want to know if you're using your own docker container or using the lightsail containers feature (which leverages Fargate)?

@midnight-wonderer
Copy link
Contributor Author

Thank you for the follow-up.
We are running our own Docker cluster; we set up a Docker Swarm on basic Lightsail instances.

@mullermp
Copy link
Contributor

mullermp commented Mar 15, 2022

I've talked to some members of the Lightsail team and also poked around a bit. Per my understanding as a background, Lightsail EC2 instances (like all EC2 instances) have linked IAM role (AmazonLightsailInstanceRole/instance-id). You can view it by calling aws sts get-caller-identity from an EC2 instance. The credentials returned by IMDS (instance metadata service) can be fetched with curl http://169.254.169.254/latest/meta-data/iam/security-credentials/AmazonLightsailInstanceRole, and the SDK tries to fetch and use these credentials. The issue is that, the SDK cannot fetch these using IMDSv2 in a container unless you modify your hop limit. I tried convincing the Lightsail team to blanket permissions to this role for EC2's modify metadata operation but they declined. So that leaves us with a few options.

Option 1) Lightsail will modify the metadata on your behalf if you can provide a customer ID, region, instance ID, and the value of the hop limit (depends on your container routing). The downside of this option, is that it's manual and new instances will not have your hop limit. If you want to do this, I can give you my email or some way to securely provide me that information.

Option 2, preferred IMO) Migrate your containers to EC2 instances without Lightsail. You can export snapshots but you will have to wire up storage and other expected features. The Lightsail engineer said that your use case may be more complex than Lightsail wants to offer. If you are running your own containers, you may as well just run them on vanilla EC2 instances, and then you have full control over them. You can then use the AWS CLI to modify your instance metadata.

Option 3) No architecture changes, and try to fail fast to use V1 credential fetching. It does seem like you expect your credentials from IMDS on the EC2 host. You can initialize an instance of Aws::InstanceProfileCredentials with 0 retries, and a smaller http_open_timeout, and pass it to your S3 client: Aws::S3::Client.new(credentials: Aws::InstanceProfileCredentials.new(..)). The major benefit is that this is "quick and dirty" and not necessarily a solution to the problem.

@mullermp mullermp changed the title AWS STS credential request is slow Lightsail credentials fetching from IMDS is slow Mar 15, 2022
@mullermp mullermp added service-api General API label for AWS Services. and removed investigating Issue is being investigated labels Mar 15, 2022
@midnight-wonderer
Copy link
Contributor Author

Thanks for the offering on Lightsail reconfiguration; I really do appreciate it. Still, I prefer regular instances to special treatment, so people who maintain the servers after me would have an easier time figuring things out.

Some of our services are running in a regular VPC; we still need IPSec peering and stuff. However, we are in the direction of another way around; we think EC2 is overly complicated for our workload and start moving over to Lightsail. Lightsail is actually doing a better job than our self-managed VPC setup on some aspects. Weighing the cost-benefit, I don't think that is the way to go for us.

Option 3 seems to be the way to go for most people. If someone comes to this thread from Google, that is probably what you are looking for. I also want to add that you probably need to config http_read_timeout instead of http_open_timeout because the packet that has a low hop limit is the one containing the secret, which excludes HTTP header and stuff.

Otherwise, TTL mangling is relatively safe too. The reason is that the condition -s 169.254.169.254 -d 172.26.0.0/16 ensures that at least one additional node is required to form a loop, and that node will be the one that decreases the TTL value. --ttl-inc 1 is unlikely to flood the network, but you may need to double-check your setup if you want more hops.

I understand every party, including the Lightsail team who decide against more ec2 permissions.
Kudos to the Ruby SDK team for arranging things out.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-api General API label for AWS Services.
Projects
None yet
Development

No branches or pull requests

4 participants