error occurred when starting amazon-ssm-agent: Failed to fetch region. Data from vault is empty #48

ratidzidziguri · 2017-04-28T20:27:13Z

Recently i initiated new EC2 machine which had some issues starting SSM agent it is windows server 2016 machine. and whenever i try to start iSSM service it fails i looked inside logs and what I see there is. the following errors.

2017-04-28 20:24:14 ERROR [Execute @ agent_windows.go.169] Failed to start agent. Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:24:51 ERROR [NewCoreManager @ coremanager.go.63] error fetching the region, Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:24:51 ERROR [start @ agent.go.61] error occured when starting core manager: Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:24:51 ERROR [Execute @ agent_windows.go.169] Failed to start agent. Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:25:28 ERROR [NewCoreManager @ coremanager.go.63] error fetching the region, Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:25:28 ERROR [start @ agent.go.61] error occured when starting core manager: Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:25:28 ERROR [Execute @ agent_windows.go.169] Failed to start agent. Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:26:05 ERROR [NewCoreManager @ coremanager.go.63] error fetching the region, Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:26:05 ERROR [start @ agent.go.61] error occured when starting core manager: Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2017-04-28 20:26:05 ERROR [Execute @ agent_windows.go.169] Failed to start agent. Failed to fetch region. Data from vault is empty. Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

instance information is not displayed on desktop as well.

I Tried to reinstall agent but got the same error. so there might be something wrong with it.

The text was updated successfully, but these errors were encountered:

liath · 2017-05-04T21:29:53Z

I'm struggling with this too. The same AMI in a different VPC doesn't seem to be effected. I haven't the fuzziest idea why as everything else in both VPCs is fine.

ratidzidziguri · 2017-05-05T00:46:54Z

What i found initialy was that the Image which i created was on one VPC but based on that image istarted new machine in different VPC.

liath · 2017-05-05T02:18:08Z

That's basically what happened with me. We built an image in our dev VPC and promoted it to our staging VPC where it's route table no longer made sense. (dev is 10.4.x.x and staging is 10.40.x.x). Changing the route to the right subnet fixed everything. I feel like this used to work on the pre-Server2016 base AMIs. Perhaps this is part of the new EC2Launch configs?

…

On May 4, 2017 5:46 PM, "Rati Dzidziguri" ***@***.***> wrote: What i found initialy was that the Image which i created was on one VPC but based on that image istarted new machine in different VPC. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#48 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABfi-3t3Fubgjh8SPjhU-c4KgOgS2aIXks5r2nF_gaJpZM4NL73L> .

nelsestu · 2017-05-10T05:35:24Z

No question about it, this is a bug with SSMAgent on windows. I have a system that relies on user data for configuration details and the application is unable to start without this config. When i create an AMI of an instance in say AZ us-west-2a, and attempt to use it in an autoscale group where us-west-2b and us-west-2c are also possible AZs, only instances that scale out in us-west-2a will work. AZ b and c will fail with the errors that started this thread. Splunking through the ec2launch logs, i can see it binding 169.254.169.254 to the subnet associated with AZ a. The binding shows success, and since it doesn't exist, it won't ever be reachable.

mmendonca3 · 2017-05-30T18:20:59Z

Thank you for posting here. Our team is investigating this issue and will provide you with a fix or ETA.

liath · 2017-05-30T18:52:55Z

Just as a follow up to my resolution above, we now run InitializeInstance.ps1 -Schedule before building images which resolves the network issues preventing us from talking to 169.254.169.254.

Another part of the puzzle for us was that because we use autologon to get a user session on these instances, InitializeInstance.ps1 fails to at line 125 because Restart-Computer needs the -Force flag in order to work while there is a user session. Without the reboot, the instance will be identical to imaged instance and have the same route table which was causing the above network issue. Adding the -Force flag to InitializeInstance.ps1 fixes the rest of our problems.

yogeshdengle · 2017-07-11T12:41:26Z

That's basically what happened with me. We built an image in our dev VPC
and promoted it to our staging VPC where it's route table no longer made
sense. (dev is 10.4.x.x and staging is 10.40.x.x). Changing the route to
the right subnet fixed everything.

I feel like this used to work on the pre-Server2016 base AMIs. Perhaps this
is part of the new EC2Launch configs?

@liath Can you elaborate what kind of route table changes you needed to do.
TIA

liath · 2017-07-14T22:23:58Z

@yogeshdengle Copying the routing table from a working instance in the same VPC should fix things but like I said in my last comment, running InitializeInstance.ps1 -Schedule before imaging an instance that will be moved between VPCs resolves the route table issues.

lasitha-petthawadu · 2017-11-27T12:17:43Z

I got the same issue and was able to solve it by checking the routing tables within windows using
netstat -rn

And I noticed that the Gateway address was incorrect within persistent routes and was not matching the subnet that was used.

So I updated the persistent route entries by executing the command.

route -p add 169.254.169.254 mask 255.255.255.255 <correct gateway IP> metric 25 if 2

Finally with the route table change I was able to access 169.254.169.254 via a web browser

NJITman · 2017-12-05T20:10:48Z

UPDATE - After doing some tests and viewing the logs on the instance, we can see that InitializeInstance.ps1 does run on boot, regardless of if it is from Amazon AMI or one that you created.

InitializeInstance.ps1 does all of the work in Correct-Routes.ps1, so there is no need to duplicate.

What is strange is that you can see the routes being updated to the wrong gateway in the log, so when you access the instance for the first time (via RDP) and run netstat -rn, they point to the gateway for the subnet that the AMI was baked from (the source instance). Both InitializeInstance.ps1 (which ran first) and Correct-Routes.ps1 (which ran second) added the wrong routes to the routing table on the instance.

Just some history:

We created an AMI with all of the settings and applications needed for our web application. It was created in az Delete kaos folder #1 in subnet Delete kaos folder #1.
We created a launch configuration using that AMI and launched 2 instanced from an auto-scaling group in az Add go report to README.md #2 (subnet Add go report to README.md #2) and az Merge dev to rc #3 (subnet Merge dev to rc #3).
Our app gets the instance data from the AWS SDK and displays the unique host part of the IP (last 8 digits in decimal form) in the footer so that we can see that the app is truly running in multiple AZs via the ELB.
The source instance would show the IP.
The launched instances from the AMI would not (SDK was returning null).
Upon investigation, we found that SSM was hung and that the routes for the 3 default IPs were pointing to the gateway for subnet Delete kaos folder #1 (source), instead of subnet Add go report to README.md #2 or Merge dev to rc #3.
Once the routes were updated to the proper gateway, SSM would restart and run. SDK would then properly show the IP address of the instance.

For example, the subnet of the source instance (AMI) is 10.0.20.0, but the subnet of the launched instance is 10.0.12.0. Here are the log entries:
2017/12/05 03:46:23Z: Successfully added the Route: 169.254.169.254/32, gateway: 10.0.20.1, NIC index: 3, Metric: 25
2017/12/05 03:46:23Z: Successfully added the Route: 169.254.169.250/32, gateway: 10.0.20.1, NIC index: 3, Metric: 25
2017/12/05 03:46:23Z: Successfully added the Route: 169.254.169.251/32, gateway: 10.0.20.1, NIC index: 3, Metric: 25

And here are the log entries after connecting to the instance via RDP and manually running Correct-Routes.ps1:
2017/12/06 00:05:53Z: Successfully added the Route: 169.254.169.254/32, gateway: 10.0.12.1, NIC index: 3, Metric: 25
2017/12/06 00:05:53Z: Successfully added the Route: 169.254.169.250/32, gateway: 10.0.12.1, NIC index: 3, Metric: 25
2017/12/06 00:05:53Z: Successfully added the Route: 169.254.169.251/32, gateway: 10.0.12.1, NIC index: 3, Metric: 25

Will continue testing and figuring this out and report back.

mmendonca3 · 2018-02-14T00:56:18Z

The behavior described by NJITman is expected behavior on current EC2Launch.

EC2Lauch is executed at first launch and will not be executed on further instance start unless explicitly schedule it. This means explicit routes to meta-data and KMS servers are not updated between instance start/stop.

If you create an image from an instance without re-schedule EC2Launch, EC2Launch will not be executed on launched instances from the image. It means launched instances may not have correct route to meta-data or KMS servers.

To prevent this, you should sysprep the instance or re-schedule EC2Launch to execute it at next launch on the instance by executing '.\InitializeInstance.ps1 -Schedule' before creating an image.

See our public document for more information about EC2Launch:
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2launch.html#ec2launch-inittasks

mmendonca3 · 2018-02-14T00:57:50Z

Please reopen this issue if you have any further questions.

mmendonca3 closed this as completed Feb 14, 2018

JamieGruener mentioned this issue Sep 21, 2018

CreateControlChannel failed with error "Unauthorized Requeset" #121

Closed

Zazcallabah mentioned this issue Jan 29, 2021

ssm agent fails to start on windows t3 ec2 instances #348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error occurred when starting amazon-ssm-agent: Failed to fetch region. Data from vault is empty #48

error occurred when starting amazon-ssm-agent: Failed to fetch region. Data from vault is empty #48

ratidzidziguri commented Apr 28, 2017 •

edited

Loading

liath commented May 4, 2017 •

edited

Loading

ratidzidziguri commented May 5, 2017

liath commented May 5, 2017 via email

nelsestu commented May 10, 2017

mmendonca3 commented May 30, 2017

liath commented May 30, 2017

yogeshdengle commented Jul 11, 2017 •

edited

Loading

liath commented Jul 14, 2017

lasitha-petthawadu commented Nov 27, 2017

NJITman commented Dec 5, 2017 •

edited

Loading

mmendonca3 commented Feb 14, 2018

mmendonca3 commented Feb 14, 2018

error occurred when starting amazon-ssm-agent: Failed to fetch region. Data from vault is empty #48

error occurred when starting amazon-ssm-agent: Failed to fetch region. Data from vault is empty #48

Comments

ratidzidziguri commented Apr 28, 2017 • edited Loading

liath commented May 4, 2017 • edited Loading

ratidzidziguri commented May 5, 2017

liath commented May 5, 2017 via email

nelsestu commented May 10, 2017

mmendonca3 commented May 30, 2017

liath commented May 30, 2017

yogeshdengle commented Jul 11, 2017 • edited Loading

liath commented Jul 14, 2017

lasitha-petthawadu commented Nov 27, 2017

NJITman commented Dec 5, 2017 • edited Loading

mmendonca3 commented Feb 14, 2018

mmendonca3 commented Feb 14, 2018

ratidzidziguri commented Apr 28, 2017 •

edited

Loading

liath commented May 4, 2017 •

edited

Loading

yogeshdengle commented Jul 11, 2017 •

edited

Loading

NJITman commented Dec 5, 2017 •

edited

Loading