New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow startup of AL2023 AMI #1751
Comments
@cartermckinnon suggested it might have something to do with EBS lazy block fetching resulting in slow startup for the executables we need: #1696 (comment) |
One way we can speed things up is to make sure that kubelet and containerd are loaded earlier in the chain.
|
We can test the EBS hypothesis by enabling fast-restore on the snapshot: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-fast-snapshot-restore.html This is pretty expensive so I understand if you don't want to give it a try; but if you can update the PR description with your launch steps we can have consistency across our experiments @stijndehaes |
@cartermckinnon I am willing to give it a try. |
Here is the new result, with fast restore on the snapshot enabled.
|
@stijndehaes the output for from the initial log, it seems like amazon-eks-ami/nodeadm/internal/containerd/sandbox.go Lines 20 to 32 in f55411c
|
After trying the AL2023 AMI I noticed startup is way slower compared to the AL2 image.
After some help from @ndbaker1 I made the following analysis using
systemd-analyze
.I added a plot using
systemd-analyze
, the nodeadm component appears to be in the hoth path.With
nodeadm-config.service
taking 20s before it allowscloud-init.service
(7.2s) to start, and thennodeadm-run.service
takes 10.53s, I think it is waiting for containerd-service (8.7s) to start though.Looking at the logs of nodeadm-config, it appears to take a long time to start before the first log(5s), but also takes 13s between starting to configure kubelet and getting the kubelet version.
Looking at the logs of nodeadm-run, most time apears to be spend looking up the sandbox image. I noticed in code this is done using
containerd config dump
and a regex, for some reason this takes a long time.Not sure what can be done to optimize these steps, but I am willing to help :) I can write golang code and make custom AMI's with custom
nodeadm
if needed to test things out.nodeadm-config logs, via
journalctl -u nodeadm-config
nodeadm-run logs, via
journalctl -u nodeadmrun
:Originally posted by @stijndehaes in #1696 (comment)
The text was updated successfully, but these errors were encountered: