Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-node should untaint node #2808

Open
runningman84 opened this issue Feb 23, 2024 · 11 comments
Open

aws-node should untaint node #2808

runningman84 opened this issue Feb 23, 2024 · 11 comments

Comments

@runningman84
Copy link

What would you like to be added:
In our eks bottlerocket usecase we see karpenter provisioning nodes which get the aws-node pod and some application pods upon start. Unfortunatly the aws-node pod takes several dozens of seconds to get ready. The application pod try to start in the meantime and get error because they do not get an ip address.

Should we use karpenter to taint the nodes until the aws-node is ready? Is the cni plugin able to remove a startup taint once it is ready?

Why is this needed:
Pods constantly fail during startup due to missing ip addresses because the aws-node pod is not ready.

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 23, 2024

@runningman84 the node should not be marked as "Ready" until the aws-node pod copies the CNI to /etc/cni/net.d/, which it does after it finishes initialization. So the scheduler should not schedule application pods on a "Not Ready" node (unless those pods tolerate the node being not ready for some reason)

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 23, 2024

The VPC CNI plugin is not able to modify node taints, as that would be a sizable security risk

@runningman84
Copy link
Author

okay what could be the reason for us seeing this behaviour?

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 23, 2024

okay what could be the reason for us seeing this behaviour?

Do you see the nodes as "Not Ready" during this time window? Do these application pods have their own tolerations?

@runningman84
Copy link
Author

I have double check that ...

t = point in time... can be seconds or even minute between two numbers

t0 node appears as not ready
t1 daemon set pods are scheduled to it
t2 aws-node pod is in initializing
t2 kube-proxy pod is in initializing
t3 kube-proxy is ready
t4 node becomes ready (aws-node still not in running state)
t5 additional pods are scheduled to it
t6 all additional pods stay in state container creating (warnings due to failed to assign IP address to container)
t7 aws-node pods is ready
t8 new pods get ready within seconds
t9 the additional pods from t6 start to get ready

It looks like the node gets ready to fast before waiting for the aws-node pod...

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 26, 2024

@runningman84 can you please sure the node logs during this timeline? Mainly we would need to look at the CNI and IPAMD logs in /var/log/aws-routed-eni/. You can email them to k8s-awscni-triage@amazon.com and we can take a look

@runningman84
Copy link
Author

runningman84 commented Feb 27, 2024

I just sent the logs and we also have an open case id: 170893944201879

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 27, 2024

Thanks @runningman84, let's work through the support case, as the support team will triage and then bring in the service team if needed.

@mathieuherbert
Copy link

Hi any news on this one? we have the same issue

@runningman84
Copy link
Author

The aws support case did not really solve that, we got the suggestion that we should try to use prefix delegation mode or things like that to speedup the ip allocation. The general question is should a node be unready until aws-node is up and running?
I could image that that behaviour would also have downsides because images would be pulled before the node is ready. The current situation is that pod might not start due to the ip address thing but at least all images are already pulled before they eventually start fine...

@tooptoop4
Copy link

tooptoop4 commented May 18, 2024

i am facing the same issue on eks 1.28

kubelet shows ready status on node
pods start being scheduled onto that node
few seconds later node goes into networknotready state
pods above get stuck forever
few seconds later kubelet switches back to ready state and new pods work, but the earlier ones don't

seems i will have to make my own workaround by monitoring new nodes myself and assigning a label aged=y after they have been there for a minute

then make all my pods have a nodeaffinity looking for that label

ideally aws pods would add label to the node themself

any ideas @jdn5126 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants