New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico deletes routes when using CNI v1.7.0 #1159
Comments
Thank you for reporting the issue @kgtw. Is it possible to run CNI log collection tool and send us the files? I'm wondering if there was a delete request from kubelet for this pod or whether two pods (running and terminating?) got same IP and deleting one of them caused the IP to be cleaned up. Logs from ipamd, plugin will help us to dive deep here. |
Hi @SaranBalaji90, I've re-applied the manifests and manually deployed a test pod Unfortunately we cannot provide the entire debug information from that script due to company policies, but here are the relevant logs for the timestamp period when the debug pod was created. plugin.log
ipamd.log
kubelet
Docker Version
You can see from the kubelet log that the daemonsets which had pods scheduled on this new node are also in a failing state. Let me know if you need anything further. |
Hi @kgtw, we set up some new Ubuntu clusters and tested with v1.7.0 right now. Using the same config as you pasted above, pods came up fine so we could not reproduce the issue. The ipamd logs above also do not show any errors, and If you run It would help us a lot to debug this if you could provide us with the complete logs from the If you apply the v1.6.3 config to the node, does it recover? |
Hi @mogren, thanks for the speedy reply :) Re-apply the 1.6.3 manifests does appear to recover the cluster, although I did terminate most of the nodes to speed it up. I have attached the debug information in a support ticket, ID Logs were taken from a newly created node, from which once it registered as ready within k8s I manually deployed a debug pod named This pod has the ip address
Hope this helps. |
@kgtw I'm guessing you upgraded by applying the yaml file right? |
@kgtw Thanks for providing the logs. It seems like the routes get added correctly, but later they get removed by Calico. We are still investigating why this happens. |
@SaranBalaji90 - Yes, although we have copied them so we can add additional labels/tolerations etc. I have double checked what we have and all looks correct. @mogren Yes, I am seeing this as well now. Adding the debug pod
ipamd.log
plugin.log
calico-node
We can see that calico-node is removing the route as it considers it unexpected for some reason, along with all other pods on the node. I did notice that we are running a newer version of calico One thing to note while we do have calico running in the cluster, we don't have any network policies applied for this cluster. To summarise:
Reverting the cni to Update: I just noticed that calico wasn't printing meaningful debug information, with debug on I noticed these two lines from felix which stood out for a new test pod. It might be nothing though...
Afterwards we start seeing the log entries about unexpected routes.
|
@kgtw Thanks for reporting and debugging this issue. Can we update the title of this ticket to something like |
Thanks for the speedy fix @jayanthvn - I'll let you know if we still have issues with this once a new release has been made. |
Yes please. Thanks :) |
PR has all the details, but in short, Calico assumes that the ENI name is generated by using this exact formula:
|
@mogren - Yes, this did fix the issue. Thanks! |
@kgtw - Thanks for verifying this :) |
Recently applied 1.7.0 manifests in one of our clusters and we are seeing new pods are having interfaces created with no ip address associated with them on the host.
Logs from aws cni indicate it has created the new interface
eni7c64954e155
with ip address172.30.164.149/32
, however when inspecting the routing table and interface we can see that it has not been setup correctly.AWS CNI Config:
Kubernetes: 1.17.9
AWS CNI: 1.7.0
The text was updated successfully, but these errors were encountered: