-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System reboot pulls new cilium/cilium:latest image, resulting in crashloopbackoff with wireguard interface #12063
Comments
In general
Additionally for reboot, the same issue can happen then you add new nodes to Kubernetes cluster meaning that you have different versions of images for same service. I only wonder if you used helm-chart to install Cilium, you already locked to release |
@Combustible you should really avoid using Re wireguard: Do you use the wireguard tunnel for communication among k8s nodes? |
This is what I'm wondering too :) - I am almost certain I never specified latest myself. I know the recommendation is not to do this, for exactly the breakage reasons I experienced here. I assumed that the helm chart would set me to a reasonable docker image (i.e. using 1.7.0 would point to v1.7). If the image pull policy was set to IfNotPresent, there would never have been any updating, even within the same major version - it'd have just re-used the locally present image. I'd have noticed the problem with the 1.7 docker image/wireguard when I went to add a new node, but that would have been much less disruptive.
Yes, I use wireguard for all of my inter-node communication between k8s nodes. My setup is somewhat simple - I have two nodes - a master and an auxiliary, both baremetal servers hosted on the public internet, with a public link between them. The auxiliary has no externally accessible ports - all of the IPTables rules allow only connections from the wireguard interface or 10.x.x.x that kubernetes tunnels over it. All external traffic goes to the master node. This took some fiddling to get set up but seems to work very nicely, and performance is adequate for my needs. I have a lot of NFS mounts between the two servers, which is why I'm using wireguard to protect everything instead of cilium's built-in encryption scheme. I'm sure this is entirely overkill, but I appreciate having my internal traffic between nodes be integrity protected / encrypted. I'm running a ton of stuff, the vast majority on the master node, and primarily using the auxiliary node for overflow when the master server doesn't have enough RAM. I use 172.30.200.x for my wireguard subnet, and 10.x.x.x for the kubernetes CIDR. |
Closing in favor of #12317. |
Bug report
Summary
One of my kubernetes nodes rebooted unexpectedly - which has in the past been fine/recoverable. Today though, it pulled a new cilium image (cilium/cilium:latest) because the ImagePullPolicy is Always, instead of IfNotPresent. This new image resulted in a crash loop for cilium, breaking my cluster until I figured out the image changed and I tagged the prior image and edited the daemonset to force it to be used.
There are three problems here:
General Information
Log from cilium pod as it tries to start up and fails:
https://gist.github.com/Combustible/2868225d86b3080d460478745c718b49
Relevant snippet:
Actual output from
ip link show wg0
:The text was updated successfully, but these errors were encountered: