New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apiserver stopped working (no changes made) #2667
Comments
@osirisguitar thanks for reporting. Kubelite wraps most if not all kube control plane components, thats why the log you see on the apiserver says it is not starting. Looking at the logs, it looks like dqlite isn't starting or it is unable to find the leader. |
Could you share the output of It is strange both clusters stopped working at the same time. Can you think of anything that might have changed? |
That noone made any changes to the machines are what stresses me out the most... The only thing I know is that they have been automatically migrated between hyper-v hosts in a cluster, but that's supposed to be completely undetectable for the VMs. The only thing I've found is that a VM could get a new MAC address in a migration (I don't know if that has happened with these machines...) And: i really appreciate the help! From the single.node cluster(same as the inspect is from above)
Empty
From the multi-node cluster
|
Was this node shut down for a while and then started again? Can you provide the contents of |
Yes, node 1 in the multi-node cluster has been turned off for a while. I had actually forgotten about that. I'll fix cluster.yaml on the single node and check the files on the other nodes on the multicluster ASAP |
Single-node cluster seems to be back up, thank you so much for the help! microk8s.status is now mostly saying microk8s is running (not every time), the api is responding on port 16443 and the pods seem to be starting. My big question though is how and why this happened... And how can I prevent it from happening again? Multi-node clusterNode 2: cluster.yaml
localnode.yaml:
Node 3: Doesn't have a /var/snap/microk8s/current directory, says microk8s isn't installed. What the h.. happened here? It does have directories 2407, 2487 and common in /var/snap/microk8s... This could probably explain why there's no leader for dqlite... |
So, microk8s is disabled on node 3 from a failed auto-refresh... Could auto refreshes be what broke my clusters? I had no idea that was even enabled.
|
Tried to force abort the stuck snap job and rebooted the machine - multi-node cluster is also back up now! Super happy with having everything running again, but worried about stability. One cluster just self-died by emptying cluster.yaml, the other by getting itself stuck in snap auto-refresh... |
So, any ideas why this happened in the single-node cluster? Why did it just lose the contents of cluster.yaml? |
The fix on the dqlite to have a more robust write to |
My final question, is there an auto-refresh always active for the microk8s snap? I read somewhere there is and it can't be turned off... |
I remember if you setup a snap proxy then you can have more control over when the updates can happen. |
I have two clusters that have both stopped working in the same way. kubectl can't connect to them and the apiserver is not running. One of the clusters is a single-node (inspection report comes from that one), the other has three nodes. No changes have been made to the machines where they are running.
I tried upgrading the single-node cluster from v1.21 to 1.22 without any changes. It's not the problem from #2486, both info.yaml and cluster.yaml have the expected contents...
I don't know why apiserver isn't included in the inspect... This is what is says (after upgrading to 1.22, that's why it's 2585 here instead of 2546 in the inspect tarball that was created before the upgrade).
inspection-report-20211018_080210.tar.gz
The text was updated successfully, but these errors were encountered: