Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debian - after ungraceful restart kube-apiserver and kubelite files are empty in /var/snap/microk8s/current/args/ #4089

Open
naphtalidavies opened this issue Jul 18, 2023 · 12 comments

Comments

@naphtalidavies
Copy link

Summary

I have a one node cluster running on Debian 11 with microk8s 1.25
After an ungraceful restart the cluster wasn't working, any command to microk8s return a connection was refused error.
In the syslog I found the following error:
debian microk8s.daemon-kubelite[2576]: Error: [--etcd-servers must be specified, service-account-issuer is a required flag, --service-account-signing-key-file and --service-account-issuer are required flags]

When looking for those settings, I saw they should be under /var/snap/microk8s/current/args/. Comparing to a clean installation, I saw that kube-apiserver and kubelite files were empty.
After replacing them with the files from the clean install and restarting microk8s, the system was up and running again.
What could have caused these files to be replaced/emptied out and how can I prevent such a situation again?

What Should Happen Instead?

After an ungraceful restart the system should be running

Reproduction Steps

Ungraceful restart

@berkayoz
Copy link
Member

Hey @naphtalidavies, thank you for reaching out.

We have not come across a similar scenario, were there any operations such as a snap refresh or some addon being enabled at the time of ungraceful restart? It might be that in this scenario the write/copy operations for these files could've been interrupted, although this is just a guess.

If there is a consistent way of reproducing, we can issue a bug fix for it. Other than that the usual warnings for updates/write operations

Many thanks!

@naphtalidavies
Copy link
Author

Hi,
In the logs I can see now the following lines
Jul 14 12:51:48 debian microk8s.daemon-apiserver-kicker[727]: CSR change detected. Restarting the cluster-agent
Jul 14 12:51:48 debian microk8s.daemon-apiserver-kicker[1424]: error: error running snapctl: snap "microk8s" has "service-control" change in progress
Jul 14 12:51:48 debian systemd[1]: snap.microk8s.daemon-apiserver-kicker.service: Main process exited, code=exited, status=1/FAILURE
Jul 14 12:51:48 debian systemd[1]: snap.microk8s.daemon-apiserver-kicker.service: Failed with result 'exit-code'.

This is about the time of the error.
Could some please explain this error?
In addition, we have some problems in this environment with outer network, there are lots of error message in the log ntp sync. I've also got an error "debian snapd[686]: devicemgr.go:2300: no NTP sync after 10m0s, trying auto-refresh anyway" although I switched off the refresh

@ktsakalozos
Copy link
Member

Hi @naphtalidavies

For the snap refresh issue I opened a forum topic in https://forum.snapcraft.io/t/no-ntp-sync-trying-auto-refresh-anyway/36093. The snappy people will get back to us. It is worth mentioning how you disabled the refreshes. What exactly commands did you use.

For the empty files, I would like to know what was the reason for the ungraceful restarts? Is it possible the node run out of disk?

On the error: error running snapctl: snap "microk8s" has "service-control" error, the microk8s.daemon-apiserver-kicker service runs a reconciliation loop. In that loop it detected that there was an IP/network change and it had to reconfigure the K8s services but it failed.

@naphtalidavies
Copy link
Author

Hi,
Thanks for your reply
For snap refresh, I posted on their forum as well - we do sudo snap refresh --hold
Emtpy files - there was no disk issue the was a full power shutdown
On the error - there was an IP change but some time before the power cut, can't recall how long before

@sachinkumarsingh092
Copy link
Contributor

Hi @naphtalidavies, could you share the full logs for the apiserver-kicker via microk8s inspect? Particularly we're interested in knowing whether systemd restarted the apiserver-kicker service or not because as seen from your current logs, it seems not to, but I tried on the latest 1.25 build and the service restarted after exiting.

@naphtalidavies
Copy link
Author

Hi,
We do not have the environment or the logs any more
Closing the issue
Thanks for the help

@shoshi-revivo
Copy link

shoshi-revivo commented Jan 22, 2024

Hi, @sachinkumarsingh092 @berkayoz
I have a similar problem too,

I deployed an OVF file with microk8s running on a VM
Immediately after the deployment and power-on the host, I power off the VM
After that, I power on the VM again and then microk8s does not run,

From a check I made, the three files were emptied:
kube-apiserver, kubelite files are empty in /var/snap/microk8s/current/args/

It happens consistently every time the machine is powered off as soon as it is powered on- (powered on with microk8s running)

inspection-report-20240125_105802.tar.gz

@sachinkumarsingh092 - From the inspection, we can see that the apiserver-kicker restarted as you expected

Who is the process responsible for these files? who is writing/overriding to them?
Thank you

@shoshi-revivo
Copy link

Hi,
Is there any update related to it?

@john-terrell
Copy link

I just had this happen on two different nodes on two consecutive days. As reported, both the kube-apiserver and kubelite files were empty.

@john-terrell
Copy link

I just had this happen on two different nodes on two consecutive days. As reported, both the kube-apiserver and kubelite files were empty.

The only way I could restore the nodes was to uninstall/reinstall microk8s.

@shoshi-revivo
Copy link

Hi, when I copy and paste those files manually and stop-start microk8s - it works.
but it happens again on an ungraceful shutdown.

@jackywu
Copy link

jackywu commented Jul 31, 2024

I'm curious, why are these configuration files set to empty? If these files are only read when microk8s starts, why are there other processes opening these files? Maybe this is the reason why these configuration files are emptyed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants