-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Microk8s crashes when joining a node using ha-cluster #1807
Comments
It will be helpful if you can attach the inpect tarball. |
Yes same problem here. Sometimes it works, other times it doesn't. No clue as to why, but I do find that to get out of it, I have to reinstall microk8s. I wish there was a better workaround. |
@bbarclay just trying to get more info. |
My issue is because of the HA-add on. As soon as I've disabled it, it worked (but I would love to use it). |
@balchua I've been wrong. It's indeed an issue with the Windows distribution. These error messages occurred after a fresh installation (multipass & microk8s removed) when joining the Linux-master (Ubuntu). Therefore, I'm pretty sure this is reproducible. Machine: Windows 10 Enterprise
I've attached the current error log below: Another configuration (channel 19) both nodes with ha-cluster add on disabled: |
@wsdt in a non ha cluster, all those "FAIL" services are not suppose to run, hence reported as FAIL. Since there is only one control plane. |
@balchua Exactly the tarball belongs to the joining node. |
For a non ha cluster, you can only enable addons from the node with control plane. The dns issue u mentioned above can be a different issue. |
I know, I did enable the dns addon on the master node. Might be, but the ha-cluster didn't work too (as the initial error messages above indicate). |
Thanks @wsdt for the clarifications. Is it ok yo provide the inspect tarball of the main node? The one where you are joining to? |
@balchua |
@wsdt ok the worker node is successfully joined to your main node in this particular setup. Whats failing at this moment is the dns. You may have to do this on all nodes. |
@balchua Thank you, I executed the commands proposed on both nodes & even disabled the network firewall on the windows worker node completely. The nodes can definitely talk to each other. Below the tarball of the master node, after trying to enable dns again (it is listed as enabled, but the worker node cannot be reached):
PS: Pod logs cannot be retrieved either (timed out -> as worker node not reachable). |
@wsdt can u also get the inspect tarball of the windows node? I think you need to use multipass command. Sorry i don't know about it much, as i dont use windows a lot. |
@balchua |
So far i dont see anything wrong with the cluster. As i see it, only the coredns is deployed. |
Interesting, as logs cannot be retrieved (as docker container in pod is failing -> container works 100% and I saw last time, that something was wrong with the internet connection within the container). All that seems very buggy to me.. Hmm.. thank you anyway. |
Definitely need some extra pair of eyes and brains. Maybe @ktsakalozos will be able to help. 😊 Btw i dont see your pod deployed in the cluster. |
@balchua Yes, I've removed it afterwards. If necessary, I'll start the pod & attach the tarballs. Thank you! :-) |
Ended up using Kubernetes natively and now everything seems fine. |
Hey I am late to the party and most likely wrong but I also had microk8s crash silently on me on nodes when trying to join those nodes into a cluster. The one thing that resolved it for me (I assume) was renaming the hosts. When being a dummy like me and just getting some Raspberry Pi's and put Ubuntu server on them these are by default simply called "ubuntu". When I first tried to create the cluster and they all had the same name things just crashed silently for me (the nodes I tried to join into the cluster did not report any error but joining did nothing and after trying to join microk8s was not running anymore and could not be started either if you had not left the cluster and/or reset microk8s on that node). After renaming all four of them to be unique and trying again things worked fine. As you are mixing Windows and Ubuntu machines I guess your issue was a different one but as I have not seen the host-name issue mentioned anywhere yet this comment may be able to serve some confused Pi users. Cheers! |
i setup 2 microk8s nodes today with ubuntu linux 20.04 lts (virtual machines on proxmox) and when joining the 2nd node to the first one with "microk8s join..." i had repeated crashes with the first node |
@devZer0 can you upload the inspect tarball? Thanks |
ok. i reinstalled everything and re-joined node. for my curiousity, joining worked this time. but something seems to have crashed, though, as dashboard-proxy disconnected while joining with this message: E0318 00:12:54.343951 842232 portforward.go:385] error copying from local connection to remote stream: read tcp4 172.16.31.207:10443->172.22.3.6:61181: read: connection reset by peer there is lots of errors like "Exec process "566247c54dfcee017b3c4e4605c6b5b1c48f47deadd16b65b43ec152d0a8281d" exits with exit code 0 and error < nil >" in the syslog i have attached the inspect tarball: inspection-report-20210318_002035.tar.gz i'm also curious - the system has 4 vcpu and there is loadavg of >1 (one day later >2), so cpu of the vm is being constantly hogged. syslog growing very large, already at >60mb on second node, loadavg at about 0.5 and also lots of messages in syslog |
I'm not sure if the problem occurs because my master node is an Ubuntu machine and the worker is Windows 10 Enterprise (WSL enabled), but I thought this might be of interest.
Version: 1.19/stable
Steps to reproduce:
No error message is output.
Output of microk8s status before joining:
Output of microk8s inspect before joining:
Output of join (finishes without further output):
Output of microk8s status after joining:
Output of microk8s inspect after joining:
And as you can image, the node is not added on the master node.
I reinstalled microk8s & removed the VM. Then everything seems to be fine again, and after trying to join microk8s crashes again.
FAIL: Service snap.microk8s.daemon-apiserver is not running
Approximately 15 minutes later, microk8s seemed to be up running again (but the api-server was still down). After trying again to join the cluster, I've received a python stacktrace. Maybe just because the api-server was down, but I thought I append this here just in case.
NOTE: Resolved in the meantime by disabling the add-on ha-cluster on both nodes. Would be great if this issue could be fixed soon!
The text was updated successfully, but these errors were encountered: