-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelite restart loop on pristine install (after remove --purge) #4342
Comments
It is hard to say what may be the issue from the attached log. Could you share a Have you tried the suggestion in https://stackoverflow.com/questions/44133503/kubelet-error-failed-to-start-containermanager-failed-to-initialise-top-level-q ? |
Thank you for looking into it. The reportI don't have the inspection report anymore. Stackoverflow referencesNo, I haven't tried that one. I forgot to mention that MicroK8s was normal the first time I installed it on the server. There is something triggering the behavior afterward. Once triggered, the behavior persisted. My decisionAfter the ordeal, I deem I am not ready for full-blown MicroK8s. I can't do anything if the same situation happens in an inused production cluster. I decided to run my cluster without control plane HA, just a single manager node with multiple workers. My situationWe are a small software company, and maintaining a K8s cluster is already a bit too big for us. However, we have no choice since we grew out of Docker Stack in a Docker Swarm cluster. The single-node MicroK8s cluster is as big as we can chew right now and is much more stable than HA ones. The threadI'll leave it up to you if the issue should be left open or, better, closed. Last infoThe following info is extracted from my memory only: (I won't have any more input)
SpeculationMy cluster is different than others in the way that I set up point-to-point WireGuard mesh and run MicroK8s through WireGuard. The instability might triggered by systemd service ordering. Maybe, wg-quick interfaces show up later than MicroK8s initializing? Also, FWIW, my WireGuard mesh is IPv4 in IPv6. P.S. I still run MicroK8s cluster and still continue to be a fan. |
FWIW, I get a loop like that as well; it continues for some time (multiple min, perhaps as much as 10!) and then things just start up fine. Taking a super quick look at the logs: Note: root@node03:~# journalctl --boot=0 --unit=snap.microk8s.daemon-kubelite.service | grep "Failed with result 'exit-code'." | wc -l
125 Just before the failure, I get this:
|
Hi @kquinsland Can you check if the
|
Hi, @neoaggelos. Good timing on your reply! We've had some stormy weather here and I've just had my power cut so I am in a good position to start the cluster up from a cold boot. karl@node03:~$ sudo lsmod | grep br_
br_netfilter 32768 0
bridge 307200 1 br_netfilter
karl@node03:~$ sudo lsmod | grep overlay
<not loaded> My boot loop is not slightly different now:
I let things "sit" for a few min as I had to dash off to deal with some other matter and a few min later, the cluster had come up on it's own. Take a look at the two root@node03:/proc/sys/net/netfilter# ls -lah
total 0
dr-xr-xr-x 1 root root 0 Mar 5 08:28 .
dr-xr-xr-x 1 root root 0 Mar 5 08:28 ..
dr-xr-xr-x 1 root root 0 Mar 5 08:45 nf_log
-rw-r--r-- 1 root root 0 Mar 5 08:45 nf_log_all_netns
root@node03:/proc/sys/net/netfilter# ls -lah
total 0
dr-xr-xr-x 1 root root 0 Mar 5 08:28 .
dr-xr-xr-x 1 root root 0 Mar 5 08:28 ..
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_acct
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_buckets
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_checksum
-r--r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_count
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_loose
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_timeout_closereq
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_timeout_closing
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_timeout_open
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_timeout_partopen
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_timeout_request
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_timeout_respond
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_dccp_timeout_timewait
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_events
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_expect_max
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_frag6_high_thresh
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_frag6_low_thresh
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_frag6_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_generic_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_gre_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_gre_timeout_stream
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_helper
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_icmp_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_icmpv6_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_log_invalid
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_max
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_closed
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_cookie_echoed
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_cookie_wait
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_established
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_heartbeat_sent
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_shutdown_ack_sent
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_shutdown_recd
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_sctp_timeout_shutdown_sent
-rw-r--r-- 1 root root 0 Mar 5 08:45 nf_conntrack_tcp_be_liberal
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_ignore_invalid_rst
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_loose
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_max_retrans
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_close
-rw-r--r-- 1 root root 0 Mar 5 08:45 nf_conntrack_tcp_timeout_close_wait
-rw-r--r-- 1 root root 0 Mar 5 08:45 nf_conntrack_tcp_timeout_established
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_fin_wait
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_last_ack
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_max_retrans
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_syn_recv
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_syn_sent
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_time_wait
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_tcp_timeout_unacknowledged
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_timestamp
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_udp_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_conntrack_udp_timeout_stream
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_flowtable_tcp_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_flowtable_udp_timeout
-rw-r--r-- 1 root root 0 Mar 5 08:50 nf_hooks_lwtunnel
dr-xr-xr-x 1 root root 0 Mar 5 08:45 nf_log
-rw-r--r-- 1 root root 0 Mar 5 08:45 nf_log_all_netns This smells like some order of operations / dependency issue where |
I also ran into this issue (and have before but didn't dig into it quickly the first time, and it resolved itself overnight then).
|
Summary
kublite stuck in a loop of restarts
Reproduction Steps
snap remove --purge microk8s
reboot now
snap install microk8s --classic --channel=1.29
per the document https://microk8s.io/docs/change-cidr#configure-calico-ip-autodetection-method-4
by editing & applying
/var/snap/microk8s/current/args/cni-network/cni.yaml
from
first-found
tocan-reach=<reference address>
add-node
andjoin
.The cluster consists of 3 nodes.
Logs excerpt
snap logs -f microk8s
and we will see the loop. I doctored an instance of a loop here:Long logs
Too long; can't post it here. See on Gist instead.
The text was updated successfully, but these errors were encountered: