Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 single node microk8s on different servers suddenly do not create pods after working perfectly for months... #3545

Closed
javad87 opened this issue Nov 7, 2022 · 7 comments
Labels

Comments

@javad87
Copy link

javad87 commented Nov 7, 2022

Summary

2 single node microk8s on different servers suddenly do not create pods after working perfectly for months! guess happened after rebooting of server:
looking at the "describe pods" & "pod event" and "jurnalctl" and "microk8s inspect" the major error I encountered is:

Warning FailedCreatePodSandBox 7m46s (x3 over 7m59s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown

What Should Happen Instead?

Pod should be running in kube-system namespace like calico pods for CNI and my application pods in other namespaces.

[root@localhost containerd]# kubectl get all --namespace kube-system
NAME READY STATUS RESTARTS AGE
pod/calico-node-z2npr 0/1 Unknown 13 308d
pod/hostpath-provisioner-5c65fbdb4f-t8hz5 0/1 Unknown 13 308d
pod/coredns-7f9c69c78c-kcsxz 0/1 Unknown 11 307d
pod/calico-kube-controllers-c85f8f74b-stth6 0/1 Unknown 13 308d

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.152.183.10 53/UDP,53/TCP,9153/TCP 321d

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/calico-node 1 1 0 1 0 kubernetes.io/os=linux 321d

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hostpath-provisioner 0/1 1 0 321d
deployment.apps/calico-kube-controllers 0/1 1 0 321d
deployment.apps/coredns 0/1 1 0 321d

NAME DESIRED CURRENT READY AGE
replicaset.apps/calico-kube-controllers-f7868dd95 0 0 0 321d
replicaset.apps/calico-kube-controllers-59b46f8b57 0 0 0 321d
replicaset.apps/hostpath-provisioner-5c65fbdb4f 1 1 0 321d
replicaset.apps/calico-kube-controllers-c85f8f74b 1 1 0 312d
replicaset.apps/coredns-7f9c69c78c 1 1 0 321d

Reproduction Steps

1.installed microk8s offline through this page: https://microk8s.io/docs/install-offline
2.# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
3.# uname -sr
Linux 3.10.0-1160.49.1.el7.x86_64
5. # microk8s version
MicroK8s v1.25.3 revision 4094
6.# microk8s kubectl version --short
Client Version: v1.25.3
Kustomize Version: v4.5.7
Server Version: v1.25.3
7.# docker version
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:22 2018
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 18.09.0
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:19:08 2018
OS/Arch: linux/amd64
Experimental: false

8.# containerd
WARN[0000] containerd config version 1 has been deprecated and will be removed in containerd v2.0, please switch to version 2, see https://github.com/containerd/containerd/blob/main/docs/PLUGINS.md#version-header
INFO[2022-11-07T16:20:26.025791559+03:30] starting containerd revision=1c90a442489720eec95342e1789ee8a5e1b9536f version=1.6.9
INFO[2022-11-07T16:20:26.050331270+03:30] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2022-11-07T16:20:26.050387334+03:30] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.060427872+03:30] skip loading plugin "io.containerd.snapshotter.v1.aufs"... error="aufs is not supported (modprobe aufs failed: exit status 1 "modprobe: FATAL: Module aufs not found.\n"): skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.060881302+03:30] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.061799698+03:30] skip loading plugin "io.containerd.snapshotter.v1.btrfs"... error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs (xfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.061847774+03:30] loading plugin "io.containerd.snapshotter.v1.devmapper"... type=io.containerd.snapshotter.v1
WARN[2022-11-07T16:20:26.061879368+03:30] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
INFO[2022-11-07T16:20:26.061942871+03:30] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.062041394+03:30] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.062490054+03:30] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.063331978+03:30] skip loading plugin "io.containerd.snapshotter.v1.zfs"... error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.063379114+03:30] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[2022-11-07T16:20:26.063424130+03:30] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2022-11-07T16:20:26.063448575+03:30] metadata content store policy set policy=shared
WARN[2022-11-07T16:20:36.064254925+03:30] waiting for response from boltdb open plugin=bolt

9.# yum info installed containerd.io
Loaded plugins: copr, fastestmirror, remove-with-leaves
Installed Packages
Name : containerd.io
Arch : x86_64
Version : 1.6.9
Release : 3.1.el7
Size : 112 M
Repo : installed
From repo : docker-ce-stable
Summary : An industry-standard container runtime
URL : https://containerd.io
License : ASL 2.0
Description : containerd is an industry-standard container runtime with an emphasis on
: simplicity, robustness and portability. It is available as a daemon for Linux
: and Windows, which can manage the complete container lifecycle of its host
: system: image transfer and storage, container execution and supervision,
: low-level storage and network attachments, etc.

Introspection Report

inspection-report-20221107_162301.tar.gz

Can you suggest a fix?

I searched formentioned above error and found this solution that due to runc vulnerability kernel version should be greater than 3.XX,
or for workaround can downgrade to old docker version (docker-ce-18.09.0-3.el7, docker-ce-cli-18.09.0-3.el7, containerd.io-1.2.0-3. el7), I did downgrade to docker version 18.09.0 (yum downgrade docker-ce-cli -y, yum downgrade docker-ce -y) but still had the same problem. (for i in {1..30}; do yum downgrade containerd.io -y; done)
pages talked about this issue: (some are in chinese pls use google translate)

https://medium.com/@dirk.avery/docker-error-response-from-daemon-1d46235ff61d
#2223
https://cloud.tencent.com/developer/article/1411527
https://www-cnblogs-com.translate.goog/mrnx2004/p/10601490.html?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp

Are you interested in contributing with a fix?

yes @timgreen
this error is mentioned in these links too:

#2223
#3221

@ktsakalozos
Copy link
Member

Hi @javad87, MicroK8s packages its own containerd so it ignores any containerd/docker installed on the host. Since you are hitting this error and you have identified a potential fix you may want to edit the kubelet arguments [1] found in /var/snap/microk8s/current/args/kubelet and configure MicroK8s to use the containerd you have manually installed. Unfortunately I have no experience in that setup but it should be possible. Of course a safer choice is to get an Ubuntu LTS VM but I suspect this might not be possible.

Another point you should be aware of is that MicroK8s (in its default containerd setup) consumes storage from /var/snap/microk8s/common. This should explain the pod evictions you are seeing as your system seems to be running out of disk even though your /home partition is almost empty.

[1] https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

@javad87
Copy link
Author

javad87 commented Nov 8, 2022

hi @ktsakalozos, again tnx for reply.
about storage, microk8s uses tmpfs for /var/snap/microk8s/common/var/lib/kubelet/* which is volatile and of course will be erased with reboot however containerd data is stored in this path: /var/snap/microk8s/common/var/lib/containerd and for me mounted on / and I freed up more space and has now 18G free for containerd (and microk8s), anyway still has the same problem, I think we can safely eliminate lack of resource problem!
#df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 30G 0 30G 0% /dev
tmpfs 30G 0 30G 0% /dev/shm
tmpfs 30G 492M 29G 2% /run
tmpfs 30G 0 30G 0% /sys/fs/cgroup
/dev/mapper/centos-root 50G 33G 18G 65% /
/dev/loop4 56M 56M 0 100% /var/lib/snapd/snap/core18/2620
/dev/loop0 210M 210M 0 100% /var/lib/snapd/snap/microk8s/2848
/dev/loop2 100M 100M 0 100% /var/lib/snapd/snap/core/11993
/dev/loop1 115M 115M 0 100% /var/lib/snapd/snap/core/13886
/dev/loop3 167M 167M 0 100% /var/lib/snapd/snap/microk8s/4094
/dev/loop5 56M 56M 0 100% /var/lib/snapd/snap/core18/2284
/dev/sda1 1014M 194M 821M 20% /boot
/dev/mapper/centos-home 952G 27G 926G 3% /home
tmpfs 5.9G 0 5.9G 0% /run/user/0
tmpfs 59G 24K 59G 1% /var/snap/microk8s/common/var/lib/kubelet/pods/5b91a2a7-6f05-4818-bacd-1f399f332ffa/volumes/kubernetes.io~secret/secret
tmpfs 59G 12K 59G 1% /var/snap/microk8s/common/var/lib/kubelet/pods/0f8dc418-19c9-4a18-a4ef-8278f203dc

#du -sch /var/snap/microk8s/common/var/lib/*
21G /var/snap/microk8s/common/var/lib/containerd
712K /var/snap/microk8s/common/var/lib/kubelet
21G total

tried to use containerd on my host or even using docker for cotainer runtime by editing file as follows in path: /var/snap/microk8s/4094/args/kubelet and then microk8s stop & microk8s start but containerd has not changed to my cotainerd version running on my host!

#kubectl describe node
still the microk8s version
#journalctl -u snap.microk8s.daemon-containerd --since 17:20 -r -o json-pretty
"_CMDLINE" : "/snap/microk8s/4094/bin/containerd --config /var/snap/microk8s/4094/args/containerd.toml --root
still running containerd from microk8s!!

#cat /var/snap/microk8s/4094/args/kubelet
--kubeconfig=${SNAP_DATA}/credentials/kubelet.config
--cert-dir=${SNAP_DATA}/certs
--client-ca-file=${SNAP_DATA}/certs/ca.crt
--anonymous-auth=false
--root-dir=${SNAP_COMMON}/var/lib/kubelet
--fail-swap-on=false
--feature-gates=DevicePlugins=true
--eviction-hard="memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi"
--container-runtime=remote
--container-runtime-endpoint=unix:///run/containerd/containerd.sock <--------> change it to host containerd
--containerd=unix:///run/containerd/containerd.sock <--------> change it to host containerd
--node-labels="microk8s.io/cluster=true"
--authentication-token-webhook=true
--cluster-domain=cluster.local
--cluster-dns=10.152.183.10

and did not change "/var/snap/microk8s/4094/args/containerd.toml" config file:
need a guide to walk us through parameters and files need to be changed so that container runtime be different from what is running by microk8s. like:

https://dev.to/stack-labs/how-to-switch-container-runtime-in-a-kubernetes-cluster-1628
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd

Also tried useing docker as container runtime in following way with no success:
https://stackoverflow.com/questions/61119975/is-it-possible-to-set-microk8s-to-use-docker-engine-instead-of-containerd
https://github.com/canonical/microk8s/blob/1.13/microk8s-resources/default-args/kubelet

I will post an issue on how to make microk8s use host docker engine or containerd for container runtime? or even for using different version of containerd, Can I just change binary file in this path: /snap/microk8s/4094/bin/?

also checked my kernel for container compatibility and result is ok:
https://blog.hypriot.com/post/verify-kernel-container-compatibility/
https://github.com/containerd/containerd/releases?after=v1.0.0-alpha1
https://containerd.io/releases/

the other option left for me is upgrading my kernel, still have doubt to do it since my docker engine and before reboot microk8s working well with kernel 3.XX

@javad87
Copy link
Author

javad87 commented Nov 9, 2022

Unfortunatly problem is not solved with kernel update either, I updated kernel version to kernel-ml which is at this time version: 6.0.7-1.el7.elrepo.x86_64

kubectl get nodes -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
localhost.localdomain Ready 323d v1.25.3 192.168.112.25 CentOS Linux 7 (Core) 6.0.7-1.el7.elrepo.x86_64 containerd://1.6.6

first stopped microk8s then updated kernel and again start it.
also did stop and disable , containerd and docker (#systemctl disable docker) on host machine so that no conflict for shim happens. the only containerd running on my server is from microk8s:

ps -aux|grep containerd

root 2547 4.1 0.1 2795668 73204 ? Ssl 11:23 0:13 /snap/microk8s/4094/bin/containerd --config /var/snap/microk8s/4094/args/containerd.toml --root /var/snap/microk8s/common/var/lib/containerd --state /var/snap/microk8s/common/run/containerd --address /var/snap/microk8s/common/run/containerd.sock

I ran out of option, seams this bug is related to containerd:

containerd/containerd#4068

how can I update version of containerd in microk8s? new version is 1.6.9 and microk8s uses 1.6.6...
Is it possible just replace containerd bin files in path: /snap/microk8s/4094/bin/ ?
#3536

@javad87
Copy link
Author

javad87 commented Nov 9, 2022

I examined result of "microk8s inspect" accuratly for both of my servers which got this problem after reboot, in both files comming errors (in folder snap.microk8s.daemon-containerd) are common (bold & italic):

server 1:
inspection-report-20221109_141931.tar.gz
server 2:
inspection-report-20221109_142214.tar.gz

level=error msg="failed to delete" cmd="/snap/microk8s/4055/bin/containerd-shim-runc-v1 -namespace k8s.io -address /var/snap/microk8s/common/run/containerd.sock -publish-binary /var/lib/snapd/snap/microk8s/4055/bin/containerd -id 1b951cd56c001742864cd6c1d8843b5c3b3fbfa0042064d228c052417f016ef0 -bundle /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/1b951cd56c001742864cd6c1d8843b5c3b3fbfa0042064d228c052417f016ef0 delete" error="exit status 1"


level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:openebs-ndm-w2hl4,Uid:dd17245f-dbf8-4928-8179-2dc878e29a9e,Namespace:openebs,Attempt:21,} failed, error" error="failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown"


level=error msg="copy shim log" error="read /proc/self/fd/19: file already closed"


[ERROR][6879] customresource.go 136: Error updating resource Key=IPAMBlock(10-1-103-128-26) Name="10-1-103-128-26" Resource="IPAMBlocks" Value=&v3.IPAMBlock{TypeMeta:v1.TypeMeta{Kind:"IPAMBlock", APIVersion:"crd.projectcalico.org/v1"}, ObjectMeta:v1.ObjectMeta{Name:"10-1-103-128-26", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"45540869", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.IPAMBlockSpec{CIDR:"10.1.103.128/26", Affinity:(*string)(0xc00028c410), StrictAffinity:false, Allocations:[]*int{(*int)(0xc0003cfe28), (*int)(0xc0003cfd08), (*int)(0xc0003cfef8), (*int)(0xc0003cfe88), (*int)(0xc0003cffc8), (*int)(0xc0003cfe98), (*int)(0xc0003cff38), (*int)(0xc0003cfea8), (*int)(0xc0003cff58), (*int)(0xc0003cfff8), (*int)(0xc0003cfd98), (*int)(0xc0003cfdd8), (*int)(0xc0003cfc38), (*int)(0xc0003cfc48), (*int)(0xc0003f6138), (*int)(0xc0003cfde8), (*int)(0xc

@javad87
Copy link
Author

javad87 commented Nov 10, 2022

My problem solved: LONG STORY SHORT ----------> this problem exactly created after reboot, the reason is snap will automaticaly update installed packages, and there is a discution about this forceful behaviour:(https://forum.snapcraft.io/t/disabling-automatic-refresh-for-snap-from-store/707/322), fortunatly I could revert back to previous vrsion (https://askubuntu.com/questions/1198022/how-to-run-a-previous-version-of-a-snap-package):
#snap list "microk8s" --all
Name Version Rev Tracking Publisher Notes
microk8s v1.23.1 2848 - canonical✓ classic
microk8s v1.25.3 4094 - canonical✓ disabled,classic
#sudo snap revert microk8s

you can see after reboot snap changed microk8s from v1.23.1 to v1.25.3 and then microk8s containerd does not work on my system it is not also related to kernel version (as I explained above I update my kernel to 6.xx and error still remained) I again moved to kernel 3.xx and all my containers created after reverting back to previous microk8s version! @ktsakalozos So could you please ask contributors to this project see whats going on with new version that containerd does not work correctly? and also add to installation guide that if you want ur cluster remain alive after reboot first stop (hold) snap to automaticly update (refresh) microk8s and ruining everything....
this is some work around to hold damn auto update of snap:
https://askubuntu.com/questions/1131182/how-can-i-disable-automatic-update-for-a-single-snap-in-ubuntu
https://snapcraft.io/docs/keeping-snaps-up-to-date#:~:text=Snaps%20update%20automatically%2C%20and%20by,with%20the%20snap%20refresh%20command.
https://snapcraft.io/blog/how-to-manage-snap-updates

containerd has a great tool named ctr which you can test if it is working or not and running container:
https://iximiuz.com/en/posts/containerd-command-line-clients/
https://github.com/containerd/containerd/blob/main/docs/getting-started.md

#microk8s ctr run docker.io/library/busybox:latest javad87
#microk8s ctr container ls
f82c42a040ce4594515c2e0ee5cbad0b114129a1364d58023562e19666d60684 docker.io/openebs/provisioner-localpv:2.12.1 io.containerd.runc.v1
f9c2ea2b97af5100b82887b7b45843b3331099a258e69b3c72249797ec741da6 docker.io/jupyterhub/configurable-http-proxy:4.5.0 io.containerd.runc.v1
fb9914a1ad8be600f9bc2b62ea5f0d2cd3a9bbc35b2d50a07bffe490f7e672fc k8s.gcr.io/pause:3.1 io.containerd.runc.v1
fcc29dbd6631a4b3aa5c301380ae1738d424071ccac8df4e9ff662e7cea0737f docker.io/calico/cni:v3.13.2 io.containerd.runc.v1
fd1a09f971b3af5a68b0ccf87a867f391e575431109546752ee71bb10a09d844 k8s.gcr.io/pause:3.1 io.containerd.runc.v1
fe7497b3530622571383ee7e167929f6cb8ba312e93dc494c24a51b2bba0a533 k8s.gcr.io/pause:3.1 io.containerd.runc.v1
gorge docker.io/library/busybox:latest io.containerd.runc.v2
javad docker.io/library/busybox:latest io.containerd.runc.v2
javad87 docker.io/library/busybox:latest io.containerd.runc.v2
v1 docker.io/library/busybox:latest io.containerd.runc.v2
v2 docker.io/library/busybox:latest io.containerd.runc.v2

with new version of microk8s it makes the error:
error="failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown"

but still creates container with io.containerd.runc.v2, there is a problem with "shim" creation for stdout, stderr to container, I don't know nuts and bolts of container creation, this guy explained what is going on under the hood:
https://iximiuz.com/en/posts/journey-from-containerization-to-orchestration-and-beyond/#containerd

@neoaggelos
Copy link
Member

On one hand, containerd will not auto-update system packages. Instead, MicroK8s comes with the required packages and binaries included in the snap bundle. Instead, I noticed that:

#snap list "microk8s" --all
Name Version Rev Tracking Publisher Notes
microk8s v1.23.1 2848 - canonical✓ classic
microk8s v1.25.3 4094 - canonical✓ disabled,classic

you can see after reboot snap changed microk8s from v1.23.1 to v1.25.3

This should not be the case in general. By default, MicroK8s installs and pins to a minor version (e.g. 1.23). You can see this by running:

$ sudo snap info microk8s
....
tracking:    1.2X/stable

Minor version upgrades are not automatic, and would only happen if (a) you changed the channel with a snap refresh microk8s --channel 1.25 or (b) you installed microk8s with snap install microk8s --channel latest, which is not recommended for production setups.

In any case, mind the following:

  • MicroK8s upgrades should not skip minor versions. If you are on 1.23 and want to move to 1.25, you should first upgrade to 1.24, ensure all workloads are fine, and only then move to 1.25. This is a requirement of the upstream Kubernetes project, and not a peculiarity of MicroK8s itself.
  • MicroK8s comes with all packages and binaries it needs, no packages are installed/updated in the underlying system.
  • For production setups, make sure to install from a specific minor version snap install microk8s --channel 1.2X/stable, so that minor version upgrades are managed properly. See the documentation for more details on upgrading a production cluster.

Hope this helps the OP, and any future onlookers.

Copy link

stale bot commented Oct 31, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the inactive label Oct 31, 2023
@stale stale bot closed this as completed Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants