New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 single node microk8s on different servers suddenly do not create pods after working perfectly for months... #3545
Comments
Hi @javad87, MicroK8s packages its own containerd so it ignores any containerd/docker installed on the host. Since you are hitting this error and you have identified a potential fix you may want to edit the kubelet arguments [1] found in Another point you should be aware of is that MicroK8s (in its default containerd setup) consumes storage from [1] https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ |
hi @ktsakalozos, again tnx for reply. #du -sch /var/snap/microk8s/common/var/lib/* tried to use containerd on my host or even using docker for cotainer runtime by editing file as follows in path: /var/snap/microk8s/4094/args/kubelet and then microk8s stop & microk8s start but containerd has not changed to my cotainerd version running on my host! #kubectl describe node #cat /var/snap/microk8s/4094/args/kubelet and did not change "/var/snap/microk8s/4094/args/containerd.toml" config file: https://dev.to/stack-labs/how-to-switch-container-runtime-in-a-kubernetes-cluster-1628 Also tried useing docker as container runtime in following way with no success: I will post an issue on how to make microk8s use host docker engine or containerd for container runtime? or even for using different version of containerd, Can I just change binary file in this path: /snap/microk8s/4094/bin/? also checked my kernel for container compatibility and result is ok: the other option left for me is upgrading my kernel, still have doubt to do it since my docker engine and before reboot microk8s working well with kernel 3.XX |
Unfortunatly problem is not solved with kernel update either, I updated kernel version to kernel-ml which is at this time version: 6.0.7-1.el7.elrepo.x86_64 kubectl get nodes -o wideNAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME first stopped microk8s then updated kernel and again start it. ps -aux|grep containerdroot 2547 4.1 0.1 2795668 73204 ? Ssl 11:23 0:13 /snap/microk8s/4094/bin/containerd --config /var/snap/microk8s/4094/args/containerd.toml --root /var/snap/microk8s/common/var/lib/containerd --state /var/snap/microk8s/common/run/containerd --address /var/snap/microk8s/common/run/containerd.sock I ran out of option, seams this bug is related to containerd: how can I update version of containerd in microk8s? new version is 1.6.9 and microk8s uses 1.6.6... |
I examined result of "microk8s inspect" accuratly for both of my servers which got this problem after reboot, in both files comming errors (in folder snap.microk8s.daemon-containerd) are common (bold & italic): server 1: level=error msg="failed to delete" cmd="/snap/microk8s/4055/bin/containerd-shim-runc-v1 -namespace k8s.io -address /var/snap/microk8s/common/run/containerd.sock -publish-binary /var/lib/snapd/snap/microk8s/4055/bin/containerd -id 1b951cd56c001742864cd6c1d8843b5c3b3fbfa0042064d228c052417f016ef0 -bundle /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/1b951cd56c001742864cd6c1d8843b5c3b3fbfa0042064d228c052417f016ef0 delete" error="exit status 1" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:openebs-ndm-w2hl4,Uid:dd17245f-dbf8-4928-8179-2dc878e29a9e,Namespace:openebs,Attempt:21,} failed, error" error="failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown" level=error msg="copy shim log" error="read /proc/self/fd/19: file already closed" [ERROR][6879] customresource.go 136: Error updating resource Key=IPAMBlock(10-1-103-128-26) Name="10-1-103-128-26" Resource="IPAMBlocks" Value=&v3.IPAMBlock{TypeMeta:v1.TypeMeta{Kind:"IPAMBlock", APIVersion:"crd.projectcalico.org/v1"}, ObjectMeta:v1.ObjectMeta{Name:"10-1-103-128-26", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"45540869", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.IPAMBlockSpec{CIDR:"10.1.103.128/26", Affinity:(*string)(0xc00028c410), StrictAffinity:false, Allocations:[]*int{(*int)(0xc0003cfe28), (*int)(0xc0003cfd08), (*int)(0xc0003cfef8), (*int)(0xc0003cfe88), (*int)(0xc0003cffc8), (*int)(0xc0003cfe98), (*int)(0xc0003cff38), (*int)(0xc0003cfea8), (*int)(0xc0003cff58), (*int)(0xc0003cfff8), (*int)(0xc0003cfd98), (*int)(0xc0003cfdd8), (*int)(0xc0003cfc38), (*int)(0xc0003cfc48), (*int)(0xc0003f6138), (*int)(0xc0003cfde8), (*int)(0xc |
My problem solved: LONG STORY SHORT ----------> this problem exactly created after reboot, the reason is snap will automaticaly update installed packages, and there is a discution about this forceful behaviour:(https://forum.snapcraft.io/t/disabling-automatic-refresh-for-snap-from-store/707/322), fortunatly I could revert back to previous vrsion (https://askubuntu.com/questions/1198022/how-to-run-a-previous-version-of-a-snap-package): you can see after reboot snap changed microk8s from v1.23.1 to v1.25.3 and then microk8s containerd does not work on my system it is not also related to kernel version (as I explained above I update my kernel to 6.xx and error still remained) I again moved to kernel 3.xx and all my containers created after reverting back to previous microk8s version! @ktsakalozos So could you please ask contributors to this project see whats going on with new version that containerd does not work correctly? and also add to installation guide that if you want ur cluster remain alive after reboot first stop (hold) snap to automaticly update (refresh) microk8s and ruining everything.... containerd has a great tool named ctr which you can test if it is working or not and running container: #microk8s ctr run docker.io/library/busybox:latest javad87 with new version of microk8s it makes the error: but still creates container with io.containerd.runc.v2, there is a problem with "shim" creation for stdout, stderr to container, I don't know nuts and bolts of container creation, this guy explained what is going on under the hood: |
On one hand, containerd will not auto-update system packages. Instead, MicroK8s comes with the required packages and binaries included in the snap bundle. Instead, I noticed that:
This should not be the case in general. By default, MicroK8s installs and pins to a minor version (e.g. 1.23). You can see this by running:
Minor version upgrades are not automatic, and would only happen if (a) you changed the channel with a In any case, mind the following:
Hope this helps the OP, and any future onlookers. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Summary
2 single node microk8s on different servers suddenly do not create pods after working perfectly for months! guess happened after rebooting of server:
looking at the "describe pods" & "pod event" and "jurnalctl" and "microk8s inspect" the major error I encountered is:
Warning FailedCreatePodSandBox 7m46s (x3 over 7m59s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown
What Should Happen Instead?
Pod should be running in kube-system namespace like calico pods for CNI and my application pods in other namespaces.
[root@localhost containerd]# kubectl get all --namespace kube-system
NAME READY STATUS RESTARTS AGE
pod/calico-node-z2npr 0/1 Unknown 13 308d
pod/hostpath-provisioner-5c65fbdb4f-t8hz5 0/1 Unknown 13 308d
pod/coredns-7f9c69c78c-kcsxz 0/1 Unknown 11 307d
pod/calico-kube-controllers-c85f8f74b-stth6 0/1 Unknown 13 308d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.152.183.10 53/UDP,53/TCP,9153/TCP 321d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/calico-node 1 1 0 1 0 kubernetes.io/os=linux 321d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hostpath-provisioner 0/1 1 0 321d
deployment.apps/calico-kube-controllers 0/1 1 0 321d
deployment.apps/coredns 0/1 1 0 321d
NAME DESIRED CURRENT READY AGE
replicaset.apps/calico-kube-controllers-f7868dd95 0 0 0 321d
replicaset.apps/calico-kube-controllers-59b46f8b57 0 0 0 321d
replicaset.apps/hostpath-provisioner-5c65fbdb4f 1 1 0 321d
replicaset.apps/calico-kube-controllers-c85f8f74b 1 1 0 312d
replicaset.apps/coredns-7f9c69c78c 1 1 0 321d
Reproduction Steps
1.installed microk8s offline through this page: https://microk8s.io/docs/install-offline
2.# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
3.# uname -sr
Linux 3.10.0-1160.49.1.el7.x86_64
5. # microk8s version
MicroK8s v1.25.3 revision 4094
6.# microk8s kubectl version --short
Client Version: v1.25.3
Kustomize Version: v4.5.7
Server Version: v1.25.3
7.# docker version
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:22 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.0
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:19:08 2018
OS/Arch: linux/amd64
Experimental: false
8.# containerd
WARN[0000] containerd config version
1
has been deprecated and will be removed in containerd v2.0, please switch to version2
, see https://github.com/containerd/containerd/blob/main/docs/PLUGINS.md#version-headerINFO[2022-11-07T16:20:26.025791559+03:30] starting containerd revision=1c90a442489720eec95342e1789ee8a5e1b9536f version=1.6.9
INFO[2022-11-07T16:20:26.050331270+03:30] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2022-11-07T16:20:26.050387334+03:30] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.060427872+03:30] skip loading plugin "io.containerd.snapshotter.v1.aufs"... error="aufs is not supported (modprobe aufs failed: exit status 1 "modprobe: FATAL: Module aufs not found.\n"): skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.060881302+03:30] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.061799698+03:30] skip loading plugin "io.containerd.snapshotter.v1.btrfs"... error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs (xfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.061847774+03:30] loading plugin "io.containerd.snapshotter.v1.devmapper"... type=io.containerd.snapshotter.v1
WARN[2022-11-07T16:20:26.061879368+03:30] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
INFO[2022-11-07T16:20:26.061942871+03:30] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.062041394+03:30] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.062490054+03:30] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.063331978+03:30] skip loading plugin "io.containerd.snapshotter.v1.zfs"... error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-11-07T16:20:26.063379114+03:30] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[2022-11-07T16:20:26.063424130+03:30] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2022-11-07T16:20:26.063448575+03:30] metadata content store policy set policy=shared
WARN[2022-11-07T16:20:36.064254925+03:30] waiting for response from boltdb open plugin=bolt
9.# yum info installed containerd.io
Loaded plugins: copr, fastestmirror, remove-with-leaves
Installed Packages
Name : containerd.io
Arch : x86_64
Version : 1.6.9
Release : 3.1.el7
Size : 112 M
Repo : installed
From repo : docker-ce-stable
Summary : An industry-standard container runtime
URL : https://containerd.io
License : ASL 2.0
Description : containerd is an industry-standard container runtime with an emphasis on
: simplicity, robustness and portability. It is available as a daemon for Linux
: and Windows, which can manage the complete container lifecycle of its host
: system: image transfer and storage, container execution and supervision,
: low-level storage and network attachments, etc.
Introspection Report
inspection-report-20221107_162301.tar.gz
Can you suggest a fix?
I searched formentioned above error and found this solution that due to runc vulnerability kernel version should be greater than 3.XX,
or for workaround can downgrade to old docker version (docker-ce-18.09.0-3.el7, docker-ce-cli-18.09.0-3.el7, containerd.io-1.2.0-3. el7), I did downgrade to docker version 18.09.0 (yum downgrade docker-ce-cli -y, yum downgrade docker-ce -y) but still had the same problem. (for i in {1..30}; do yum downgrade containerd.io -y; done)
pages talked about this issue: (some are in chinese pls use google translate)
https://medium.com/@dirk.avery/docker-error-response-from-daemon-1d46235ff61d
#2223
https://cloud.tencent.com/developer/article/1411527
https://www-cnblogs-com.translate.goog/mrnx2004/p/10601490.html?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp
Are you interested in contributing with a fix?
yes @timgreen
this error is mentioned in these links too:
#2223
#3221
The text was updated successfully, but these errors were encountered: