Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

kubelet-wrapper leaves behind Orphaned Pods #1831

Closed
chrigl opened this issue Feb 25, 2017 · 5 comments
Closed

kubelet-wrapper leaves behind Orphaned Pods #1831

chrigl opened this issue Feb 25, 2017 · 5 comments
Labels

Comments

@chrigl
Copy link

chrigl commented Feb 25, 2017

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.12.0
VERSION_ID=1235.12.0
BUILD_ID=2017-02-23-0222
PRETTY_NAME="Container Linux by CoreOS 1235.12.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

Cloud Provider and VirtualBox VMs. But issue should be the same in all environments

Expected Behavior

If Pods are going to be deleted, they should go away and don't leave behind Orphaned Pods.

Actual Behavior

Orphaned Pods stay on the system

Reproduction Steps

  1. Schedule a Pod with a Secret mounted
  2. Go to the Node and stop kubelet
  3. Start kubelet

You should see a new kubelet container in rkt list, and the old one stopped

rkt list
UUID            APP             IMAGE NAME                                      STATE           CREATED         STARTED         NETWORKS
73a545fc        flannel         quay.io/coreos/flannel:v0.6.2                   running         1 day ago       1 day ago
dbde2e4e        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        exited          1 hour ago      1 hour ago
ff9dd15e        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        running         28 minutes ago  28 minutes ago
  1. kubectl delete $POD
  2. See the logs:
Feb 25 16:26:05 kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1.novalocal kubelet-wrapper[27093]: I0225 16:26:05.829431   27093 kubelet_volumes.go:104] Orphaned pod "d2ddd075-f2b2-11e6-808d-fa163eac0cd0" found, but volumes are not cleaned up

I did not test this with Persistent Storage, but this may be the same issue. This could be a real problem, because the Volume may not be attachable to another node.

While with Secrets and ConfigMaps this is not as relevant, because the Orphaned Pods are cleaned up on the next reboot anyway.

Other Information

Copy & paste from: kubernetes/kubernetes#38498 (comment)

TL;DR: It seeoms to be a problem when running kubelet in rkt fly on CoreOS

Currently happens on the system:

Feb 25 14:22:54 kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1.novalocal kubelet-wrapper[12194]: E0225 14:22:54.170472   12194 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/d2ddd075-f2b2-11e6-808d-fa163eac0cd0-heapster-token-f1g9p\" (\"d2ddd075-f2b2-11e6-808d-fa163eac0cd0\")" failed. No retries permitted until 2017-02-25 14:24:54.170358703 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/d2ddd075-f2b2-11e6-808d-fa163eac0cd0-heapster-token-f1g9p" (volume.spec.Name: "heapster-token-f1g9p") pod "d2ddd075-f2b2-11e6-808d-fa163eac0cd0" (UID: "d2ddd075-f2b2-11e6-808d-fa163eac0cd0") with: rename /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/wrapped_heapster-token-f1g9p.deleting~394175716: device or resource busy
Feb 25 14:22:54 kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1.novalocal kubelet-wrapper[12194]: E0225 14:22:54.170592   12194 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/b539bd4a-f2b2-11e6-808d-fa163eac0cd0-default-token-q6jpp\" (\"b539bd4a-f2b2-11e6-808d-fa163eac0cd0\")" failed. No retries permitted until 2017-02-25 14:24:54.170569069 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/b539bd4a-f2b2-11e6-808d-fa163eac0cd0-default-token-q6jpp" (volume.spec.Name: "default-token-q6jpp") pod "b539bd4a-f2b2-11e6-808d-fa163eac0cd0" (UID: "b539bd4a-f2b2-11e6-808d-fa163eac0cd0") with: rename /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/wrapped_default-token-q6jpp.deleting~172107507: device or resource busy

But it can't be moved, since it is still mounted. So like you mentioned, kubelet does not consider the volume to be a tmpfs.

# lsof -n | grep "token-" || echo "Nothing"
Nothing
# mount | grep "token-"
tmpfs on /var/lib/rkt/pods/run/20bc48e1-cf9a-4dae-9c33-c89dd4e2cfc3/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/rkt/pods/run/20bc48e1-cf9a-4dae-9c33-c89dd4e2cfc3/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p type tmpfs (rw,relatime,seclabel)

Hmm. There was a crashed kubelet (out of space on this node)

# rkt list
UUID            APP             IMAGE NAME                                      STATE   CREATED         STARTED         NETWORKS
20bc48e1        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        exited  1 day ago       1 day ago
73a545fc        flannel         quay.io/coreos/flannel:v0.6.2                   running 1 day ago       1 day ago
e67f6189        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        running 25 minutes ago  25 minutes ago
kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1 containers # rkt rm 20bc48e1
"20bc48e1-cf9a-4dae-9c33-c89dd4e2cfc3"
# mount | grep token-
tmpfs on /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/d2ddd075-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/heapster-token-f1g9p type tmpfs (rw,relatime,seclabel)

Some mounts are gone.

Turn on more logging

 14:47:16.475315   13150 empty_dir_linux.go:38] Determining mount medium of /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp
 14:47:16.476148   13150 empty_dir_linux.go:48] Statfs_t of /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp: {Type:61267 Bsize:4096 Blocks:4474386 Bfree:3517504 Bavail:3312627 Files:4625792 Ffree:4471664 Fsid:{X__val:[-2141875238 -1373838413]}

Okey, let's drill this one down, according to https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/empty_dir/empty_dir_linux.go#L37, just to be sure, this works like expected.

package main

import (
        "flag"
        "fmt"
        "os"
        "syscall"
)

const linuxTmpfsMagic = 0x01021994

func main() {

        path := ""
        flag.StringVar(&path, "path", "", "Path of file")
        flag.Parse()
        if path == "" {
                fmt.Println("Provide a path")
                os.Exit(1)
        }

        buf := syscall.Statfs_t{}
        if err := syscall.Statfs(path, &buf); err != nil {
                fmt.Printf("statfs(%q): %v\n", path, err)
                os.Exit(1)
        }

        fmt.Printf("Statfs_t of %q: %+v\n", path, buf)
        if buf.Type == linuxTmpfsMagic {
                fmt.Printf("%q is a tmpfs\n", path)
        } else {
                fmt.Printf("%q NOT a tmpfs\n", path)
        }
}

This works:

# ./statfs -path /dev
Statfs_t of "/dev": {Type:16914836 Bsize:4096 Blocks:502378 Bfree:502378 Bavail:502378 Files:502378 Ffree:502048 Fsid:{X__val:[0 0]} Namelen:255 Frsize:4096 Flags:34 Spare:[0 0 0 0]}
"/dev" is a tmpfs

Trying this on the affected node, with the real file

# ./statfs -path /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp
Statfs_t of "/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp": {Type:16914836 Bsize:4096 Blocks:506397 Bfree:506394 Bavail:506394 Files:506397 Ffree:506388 Fsid:{X__val:[0 0]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]}
"/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp" is a tmpfs

Surprise, this is a tmpfs like expected. So what else could this be? I noticed that Type:61267 shows us, that we are a ext4 mount point. So likely kubelets hits /.

For sure, kubelet is running as a rkt fly container.

# chroot /proc/$(pgrep kubelet)/root
# /run/statfs -path /var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp
Statfs_t of "/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp": {Type:61267 Bsize:4096 Blocks:4474386 Bfree:3510932 Bavail:3306055 Files:4625792 Ffree:4471667 Fsid:{X__val:[-2141875238 -1373838413]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]}
"/var/lib/kubelet/pods/b539bd4a-f2b2-11e6-808d-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-q6jpp" NOT a tmpfs
# mount | grep "token-" || echo "Nothing"
Nothing

Well, this would have been discoverable without the lines of code. But anyway.

It is quite simple to reproduce this behavior. Just start a Pod with a Secret. Stop kubelet on the node, I left it off until the api server recognised. Started it again (with help kubelet-wrapper, of course) and waited until the api server showed this node as Ready. I made sure, the pod was still running on this node. After this, I just deleted it with kubectl. Voila, one more Orphand Pod with the same symptoms.

@lucab
Copy link

lucab commented Feb 28, 2017

You should see a new kubelet container in rkt list, and the old one stopped.

You should not see the old exited one, as the example service unit does a rm in ExecStartPre to clean up the environment in case of restarts. How are you running the kubelet?

@chrigl
Copy link
Author

chrigl commented Feb 28, 2017

Because of my old setup, there was still a kubelet.service without the rm, but I tested it this morning with the recommended kubelet.service and the problem still exists. kubernetes/kubernetes#38498 (comment)

One more thing, I just realized. One mount counts up when restarting kubelet.

# mount | grep 5c8aed80-fd9d-11e6-8be
tmpfs on /var/lib/rkt/pods/run/43c638e8-c227-43d6-8367-af014ceb4b22/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
# grep 5c8aed80-fd9d-11e6-8be /proc/$(pgrep kubelet)/mounts
tmpfs /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt tmpfs rw,seclabel,relatime 0 0
# systemctl restart kubelet

# rkt list
UUID            APP             IMAGE NAME                                      STATE   CREATED         STARTED         NETWORKS
0a19fcca        flannel         quay.io/coreos/flannel:v0.6.2                   running 2 days ago      2 days ago
51868386        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        running 4 minutes ago   4 minutes ago

# mount | grep 5c8aed80-fd9d-11e6-8be
tmpfs on /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/rkt/pods/run/51868386-b927-491b-89ac-0208ec41f0e1/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)

# systemctl restart kubelet
# rkt list
UUID            APP             IMAGE NAME                                      STATE   CREATED         STARTED         NETWORKS
0a19fcca        flannel         quay.io/coreos/flannel:v0.6.2                   running 2 days ago      2 days ago
f3bb86c0        hyperkube       quay.io/coreos/hyperkube:v1.5.2_coreos.2        running 1 second ago    now
# mount | grep 5c8aed80-fd9d-11e6-8be
tmpfs on /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/rkt/pods/run/f3bb86c0-5279-4629-8cf5-528b7eac11c5/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)

Unmounting all of them by hand, and waiting for kubelet to recreate it:

# journalctl -u kubelet -f
Feb 28 12:07:22 kubi-kube-worker-ycnoukyptmzw-0-xuflmcdypg5d1.novalocal kubelet-wrapper[4956]: I0228 12:07:22.351442    4956 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0-default-token-7j7wt" (spec.Name: "default-token-7j7wt") pod "5c8aed80-fd9d-11e6-8bec-fa163eac0cd0" (UID: "5c8aed8
0-fd9d-11e6-8bec-fa163eac0cd0").

# mount | grep 5c8aed80-fd9d-11e6-8be
tmpfs on /var/lib/rkt/pods/run/f3bb86c0-5279-4629-8cf5-528b7eac11c5/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/5c8aed80-fd9d-11e6-8bec-fa163eac0cd0/volumes/kubernetes.io~secret/default-token-7j7wt type tmpfs (rw,relatime,seclabel)

chris@home $ kubectl delete pod/nginx-test-3315381049-pnf9f
pod "nginx-test-3315381049-pnf9f" deleted

# sleep 30
# mount | grep 5c8aed80-fd9d-11e6-8bec || echo Nothing
Nothing

No Orphaned Pod!

@chrigl
Copy link
Author

chrigl commented Mar 1, 2017

@lucab I did another test with hyperkube "no node" without running it in rkt. Without rkt, the problem does not exist.

# mkdir -p /opt/bin && cp /proc/$(pgrep kubelet)/root/hyperkube /opt/bin/
# systemctl stop kubelet.service
# systemctl cat kubelet-no-rkt.service
# /etc/systemd/system/kubelet-no-rkt.service
[Service]
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
EnvironmentFile=/etc/environment
EnvironmentFile=/etc/environment-kubelet
EnvironmentFile=/etc/environment-os-servername
ExecStart=/opt/bin/hyperkube kubelet \
  --require-kubeconfig \
  --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
  --network-plugin-dir=/etc/kubernetes/cni/net.d \
  --register-node=true \
  --allow-privileged=true \
  --pod-manifest-path=/etc/kubernetes/manifests \
  --hostname-override=${OS_SERVER_NAME} \
  --tls-cert-file=/etc/kubernetes/ssl/worker.pem \
  --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem \
  --cluster_dns=10.3.0.53 \
  --cluster_domain=cluster.local

Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target

# systemctl start kubelet-no-rkt.service
# ps -ef | grep kubelet
root      2355     1  7 09:14 ?        00:00:01 /opt/bin/hyperkube kubelet --require-kubeconfig --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --network-plugin-dir=/etc/kubernetes/cni/net.d --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=kubi2-kube-worker-re3egr3ck7tl-0-zysdhjsaglf51 --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --cluster_dns=10.3.0.53 --cluster_domain=cluster.local

# docker ps
CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS               NAMES
e526aaa5eb1c        nginx:latest                                         "nginx -g 'daemon off"   22 seconds ago      Up 22 seconds                           k8s_nginx-test.4e20122b_nginx-test-3245317857-hl2v6_default_cbabdb5c-fe5f-11e6-a0e1-fa163ed1b02a_b9af961c
1c60bbb2bad9        gcr.io/google_containers/pause-amd64:3.0             "/pause"                 24 seconds ago      Up 24 seconds                           k8s_POD.d8dbe16c_nginx-test-3245317857-hl2v6_default_cbabdb5c-fe5f-11e6-a0e1-fa163ed1b02a_e81cdc03

# mount | grep cbabdb5c-fe5f-11e6-a0e1-fa1
tmpfs on /var/lib/kubelet/pods/cbabdb5c-fe5f-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/rkt/pods/run/dfede8ae-a2e4-411f-a2cc-729020b60b5c/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/cbabdb5c-fe5f-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)

# systemctl stop kubelet-no-rkt.service
# sleep 10
# systemctl start kubelet-no-rkt.service
# mount | grep cbabdb5c-fe5f-11e6-a0e1-fa1
tmpfs on /var/lib/kubelet/pods/cbabdb5c-fe5f-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/rkt/pods/run/dfede8ae-a2e4-411f-a2cc-729020b60b5c/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/cbabdb5c-fe5f-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)

chris@home $kubectl delete pod/nginx-test-3245317857-hl2v6
pod "nginx-test-3245317857-hl2v6" deleted

# mount | grep cbabdb5c-fe5f-11e6-a0e1-fa1 || echo "Nothing"
Nothing

While still exists when using rkt.

# systemctl stop kubelet-no-rkt.service
# ps -ef | grep kubelet
root     12914  6882  0 09:27 pts/0    00:00:00 grep --colour=auto kubelet
# systemctl start kubelet.service
# ps -ef | grep kubelet
root     13021     1 99 09:27 ?        00:00:02 /usr/bin/rkt run --uuid-file-save=/var/lib/coreos/kubelet-wrapper.uuid --trust-keys-from-https --volume etc-kubernetes,kind=host,source=/etc/kubernetes,readOnly=false --volume etc-ssl-certs,kind=host,source=/etc/ssl/certs,readOnly=true --volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true --volume var-lib-docker,kind=host,source=/var/lib/docker,readOnly=false --volume var-lib-kubelet,kind=host,source=/var/lib/kubelet,readOnly=false --volume os-release,kind=host,source=/usr/lib/os-release,readOnly=true --volume run,kind=host,source=/run,readOnly=false --mount volume=etc-kubernetes,target=/etc/kubernetes --mount volume=etc-ssl-certs,target=/etc/ssl/certs --mount volume=usr-share-certs,target=/usr/share/ca-certificates --mount volume=var-lib-docker,target=/var/lib/docker --mount volume=var-lib-kubelet,target=/var/lib/kubelet --mount volume=os-release,target=/etc/os-release --mount volume=run,target=/run --stage1-from-dir=stage1-fly.aci quay.io/coreos/hyperkube:v1.5.2_coreos.2 --exec=/kubelet -- --require-kubeconfig --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --network-plugin-dir=/etc/kubernetes/cni/net.d --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=kubi2-kube-worker-re3egr3ck7tl-0-zysdhjsaglf51 --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --cluster_dns=10.3.0.53 --cluster_domain=cluster.local
root     13066  6882  0 09:27 pts/0    00:00:00 grep --colour=auto kubelet


# systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
[Service]
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
EnvironmentFile=/etc/environment
EnvironmentFile=/etc/environment-kubelet
EnvironmentFile=/etc/environment-os-servername
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/lib/coreos/kubelet-wrapper.uuid"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/lib/coreos/kubelet-wrapper.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
  --require-kubeconfig \
  --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
  --network-plugin-dir=/etc/kubernetes/cni/net.d \
  --register-node=true \
  --allow-privileged=true \
  --pod-manifest-path=/etc/kubernetes/manifests \
  --hostname-override=${OS_SERVER_NAME} \
  --tls-cert-file=/etc/kubernetes/ssl/worker.pem \
  --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem \
  --cluster_dns=10.3.0.53 \
  --cluster_domain=cluster.local
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/lib/coreos/kubelet-wrapper.uuid

Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target

# docker ps
CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS               NAMES
ac7e9a5725fa        nginx:latest                                         "nginx -g 'daemon off"   1 seconds ago       Up 1 seconds                            k8s_nginx-test.4e20122b_nginx-test-3245317857-36hd9_default_8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a_c3e78ffd
8fe71b21e537        gcr.io/google_containers/pause-amd64:3.0             "/pause"                 3 seconds ago       Up 2 seconds                            k8s_POD.d8dbe16c_nginx-test-3245317857-36hd9_default_8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a_92e3d2da

# mount | grep 8b5a54aa-fe61-11e6-a0e1-
tmpfs on /var/lib/rkt/pods/run/a3ce979c-cbed-44a4-b8f3-4c736fdbda3a/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)

# mount | grep 8b5a54aa-fe61-11e6-a0e1-
tmpfs on /var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/rkt/pods/run/f3d0b790-9ab1-463a-9e25-930d7dc50949/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)

chris@home $ kubectl delete pod/nginx-test-3245317857-36hd9
pod "nginx-test-3245317857-36hd9" deleted

# journalctl -u kubelet -f
Mar 01 09:31:36 kubi2-kube-worker-re3egr3ck7tl-0-zysdhjsaglf51.novalocal kubelet-wrapper[15818]: E0301 09:31:36.857857   15818 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a-default-token-rv8md\" (\"8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a\")" failed. No retries permitted until 2017-03-01 09:31:40.857818576 +0000 UTC (durationBeforeRetry 4s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a-default-token-rv8md" (volume.spec.Name: "default-token-rv8md") pod "8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a" (UID: "8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a") with: rename /var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md /var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/wrapped_default-token-rv8md.deleting~760049944: device or resource busy

# mount | grep 8b5a54aa-fe61-11e6-a0e1-
tmpfs on /var/lib/kubelet/pods/8b5a54aa-fe61-11e6-a0e1-fa163ed1b02a/volumes/kubernetes.io~secret/default-token-rv8md type tmpfs (rw,relatime,seclabel)

@lucab
Copy link

lucab commented Mar 1, 2017

/cc @pbx0

@lucab
Copy link

lucab commented Mar 29, 2017

As per kubernetes/kubernetes#38498 (comment), the var-lib-kubelet volume in kubelet-wrapper may require an additional recursive=true option so that existing mounts in host-ns are exposed as mounts in kubelet mount-ns. Without this, kubelet gets confused by having leftover pod-volumes appearing as plain directories without their mountpoint counterpart.

The only bit I'd like to double-check in order to turn this on by default is ensuring that the recursive option doesn't propagate back an additional mount to the host-ns.

/cc @euank @aaronlevy

lucab added a commit to lucab/coreos-overlay that referenced this issue Apr 5, 2017
… mount

So far `/var/lib/kubelet` was mounted as an implicit non-recursive mount.
This changes the wrapper to an explicit recursive mount.

As shown in kubernetes/kubernetes#38498 (comment),
current non-recursive behavior seems to confuse the kubelet which
is incapable of cleaning up resources for orphaned pods, as the
extisting mountpoints for them are not available inside kubelet
chroot.
With `recursive=true`, those mounts are made available in the
chroot and can be unmounted on the host-side from kubelet chroot
via shared back-propagation.

Fixes coreos/bugs#1831
euank pushed a commit to euank/coreos-overlay that referenced this issue May 30, 2017
… mount

So far `/var/lib/kubelet` was mounted as an implicit non-recursive mount.
This changes the wrapper to an explicit recursive mount.

As shown in kubernetes/kubernetes#38498 (comment),
current non-recursive behavior seems to confuse the kubelet which
is incapable of cleaning up resources for orphaned pods, as the
extisting mountpoints for them are not available inside kubelet
chroot.
With `recursive=true`, those mounts are made available in the
chroot and can be unmounted on the host-side from kubelet chroot
via shared back-propagation.

Fixes coreos/bugs#1831
euank pushed a commit to euank/coreos-overlay that referenced this issue May 30, 2017
… mount

So far `/var/lib/kubelet` was mounted as an implicit non-recursive mount.
This changes the wrapper to an explicit recursive mount.

As shown in kubernetes/kubernetes#38498 (comment),
current non-recursive behavior seems to confuse the kubelet which
is incapable of cleaning up resources for orphaned pods, as the
extisting mountpoints for them are not available inside kubelet
chroot.
With `recursive=true`, those mounts are made available in the
chroot and can be unmounted on the host-side from kubelet chroot
via shared back-propagation.

Fixes coreos/bugs#1831
ChrisMcKenzie pushed a commit to ChrisMcKenzie/coreos-overlay that referenced this issue Dec 9, 2017
… mount

So far `/var/lib/kubelet` was mounted as an implicit non-recursive mount.
This changes the wrapper to an explicit recursive mount.

As shown in kubernetes/kubernetes#38498 (comment),
current non-recursive behavior seems to confuse the kubelet which
is incapable of cleaning up resources for orphaned pods, as the
extisting mountpoints for them are not available inside kubelet
chroot.
With `recursive=true`, those mounts are made available in the
chroot and can be unmounted on the host-side from kubelet chroot
via shared back-propagation.

Fixes coreos/bugs#1831
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants