Skip to content

kube-dns keep CrashLoopBackOff #352

@emiliajin

Description

@emiliajin

kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:13:31Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

kubectl get pods -n=kube-system
NAME READY STATUS RESTARTS AGE
heapster-584cf4954-gthsf 2/2 Running 0 3d
kube-dns-v20-7c556f89c5-4bg8m 1/3 CrashLoopBackOff 2066 3d
kube-dns-v20-7c556f89c5-nhl6k 1/3 CrashLoopBackOff 1301 3d
kube-proxy-6trjw 1/1 Running 0 3d
kube-proxy-9ktmb 1/1 Running 0 3d
kube-proxy-qs8l5 1/1 Running 0 3d
kube-svc-redirect-6wsw6 1/1 Running 0 3d
kube-svc-redirect-bcttt 1/1 Running 0 3d
kube-svc-redirect-l2dns 1/1 Running 0 3d
kubernetes-dashboard-546f987686-qj2w5 1/1 Running 0 3d
tunnelfront-7b75c8bcf8-shv6k 1/1 Running 319 3d

kube-dns of our AKS in eastus never get proper functioning.... I have tried to update to the newest version available in Azure to see whether the problem is solved. But it still keeps crashing. Strange thing is, the first time crash happens after about 3 hours if I tried to delete kube-dns pods to force the Deployment recreate the pods. For the new kube-dns pods, they keep running at the first 3 hours, then it starts to crash, after a time period (within 3 days), only one pod running, another pod crash, after 3 days, two pods are all crashed and never back to running state.

check pod status:
kubectl describe pods kube-dns -n=kube-system
Type Reason Age From Message


Normal Pulled 53m (x956 over 3d) kubelet, aks-agentpool-35862059-3 Container image "k8s-gcrio.azureedge.net/k8s-dns-kube-dns-amd64:1.14.8" already present on machine
Warning Unhealthy 37m (x4427 over 3d) kubelet, aks-agentpool-35862059-3 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 27m (x13588 over 3d) kubelet, aks-agentpool-35862059-3 Back-off restarting failed container
Warning BackOff 12m (x11916 over 3d) kubelet, aks-agentpool-35862059-3 Back-off restarting failed container
Normal Killing 7m (x966 over 3d) kubelet, aks-agentpool-35862059-3 Killing container with id docker://kubedns:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 3m (x4087 over 3d) kubelet, aks-agentpool-35862059-3 Liveness probe failed: Get http://10.244.3.5:8080/healthz-kubedns: dial tcp 10.244.3.5:8080: getsockopt: connection refused

Type Reason Age From Message


Normal Killing 50m (x599 over 3d) kubelet, aks-agentpool-35862059-0 Killing container with id docker://kubedns:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 30m (x700 over 3d) kubelet, aks-agentpool-35862059-0 Container image "k8s-gcrio.azureedge.net/exechealthz-amd64:1.2" already present on machine
Warning Unhealthy 15m (x2595 over 3d) kubelet, aks-agentpool-35862059-0 Liveness probe failed: Get http://10.244.4.7:8080/healthz-kubedns: dial tcp 10.244.4.7:8080: getsockopt: connection refused
Warning Unhealthy 10m (x2829 over 3d) kubelet, aks-agentpool-35862059-0 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 4m (x7453 over 3d) kubelet, aks-agentpool-35862059-0 Back-off restarting failed container
Warning BackOff 4s (x8589 over 3d) kubelet, aks-agentpool-35862059-0 Back-off restarting failed container

Check logs:
==== START logs for container kubedns of pod kube-system/kube-dns-v20-7c556f89c5-4bg8m ====
I0508 08:44:37.557004 1 dns.go:48] version: 1.14.8
I0508 08:44:37.560846 1 server.go:71] Using configuration read from directory: /kube-dns-config with period 10s
I0508 08:44:37.560897 1 server.go:119] FLAG: --alsologtostderr="false"
I0508 08:44:37.560938 1 server.go:119] FLAG: --config-dir="/kube-dns-config"
I0508 08:44:37.560955 1 server.go:119] FLAG: --config-map=""
I0508 08:44:37.560961 1 server.go:119] FLAG: --config-map-namespace="kube-system"
I0508 08:44:37.560967 1 server.go:119] FLAG: --config-period="10s"
I0508 08:44:37.561050 1 server.go:119] FLAG: --dns-bind-address="0.0.0.0"
I0508 08:44:37.561069 1 server.go:119] FLAG: --dns-port="10053"
I0508 08:44:37.561078 1 server.go:119] FLAG: --domain="cluster.local."
I0508 08:44:37.561085 1 server.go:119] FLAG: --federations=""
I0508 08:44:37.561092 1 server.go:119] FLAG: --healthz-port="8081"
I0508 08:44:37.561098 1 server.go:119] FLAG: --initial-sync-timeout="1m0s"
I0508 08:44:37.561111 1 server.go:119] FLAG: --kube-master-url=""
I0508 08:44:37.561142 1 server.go:119] FLAG: --kubecfg-file="/config/kubeconfig"
I0508 08:44:37.561150 1 server.go:119] FLAG: --log-backtrace-at=":0"
I0508 08:44:37.561166 1 server.go:119] FLAG: --log-dir=""
I0508 08:44:37.561173 1 server.go:119] FLAG: --log-flush-frequency="5s"
I0508 08:44:37.561179 1 server.go:119] FLAG: --logtostderr="true"
I0508 08:44:37.561277 1 server.go:119] FLAG: --nameservers=""
I0508 08:44:37.561284 1 server.go:119] FLAG: --stderrthreshold="2"
I0508 08:44:37.561290 1 server.go:119] FLAG: --v="2"
I0508 08:44:37.561295 1 server.go:119] FLAG: --version="false"
I0508 08:44:37.561312 1 server.go:119] FLAG: --vmodule=""
I0508 08:44:37.561493 1 server.go:201] Starting SkyDNS server (0.0.0.0:10053)
I0508 08:44:37.561652 1 server.go:222] Skydns metrics not enabled
I0508 08:44:37.561726 1 dns.go:146] Starting endpointsController
I0508 08:44:37.561756 1 dns.go:149] Starting serviceController
I0508 08:44:37.562467 1 sync.go:177] Updated upstreamNameservers to [8.8.8.8 8.8.4.4]
I0508 08:44:37.561842 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0508 08:44:37.562612 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0508 08:44:38.062754 1 dns.go:170] Initialized services and endpoints from apiserver
I0508 08:44:38.062811 1 server.go:135] Setting up Healthz Handler (/readiness)
I0508 08:44:38.062823 1 server.go:140] Setting up cache handler (/cache)
I0508 08:44:38.062839 1 server.go:126] Status HTTP port 8081
==== END logs for container kubedns of pod kube-system/kube-dns-v20-7c556f89c5-4bg8m ====
==== START logs for container dnsmasq of pod kube-system/kube-dns-v20-7c556f89c5-4bg8m ====
I0504 11:46:36.291639 1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=-] true} /kube-dns-config 10000000000}
I0504 11:46:36.291907 1 sync.go:177] Updated upstreamNameservers to [8.8.8.8 8.8.4.4]
I0504 11:46:36.291956 1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=- --server 8.8.8.8 --server 8.8.4.4 --no-resolv]
I0504 11:46:37.216866 1 nanny.go:119]
W0504 11:46:37.216909 1 nanny.go:120] Got EOF from stdout
I0504 11:46:37.216955 1 nanny.go:116] dnsmasq[11]: started, version 2.78 cachesize 1000
I0504 11:46:37.216969 1 nanny.go:116] dnsmasq[11]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0504 11:46:37.216981 1 nanny.go:116] dnsmasq[11]: using nameserver 8.8.4.4#53
I0504 11:46:37.216987 1 nanny.go:116] dnsmasq[11]: using nameserver 8.8.8.8#53
I0504 11:46:37.216999 1 nanny.go:116] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0504 11:46:37.217005 1 nanny.go:116] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0504 11:46:37.217011 1 nanny.go:116] dnsmasq[11]: using nameserver 127.0.0.1#10053
I0504 11:46:37.217078 1 nanny.go:116] dnsmasq[11]: read /etc/hosts - 7 addresses
==== END logs for container dnsmasq of pod kube-system/kube-dns-v20-7c556f89c5-4bg8m ====
==== START logs for container healthz of pod kube-system/kube-dns-v20-7c556f89c5-4bg8m ====
==== END logs for container healthz of pod kube-system/kube-dns-v20-7c556f89c5-4bg8m ====
==== START logs for container kubedns of pod kube-system/kube-dns-v20-7c556f89c5-nhl6k ====
I0508 08:45:21.673568 1 dns.go:48] version: 1.14.8
I0508 08:45:21.678171 1 server.go:71] Using configuration read from directory: /kube-dns-config with period 10s
I0508 08:45:21.678225 1 server.go:119] FLAG: --alsologtostderr="false"
I0508 08:45:21.678235 1 server.go:119] FLAG: --config-dir="/kube-dns-config"
I0508 08:45:21.678251 1 server.go:119] FLAG: --config-map=""
I0508 08:45:21.678256 1 server.go:119] FLAG: --config-map-namespace="kube-system"
I0508 08:45:21.678262 1 server.go:119] FLAG: --config-period="10s"
I0508 08:45:21.678269 1 server.go:119] FLAG: --dns-bind-address="0.0.0.0"
I0508 08:45:21.678275 1 server.go:119] FLAG: --dns-port="10053"
I0508 08:45:21.678289 1 server.go:119] FLAG: --domain="cluster.local."
I0508 08:45:21.678297 1 server.go:119] FLAG: --federations=""
I0508 08:45:21.678304 1 server.go:119] FLAG: --healthz-port="8081"
I0508 08:45:21.678310 1 server.go:119] FLAG: --initial-sync-timeout="1m0s"
I0508 08:45:21.678323 1 server.go:119] FLAG: --kube-master-url=""
I0508 08:45:21.678330 1 server.go:119] FLAG: --kubecfg-file="/config/kubeconfig"
I0508 08:45:21.678336 1 server.go:119] FLAG: --log-backtrace-at=":0"
I0508 08:45:21.678351 1 server.go:119] FLAG: --log-dir=""
I0508 08:45:21.678357 1 server.go:119] FLAG: --log-flush-frequency="5s"
I0508 08:45:21.678363 1 server.go:119] FLAG: --logtostderr="true"
I0508 08:45:21.678369 1 server.go:119] FLAG: --nameservers=""
I0508 08:45:21.678374 1 server.go:119] FLAG: --stderrthreshold="2"
I0508 08:45:21.678387 1 server.go:119] FLAG: --v="2"
I0508 08:45:21.678393 1 server.go:119] FLAG: --version="false"
I0508 08:45:21.678408 1 server.go:119] FLAG: --vmodule=""
I0508 08:45:21.678630 1 server.go:201] Starting SkyDNS server (0.0.0.0:10053)
I0508 08:45:21.678695 1 server.go:222] Skydns metrics not enabled
I0508 08:45:21.678703 1 dns.go:146] Starting endpointsController
I0508 08:45:21.678708 1 dns.go:149] Starting serviceController
I0508 08:45:21.678798 1 sync.go:177] Updated upstreamNameservers to [8.8.8.8 8.8.4.4]
I0508 08:45:21.679430 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0508 08:45:21.679439 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0508 08:45:22.178969 1 dns.go:170] Initialized services and endpoints from apiserver
I0508 08:45:22.178992 1 server.go:135] Setting up Healthz Handler (/readiness)
I0508 08:45:22.179001 1 server.go:140] Setting up cache handler (/cache)
I0508 08:45:22.179015 1 server.go:126] Status HTTP port 8081
I0508 08:47:01.114531 1 server.go:160] Ignoring signal terminated (can only be terminated by SIGKILL)
==== END logs for container kubedns of pod kube-system/kube-dns-v20-7c556f89c5-nhl6k ====
==== START logs for container dnsmasq of pod kube-system/kube-dns-v20-7c556f89c5-nhl6k ====
I0504 12:01:28.529855 1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=-] true} /kube-dns-config 10000000000}
I0504 12:01:28.530121 1 sync.go:177] Updated upstreamNameservers to [8.8.8.8 8.8.4.4]
I0504 12:01:28.530174 1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --log-facility=- --server 8.8.8.8 --server 8.8.4.4 --no-resolv]
I0504 12:01:29.489200 1 nanny.go:119]
W0504 12:01:29.489470 1 nanny.go:120] Got EOF from stdout
I0504 12:01:29.489325 1 nanny.go:116] dnsmasq[10]: started, version 2.78 cachesize 1000
I0504 12:01:29.490781 1 nanny.go:116] dnsmasq[10]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0504 12:01:29.491161 1 nanny.go:116] dnsmasq[10]: using nameserver 8.8.4.4#53
I0504 12:01:29.491426 1 nanny.go:116] dnsmasq[10]: using nameserver 8.8.8.8#53
I0504 12:01:29.491454 1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0504 12:01:29.491492 1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0504 12:01:29.491507 1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053
I0504 12:01:29.491513 1 nanny.go:116] dnsmasq[10]: read /etc/hosts - 7 addresses
==== END logs for container dnsmasq of pod kube-system/kube-dns-v20-7c556f89c5-nhl6k ====
==== START logs for container healthz of pod kube-system/kube-dns-v20-7c556f89c5-nhl6k ====
2018/05/08 08:42:34 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-05-08 08:42:07.361242339 +0000 UTC, error exit status 1
2018/05/08 08:42:44 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-05-08 08:42:07.361242339 +0000 UTC, error exit status 1
2018/05/08 08:42:54 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-05-08 08:42:47.386510431 +0000 UTC, error exit status 1
2018/05/08 08:43:04 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2018-05-08 08:42:47.386510431 +0000 UTC, error exit status 1
==== END logs for container healthz of pod kube-system/kube-dns-v20-7c556f89c5-nhl6k ====

Got totally no idea how to resolve it. Any idea?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions