Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS pod dies when trying to resolve? #1986

Closed
ivanovaleksandar opened this issue Jul 19, 2018 · 18 comments

Comments

Projects
None yet
4 participants
@ivanovaleksandar
Copy link

commented Jul 19, 2018

When using Kubernetes v1.11 and executing a simple dns call within the CoreDNS container (or any other container that tries to resolve a hostname), the pod dies immediately with no additional logs:

k logs -n kube-system coredns-78fcdf6894-4kv6n
.:53
2018/07/19 15:23:06 [INFO] CoreDNS-1.1.3
2018/07/19 15:23:06 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/07/19 15:23:06 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c

Simple tryout troubleshooting:

k exec -it coredns-78fcdf6894-4kv6n -n kube-system sh
/ # dig google.com
command terminated with exit code 137

Any idea what goes wrong here?

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

Same things happens when you query from a different pod?

Does it do the same if you query the pod directly via it's pod IP? e.g. dig @pod-ip google.com

Do you get a crash if you do an internal name look up? e.g. dig kubernetes.default.svc.cluster.local.

@ivanovaleksandar

This comment has been minimized.

Copy link
Author

commented Jul 19, 2018

Crashes when I query from another pod and querying with the CoreDNS pods IP (dig @pod-ip google.com).

But, I do not get a crash when I do the internal name look up:

k exec -it -n kube-system coredns-78fcdf6894-4kv6n sh
/ # dig kubernetes.default.svc.cluster.local.

; <<>> DiG 9.11.2-P1 <<>> kubernetes.default.svc.cluster.local.
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28438
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: c93b86c1f3221b89 (echoed)
;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 5	IN A	10.96.0.1

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Thu Jul 19 15:35:18 UTC 2018
;; MSG SIZE  rcvd: 129

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

Can you query your upstream DNS server? e.g. dig @upstream-dns-ip google.com

@miekg

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

@ivanovaleksandar

This comment has been minimized.

Copy link
Author

commented Jul 19, 2018

Hmmm... Same thing I guess.. Dies immediately.

dig @127.0.0.53 google.com
command terminated with exit code 137
dig @8.8.8.8 google.com

; <<>> DiG 9.11.2-P1 <<>> @8.8.8.8 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45514
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		145	IN	A	172.217.17.142

;; Query time: 1 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Jul 19 15:43:20 UTC 2018
;; MSG SIZE  rcvd: 55

and the full /var/log/syslog:

Jul 19 15:50:30 kubernetes-master kernel: [189136.073938] coredns invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=992
Jul 19 15:50:30 kubernetes-master kernel: [189136.073939] coredns cpuset=867fd56120b2e88f86d1e39c22b9bd4f153dbfb6820bf8fa3c1817607f8d7357 mems_allowed=0
Jul 19 15:50:30 kubernetes-master kernel: [189136.073946] CPU: 0 PID: 29170 Comm: coredns Not tainted 4.15.0-23-generic #25-Ubuntu
Jul 19 15:50:30 kubernetes-master kernel: [189136.073947] Hardware name: DigitalOcean Droplet, BIOS 20171212 12/12/2017
Jul 19 15:50:30 kubernetes-master kernel: [189136.073947] Call Trace:
Jul 19 15:50:30 kubernetes-master kernel: [189136.073957]  dump_stack+0x63/0x8b
Jul 19 15:50:30 kubernetes-master kernel: [189136.073963]  dump_header+0x71/0x285
Jul 19 15:50:30 kubernetes-master kernel: [189136.073966]  oom_kill_process+0x220/0x440
Jul 19 15:50:30 kubernetes-master kernel: [189136.073968]  out_of_memory+0x2d1/0x4f0
Jul 19 15:50:30 kubernetes-master kernel: [189136.073971]  mem_cgroup_out_of_memory+0x4b/0x80
Jul 19 15:50:30 kubernetes-master kernel: [189136.073974]  mem_cgroup_oom_synchronize+0x2e8/0x320
Jul 19 15:50:30 kubernetes-master kernel: [189136.073976]  ? mem_cgroup_css_online+0x40/0x40
Jul 19 15:50:30 kubernetes-master kernel: [189136.073979]  pagefault_out_of_memory+0x36/0x7b
Jul 19 15:50:30 kubernetes-master kernel: [189136.073983]  mm_fault_error+0x90/0x180
Jul 19 15:50:30 kubernetes-master kernel: [189136.073985]  __do_page_fault+0x4a5/0x4d0
Jul 19 15:50:30 kubernetes-master kernel: [189136.073990]  ? aa_sock_opt_perm+0x1c/0x30
Jul 19 15:50:30 kubernetes-master kernel: [189136.073992]  do_page_fault+0x2e/0xe0
Jul 19 15:50:30 kubernetes-master kernel: [189136.073996]  ? async_page_fault+0x2f/0x50
Jul 19 15:50:30 kubernetes-master kernel: [189136.074000]  do_async_page_fault+0x51/0x80
Jul 19 15:50:30 kubernetes-master kernel: [189136.074002]  async_page_fault+0x45/0x50
Jul 19 15:50:30 kubernetes-master kernel: [189136.074005] RIP: 0033:0x45ad5a
Jul 19 15:50:30 kubernetes-master kernel: [189136.074006] RSP: 002b:000000c42001dc88 EFLAGS: 00010202
Jul 19 15:50:30 kubernetes-master kernel: [189136.074009] RAX: 0000000000000080 RBX: 0000000000001be0 RCX: 000000c428d7c000
Jul 19 15:50:30 kubernetes-master kernel: [189136.074010] RDX: 000000c428d7a000 RSI: 000000c428d7a420 RDI: 000000c42d3c63a0
Jul 19 15:50:30 kubernetes-master kernel: [189136.074011] RBP: 000000c42001de38 R08: 000000c42d38c900 R09: 0000000000000000
Jul 19 15:50:30 kubernetes-master kernel: [189136.074012] R10: 000000c42d3c6388 R11: 0000000000000018 R12: 0000000000000000
Jul 19 15:50:30 kubernetes-master kernel: [189136.074014] R13: 00000000000000f7 R14: 0000000000000077 R15: 0000000000000000
Jul 19 15:50:30 kubernetes-master kernel: [189136.074016] Task in /kubepods/burstable/podd932ca48-8b58-11e8-a796-7625bd182864/867fd56120b2e88f86d1e39c22b9bd4f153dbfb6820bf8fa3c1817607f8d7357 killed as a result of limit of /kubepods/burstable/podd932ca48-8b58-11e8-a796-7625bd182864
Jul 19 15:50:30 kubernetes-master kernel: [189136.074023] memory: usage 174080kB, limit 174080kB, failcnt 2332
Jul 19 15:50:30 kubernetes-master kernel: [189136.074025] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Jul 19 15:50:30 kubernetes-master kernel: [189136.074026] kmem: usage 15140kB, limit 9007199254740988kB, failcnt 0
Jul 19 15:50:30 kubernetes-master kernel: [189136.074027] Memory cgroup stats for /kubepods/burstable/podd932ca48-8b58-11e8-a796-7625bd182864: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 19 15:50:30 kubernetes-master kernel: [189136.074039] Memory cgroup stats for /kubepods/burstable/podd932ca48-8b58-11e8-a796-7625bd182864/b492f1a750ff8c8484b26daca99543eb2f9bccbc28de3cc22198f69cfdd6126a: cache:0KB rss:44KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:44KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 19 15:50:30 kubernetes-master kernel: [189136.074052] Memory cgroup stats for /kubepods/burstable/podd932ca48-8b58-11e8-a796-7625bd182864/825ad93a46212dc692ddd2dba9099f69e375fddcde2cb619f91fc4656906155c: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 19 15:50:30 kubernetes-master kernel: [189136.074064] Memory cgroup stats for /kubepods/burstable/podd932ca48-8b58-11e8-a796-7625bd182864/867fd56120b2e88f86d1e39c22b9bd4f153dbfb6820bf8fa3c1817607f8d7357: cache:0KB rss:158896KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:158820KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 19 15:50:30 kubernetes-master kernel: [189136.074076] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jul 19 15:50:30 kubernetes-master kernel: [189136.074168] [12712]     0 12712      256        1    32768        0          -998 pause
Jul 19 15:50:30 kubernetes-master kernel: [189136.074184] [29096]     0 29096    66016    44328   577536        0           992 coredns
Jul 19 15:50:30 kubernetes-master kernel: [189136.074188] [29192]     0 29192      386        1    45056        0           992 sh
Jul 19 15:50:30 kubernetes-master kernel: [189136.074190] [29261]     0 29261     7147      496    98304        0           992 dig
Jul 19 15:50:30 kubernetes-master kernel: [189136.074191] Memory cgroup out of memory: Kill process 29096 (coredns) score 2001 or sacrifice child
Jul 19 15:50:30 kubernetes-master kernel: [189136.076939] Killed process 29096 (coredns) total-vm:264064kB, anon-rss:156668kB, file-rss:20644kB, shmem-rss:0kB
Jul 19 15:50:30 kubernetes-master kernel: [189136.085174] oom_reaper: reaped process 29096 (coredns), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Jul 19 15:50:30 kubernetes-master dockerd[1050]: time="2018-07-19T15:50:30Z" level=info msg="shim reaped" id=867fd56120b2e88f86d1e39c22b9bd4f153dbfb6820bf8fa3c1817607f8d7357 module="containerd/tasks"
Jul 19 15:50:30 kubernetes-master dockerd[1050]: time="2018-07-19T15:50:30.893358553Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jul 19 15:50:31 kubernetes-master kubelet[9505]: I0719 15:50:31.523676    9505 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-w9qr8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 19 15:50:31 kubernetes-master kubelet[9505]: I0719 15:50:31.524631    9505 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"
Jul 19 15:50:31 kubernetes-master kubelet[9505]: I0719 15:50:31.524805    9505 kuberuntime_manager.go:767] Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)
Jul 19 15:50:31 kubernetes-master kubelet[9505]: E0719 15:50:31.524859    9505 pod_workers.go:186] Error syncing pod d932ca48-8b58-11e8-a796-7625bd182864 ("coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"
Jul 19 15:50:32 kubernetes-master /usr/bin/topbeat[995]: single.go:126: Connecting error publishing events (retrying): dial tcp 178.237.33.86:5002: i/o timeout
Jul 19 15:50:32 kubernetes-master /usr/bin/topbeat[995]: single.go:154: send fail
Jul 19 15:50:32 kubernetes-master /usr/bin/topbeat[995]: single.go:161: backoff retry: 1m0s
Jul 19 15:50:32 kubernetes-master kubelet[9505]: I0719 15:50:32.545852    9505 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-w9qr8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 19 15:50:32 kubernetes-master kubelet[9505]: I0719 15:50:32.546484    9505 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"
Jul 19 15:50:32 kubernetes-master kubelet[9505]: I0719 15:50:32.546757    9505 kuberuntime_manager.go:767] Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)
Jul 19 15:50:32 kubernetes-master kubelet[9505]: E0719 15:50:32.546901    9505 pod_workers.go:186] Error syncing pod d932ca48-8b58-11e8-a796-7625bd182864 ("coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"
Jul 19 15:50:47 kubernetes-master kubelet[9505]: I0719 15:50:47.575541    9505 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-w9qr8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 19 15:50:47 kubernetes-master kubelet[9505]: I0719 15:50:47.575712    9505 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"
Jul 19 15:50:47 kubernetes-master kubelet[9505]: I0719 15:50:47.575882    9505 kuberuntime_manager.go:767] Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)
Jul 19 15:50:47 kubernetes-master kubelet[9505]: E0719 15:50:47.575919    9505 pod_workers.go:186] Error syncing pod d932ca48-8b58-11e8-a796-7625bd182864 ("coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"
Jul 19 15:51:01 kubernetes-master kubelet[9505]: I0719 15:51:01.578317    9505 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-w9qr8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 19 15:51:01 kubernetes-master kubelet[9505]: I0719 15:51:01.578488    9505 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"
Jul 19 15:51:01 kubernetes-master kubelet[9505]: I0719 15:51:01.578670    9505 kuberuntime_manager.go:767] Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)
Jul 19 15:51:01 kubernetes-master kubelet[9505]: E0719 15:51:01.578706    9505 pod_workers.go:186] Error syncing pod d932ca48-8b58-11e8-a796-7625bd182864 ("coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 40s restarting failed container=coredns pod=coredns-78fcdf6894-4kv6n_kube-system(d932ca48-8b58-11e8-a796-7625bd182864)"

The strange thing is the it is an OOM killer, but the (master) node is under no memory or cpu pressure..

@miekg

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

Ah ok... 127.0.0.53, from the coredns pod perspective, is "itself" ... not the host node DNS resolved process. So basically, its looping forever on itself.

You will need to change your upstream to the actuall IP of your upstream, not the host node's local resolved process.

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

You will need to change your upstream to the actuall IP of your upstream, not the host node's local resolved process.

In other words, in your coredns Corefile, change the proxy lines to point to the real ip of the upstream servers, not 127.0.0.53...

@miekg

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

We cant fix the core issue in coredns, but we can make coredns "fail better" here. For example we could make proxy and forward return a SERVFAIL and log an error/warning before trying to forward requests to any 127.0.0.0/8 address. We'd also want to introduce an option to these plugins to turn this off, thus allowing local forwarding (e.g. "allow local").

... or instead we could detect it at Corefile read time, when the proxy / forward plugins read their configuration, and log errors / die before starting.

@miekg

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

So that approach just papers over the problem.

I understand that. It's a quick patch to this specific problem, which has affected several people. This is kind of how this conversation went the last time, which resulted in no fix at all.

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

In the interim of finding the perfect fix, how about we just warn in the logs if we see a local address in proxy/forward, at config load time... No operational behavior change, just a warning in the logs. This way, the failure is less mysterious.

@chrisohaver

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

@ivanovaleksandar

This comment has been minimized.

Copy link
Author

commented Jul 20, 2018

Like @chrisohaver suggested, I have resolved this with removing the loopback and referenced a different file in kubelet config --resolv-conf=/run/systemd/resolve/resolv.conf. As I remember, the DNS pod picks up the /etc/resolv.conf form the host.

Anyways, a better logging to indicate using of loopback should be avoided, because if I (and kubernetes deployment yaml) did not set the resource limit quotas, the node was going to crash as well.

Thanks guys!

Should I close this issue (because there are other with the same core problem floating around)?

@miekg

This comment has been minimized.

Copy link
Member

commented Jul 20, 2018

@ivanovaleksandar

This comment has been minimized.

Copy link
Author

commented Jul 20, 2018

Perfect!
Closing now..

@tathagatk22

This comment has been minimized.

Copy link

commented Nov 14, 2018

Like @chrisohaver suggested, I have resolved this with removing the loopback and referenced a different file in kubelet config --resolv-conf=/run/systemd/resolve/resolv.conf. As I remember, the DNS pod picks up the /etc/resolv.conf form the host.

Anyways, a better logging to indicate using of loopback should be avoided, because if I (and kubernetes deployment yaml) did not set the resource limit quotas, the node was going to crash as well.

Thanks guys!

Should I close this issue (because there are other with the same core problem floating around)?

can you please tell me how did you remove loopback and referenced a different file in kubelet config.

Can you please elaborate this process for newbies... ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.