Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin [loop] not work with systemd-resolved running #2087

Closed
mritd opened this issue Sep 6, 2018 · 32 comments
Closed

Plugin [loop] not work with systemd-resolved running #2087

mritd opened this issue Sep 6, 2018 · 32 comments

Comments

@mritd
Copy link

@mritd mritd commented Sep 6, 2018

When systemd-resolved is running, nameserver in /etc/resolved.conf default to 127.0.0.53;
The plugin loop detects two DNS query, and finally coredns fails to start.

Environment:

  • Ubuntu 18.04.1
  • Kubernetes 1.11.2
  • CoreDNS 1.2.2

Error log:

docker1.node ➜  kubectl logs coredns-55f86bf584-7sbtj -n kube-system
.:53
2018/09/06 13:02:45 [INFO] CoreDNS-1.2.2
2018/09/06 13:02:45 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/09/06 13:02:45 [INFO] plugin/reload: Running configuration MD5 = 86e5222d14b17c8b907970f002198e96
2018/09/06 13:02:45 [FATAL] plugin/loop: Seen "HINFO IN 2050421060481615995.5620656063561519376." more than twice, loop detected

Deploy with deploy.sh

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Sep 6, 2018

This is working as intended. The loop plugin has detected a forwarding loop, caused by systemd-resolved. If CoreDNS didn't exit, it would loop "forever" on the first upstream query it receives and get OOM killed.

The best fix is to add a flag to kubelet, to let it know that it should use the original resolv.conf....
--resolv-conf=/run/systemd/resolve/resolv.conf, then restart coredns pods

@mritd
Copy link
Author

@mritd mritd commented Sep 6, 2018

Thanks for your answer, this is a good idea. (I just solved it by stopping systemd-resolved, stupid me 😂).

@mritd mritd closed this Sep 6, 2018
@miekg
Copy link
Member

@miekg miekg commented Sep 8, 2018

@avaikararkin
Copy link

@avaikararkin avaikararkin commented Sep 28, 2018

I am facing same issue:

[root@faas-cent1 ~]# kubectl logs coredns-7f4b9fccc6-6bg7s -n kube-system
.:53
2018/09/28 09:24:50 [INFO] CoreDNS-1.2.2
2018/09/28 09:24:50 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/09/28 09:24:50 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/09/28 09:24:56 [FATAL] plugin/loop: Seen "HINFO IN 6010196033322906137.8653621564656081764." more than twice, loop detected

This is on centOS7 & no, my /etc/resolv.conf does not have a 127... entry.
It is this:

[root@faas-cent1 ~]# cat /etc/resolv.conf

Generated by NetworkManager

nameserver 10.148.20.5
[root@faas-cent1 ~]#

[root@faas-cent1 ~]# docker --version
Docker version 18.06.1-ce, build e68fc7a
[root@faas-cent1 ~]#

[root@faas-cent1 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T17:02:38Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
[root@faas-cent1 ~]#

[root@faas-cent1 ~]# uname -a
Linux faas-cent1 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@faas-cent1 ~]#

I don't have a file /run/systemd/resolve/resolv.conf on my system to try the workaround
dnsmasq seems to be running on the system though, would that be causing this issue?

@Asisranjan
Copy link

@Asisranjan Asisranjan commented Oct 4, 2018

I am getting same error too.
.:53
2018/10/04 12:18:47 [INFO] CoreDNS-1.2.2
2018/10/04 12:18:47 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/10/04 12:18:47 [INFO] plugin/reload: Running configuration MD5 = 486384b491cef6cb69c1f57a02087363
2018/10/04 12:18:53 [FATAL] plugin/loop: Seen "HINFO IN 7533478916006617590.6696743068873483726." more than twice, loop detected

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Oct 4, 2018

This is the loop detection detecting a loop, and exiting. This is the intended behavior, unless of course there is no loop.

If you doubt there is a loop, you may try removing the loop detection (remove loop from the coredns configuration), and then test DNS resolution from pods (i.e. test resolution to external domains from the command line of a pod running in the cluster).

@johnbelamaric
Copy link
Member

@johnbelamaric johnbelamaric commented Oct 4, 2018

@miekg
Copy link
Member

@miekg miekg commented Oct 4, 2018

@avaikararkin
Copy link

@avaikararkin avaikararkin commented Oct 8, 2018

In my case, it seemed to be the problem with IPv6, the VM I had created had IPv6 turned on by default and there was an entry for the same in /etc/resolv. I turned IPv6 off and removed the entries for ::1 and things seem to be working.

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Oct 12, 2018

Seems like the error message needs to be clearer. It should say something like ...

LOL - i just saw this now, after I submitted a PR for it.

@johnbelamaric
Copy link
Member

@johnbelamaric johnbelamaric commented Oct 12, 2018

No problem. @avaikararkin you could add details to the README Troubleshooting section...

@ahalimkara
Copy link

@ahalimkara ahalimkara commented Oct 20, 2018

Removing loop plugin is worked for me, is there any side effect of removing loop from the coredns configuration?

If you doubt there is a loop, you may try removing the loop detection (remove loop from the coredns configuration), and then test DNS resolution from pods (i.e. test resolution to external domains from the command line of a pod running in the cluster).

@miekg
Copy link
Member

@miekg miekg commented Oct 20, 2018

@spitfire88
Copy link

@spitfire88 spitfire88 commented Oct 23, 2018

remove loop from the coredns configuration

How do you do that?

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Oct 23, 2018

@spitfire88, remove loop from the Corefile (in k8s, the Corefile is in the coredns configmap)

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Oct 23, 2018

e.g.

kubectl -n kube-system edit configmap coredns

Then delete the line that says loop, and save the configuration. It can take several minutes for k8s to propagate the config change to the coredns pods.

@zhuziying
Copy link

@zhuziying zhuziying commented Nov 7, 2018

hi,chrisohaver.what's meaning of loop in Corefile

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Nov 7, 2018

@zhuziying
Copy link

@zhuziying zhuziying commented Nov 8, 2018

3q,chrisohaver

@SiddheshRane
Copy link

@SiddheshRane SiddheshRane commented Nov 12, 2018

I recently faced this problem. It is not specific to systemd-resolve. On Ubuntu 16.04 which does not have systemd-resolve, the resolve.conf contains localhost dns server.
My question is why don't we simply ignore any ip which points to localhost, like 127.0.0.1, ::1 etc.
Right now I need to use fragile hacks like pointing to /var/run/systemd/resolve/resolv.conf.

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Nov 12, 2018

@SiddheshRane, I think in 16.04, DNS is managed by NetworkManager, which can essentially do the same thing as systemd-resolved as it pertains to DNS; it can run a local DNS cache (dnsmasq).

Skipping over loopbacks such as 127.0.0.1 would not solve the larger problem because these configurations typically only contain a local address in /etc/resolv.conf. Skipping it would still result in non-functional DNS for upstream queries, because no upstream server would be configured. Functionally, the correct resolv.conf file to use the one that contains the actual upstream servers used by the host.

In the context of Kubernetes, the best fix is to properly configure kubelet, so it can pass the correct resolv.conf file to all Pods using the Default DNS policy.

@bwillcox
Copy link

@bwillcox bwillcox commented Nov 19, 2018

I've tried the extra config with and without quotes on the parameter, and it prevents the kubelet from starting, i'm sure it's a newbie mistake and apologies if this isn't the right place for this
sudo minikube start --vm-driver=none --extra-config=kubelet.ResolverConfig="/var/run/systemd/resolve/resolv.conf"

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Nov 19, 2018

Probably best to ask in minikube repo, but ... that syntax seems correct , from what i just read.
Do kubelet logs reveal any hints?

@bwillcox
Copy link

@bwillcox bwillcox commented Nov 19, 2018

this from syslog; looks like it's not being passed as expected (maybe my expectations, set by https://kubernetes.io/docs/setup/minikube/#quickstart, are incorrect)
Nov 19 16:10:53 ubuntu kubelet[16413]: F1119 16:10:53.060353 16413 server.go:145] unknown flag: --ResolverConfig

this gave me an idea to try this...
ubuntu % sudo minikube start --vm-driver=none --extra-config=kubelet.resolv-conf=/var/run/systemd/resolve/resolv.conf

and that seems to have worked; coredns and kube-dns now much happier

thanks for the nudge...

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Nov 19, 2018

maybe my expectations, set by https://kubernetes.io/docs/setup/minikube/#quickstart, are incorrect

Yes, it seems those docs are incorrect.

@utkuozdemir
Copy link

@utkuozdemir utkuozdemir commented Nov 27, 2018

I shared the solution that has worked for me here: https://stackoverflow.com/a/53414041/1005102

@GOOD21
Copy link

@GOOD21 GOOD21 commented Dec 4, 2018

@chrisohaver Is there anyway to disable the loop when I init the k8s?
such as some configuration for "kubeadm init".

@csuxh
Copy link

@csuxh csuxh commented Jun 4, 2019

Hi Guys,
I removed the loop but still get the same error, how can I solve this:
E0604 06:56:14.691993 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Endpoints: Get https://10.254.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: x509: certificate is valid for 127.0.0.1, 10.211.55.20, 10.211.55.21, 10.211.55.22, not 10.254.0.1
E0604 06:56:14.743608 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:320: Failed to list *v1.Namespace: Get https://10.254.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: x509: certificate is valid for 127.0.0.1, 10.211.55.20, 10.211.55.21, 10.211.55.22, not 10.254.0.1

@csuxh
Copy link

@csuxh csuxh commented Jun 4, 2019

The kubectl describe is like this:
Normal Scheduled 8m16s default-scheduler Successfully assigned kube-system/coredns-784f8f5b7b-c9nc7 to kube-node3
Warning Unhealthy 6m32s (x5 over 7m12s) kubelet, kube-node3 Liveness probe failed: HTTP probe failed with statuscode: 503
Normal Killing 6m32s kubelet, kube-node3 Container coredns failed liveness probe, will be restarted
Normal Pulled 6m2s (x2 over 8m15s) kubelet, kube-node3 Container image "coredns/coredns:1.1.3" already present on machine
Normal Created 6m2s (x2 over 8m15s) kubelet, kube-node3 Created container coredns
Normal Started 6m1s (x2 over 8m14s) kubelet, kube-node3 Started container coredns
Warning Unhealthy 3m7s (x31 over 8m7s) kubelet, kube-node3 Readiness probe failed: HTTP probe failed with statuscode: 503

@mritd
Copy link
Author

@mritd mritd commented Jun 4, 2019

@csuxh This does not seem to be a problem with coredns. The reason for this problem is that the certificate used by your API Server does not contain the IP of 10.254.0.1.

@Ramane19
Copy link

@Ramane19 Ramane19 commented Jun 17, 2019

I had this same issue after deleting the loop.

Can someone help me with this?

kubectl logs coredns-fb8b8dccf-j6mjl -n kube-system
Error from server (BadRequest): container "coredns" in pod "coredns-fb8b8dccf-j6mjl" is waiting to start: ContainerCreating
master@master:~$ sudo kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-fb8b8dccf-j6mjl 0/1 ContainerCreating 0 7m31s
kube-system coredns-fb8b8dccf-lst4v 0/1 ContainerCreating 0 7m31s
kube-system etcd-master.testcluster.com 1/1 Running 0 25m
kube-system kube-apiserver-master.testcluster.com 1/1 Running

@chrisohaver
Copy link
Member

@chrisohaver chrisohaver commented Jun 17, 2019

@Ramane19, your pods are stuck in "ContainerCreating", which is a different issue.

@coredns coredns locked as resolved and limited conversation to collaborators Jun 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet