Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin [loop] not work with systemd-resolved running #2087

Closed
mritd opened this Issue Sep 6, 2018 · 27 comments

Comments

Projects
None yet
@mritd
Copy link

mritd commented Sep 6, 2018

When systemd-resolved is running, nameserver in /etc/resolved.conf default to 127.0.0.53;
The plugin loop detects two DNS query, and finally coredns fails to start.

Environment:

  • Ubuntu 18.04.1
  • Kubernetes 1.11.2
  • CoreDNS 1.2.2

Error log:

docker1.node ➜  kubectl logs coredns-55f86bf584-7sbtj -n kube-system
.:53
2018/09/06 13:02:45 [INFO] CoreDNS-1.2.2
2018/09/06 13:02:45 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/09/06 13:02:45 [INFO] plugin/reload: Running configuration MD5 = 86e5222d14b17c8b907970f002198e96
2018/09/06 13:02:45 [FATAL] plugin/loop: Seen "HINFO IN 2050421060481615995.5620656063561519376." more than twice, loop detected

Deploy with deploy.sh

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Sep 6, 2018

This is working as intended. The loop plugin has detected a forwarding loop, caused by systemd-resolved. If CoreDNS didn't exit, it would loop "forever" on the first upstream query it receives and get OOM killed.

The best fix is to add a flag to kubelet, to let it know that it should use the original resolv.conf....
--resolv-conf=/run/systemd/resolve/resolv.conf, then restart coredns pods

@mritd

This comment has been minimized.

Copy link
Author

mritd commented Sep 6, 2018

Thanks for your answer, this is a good idea. (I just solved it by stopping systemd-resolved, stupid me 😂).

@mritd mritd closed this Sep 6, 2018

@miekg

This comment has been minimized.

Copy link
Member

miekg commented Sep 8, 2018

@avaikararkin

This comment has been minimized.

Copy link

avaikararkin commented Sep 28, 2018

I am facing same issue:

[root@faas-cent1 ~]# kubectl logs coredns-7f4b9fccc6-6bg7s -n kube-system
.:53
2018/09/28 09:24:50 [INFO] CoreDNS-1.2.2
2018/09/28 09:24:50 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/09/28 09:24:50 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/09/28 09:24:56 [FATAL] plugin/loop: Seen "HINFO IN 6010196033322906137.8653621564656081764." more than twice, loop detected

This is on centOS7 & no, my /etc/resolv.conf does not have a 127... entry.
It is this:

[root@faas-cent1 ~]# cat /etc/resolv.conf

Generated by NetworkManager

nameserver 10.148.20.5
[root@faas-cent1 ~]#

[root@faas-cent1 ~]# docker --version
Docker version 18.06.1-ce, build e68fc7a
[root@faas-cent1 ~]#

[root@faas-cent1 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T17:02:38Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
[root@faas-cent1 ~]#

[root@faas-cent1 ~]# uname -a
Linux faas-cent1 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@faas-cent1 ~]#

I don't have a file /run/systemd/resolve/resolv.conf on my system to try the workaround
dnsmasq seems to be running on the system though, would that be causing this issue?

@Asisranjan

This comment has been minimized.

Copy link

Asisranjan commented Oct 4, 2018

I am getting same error too.
.:53
2018/10/04 12:18:47 [INFO] CoreDNS-1.2.2
2018/10/04 12:18:47 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/10/04 12:18:47 [INFO] plugin/reload: Running configuration MD5 = 486384b491cef6cb69c1f57a02087363
2018/10/04 12:18:53 [FATAL] plugin/loop: Seen "HINFO IN 7533478916006617590.6696743068873483726." more than twice, loop detected

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Oct 4, 2018

This is the loop detection detecting a loop, and exiting. This is the intended behavior, unless of course there is no loop.

If you doubt there is a loop, you may try removing the loop detection (remove loop from the coredns configuration), and then test DNS resolution from pods (i.e. test resolution to external domains from the command line of a pod running in the cluster).

@johnbelamaric

This comment has been minimized.

Copy link
Member

johnbelamaric commented Oct 4, 2018

@miekg

This comment has been minimized.

Copy link
Member

miekg commented Oct 4, 2018

@avaikararkin

This comment has been minimized.

Copy link

avaikararkin commented Oct 8, 2018

In my case, it seemed to be the problem with IPv6, the VM I had created had IPv6 turned on by default and there was an entry for the same in /etc/resolv. I turned IPv6 off and removed the entries for ::1 and things seem to be working.

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Oct 12, 2018

Seems like the error message needs to be clearer. It should say something like ...

LOL - i just saw this now, after I submitted a PR for it.

@johnbelamaric

This comment has been minimized.

Copy link
Member

johnbelamaric commented Oct 12, 2018

No problem. @avaikararkin you could add details to the README Troubleshooting section...

@ahalimkara

This comment has been minimized.

Copy link

ahalimkara commented Oct 20, 2018

Removing loop plugin is worked for me, is there any side effect of removing loop from the coredns configuration?

If you doubt there is a loop, you may try removing the loop detection (remove loop from the coredns configuration), and then test DNS resolution from pods (i.e. test resolution to external domains from the command line of a pod running in the cluster).

@miekg

This comment has been minimized.

Copy link
Member

miekg commented Oct 20, 2018

@spitfire88

This comment has been minimized.

Copy link

spitfire88 commented Oct 23, 2018

remove loop from the coredns configuration

How do you do that?

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Oct 23, 2018

@spitfire88, remove loop from the Corefile (in k8s, the Corefile is in the coredns configmap)

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Oct 23, 2018

e.g.

kubectl -n kube-system edit configmap coredns

Then delete the line that says loop, and save the configuration. It can take several minutes for k8s to propagate the config change to the coredns pods.

@zhuziying

This comment has been minimized.

Copy link

zhuziying commented Nov 7, 2018

hi,chrisohaver.what's meaning of loop in Corefile

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Nov 7, 2018

@zhuziying

This comment has been minimized.

Copy link

zhuziying commented Nov 8, 2018

3q,chrisohaver

@SiddheshRane

This comment has been minimized.

Copy link

SiddheshRane commented Nov 12, 2018

I recently faced this problem. It is not specific to systemd-resolve. On Ubuntu 16.04 which does not have systemd-resolve, the resolve.conf contains localhost dns server.
My question is why don't we simply ignore any ip which points to localhost, like 127.0.0.1, ::1 etc.
Right now I need to use fragile hacks like pointing to /var/run/systemd/resolve/resolv.conf.

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Nov 12, 2018

@SiddheshRane, I think in 16.04, DNS is managed by NetworkManager, which can essentially do the same thing as systemd-resolved as it pertains to DNS; it can run a local DNS cache (dnsmasq).

Skipping over loopbacks such as 127.0.0.1 would not solve the larger problem because these configurations typically only contain a local address in /etc/resolv.conf. Skipping it would still result in non-functional DNS for upstream queries, because no upstream server would be configured. Functionally, the correct resolv.conf file to use the one that contains the actual upstream servers used by the host.

In the context of Kubernetes, the best fix is to properly configure kubelet, so it can pass the correct resolv.conf file to all Pods using the Default DNS policy.

@bwillcox

This comment has been minimized.

Copy link

bwillcox commented Nov 19, 2018

I've tried the extra config with and without quotes on the parameter, and it prevents the kubelet from starting, i'm sure it's a newbie mistake and apologies if this isn't the right place for this
sudo minikube start --vm-driver=none --extra-config=kubelet.ResolverConfig="/var/run/systemd/resolve/resolv.conf"

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Nov 19, 2018

Probably best to ask in minikube repo, but ... that syntax seems correct , from what i just read.
Do kubelet logs reveal any hints?

@bwillcox

This comment has been minimized.

Copy link

bwillcox commented Nov 19, 2018

this from syslog; looks like it's not being passed as expected (maybe my expectations, set by https://kubernetes.io/docs/setup/minikube/#quickstart, are incorrect)
Nov 19 16:10:53 ubuntu kubelet[16413]: F1119 16:10:53.060353 16413 server.go:145] unknown flag: --ResolverConfig

this gave me an idea to try this...
ubuntu % sudo minikube start --vm-driver=none --extra-config=kubelet.resolv-conf=/var/run/systemd/resolve/resolv.conf

and that seems to have worked; coredns and kube-dns now much happier

thanks for the nudge...

@chrisohaver

This comment has been minimized.

Copy link
Member

chrisohaver commented Nov 19, 2018

maybe my expectations, set by https://kubernetes.io/docs/setup/minikube/#quickstart, are incorrect

Yes, it seems those docs are incorrect.

@utkuozdemir

This comment has been minimized.

Copy link

utkuozdemir commented Nov 27, 2018

I shared the solution that has worked for me here: https://stackoverflow.com/a/53414041/1005102

@GOOD21

This comment has been minimized.

Copy link

GOOD21 commented Dec 4, 2018

@chrisohaver Is there anyway to disable the loop when I init the k8s?
such as some configuration for "kubeadm init".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.