New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s: loop detected with 8.8.8.8 upstream and no systemd-resolved #2354
Comments
There is no response? Is coredns tripping itself up, by resending and seeing it twice and calling it a loop? |
AFAICT there is no response to HINFO? in tcpdump. Each time coredns starts I see 3 HINFO? going to 8.8.8.8, no response, and then coredns crashes. It's possible the network NS inside the pod is doing something stupid, but I'd need to exec into the coredns container to figure that out. Is there a coredns container image somewhere that is FROM debian or FROM alpine, so I can exec in and poke? |
[ Quoting <notifications@github.com> in "Re: [coredns/coredns] k8s: loop det..." ]
AFAICT there is no response to HINFO? in tcpdump. Each time coredns starts I see 3 HINFO? going to 8.8.8.8, no response, and then coredns crashes.
odd. For the record: without *loop* everything works?
It's possible the network NS inside the pod is doing something stupid, but I'd need to exec into the coredns container to figure that out. Is there a coredns container image somewhere that is FROM debian or FROM alpine, so I can exec in and poke?
Not an official one, we stopped shipping the alpine + dig container we did back
in the day.
|
also put log in your config; you'll see the incoming queries being logged. |
Okay, without loop coredns stays up, but nothing can reach DNS. I think I have a more severe networking problem in this cluster, and whatever's happening there is probably also breaking coredns. I'll go investigate now, sorry for the distraction :/ |
Ack. Still think this is odd from the loop plugin's perspective. I'll take
a look.
…On Fri, 30 Nov 2018, 08:14 Dave Anderson ***@***.*** wrote:
Okay, without loop coredns stays up, but nothing can reach DNS. I think I
have a more severe networking problem in this cluster, and whatever's
happening there is probably also breaking coredns. I'll go investigate now,
sorry for the distraction :/
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2354 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAVkW4pcKR2O5XLzK9XpVxsth-52Pbpvks5u0OjegaJpZM4Y7Klu>
.
|
Erroneous detection of loops when upstream is non-responsive during start up was fixed in 1.2.6. #2255 |
Okay, found the root cause. It's got nothing to do with CoreDNS, it's an iptables version mismatch between the host OS and the network containers (e.g. calico, kube-proxy, weave, ...). Evil details at projectcalico/calico#2322 . Closing this issue, as the loop startup behavior was improved in 1.2.6, so there's nothing more for coredns to do. |
This is with CoreDNS version 1.2.2, running as the DNS server for a kubeadm Kubernetes cluster. Corefile config is:
I'm using the Calico network addon, and on the host machines, /etc/resolv.conf statically points to 8.8.8.8:
In this deployment, coredns is crashlooping because of loop detection:
On the host, I see the HINFO? going from the coredns pod to 8.8.8.8, and no responses:
So... Based on this, I don't see any evidence of a DNS forwarding loop, but CoreDNS still seems to see one. I looked through the issue trackers for coredns, k8s and kubeadm, and all the issues I could find were because of /etc/resolv.conf pointing to systemd-resolved, which is not the case here. I also tried to exec into the coredns container to look at the universe inside the container, but it looks like the container doesn't have any rootfs, so I can't exec a shell :(
The problem also seems to be non-deterministic: sometimes, if I destroy the cluster and build a new one, coredns seems to be stable and non-looping. This smells like a race condition somewhere, possibly in cluster setup rather than coredns, but how to diagnose?
The only unusual piece of my environment is that this is a qemu virtualized cluster. If you're really lucky, you can reproduce this by cloning https://github.com/danderson/virtuakube , and running:
Virtuakube requires qemu, docker, guestfish, and vde_switch to work, and will consume ~2-3G to construct the VM base image for the cluster. It's also pretty alpha and nobody but me's ever run it, so it might not work at all :/. If it does work, the simple-cluster command might hang after the node joins the cluster, because virtuakube is waiting for deployments to become 100% available, and the coredns crashloop might prevent this. Even if the setup hangs, you can
ssh -p50000 root@localhost
(password "root") to connect to the k8s master VM, and -p50003 to connect to the node VM. You can also `export KUBECONFIG=/tmp/virtuakube*/cluster*/kubeconfig to get kubectl to talk to the virtual cluster and examine it that way.Any suggestions on where to go from here to debug? I'm happy to iterate with virtuakube if you can give me some ideas of what to explore, my main problem right now is I have no idea what to do :)
The text was updated successfully, but these errors were encountered: