Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS misbehaving for outside Kuberentes queries #2289

Closed
Tabrizian opened this issue Nov 12, 2018 · 27 comments
Closed

CoreDNS misbehaving for outside Kuberentes queries #2289

Tabrizian opened this issue Nov 12, 2018 · 27 comments

Comments

@Tabrizian
Copy link

CoreDNS fails to respond to queries relating to the outside Kuberntes cluster. I have enabled logging in my configuration.

10.244.1.155:40796 - [12/Nov/2018:08:39:03 +0000] 19104 "A IN acme-v02.api.letsencrypt.org.ingress.svc.cluster.local. udp 72 false 512" NXDOMAIN qr,rd,ra 165 0.000410577s
10.244.1.155:51013 - [12/Nov/2018:08:39:03 +0000] 29141 "AAAA IN acme-v02.api.letsencrypt.org.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,rd,ra 157 0.000425219s
10.244.1.155:58880 - [12/Nov/2018:08:39:03 +0000] 35055 "A IN acme-v02.api.letsencrypt.org.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,rd,ra 157 0.000842194s
10.244.1.155:53611 - [12/Nov/2018:08:39:03 +0000] 279 "AAAA IN acme-v02.api.letsencrypt.org.cluster.local. udp 60 false 512" NXDOMAIN qr,rd,ra 153 0.000158279s
10.244.1.155:57644 - [12/Nov/2018:08:39:03 +0000] 26214 "A IN acme-v02.api.letsencrypt.org.cluster.local. udp 60 false 512" NXDOMAIN qr,rd,ra 153 0.000582892s
10.244.1.155:53330 - [12/Nov/2018:08:39:03 +0000] 23888 "AAAA IN acme-v02.api.letsencrypt.org.local. udp 52 false 512" NXDOMAIN qr,rd,ra 52 0.001675627s
10.244.1.155:41214 - [12/Nov/2018:08:39:03 +0000] 49764 "A IN acme-v02.api.letsencrypt.org.local. udp 52 false 512" NXDOMAIN qr,rd,ra 52 0.002842832s
10.244.1.155:58355 - [12/Nov/2018:08:39:03 +0000] 589 "A IN acme-v02.api.letsencrypt.org. udp 46 false 512" SERVFAIL qr,rd 46 0.008358253s2018/11/12 08:39:03
[ERROR] 0 acme-v02.api.letsencrypt.org. A: unreachable backend: no upstream host
@Tabrizian
Copy link
Author

Tabrizian commented Nov 13, 2018

It is related but it is a little different. dig works but nslookup fails with the same Error. Can it be related to ipv6 assigned to pods? I'm using flannel with default configuration and installed Kubernetes kubeadm.

DNS Tools nslookup Output

dnstools# nslookup acme-v01.api.letsencrypt.org 10.96.0.10
Server:         10.96.0.10
Address:        10.96.0.10#53
** server can't find acme-v01.api.letsencrypt.org: SERVFAIL

DNS Tools dig Output

; <<>> DiG 9.11.3 <<>> acme-v01.api.letsencrypt.org @10.96.0.10                                                    [7/1561]
;; global options: +cmd;; 
Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54093
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 13, ADDITIONAL: 12
;; QUESTION SECTION:
;acme-v01.api.letsencrypt.org.  IN      A
;; ANSWER SECTION:
acme-v01.api.letsencrypt.org. 19 IN     CNAME   api.letsencrypt.org-ng.edgekey.net.
api.letsencrypt.org-ng.edgekey.net. 19 IN CNAME e14990.dscx.akamaiedge.net.e14990.dscx.akamaiedge.net. 
19  IN      A       104.117.139.171
;; AUTHORITY SECTION:
.                       19      IN      NS      g.root-servers.net.
.                       19      IN      NS      m.root-servers.net.
.                       19      IN      NS      f.root-servers.net.
.                       19      IN      NS      e.root-servers.net.
.                       19      IN      NS      h.root-servers.net.
.                       19      IN      NS      l.root-servers.net.
.                       19      IN      NS      i.root-servers.net.
.                       19      IN      NS      a.root-servers.net.
.                       19      IN      NS      d.root-servers.net.
.                       19      IN      NS      c.root-servers.net.
.                       19      IN      NS      b.root-servers.net.
.                       19      IN      NS      j.root-servers.net.
.                       19      IN      NS      k.root-servers.net.

;; ADDITIONAL SECTION:
f.root-servers.net.     19      IN      A       192.5.5.241
b.root-servers.net.     19      IN      A       199.9.14.201
j.root-servers.net.     19      IN      A       192.58.128.30
l.root-servers.net.     19      IN      A       199.7.83.42
i.root-servers.net.     19      IN      A       192.36.148.17
g.root-servers.net.     19      IN      A       192.112.36.4
k.root-servers.net.     19      IN      A       193.0.14.129
m.root-servers.net.     19      IN      A       202.12.27.33
h.root-servers.net.     19      IN      A       198.97.190.53
d.root-servers.net.     19      IN      A       199.7.91.13
a.root-servers.net.     19      IN      A       198.41.0.4
c.root-servers.net.     19      IN      A       192.33.4.12
;; Query time: 108 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Nov 13 14:29:31 UTC 2018

@chrisohaver
Copy link
Member

Assuming you are not routing local. to a different dns server, the log shows that there are at least two successful queries upstream. They receive an NXDOMAIN response...

10.244.1.155:53330 - [12/Nov/2018:08:39:03 +0000] 23888 "AAAA IN acme-v02.api.letsencrypt.org.local. udp 52 false 512" NXDOMAIN qr,rd,ra 52 0.001675627s
10.244.1.155:41214 - [12/Nov/2018:08:39:03 +0000] 49764 "A IN acme-v02.api.letsencrypt.org.local. udp 52 false 512" NXDOMAIN qr,rd,ra 52 0.002842832s

nslookup does these queries because it follows the search domains in /etc/resolv.conf. I'm guessing you have local defined as a search domain on the node?
dig does not by default follow the search domains in /etc/resolv.conf.

So, it seems, with nslookup that the upstream dns server answers 2 queries, then ignores the third.

@Tabrizian
Copy link
Author

Exactly, I have defined local in the resolv.conf. Removing the search domain will fix the problem?

@chrisohaver
Copy link
Member

Removing the search domain will fix the problem?

No, i don't think it would. Perhaps there are intermittent connectivity issues to the upstream server...

When you use nslookup, does it always fail on the same domain? or does it randomly fail.
and when us use dig, does it always succeed?

@Tabrizian
Copy link
Author

As you said removing local did not fix the problem. nslookup always fails on acme-v02.api.letsencrypt.org and dig always suceeds on the acme-v02.api.letsencrypt.org. nslookup and dig work perfectly fine on google.com. Also, nslookup would work for a couple of queries after using dig on acme-v02.api.letsencrypt.org. But after a couple of queries these are what the results of nslookup change to:

After 1st execution of nslookup after dig

dnstools# nslookup acme-v02.api.letsencrypt.org
;; Truncated, retrying in TCP mode.
Server:         10.96.0.10
Address:        10.96.0.10#53
Non-authoritative answer:
acme-v02.api.letsencrypt.org    canonical name = api.letsencrypt.org-ng.edgekey.net.
api.letsencrypt.org-ng.edgekey.net      canonical name = e14990.dscx.akamaiedge.net.
Name:   e14990.dscx.akamaiedge.net
Address: 104.117.139.171
** server can't find e14990.dscx.akamaiedge.net: SERVFAIL

After 2nd execution of nslookup after dig

dnstools# nslookup acme-v02.api.letsencrypt.org
Server:         10.96.0.10
Address:        10.96.0.10#53
** server can't find acme-v02.api.letsencrypt.org: SERVFAIL

@chrisohaver
Copy link
Member

nslookup always fails on acme-v02.api.letsencrypt.org ...
nslookup and dig work perfectly fine on google.com

Confounding.

What happens when you query the upstream servers directly using nslookup? e.g.

nslookup acme-v02.api.letsencrypt.org <upstream-dns-ip>

@johnbelamaric
Copy link
Member

johnbelamaric commented Nov 13, 2018 via email

@Tabrizian
Copy link
Author

What happens when you query the upstream servers directly using nslookup? e.g.
@chrisohaver

nslookup acme-v02.api.letsencrypt.org <upstream-dns-ip> 

DNS query works perfectly. Here is the output:

dnstools# nslookup acme-v02.api.letsencrypt.org 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
acme-v02.api.letsencrypt.org    canonical name = api.letsencrypt.org-ng.edgekey.net.
api.letsencrypt.org-ng.edgekey.net      canonical name = e14990.dscx.akamaiedge.net.
Name:   e14990.dscx.akamaiedge.net
Address: 104.117.139.171
Name:   e14990.dscx.akamaiedge.netAddress: 2600:1417:7a:28d::3a8e
Name:   e14990.dscx.akamaiedge.netAddress: 2600:1417:7a:291::3a8e

What about with dig +search ? @johnbelamaric

dnstools# dig +search acme-v02.api.letsencrypt.org
;; expected opt record in response

; <<>> DiG 9.11.3 <<>> +search acme-v02.api.letsencrypt.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61019
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 13, ADDITIONAL: 13

;; QUESTION SECTION:
;acme-v02.api.letsencrypt.org.  IN      A
;; ANSWER SECTION:
acme-v02.api.letsencrypt.org. 19 IN     CNAME   api.letsencrypt.org-ng.edgekey.net.
api.letsencrypt.org-ng.edgekey.net. 19 IN CNAME e14990.dscx.akamaiedge.net.
e14990.dscx.akamaiedge.net. 19  IN      A       104.117.139.171

;; AUTHORITY SECTION:
.                       19      IN      NS      h.root-servers.net.
.                       19      IN      NS      l.root-servers.net.
.                       19      IN      NS      i.root-servers.net.
.                       19      IN      NS      a.root-servers.net.
.                       19      IN      NS      d.root-servers.net.
.                       19      IN      NS      c.root-servers.net.
.                       19      IN      NS      b.root-servers.net.
.                       19      IN      NS      j.root-servers.net.
.                       19      IN      NS      k.root-servers.net.
.                       19      IN      NS      g.root-servers.net.
.                       19      IN      NS      m.root-servers.net.
.                       19      IN      NS      f.root-servers.net.
.                       19      IN      NS      e.root-servers.net.
;; ADDITIONAL SECTION:
e.root-servers.net.     19      IN      A       192.203.230.10
d.root-servers.net.     19      IN      A       199.7.91.13
c.root-servers.net.     19      IN      A       192.33.4.12
b.root-servers.net.     19      IN      A       199.9.14.201
l.root-servers.net.     19      IN      A       199.7.83.42
m.root-servers.net.     19      IN      A       202.12.27.33
i.root-servers.net.     19      IN      A       192.36.148.17
g.root-servers.net.     19      IN      A       192.112.36.4
j.root-servers.net.     19      IN      A       192.58.128.30
k.root-servers.net.     19      IN      A       193.0.14.129
a.root-servers.net.     19      IN      A       198.41.0.4
f.root-servers.net.     19      IN      A       192.5.5.241
h.root-servers.net.     19      IN      A       198.97.190.53
;; Query time: 108 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Nov 13 19:58:05 UTC 2018
;; MSG SIZE  rcvd: 1083

@chrisohaver
Copy link
Member

chrisohaver commented Nov 13, 2018

The response is quite large. Could be related to compression/truncation? And one of the responses above shows a TCP fallback due to truncation. We recently fixed some compression/truncation issues.

What CoreDNS version are you using?

@johnbelamaric
Copy link
Member

johnbelamaric commented Nov 13, 2018 via email

@johnbelamaric
Copy link
Member

johnbelamaric commented Nov 13, 2018 via email

@chrisohaver
Copy link
Member

dig +search +noedns

Yeah - if this is a compression/truncation issue, then the expectation is that this would fail in the same way as nslookup. Actually, just dig +noedns should do it.

Would be good to confirm before upgrading to latest CoreDNS, if indeed you are on an older version.

@Tabrizian
Copy link
Author

The response is quite large. Could be related to compression/truncation?

Yes, maybe. I wanted to try DoT (DNS over TLS) so that I could possibly fix this issues and also prevent possible intervention by ISP. Do you think that this might help me in this case?

What CoreDNS version are you using?

k8s.gcr.io/coredns:1.2.2

dig +noedns

dnstools# dig  acme-v02.api.letsencrypt.org +noedns

; <<>> DiG 9.11.3 <<>> acme-v02.api.letsencrypt.org +noedns
;; global options: +cmd
;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 47720
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;acme-v02.api.letsencrypt.org.  IN      A
;; Query time: 10 msec;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Nov 13 20:54:18 UTC 2018
;; MSG SIZE  rcvd: 46

dig +noedns +search

dnstools# dig  acme-v02.api.letsencrypt.org +noedns +search
; <<>> DiG 9.11.3 <<>> acme-v02.api.letsencrypt.org +noedns +search
;; global options: +cmd
;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 45697
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available;; QUESTION SECTION:
;acme-v02.api.letsencrypt.org.  IN      A
;; Query time: 1 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Nov 13 21:02:50 UTC 2018
;; MSG SIZE  rcvd: 46

@chrisohaver
Copy link
Member

;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 45697

Yes, they are servfailing ...

k8s.gcr.io/coredns:1.2.2

Try k8s.gcr.io/coredns:1.2.6, this has a fix for compression/truncation (#2261). I don't know for certain that it will fix this specific issue, but its wort a try.

@Tabrizian
Copy link
Author

I will try and share my results here. Thank you very much for your support.

@miekg
Copy link
Member

miekg commented Nov 13, 2018

the dig output in #2289 (comment) looks bat shit crazy. Either something is interfering with your request or coredns is doing some very wrong.

@chrisohaver
Copy link
Member

bat shit crazy

I assumed it was a copy paste issue, or misplaced newlines due to scrubbing.

@Tabrizian
Copy link
Author

Tabrizian commented Nov 14, 2018

There was a little problem in copy paste. I fixed it, sorry.

I updated CoreDNS to k8s.gcr.io/coredns:1.2.6. But it didn't help. But I experimented something that might help.

I ran the same DNS queries on a host without CoreDNS they were pretty large. Both of the results were successful but very large. The authority section of both of them looked the same. But I tried these queries on a host with a different network. There was no authority section (which is how it should be) and the response was smaller. It seems that there may be a problem with CoreDNS handling large DNS queries.

Because both nslookup and dig are succesful in querying DNS using upstream defined in the /etc/resolv.conf.

@miekg
Copy link
Member

miekg commented Nov 14, 2018 via email

@zhangguanzhang
Copy link

I also had this problem

@zhangguanzhang
Copy link

image: coredns/coredns:1.2.2 or image: coredns/coredns:1.2.6 also have this

@liupeng0518
Copy link

liupeng0518 commented Dec 18, 2018

same problem
use image busybox err,
but use image busyboxplus:curl is OK !

[root@k8s-m1 k8s-manual-files]# kubectl exec -ti curl -- sh
[ root@curl:/ ]$ nslookup kubernetes
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[ root@curl:/ ]$ [root@k8s-m1 k8kubectl exec -ti busybox -- nslookup kubernetes
Server:		10.96.0.10
Address:	10.96.0.10:53

** server can't find kubernetes: NXDOMAIN

*** Can't find kubernetes: No answer

@ducttapecoder-vt
Copy link

@liupeng0518 In my travels I've seen a lot of other posts in regards to a busybox specific issue. You might want to look into that and rule that out as a possibility.

@chrisohaver
Copy link
Member

chrisohaver commented Mar 15, 2019

@liupeng0518, Any version of busybox > 1.28 has a broken nslookup that ignores search domains.

@zhangguanzhang
Copy link

not the busybox,It's no error when I use the image with net-tools test at the kube-dns, I saw the errors when I use the coredns and the same images。

@miekg
Copy link
Member

miekg commented Jun 20, 2019

This looks stale or solved. Closing

@miekg miekg closed this as completed Jun 20, 2019
@KeithTt
Copy link

KeithTt commented Jan 21, 2021

In my case, it is very strange, AUTHORITY SECTION will appear sometimes, not every time.

And I wonder, when will AUTHORITY SECTION be output?

coredns version: 1.6.2

Here is the resolve output:

# dig www.baidu.com @10.3.0.10

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.3 <<>> www.baidu.com @10.3.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36859
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 13, ADDITIONAL: 27

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.baidu.com.			IN	A

;; ANSWER SECTION:
www.baidu.com.		22	IN	CNAME	www.a.shifen.com.
www.a.shifen.com.	22	IN	A	110.242.68.3
www.a.shifen.com.	22	IN	A	110.242.68.4

;; AUTHORITY SECTION:
com.			22	IN	NS	g.gtld-servers.net.
com.			22	IN	NS	h.gtld-servers.net.
com.			22	IN	NS	k.gtld-servers.net.
com.			22	IN	NS	d.gtld-servers.net.
com.			22	IN	NS	m.gtld-servers.net.
com.			22	IN	NS	i.gtld-servers.net.
com.			22	IN	NS	c.gtld-servers.net.
com.			22	IN	NS	j.gtld-servers.net.
com.			22	IN	NS	b.gtld-servers.net.
com.			22	IN	NS	e.gtld-servers.net.
com.			22	IN	NS	f.gtld-servers.net.
com.			22	IN	NS	a.gtld-servers.net.
com.			22	IN	NS	l.gtld-servers.net.

;; ADDITIONAL SECTION:
k.gtld-servers.net.	22	IN	A	192.52.178.30
b.gtld-servers.net.	22	IN	A	192.33.14.30
e.gtld-servers.net.	22	IN	AAAA	2001:502:1ca1::30
g.gtld-servers.net.	22	IN	A	192.42.93.30
a.gtld-servers.net.	22	IN	AAAA	2001:503:a83e::2:30
m.gtld-servers.net.	22	IN	AAAA	2001:501:b1f9::30
m.gtld-servers.net.	22	IN	A	192.55.83.30
c.gtld-servers.net.	22	IN	A	192.26.92.30
g.gtld-servers.net.	22	IN	AAAA	2001:503:eea3::30
h.gtld-servers.net.	22	IN	AAAA	2001:502:8cc::30
k.gtld-servers.net.	22	IN	AAAA	2001:503:d2d::30
f.gtld-servers.net.	22	IN	AAAA	2001:503:d414::30
h.gtld-servers.net.	22	IN	A	192.54.112.30
a.gtld-servers.net.	22	IN	A	192.5.6.30
b.gtld-servers.net.	22	IN	AAAA	2001:503:231d::2:30
f.gtld-servers.net.	22	IN	A	192.35.51.30
d.gtld-servers.net.	22	IN	A	192.31.80.30
i.gtld-servers.net.	22	IN	A	192.43.172.30
d.gtld-servers.net.	22	IN	AAAA	2001:500:856e::30
l.gtld-servers.net.	22	IN	A	192.41.162.30
l.gtld-servers.net.	22	IN	AAAA	2001:500:d937::30
j.gtld-servers.net.	22	IN	AAAA	2001:502:7094::30
i.gtld-servers.net.	22	IN	AAAA	2001:503:39c1::30
j.gtld-servers.net.	22	IN	A	192.48.79.30
c.gtld-servers.net.	22	IN	AAAA	2001:503:83eb::30
e.gtld-servers.net.	22	IN	A	192.12.94.30

;; Query time: 0 msec
;; SERVER: 10.3.0.10#53(10.3.0.10)
;; WHEN: Thu Jan 21 17:24:23 CST 2021
;; MSG SIZE  rcvd: 897

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants