Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Kubernetes autopath plugin bug - CoreDNS fails to resolve A records under circumstances #2842
All versions of CoreDNS are affected, including the latest one.
Relevant parts from the bug point of view are:
Relevant notes about the OS:
Kubernetes plugin, autopath section.
Occasionally A record resolution fails within cluster. If you use curl in your Kubernetes cluster for whatever reason (many SDKs do), then you'll see sporadic
The scenario is fairly complex and took a few days to pin down the exact reason behind the symptom. Won't go deep here about the investigation, will present only the result which will make sense even without the investigation.
The root cause of the problem is that autopath is applied only if the first entry of the source pod's search path is equal or ends with the entry we want to resolve. This is checked here using this function here.
This search path function is implemented in modules which support autopath, including the Kubernetes plugin. The Kubernetes plugin is responsible to build the search path list based on the source pod of the request, this is what is used to "figure out" what the source Pod's search path looks like. This function is here.
This function looks up the source IP of the Pod, then it reaches out to Kubernetes to find out which namespace it belongs to based on its IP address. Using that, it'll rebuild how the source Pod's search path looks like.
This works fine most of the time, but not all of the time. When it doesn't work, autopath won't kick in and CoreDNS cache will store NXDOMAIN for for the query as
So, when only one side expires alone and the cache should be repopulated again for that specific type and entry, it depends on whether the source Pod triggers autopath or not, it'll get an NXDomain or NOERROR. If the other type (A/AAAA) of that same entry which still sits in the cache has a "matching" type in terms of originally was populated with or without autopath, it's fine as both A and AAAA will represent the truth, either autopath'd (NOERROR) or without autopath (NXDomain). Both works, as autopath will return an entry the host can work with, and A/AAAA NXDomain will trigger the next entry in the search path on the client side.
The problem happens when they are ending up being different "types", as one was resolved originally by autopath (NOERROR) while the other wasn't (NXDomain). Which one which is fairly irrelevant, although it's luckier to have the A as NOERROR and AAAA as NXDomain than the other way around.
This will result a scenario where for a single external query CoreDNS replies with one NXDOMAIN (as
This is already inconsistent, as we received half the truth, we received the record for A or AAAA which was resolved by autopath, and an NXDOMAIN for the other as outside of autopath
Now, if we're lucky and we ended up in having the A record (NOERROR) while the AAAA missed out (NXDOMAIN), this won't even show up as an error in your environment. If the case is other way around, when A is NXDOMAIN and AAAA is NOERROR, that's when problems starts to happen. Most of the time hosts are returning entries for AAAA, but only a CNAME, which doesn't translate to further AAAA records at the end.
So, the client can't connect. It has an AAAA record with a CNAME only and no A record at all. One example is api.twilio.com:
As long as CoreDNS cache has an inconsistent state about a single query, this resolution will fail from the Kubernetes cluster. Once the cache expires, things clear and it starts working again, until we have another case where two pods in the same namespace are trying to resolve the same entry, one of them triggers autopath the other doesn't.
So, now back to the bug: why would autopath serve some Pods while wouldn't serve others in the same namespace trying to resolve the same entry?
Because when it looks for the source Pod's namespace, it looks up the IP of the Pod and gets its namespace. The problem is, that Kubernetes reuse cluster IPs and we may have other, previously exited/completed Pods in that list with that IP address in different namespaces. As a result, the Kubernetes autopath plugin will build an incorrect search path which won't match the entry the source Pod is trying to resolve, and it'll skip autopath.
An example when you have multiple Pods with the same IP, 2 historical and 1 running:
When I make a request from
As a result occasionally hosts will fail resolving DNS names while the cache stays inconsistent by having A and AAAA entries of a single request one served by autopath and the other not served by autopath, depending on the source Pod which triggered the query.
I have raw packet dumps where I can see how the inconsistent cache state caused by the fact that autopath is sometimes applied, sometimes not from the same namespace, will cause the source Pod to fail resolving hostnames.
We need to have the following items in place to trigger this bug:
Write a shell script,
#!/bin/sh while true do date | tr '\n' ' ' curl https://api.twilio.com 2>&1 | egrep "TwilioResponse|Could not resolve" | sed 's/^.*curl/curl/g;s/<Versions>.*$//g' sleep 5 done
This script will give us two type of output:
Sat May 25 03:29:44 UTC 2019 <TwilioResponse> Sat May 25 03:29:49 UTC 2019 <TwilioResponse>
When we were able to resolve
Sat May 25 03:29:54 UTC 2019 curl: (6) Could not resolve host: api.twilio.com
When we failed.
Also, it wouldn't matter much if we increase the resolution frequency, as the inconsistency potentially can hit only when the CoreDNS entry cache expired and has to be repopulated.
Set up two test Pods in the default namespace, called
dnsConfig: searches: - test2.svc.cluster.local - svc.cluster.local - cluster.local - eu-west-1.compute.internal
This will make Kubernetes autopath to ignore this Pod, as when we'll try to resolve
With this we simulate the case when autopath gets a historical Pod which used to have our IP, and which was in a different namespace, like
Start running our test script in
messages in sync with each other randomly come and go, depending on whether our current cache is consistent/inconsistent. It can stay broken for only a few seconds even up to minutes, depending on the request patterns, cache settings and TTLs.
When that hits, no one can resolve
As you see the bug is highly contextual.
Note: it's interesting to see why
The solution is, not enough to query the Pod based on its IP address here but also to filter for status, so we look onlly for Pods with that IP that are actually running. Instead of returning the first Pod we found with that IP, we have to iterate them through and return the first
As a result you'll always get the correct namespace, autopath will build the correct search path, and the cache remains consistent.
Nicely debugged! Thanks!
# Solution The solution is, not enough to query the Pod based on its IP address [here](https://github.com/coredns/coredns/blob/master/plugin/kubernetes/autopath.go#L31) but you also have to filter for status, so you only look for Pods with that IP that are actually running. As a result you'll always get the correct namespace, autopath will build the correct search path, and the cache remains consistent.
You have a PR ready?
@miekg Trying to push a new branch
Could you please advise? If you have a guide I could follow how to push this, I would appreciate. Or if it's just a permission issue, if you could allow me to push it. Also, I can rename the branch, I see we prefer - over _ in branch names.