-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ares_getaddrinfo for IPv6 not iterating all domains under specific conditions #426
Comments
This is always one of those confusing things about DNS. I'd argue that whatever server responding to the query with NODATA is in the wrong. As it means it knows the domain name, but just not the record type requested. Typically this will occur when a host has an entry for IPv6 but not IPv4 but the opposite was requested, a cname is handled differently here and will not result in c-ares returning NODATA. Some more discussion here... Now, if this is really supposed to be ignored and continued as if an NXDOMAIN was hit by the getaddrinfo() spec, by all means we should be doing that. Looking at, say, android's bionic library, I do see they treat a nodata the same as nxdomain except they also track it to account for a final error code override: |
Can you try that commit ea68b1b and see if this resolves your issue? |
Hi, Thanks for the fast commit. I tested it but I found it still not working, the behavior is the same as without the patch. I believe the cause, for this scenario, is the code added by the patch relies on the fact
That is, several Maybe since Thanks, |
In that case, try this latest commit. I don't have a system set up to simulate that particular behavior to test. |
Hi, I tested commit 9aacffe and it worked! Many thanks for your time and help, |
…s ARES_ENODATA Some DNS servers may behave badly and return a valid response with no data, in this case, continue on to the next search domain, but cache the result. Fixes Bug: c-ares#426 Fix By: Brad House (@bradh352)
We would be very thankful if you could give us our opinion about an issue that seems related with c-ares library implementation of
ares_getaddrinfo
.The issue was first found in Kubernetes IPv6 clusters, in a containers running envoy-proxy. We noticed that, after an envoy-proxy version upgrade, in some of our IPv6 environments, DNS resolution operations requested by envoy-proxy processes were failing. That problem was not present in the same environments using the non upgraded envoy-proxy containers.
After some investigation, we discovered the envoy-proxy commit related with the problem, and that pointed out to an update of c-ares library version being used by envoy-proxy. envoy-proxy is using
ares_getaddrinfo
to resolve names. After more investigation, our issue seems to be related with the following c-ares commit:dbd4c44 Parallel A and AAAA lookups in
ares_getaddrinfo
(#290)Doing some troubleshooting we reproduced the behaviour I will describe next both in a build with c-ares version corresponding to that commit, and also in a build using latest c-ares release v1.17.2.
Our setup has the following relevant configuration for the case:
ares_getaddrinfo is called to to find only an IPv6 family address for name:
kkkk-ingressgw-app-traffic.test-site2.svc.cluster.local
For the case where resolution is working, that is, using an c-ares version previous to the specified commit (dbd4c44 ), tcpdump capture is the following:
For the scenario where resolution is NOT done, with current implementation, tcpdump capture is the following:
The difference between both captures is that, in the working one, all domains in resolv.conf are tried, and then a final query for the specified name without any domain added (as expected). That final query is the one that returns the IP address. However, for the scenario where no IP address is returned, after response from DNS server for name kkkk-ingressgw-app-traffic.test-site2.svc.cluster.local.dddd.ccc.bbbbbbbb.aa. no other query is sent.
In both captures the response for that particular query is the same, and corresponds to a NOERROR response with 0 answer records. That is, this response will correspond to a NODATA response. In fact, in that case,
ares_getaddrinfo
is returning (via callback)ARES_ENODATA
(1), which is not reflected in the documentation.My question there is if in this case
ares_getaddrinfo
should stop queries (callingares_query
) when anares_query
returnsARES_ENODATA
, as it is happening in current implementation, or if it shall continue with rest of queries like in previous implementation. According to some references, a NOERROR without data (NODATA) is something that complies with DNS protocol and depends on the specific record types that exist in a DNS server (e.g. https://prefetch.net/blog/2016/09/28/the-subtleties-between-the-nxdomain-noerror-and-nodata-dns-response-codes/)So, in my humble opinion, I beleive
ares_getaddrinfo
may be modified in one of the following ways:ares_getaddrinfo
implementation to treat anARES_ENODATA
status value fromares_query
the same asARES_ENOTFOUND
and proceed to try next domain/query, so all domains are tried and, in case no invocation ofares_query
returns any valid address then returnARES_ENOTFOUND
(fromares_getaddrinfo
).ares_getaddrinfo
documentation to explicitly state it may returnARES_ENODATA
and under which circumnstances, so user of c-ares library may know and try to fallback to another resolution mechanism.Please, let us know you opinion.
Thanks & Best Regards,
Daniel.
The text was updated successfully, but these errors were encountered: