-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreDNS resiliency with non-compliant upstream #5953
Comments
The philosophical matter of enabling other misbehaved DNS implementations aside ... This seems reasonable on a first take. We'd need to define precisely what a "UDP Payload failure" is - perhaps just this specific issue ( Anyone else see any gotchas? |
The only case I can think of for now would be an overlarge packet. The thing I don't like about automatically re-querying via TCP is that we'd still be dealing with a UDP response to the client, so we'd have to take the response we received via TCP, truncate it appropriately, and then send it via UDP. The client then has to come back via TCP and get the data again, but at least we'd be able to have it via cache if that's enabled. I see two options that could be added on to deal with this in a more efficient way. Manually override buffer sizeFirst, and simpler: add an option to force a minimum buffer size on the request object like the cache plugin used to do, so that users could set it to large enough to handle whatever size their misbehaving DNS server might send over UDP. This falls back into that "enabling misbehavior" question, which is why I don't prefer it. We should then take the response, truncate it, and send it back to the client at 512 bytes so that we're giving a compliant response even if the upstream server isn't. This is different from the bufsize plugin because bufsize currently only limits the maximum buffer size but doesn't increase it. Suggested implementation:
End result: we use more memory than 1.10.1 (but potentially less memory than 1.10.0), but shield downstream clients from a misbehaving upstream DNS server without increasing external query counts. The client might have to requery via TCP, but we could serve the response from cache. Utilize eDNS0Depending on the upstream server, we might also be able to fix this by add an option in the forward plugin to use eDNS0 on the upstream request if cache is enabled, even if the client doesn't request eDNS0 in the first case. It would have the same behavior as the automatic requery from the client perspective, but it would mean that coredns won't have to requery the upstream via TCP for a medium length (512<request<1232, assuming default bufsize settings) request even if the client doesn't support eDNS0 Additionally, the DNS server I'm working with operates properly if eDNS0 is sent - so if we use eDNS0 on all requests, even if we did eDNS0 with a bufsize of 512, then the problem doesn't show up in the first place. I like this because most of the latency in our end is on the forwarded query. The client having to requery via TCP and get a cached response from coredns is a minimal amount of overhead compared to making the upstream query in the first place, so getting the full response from upstream in the first request and caching it means the response via TCP will be much faster. This is basically going back to #5366 and solving it another way - that bug was because we always sent eDNS0, which is what I'm proposing here, but we didn't strip it back off and truncate the response to the client. Suggested implementation:
Thoughts, @chrisohaver? |
I just realized that providing an option to turn eDNS0 on for all forwarded queries is basically what the old "always request DNSSEC" code did - since DNSSEC requires eDNS0, that turned on eDNS0 for every query. Accordingly, can we reenable just that send-eDNS-by-default behavior, which could then respect bufsize, without turning the DNSSEC behavior back on? |
IIRC, the need to enable edns0 was in itself was the root of several of the problems the DNSSEC cache hack caused. I think a better fix would be: in forward, when the response is invalid due to overflow, trim the answer section to size and mark response as truncated, and send it to client. This would cause the client to retry over TCP. The problem with having forward try to short-cut this and do the retry itself over tcp, is that after retrying, it would still have to trim the full response down to the original udp request size when sending the response to the client over udp. The client then sees the truncated response and retries over tcp. So it ends up being more work. |
Sorry - I just noticed you already said the same thing. |
Actually this doesn't work either. The entire answer needs to be cached, and truncated appropriately to client spec when serving from cache. Which means when cache is enabled, we need to set edns buff size to max. Otherwise we'll get truncated answers from cache when answering tcp queries. Or never write truncated responses to cache ... I need to spend some time reviewing what cache and forward actually currently do. |
Sorry for spamming this issue: My reasoning is flawed here. The original suggested retry in forward works fine because we would cache the full answer. cache (should already) trim the answer down to size for udp clients.
Isn't this always required? Ultimately we have to give a valid answer to a client that has a buff size smaller than the cached answer. The only way to to that is to truncate the answer. |
On caching: from RFC1035 section 7.4:
What about this, for simplicity - if the UDP response from the DNS server fails to unmarshall due to an overflow error, we return an empty message with the tc flag set. This avoids having to figure out what maximum buffer size to pass to miekg/dns to make sure we can actually unmarshall the message, since RFC6891 doesn't actually set a maximum UDP message size (it suggests 4096 bytes). This also matches what other large providers do. For example, CloudFlare and Google just send back a response with no answer records and the tc flag set:
|
I don't think we are currently caching truncated messages. This is from memory, but I think the way it works now is ...
The issue is when the result is invalid in step 1. The proposed solution in |
Yup, I was just agreeing with your comment earlier about not caching truncated entries - that agrees with the RFC. As far as retrying, forward only switches to TCP on a UDP truncated response if prefer_udp is set. Otherwise it breaks and returns back to the consumer. My only concern with doing an automatic TCP retry is that it's only useful if caching is enabled - if cache isn't enabled, there's no point in making the TCP request in forward, as we won't be able to do anything with the response. However, cache runs before forward - is there a good way to check that? In either case, my main concern on an automatic retry during the UDP is that it's wasted effort if either there's no cache or the client doesn't come back via TCP - why front load the effort when we may not need it? It seems cleaner to just say "if the request doesn't unmarshal properly because it's too long, signal the client to switch to TCP by returning an empty response with the truncate bit set." We already have the behavior to do the automatic retry if you set prefer_udp, so should the default be the simpler case? |
@chrisohaver I did a quick PR on the simplest handling case - just responding with an empty truncated message. If we want to go with a more complicated solution, please feel free to reject it - but it seemed like the simplest way to fix the overall issue, and is still something we should do even for my two alternate solutions. |
This is related to MR #6183. I think there is not well understood how to handle EDNS from clients and its relation to EDNS used in forwarded queries. It is okay to add EDNS0 header to forwarded queries. DNS recursors like named, unbound or even dnsmasq do that often. But what is important is stripping such header and possible DNSSEC OK bit from the response delivered to client. Queries with EDNS0 are faster than TCP queries, so they should be preferred if possible. But the server must not send response bigger than 512 bytes to client, which did not include EDNS0 header in its query. Even if forwarded query had it. |
What I would recommend to make possible is:
I think it is not possible now to add EDNS0 and/or DNSSEC OK to forwarded queries, but remove extensions from client answers. |
my bad - I though this was the PR. This is the issue. reopening. |
I think there is one important difference between DNSSEC Okay bit and EDNS0 header. EDNS0 header adds just few bytes to the query. DNSSEC Okay (DO) causes RRSIG and NSEC(3) records to be added on signed zones, and that increases the response size significantly. I would recommend adding EDNS0 even to forwarded queries, which had not EDNS0 on the original query. It can respond with single packet to It would be nice if such EDNS0 addition could be turned off, but should work on most deployments. Something ISC BIND9 has by default to yes. |
So your argument is that for non-EDNS0 clients, it's better to add EDNS0 for forwards with UDP Payload set as The benefit here is really found in caching, right? When a non-EDNS0 client makes the same query again, CoreDNS should retrieve the existing full-sized response from the cache, then truncate it before sending to the client. Instead of doing a non-EDNS0 forward for a non-EDNS0 client which potentially results a truncated response from upstream (CoreDNS does not cache truncated responses), CoreDNS should do the "full-sized" EDNS0 request and store that in the cache for other requests (including non-EDNS0) to use. This means CoreDNS is more likely to retrieve a cache-able, non-truncated response from upstream, improving cache efficiency. Is this a feature of the bufsize plugin? How would this work if the bufsize plugin is not enabled? Though I'm interested in this conversation and it's relevant, to be clear, I don't think it's a solution to this issue about CoreDNS resiliency as with non-compliant upstreams, as I believe we were using EDNS0 everywhere. Interested in creating a separate issue to continue this discussion on EDNS0? |
Okay, created a separate issue for that. But it seems to me coredns in current version is non-compliant itself. It copies incoming EDNS0 header, but it should not. Created #6188 for it. |
I think to improve resiliency in general, coredns should store known featureset of used forwarders. Is they send wrong answers, it should log an anomaly and store it in featureset cache for some time. That would be to prevent logging that error on every reply. It might store that server does not understand EDNS0 and stop sending it in requests, it it correctly refuses the packet with it by FORMERR return code. It would be nice if such featureset could be fetched on demand. But since it does not implement iterative queries itself, forwarders are always known in advance. I would say receiving responses up to 1500 bytes in size should be possible even without EDNS0. Support for EDNS0 were created for devices, which allocate just 512 bytes for response and cannot use more. The intention were to create backward compatibility. It is to not break them. It may sense to implement strict policy, where any response larger than requested were dropped, but I think that should not be mandatory. Kind of relaxed policy on receiving side would help. I think UDP responses larger than 4096 should be dropped, but at least 1500 should be acceptable even if requested size were smaller. It should just log the forwarder is doing something wrong, but should be able to process ethernet frame sized response. If otherwise the message is valid, i think it can be accepted. |
Especially if retries were done, it may make sense to relax requirements on retry, unless that is security sensitive matter. |
It would be great if DNS-Compliance-Testing were used to test both authoritative and forwarding modes of coredns.
{
"data": [
{
"zone": "ipv4only.arpa",
"address": "127.0.0.1",
"servername": "localhost",
"tests": {
"dns": "aa",
"edns": "aa",
"edns@512": "aa,notc",
"ednsopt": "aa",
"do": "aa",
"ednsflags": "aa",
"optlist": "aa,expire-bad",
"signed": "aa",
"ednstcp": "aa"
}
}
]
} I am not sure how 3rd party tool can be integrated to checks. |
#6277 has merged, though slightly different than @pemensik described in #5953 (comment). If the received packet is oversized, it truncates the response so the client can retry in TCP. I'll close this issue since I feel my original request has been met, but feel free to follow up with another issue. |
What happened:
I'm trying to work through an issue in which a OKD user reported (https://issues.redhat.com/browse/OCPBUGS-6829) CoreDNS failing on upstream/forwarded requests with the failure:
The only way this logic could be getting executed is if the upstream is non-compliant regarding UDP payloads sizes. We specify
bufsize: 512
and it appears the upstream is coming back with more than 512 and not setting the truncated bit. The user is using a Windows 2016 or 2019 AD DC integrated DNS server for upstream DNS.I've reproduced this error with 2 CoreDNS binaries: one being a vanilla CoreDNS (first hop/downstream) and another being a hacked CoreDNS (upstream) that doesn't truncate and sends UDP responses larger than UDP payload requested. I "hacked" CoreDNS to never truncate by putting a
return
here. So it looks like:Dig Client --> Vanilla Downstream CoreDNS 1.10.0 --> Upstream Hacked-to-not-truncate CoreDNS 1.10.0
In this reproducer, the chain of events is as follows:
What you expected to happen:
I understand the upstream is non-compliant, but it seems non-compliant server exist in the wild (see #3941 and #3305). The user's solution here is to increase
bufsize
(seems a bit arbitrary) or turn onforce_tcp
.My question here is: Could CoreDNS retry TCP after a UDP Payload failure automatically? It helplessly retries for 5 seconds, but would it increase resiliency to attempt a TCP retry upon one of these failure cases?
How to reproduce it (as minimally and precisely as possible):
I talked about creating the reproducer in the What happened section.
Anything else we need to know?:
N/A
Environment:
1.10.0
cat /etc/os-release
):The text was updated successfully, but these errors were encountered: