-
-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unbound is not friendly enough to support certain non-standard zones. #1074
Comments
There are several v6 addresses. Since my machine does not have a public IPv6 environment, udp connect will fail. In this case the rto of the v6 address should also grow. But there is actually no growth. |
As I understand it, the servers are intentionally dropping packets causing timeout logic in Unbound. Unbound then applies the exponential backoff timer to wait more and more between each packet until the servers reach the configured upper limit. In that case Unbound will consider the servers offline and not waste traffic there.
Now with that out of the way, failure to communicate and introducing resolver timeouts by the upstream nameservers is not Unbound's problem. There is also a relevant RFC about that https://datatracker.ietf.org/doc/rfc8906/. The exponential backoff logic happens when a request times out, if your system does not support IPv6 the query is likely not getting out at all. In that case I would like to close this as a non-issue but I leave it open in case I misunderstood something from your text :) |
you're right. I can understand the logic. But I still expect a better logic. I tested bind and knot-resolver. They have better anti-attack performance in the same scenario. |
|
If servers timeout Unbound would first try to aggressively back off to give them time to come back up or deal with a potential traffic spike. If they still timeout after the configured max (12 seconds by default) they are considered down and may be probed in the future. This is fine for a server under pressure, it can stay out of Unbound's selection and may be reprobed based on configuration. This case however is for all misbehaving upstream nameservers. Fixing this case just for them brings unneeded server selection shenanigans and retries to Unbound for all other nameservers that are simply down. I understand the "attack" you are talking off, but I don't see how this is Unbound's problem. |
infra's rto exponential backoff is not a good algorithm. I insist that this should be logarithmic backoff. It is infinitely close, but will not reach rto. |
With exponential Unbound is trying to be generous to poorly connected nameservers by doubling the timeout while waiting for an answer. It also makes Unbound aggressive to drop non-responding nameservers from server selection by reaching the top configured timeout faster. Non-responding nameservers are servers that are either under load and can't keep up, or broken nameservers. For the former it is good that Unbound stops contacting them and preferring other nameservers. For the latter it is good that Unbound stops contacting them because they are broken and wasting Unbound's time. Also a server's timeout needs to reach the top configured timeout (after several timeouts) since this is the criteria for a non-responsive nameserver, so that the server gets removed from the selection. Then Unbound can spend time on responsive nameservers. There are two distinct cases in your issue though: For a) there is nothing for Unbound to do. There is an RFC that clearly states this is wrong behavior. The faulty behavior is with the upstream. For b) maybe Unbound needs to do something differently and facilitate more attempts, than the current none, to such delegation but this needs some thinking because it can have unexpected results in certain scenarios. We have plans to augment server selection for configured forward/stub zones in the future and we can also revisit server selection for common nameservers. |
Describe the bug
Scenario, when the authoritative server does not respond to a specific domain name, unbound will cause a very serious penalty to the authoritative server.
And the process of marking authoritative ns as timed out is very fast, it is an exponential process.
eg: taobao.com.
When I initiate a request to unbound https://taobao.com, unbound will poll the four ns servers of taobao.com, ns4.taobao.com., ns5.taobao.com., ns6.taobao.com., ns7.taobao.com. initiated https://taobao.com requests, and all requests timed out.
Causes rto to double. When I request again, rto is triggered to double again. Soon, taobao.com will be marked as timed out in infra.
Normal requests under the taobao.com domain will also respond to serverfail quickly. This is a loophole. We cannot control client-side requests, but we must ensure that client requests do not affect unbound's normal services.
To reproduce
Steps to reproduce the behavior:
Expected behavior
Like bind, rto is growing, but will not increase to the upper limit any time soon. Until infra's ttl times out, rto is reset to a smaller level.
System:
unbound -V
output:Additional information
The possible problem should be in the rtt_lost function:
The text was updated successfully, but these errors were encountered: