Skip to content

Forwarder returns servfail on upstream response noerror no data #946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sylviedev opened this issue Oct 4, 2023 · 9 comments
Closed

Forwarder returns servfail on upstream response noerror no data #946

sylviedev opened this issue Oct 4, 2023 · 9 comments

Comments

@sylviedev
Copy link

sylviedev commented Oct 4, 2023

Describe the bug

unbound 1.18.0 configured as forwarder : when upstream server replies with rcode NOERROR and zero answers, the forwarder replies to the client with rcode SERVFAIL.

To reproduce

  • an unbound server configured as recursive resolver on interface 127.0.0.1@8053, configured with
    local-zone: "example.com" always_nodata
  • an unbound server configured as forwarder on interface 127.0.0.1@7053, config:
server:
    username: ""
    chroot: ""
    directory: "/var/lib"
    do-daemonize: no
    do-not-query-localhost: no
    module-config: "iterator"
    interface: 127.0.0.1@7053
    interface-action: 127.0.0.1@7053 allow
    use-syslog: no

forward-zone:
    name: "."
    forward-addr: 127.0.0.1@8053
  • query the forwarder : status SERVFAIL
$ dig @127.0.0.1 -p 7053 example.com

; DiG 9.16.42-Debian @127.0.0.1 -p 7053 example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: SERVFAIL, id: 28215
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;example.com.			IN	A
  • query the recursive resolver : status NOERROR
$ dig @127.0.0.1 -p 8053 example.com 

;  DiG 9.16.42-Debian  @127.0.0.1 -p 8053 example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: NOERROR, id: 53473
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;example.com.			IN	A

Expected behavior
return rcode NOERROR with zero answers to the client as did unbound 1.17.1.

System:

  • OS: Debian 11
  • unbound -V output:
Version 1.18.0

Configure line: 
Linked libs: mini-event internal (it uses select), OpenSSL 1.1.1n  15 Mar 2022
Linked modules: dns64 respip validator iterator

Additional information

log for unbound forwarder 1.18.0

[1696431803] unbound[112655:0] info: iterator operate: query example.com. A IN
[1696431803] unbound[112655:0] info: response for example.com. A IN
[1696431803] unbound[112655:0] info: reply from <.> 127.0.0.1#8053
[1696431803] unbound[112655:0] info: query response was THROWAWAY

log for unbound forwarder 1.17.1

[1696431738] unbound[112637:0] info: iterator operate: query example.com. A IN
[1696431738] unbound[112637:0] info: response for example.com. A IN
[1696431738] unbound[112637:0] info: reply from <.> 127.0.0.1#8053
[1696431738] unbound[112637:0] info: query response was nodata ANSWER
@wcawijngaards
Copy link
Member

Yes, this is true. It would be nice to both be able to do this, but also the item that changes it for 1.18.0, that notes: Fix to ignore entirely empty responses, and try at another authority. This turns completely empty responses, a type of noerror/nodata into a servfail, but they do not conform to RFC2308, and the retry can fetch improved content.
So it would be nice to be able to tell one case from the other.

@wcawijngaards
Copy link
Member

Adding a SOA record would also solve the issue, eg. for the upstream server to include a SOA record into the answer. Unbound could perhaps already do this, when a local-data statement with a suitable SOA record is included in config, local-data: "example.com. SOA localhost. nobody.invalid. 1 3600 1200 604800 10" for example.

@wcawijngaards
Copy link
Member

There is a fix committed. Thanks for the detailed report.

The fix notices the event and classifies the first nodata answer to be retried. It then probes for another server. But since it is always_nodata, that also returns nodata, or it notices no better server selection and tries again. The second time it notices the event of an empty answer, it accepts the answer. This makes the setup from the issue work, but also tries again at another server like in the issue ticket that was solved for 1.18.0, and so works for both cases.

@sylviedev
Copy link
Author

Thanks for the fix.

Now the reply is correct with rcode noerror. As only one upstream server is configured, it gets queried twice, would it be possible to directly accept the first answer in this case ?

@wcawijngaards
Copy link
Member

I do not have sufficient information about the 1.18.0 fix case to know if that would be helpful for it. But it might be possible to implement by tracking the number of servers, right now it just attempts once again. It more certainly works with the current approach.

@firefrei
Copy link

firefrei commented Oct 9, 2023

Hi,
I think I am currently running into this issue in my setup. Can you recommend any workaround that I (and maybe also others :-) ) can implement/configure until this fix has made its way into the next release?

Thanks a lot!
Matthias

@wcawijngaards
Copy link
Member

Not sure what configuration workaround exists, unbound offers the option to tailor the results for particular queries with local-zone and local-data statements, and this certainly works to workaround particular queries and set answers for them. Otherwise, incorporating the fix, e.g. the code commit or building unbound from the source repo could fix it.
Documentation links https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html#unbound-conf-local-zone
https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html#unbound-conf-local-data

@smaeda-ks
Copy link

Hi @wcawijngaards, this isn't an ongoing issue, but could you help me to better understand the issue for my future reference? 🙇

I'd like to make sure I understand your comment right, as you previously mentioned:

Adding a SOA record would also solve the issue, eg. for the upstream server to include a SOA record into the answer.

Basically, we started seeing some issues with Let's Encrypt cert issuance, as they switched to Unbound 1.18 last month:
https://community.letsencrypt.org/t/dns-resolver-upgraded-to-unbound-1-18-empty-responses-require-soa-sections/210417

They started failing to resolve some domains that were fine with the prior version, otherwise.
An example condition is below:

  • example.com zone is hosted on Route53
  • acme-delegated subdomain is delegated to ACME DNS via NS records on the above Route53 zone
  • example.com zone also exists in ACME DNS
  • there's no RR on this acme-delegated label in ACME DNS
  • I trigger new cert issuance for foo.acme-delegated.example.com
  • Let’s Encrypt (Unbound) checks CAA on foo.acme-delegated.example.com
  • we have a default system wildcard RR * that’s on the root of the zone (Apex) in ACME DNS, and it's applied/expanded to any subdomains, unless there’s an explicit RR set (correct me if I'm wrong on this). This means querying CAA for foo.acme-delegated.example.com returns NOERROR empty answer instead of NXDOMAIN. This NOERROR response does contain SOA record in AUTHORITY section, but its domain name is example.com
  • Unbound ignores and remove this SOA RR from the response, because example.com is not a children of the originating zone - code (I'm not 100% sure about my understanding here)
  • therefore, Unbound 1.18 started exhibiting the issue due to this change
  • Unbound changed the retry logic in 1.19 - code, so it doesn't throw SERVFAIL even with no SOA

Am I understanding this right?

@wcawijngaards
Copy link
Member

Yes I think you have this correct. The issue you cite is different from the top-post issue, in that the SOA RR is removed from the response by unbound because it is out-of-zone. And then it is the similar case because the empty result that remains. The fix would then allow unbound to retry instead of ignore it all the time, and use the retried result as the answer, and that makes it work again as before. Whilst also retrying empty contents to see if it can be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants