-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processQueryResponse() THROWAWAY should be mindful of fail_reply #923
Comments
The commit fixes it by having a copy of the address |
jedisct1
added a commit
to jedisct1/unbound
that referenced
this issue
Sep 4, 2023
* nlnet/master: (44 commits) - Fix NLnetLabs#927: unbound 1.18.0 make test error. Fix make test without SHA1. - Fix autoconf 2.69 warnings in configure. - Fix for WKS call to getservbyname that creates allocation on exit in unit test by testing numbers first and testing from the services list later. Tag 1.18.0rc1 became the 1.18.0 release on 30 aug 2023, with the fix from 25 aug, fix compile on NetBSD included. The repository continues with version 1.18.1. - Fix for version generation race condition that ignored changes. - Fix compile error on NetBSD in util/netevent.h. - Tag for 1.18.0rc1 release. - Set version number to 1.18.0. - Fix unit test for unbound-control to work when threads are disabled, and fix cache dump check. - Fix NLnetLabs#923: processQueryResponse() THROWAWAY should be mindful of fail_reply. - Fix for NLnetLabs#925: unbound.service: Main process exited, code=killed, status=11/SEGV. Fixes cachedb configuration handling. - Fix windows ci workflow to install bison and flex. Further debug for windows ci workflow. - Debug Windows ci workflow. - Fix stat_values test to work with dig that enables DNS cookies. - Fix uninitialized memory passed in padding bytes of cmsg to sendmsg. Changelog for commit. - Fix for iter_dec_attempts that could cause a hang, part of capsforid and qname minimisation, depending on the settings. - Fix for iter_dec_attempts that could cause a hang, part of capsforid and qname minimisation, depending on the settings. - Fix ip_ratelimit test to work with dig that enables DNS cookies. - Fix regional_alloc_init for potential unaligned source of the copy. ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Oracle Solaris ships unbound on SPARC platforms built with ADI (Application Data Integrity) enabled. This SPARC HW feature picks up on memory corruption/violations, and it's this feature which is triggering unbound SEGVs.
NOTE: In the following examples, "XXX" is a redaction.
The resulting core file when analysed points to a problem with attempting to log a SERVFAIL. The unbound configuration has:
log-servfail: yes
Disabling ADI (by using elfedit(1) on the unbound binary) instead results in the SERVFAIL messages being logged. A real world example:
Jul 8 05:15:01 XXX unbound: [ID 993594 daemon.error] [11684:0] error: SERVFAIL <XXX. A IN>: all the configured stub or forward servers failed, at zone . from (inet_ntop_error) upstream server timeout
The "inet_ntop_error" is the message of interest here.
This comes from addr_to_str(). We can see addr_to_str() in the stacktrace in the core file:
Running through the call sequence, errinf_reply() is attempting to "add response specific error information for log servfail". It calls addr_to_str() passing "fail_reply" (a copy of a pointer to a struct comm_reply). In turn, addr_to_str() calls inet_ntop(), which first validates the address family; failure to validate means inet_ntop() returns NULL, and it's this NULL that results in addr_to_str() producing the "(inet_ntop_error)" string.
So why does the address family validation fail?
Using debug logging it appeared that lookups were failing with both THROWAWAY and timeouts. Code inspection lead to the following few lines:
iterator/iterator.c
Namely the "without resetting anything" comment.
Rather than attempt to craft a DNS environment which results in response_type_from_server() returning RESPONSE_TYPE_THROWAWAY, response_type_from_server() was modified to always return RESPONSE_TYPE_THROWAWAY.
Then set unbound.conf to have two forward-addr settings: one for a working DNS server, the other for a machine with no DNS service.
Finally, a script which fires a number of dig(1) queries at unbound completes the test case.
Without ADI enabled, log messages seen were of the likes:
[1692363449] unbound[1500:0] error: SERVFAIL <XXX. A IN>: all the configured stub or forward servers failed, at zone . from (inet_ntop_error) upstream server timeout
Which seems pretty close to the original failure. The suspect code/comment suggests what's happening is:
Changing processQueryResponse() and clearing "fail_reply", ie:
gives us a quick fix, as we aren't leaving an old pointer lying around.
Note...the change to response_type_from_server() is an ugly hack to make reproducing the circumstances easier, obviously in the real world the DNS environment was resulting in a RESPONSE_TYPE_THROWAWAY return value from time to time.
Finally, as a quick fix I'm sure there's a more elegant/complete solution which can be implemented.
The text was updated successfully, but these errors were encountered: