-
-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash due to rbtree_remove of a node never inserted #558
Comments
Hi there, Thanks for reporting this! You mention that this is maybe related to event handling but I don't have the time to look closer atm; will do later today or tomorrow. From a quick look I see that you compiled with libevent support but unbound couldn't find it and instead uses the builtin mini-event. Another option to test is to actually use libevent and see if you could hit the same bug. If you can reproduce it in the meantime, that would be great! |
I meant embedded minievent actually. Please take a look at the pull request, I think I have nailed the problem. Can you please point at 1.13.2 commit that fixed random crashes in TCP reuse code? |
These mostly work on the reuse rbtree, and we have crash in timeout rbtree. Some, e.g. 7396eff we already have in our tree. We use the unbound that is supplied with FreeBSD. |
I had a closer look at this and tried to reproduce it (not successfully). Indeed this is different than the issues fixed previously. This seems to be for in the out-of-order processing for clients, whereas the previous fixes were for the stream reuse for upstreams. Were you able to stumble upon it again or reproduce it? Calling ub_event_add() with NULL timeval is fine. In that case the event does not have the UB_EV_TIMEOUT bit set from the caller. I wouldn't want to use the fix in the PR since it changes the event bits in the add routine which is something unexpected to calling code. But, did it solve the issue for you? |
Frames 0-2. Not much info here...
|
I now have only single core and don't have logs. But I've been told that within fleet several machines had unbound crashed. |
… reclaimed more than once during callbacks.
I believe I solved the issue with the above commits. For me the problem was really solved when I fixed the potential loop in the comm_point->tcp_free list. I also included changes to clear the UB_EV_TIMEOUT bit on event creation because it is a nice to have fix. I am closing the related PR but I am leaving this issue open; it would be good if we get feedback that these changes solve the issue for you as well. |
* nlnet/master: (23 commits) Document PR NLnetLabs#563 to changelog Clarify KEEPALIVE EDNS0 option operation Make explicit whether edns options are parsed from queries or responses add missing return code Remove wrongly added EDE comments Update util/data/msgparse.c add potential EDE spots complete renaming of the modules edns list Apply suggestions from code review Changelog note for NLnetLabs#565 - Merge NLnetLabs#565: unbound.service.in: Disable ProtectKernelTunables again. - Fix to remove unused code from rpz resolve client and action function. - Fix analyzer review failure in rpz action override code to not crash on unlocking the local zone lock. - Fix for NLnetLabs#558: clear the UB_EV_TIMEOUT bit before adding an event. - Fix for NLnetLabs#558: fix loop in comm_point->tcp_free when a comm_point is reclaimed more than once during callbacks. - Fix that forward-zone name is documented as the full name of the zone. It is not relative but a fully qualified domain name. Disable ProtectKernelTunables again - Fix NLnetLabs#552: Unbound assumes index.html exists on RPZ host. Fix keepalive logic Move option handling to parse-time split edns_data.opt_list in opt_list_in and opt_list_out ...
Closing this as inactive; the observed issue is already resolved. |
Unbound crashes due to NULL pointer dereference. This definitely is associated with TCP requests, however my reading of the code says the bug is in the event library and is not specific to TCP.
Looking into the node we see it was not removed before, since removed node would have pointers set to
<rbtree_null_node>
.Thus, node was never inserted into the tree.
The node insertion predicate is timeout pointer passed and EV_TIMEOUT flag set in event_add() at mini_event.c:311:
The node removal predicate is just the flag, see event_del() at mini_event.c:332.
So potentially we would create offending event if event_add() is ever called with NULL tv. This is possible in netevent.c:4165.
To reproduce
We don't have a reproduce case for this. Happens not so often at a large fleet of machines.
Expected behavior
Not crash!
System:
OS: FreeBSD 14
local-unbound -V
Version 1.13.1
Configure line: --with-ssl=/usr --with-libexpat=/usr --disable-dnscrypt --disable-dnstap --enable-ecdsa --disable-event-api --enable-gost --with-libevent --disable-subnet --disable-tfo-client --disable-tfo-server --with-pthreads--prefix=/usr --localstatedir=/var/unbound --mandir=/usr/share/man --build=freebsd
Linked libs: mini-event internal (it uses select), OpenSSL 1.1.1l-freebsd 24 Aug 2021
Linked modules: dns64 respip validator iterator
BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
The text was updated successfully, but these errors were encountered: