Skip to content

Unbound-1.13.1 crashed by SIGABRT #469

@iruzanov

Description

@iruzanov

Hello, Wouter!

I am actively using unbound-1.13.1 (with our DNSTAP patches, issue #367). And sometimes my unbound is crashing under highload, massive recursive TCP-requests. Any abnormal terminations caused by services/outside_network.c code. And now i have one of such core dumps:
(gdb) bt
#0 0x0000000800955c2a in thr_kill () from /lib/libc.so.7
#1 0x0000000800954084 in raise () from /lib/libc.so.7
#2 0x00000008008ca279 in abort () from /lib/libc.so.7
#3 0x0000000800464641 in ?? () from /usr/local/lib/libevent-2.1.so.7
#4 0x0000000800464939 in event_errx () from /usr/local/lib/libevent-2.1.so.7
#5 0x000000080045ec54 in evmap_io_del_ () from /usr/local/lib/libevent-2.1.so.7
#6 0x0000000800457e8f in event_del_nolock_ () from /usr/local/lib/libevent-2.1.so.7
#7 0x000000080045ada8 in event_del () from /usr/local/lib/libevent-2.1.so.7
#8 0x000000000030e25b in ub_event_del (ev=) at ./util/ub_event.c:395
#9 comm_point_close (c=0xdc97b7c00) at ./util/netevent.c:3860
#10 0x0000000000315bab in decommission_pending_tcp (outnet=, pend=0xdc9494980)
at ./services/outside_network.c:945
#11 0x00000000003147d6 in reuse_cb_and_decommission (outnet=0x18e75, pend=0x6, error=-2)
at ./services/outside_network.c:986
#12 0x0000000000317491 in outnet_tcptimer (arg=0xee67c2300) at ./services/outside_network.c:2033
#13 0x000000080045e0ed in ?? () from /usr/local/lib/libevent-2.1.so.7
#14 0x000000080045a09c in event_base_loop () from /usr/local/lib/libevent-2.1.so.7
#15 0x000000000024dc54 in thread_start (arg=0x8014c0800) at ./util/ub_event.c:280
#16 0x0000000800780fac in ?? () from /lib/libthr.so.3
#17 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf7fa000
(gdb)

If we enter frame 12 (outnet_tcptimer) and do print pend structure, we will see the following:
(gdb) print pend
$15 = (struct pending_tcp *) 0x6
(gdb) print *pend
Cannot access memory at address 0x6
(gdb)
And this corrupt pend structure is passing to reuse_cb_and_decommission() function (frame 11) and higher in the stacktrace output above.

In the outnet_tcptimer() function we can see the following code (in services/outside_network.c):
/* it was in use /
struct pending_tcp
pend=(struct pending_tcp*)w->next_waiting;

But the structure w->next_waiting is of type waiting_tcp:
(gdb) print w->next_waiting
$18 = (struct waiting_tcp *) 0xdc9494980
(gdb)

So my question - is the types casting correct in outnet_tcptimer() function? And does this corrupt pend structure cause event_errx() in libevent?
If it might help, i found structure of pending_tcp type in w structure:
(gdb) print w->outnet->tcp_free
$23 = (struct pending_tcp *) 0xdc9494980
(gdb)
(gdb) print *w->outnet->tcp_free
$24 = {next_free = 0xdc9493e40, pi = 0xd7da2c000, c = 0xdc97b7c00, query = 0x0, reuse = {node = {parent = 0xdc94953a0,
left = 0x3287d0 <rbtree_null_node>, right = 0x3287d0 <rbtree_null_node>, key = 0x0, color = 1 '\001'}, addr = {
ss_len = 0 '\000', ss_family = 2 '\002', __ss_pad1 = "\000\065X\320\017\067", __ss_align = 0,
__ss_pad2 = "\000\000\000\000\000\000\000\016", '\000' <repeats 103 times>}, addrlen = 16, is_ssl = 0,
lru_next = 0xdc9494ae0, lru_prev = 0x0, item_on_lru_list = 0, pending = 0xdc9494980, cp_more_read_again = 0,
cp_more_write_again = 0, tree_by_id = {root = 0x3287d0 <rbtree_null_node>, count = 0,
cmp = 0x3133e0 <reuse_id_cmp>}, write_wait_first = 0x0, write_wait_last = 0x0, outnet = 0xd7d805000}}
(gdb)

Big thank you in advance!

PS I did not send core-file itself because of 31GB in size of the file.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions