Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alloc_reg_obtain() core dump #728

Closed
kerolasa opened this issue Aug 1, 2022 · 2 comments
Closed

alloc_reg_obtain() core dump #728

kerolasa opened this issue Aug 1, 2022 · 2 comments

Comments

@kerolasa
Copy link

kerolasa commented Aug 1, 2022

Describe the bug
Unbound allocator is storing variable to unallocated address.

Core was generated by `/usr/sbin/unbound -d -c /etc/unbound-upper/unbound.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/345613' in core file too small.
#0  0x000055e76da1b539 in alloc_reg_obtain (alloc=0x55e76eb23650) at util/alloc.c:333
333     util/alloc.c: No such file or directory.
[Current thread is 1 (Thread 0x7f32d199f740 (LWP 345613))]
(gdb) bt
#0  0x000055e76da1b539 in alloc_reg_obtain (alloc=0x55e76eb23650) at util/alloc.c:333
#1  outnet_serviced_query (callback=<optimized out>, was_ratelimited=0x7ffd9a229670, env=0x55e76eb24b00, buff=0x55e76ecd8760, 
    callback_arg=0x55e76f4d9da0, qstate=0x55e76f4d8800, zonelen=15, zone=0x55e76f4d8bd8 "\tawsdns-49\003org", addrlen=28, addr=0x55e76f4d9870,
    tls_auth_name=0x0, ssl_upstream=0, tcp_upstream=0, check_ratelimit=1, nocaps=0, want_dnssec=0, dnssec=32784, flags=16, 
    qinfo=0x55e76f4d8b50, outnet=0x55e76ecd8490) at services/outside_network.c:3359
#2  worker_send_query (qinfo=0x55e76f4d8b50, flags=<optimized out>, dnssec=32784, want_dnssec=0, nocaps=0, check_ratelimit=1, 
    addr=0x55e76f4d9870, addrlen=28, zone=0x55e76f4d8bd8 "\tawsdns-49\003org", zonelen=15, tcp_upstream=0, ssl_upstream=0, tls_auth_name=0x0, 
    q=0x55e76f4d8800, was_ratelimited=0x7ffd9a229670) at daemon/worker.c:2196
#3  0x000055e76da266d2 in processQueryTargets (qstate=<optimized out>, iq=<optimized out>, ie=<optimized out>, id=<optimized out>)
    at iterator/iterator.c:2642
#4  0x000055e76da2d2ca in iter_handle (qstate=<optimized out>, iq=0x55e76f4d8a50, ie=0x55e76e6a6900, id=<optimized out>)
    at iterator/iterator.c:3717
#5  0x000055e76da39def in mesh_run (mesh=0x55e76f520870, mstate=0x55e76f4d87b0, ev=<optimized out>, e=0x0) at services/mesh.c:1880
#6  0x000055e76da0ad44 in mesh_report_reply (what=0, reply=0x7ffd9a229cd0, e=0x55e76f4843a8, mesh=<optimized out>) at services/mesh.c:889
#7  worker_handle_service_reply (c=0x55e76f346ec0, arg=0x55e76f4843a8, error=0, reply_info=0x7ffd9a229cd0) at daemon/worker.c:266
#8  0x000055e76da83f8f in serviced_callbacks (sq=<optimized out>, error=0, c=0x55e76f346ec0, rep=<optimized out>)
    at services/outside_network.c:2993
#9  0x000055e76da84d70 in serviced_udp_callback (c=0x55e76f346ec0, arg=0x55e76e4fea40, error=error@entry=0, rep=rep@entry=0x7ffd9a229cd0)
    at services/outside_network.c:3333
#10 0x000055e76da807ec in outnet_udp_cb (c=<optimized out>, arg=0x55e76ecd8490, error=<optimized out>, reply_info=0x7ffd9a229cd0)
    at services/outside_network.c:1466
#11 0x000055e76da78d5d in comm_point_udp_callback (fd=277, event=<optimized out>, arg=<optimized out>) at util/netevent.c:784
#12 0x00007f32d1ea8b4f in ?? () from /lib/x86_64-linux-gnu/libevent-2.1.so.7
#13 0x00007f32d1ea928f in event_base_loop () from /lib/x86_64-linux-gnu/libevent-2.1.so.7
#14 0x000055e76daa1a29 in ub_event_base_dispatch (base=<optimized out>) at util/ub_event.c:280
#15 comm_base_dispatch.isra.0 (b=<optimized out>, b=<optimized out>) at util/netevent.c:256
#16 0x000055e76d9fb5af in worker_work (worker=<optimized out>) at daemon/worker.c:2135
#17 daemon_fork (daemon=<optimized out>) at daemon/daemon.c:701
#18 0x000055e76d9f43b8 in run_daemon (need_pidfile=1, debug_mode=1, cmdline_verbose=0, 
    cfgfile=0x7ffd9a22bf02 "/etc/unbound-upper/unbound.conf") at daemon/unbound.c:736
#19 main (argc=<optimized out>, argv=<optimized out>) at daemon/unbound.c:838

To reproduce
I have not been able to reproduce this issue.

Expected behavior
No core dump when a dns name is queried.

System:

  • Unbound version: 1.16.0
  • OS: Linux
  • unbound -V output:
    Version 1.16.0

Configure line: --enable-dnstap --disable-rpath --with-libevent --with-pidfile=/var/run/unbound.pid --prefix=/usr --sysconfdir=/etc
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 1.1.1n 15 Mar 2022
Linked modules: dns64 respip validator iterator

Additional information
This does not happen very often. My hunch is that memory allocator is
trying to store data to variable that another thread unallocated in a thread
race that should use locking, or something (sorry, I haven't followed how
unbound is implemented, maybe hazard pointers or RCU should be used or is
going wrong).

This is issue might be the same as #586

@kerolasa
Copy link
Author

kerolasa commented Aug 1, 2022

Another server, another location. Very similar but not exactly the same backtrace.

#0  0x00005574db53b11e in alloc_reg_obtain (alloc=<optimized out>) at util/alloc.c:333
#1  mesh_state_create (env=0x5574ddc8eef0, qinfo=qinfo@entry=0x7ffc5e4f4b90, cinfo=cinfo@entry=0x0, qflags=qflags@entry=272, prime=prime@entry=0, valrec=valrec@entry=1) at services/mesh.c:897
#2  0x00005574db53b5d2 in mesh_add_sub (qstate=qstate@entry=0x5574de656d70, qinfo=0x7ffc5e4f4b90, qflags=<optimized out>, prime=0, valrec=1, newq=0x7ffc5e4f4b80, sub=0x7ffc5e4f4b48)
    at services/mesh.c:1141
#3  0x00005574db53b684 in mesh_attach_sub (qstate=0x5574de656d70, qinfo=<optimized out>, qflags=<optimized out>, prime=<optimized out>, valrec=<optimized out>, newq=<optimized out>)
    at services/mesh.c:1176
#4  0x00005574db522c2e in generate_sub_request (qname=qname@entry=0x5574de659578 "\001c\ricann-servers\003net", qnamelen=<optimized out>, qtype=qtype@entry=28, qclass=qclass@entry=1,
    qstate=qstate@entry=0x5574de656d70, id=id@entry=1, iq=0x5574de656fe8, initial_state=INIT_REQUEST_STATE, finalstate=FINISHED_STATE, subq_ret=0x7ffc5e4f4c40, v=0, detached=0)
    at iterator/iterator.c:782
#5  0x00005574db52344c in generate_target_query (qclass=1, qtype=28, namelen=<optimized out>, name=0x5574de659578 "\001c\ricann-servers\003net", id=1, iq=0x5574de656fe8, qstate=0x5574de656d70)
    at iterator/iterator.c:1808
#6  query_for_targets (qstate=qstate@entry=0x5574de656d70, iq=iq@entry=0x5574de656fe8, ie=ie@entry=0x5574dd810ca0, id=id@entry=1, maxtargets=maxtargets@entry=3, num=num@entry=0x7ffc5e4f4f30)
    at iterator/iterator.c:1892
#7  0x00005574db524da7 in processQueryTargets (qstate=0x5574de656d70, iq=0x5574de656fe8, ie=0x5574dd810ca0, id=1) at iterator/iterator.c:2502
#8  0x00005574db52d2ca in iter_handle (qstate=<optimized out>, iq=0x5574de656fe8, ie=0x5574dd810ca0, id=<optimized out>) at iterator/iterator.c:3717
#9  0x00005574db5301d3 in process_response (qstate=<optimized out>, iq=<optimized out>, ie=<optimized out>, id=<optimized out>, outbound=<optimized out>, event=<optimized out>)
    at iterator/iterator.c:3951
#10 0x00005574db539def in mesh_run (mesh=0x5574de68ad90, mstate=0x5574de656d20, ev=<optimized out>, e=0x5574de658db0) at services/mesh.c:1880
#11 0x00005574db50ad44 in mesh_report_reply (what=0, reply=0x7ffc5e4f57c0, e=0x5574de658db0, mesh=<optimized out>) at services/mesh.c:889
#12 worker_handle_service_reply (c=0x5574de4b44c0, arg=0x5574de658db0, error=0, reply_info=0x7ffc5e4f57c0) at daemon/worker.c:266
#13 0x00005574db583f8f in serviced_callbacks (sq=<optimized out>, error=0, c=0x5574de4b44c0, rep=<optimized out>) at services/outside_network.c:2993
#14 0x00005574db584d70 in serviced_udp_callback (c=0x5574de4b44c0, arg=0x5574dd9e34f0, error=error@entry=0, rep=rep@entry=0x7ffc5e4f57c0) at services/outside_network.c:3333
#15 0x00005574db5807ec in outnet_udp_cb (c=<optimized out>, arg=0x5574dde42830, error=<optimized out>, reply_info=0x7ffc5e4f57c0) at services/outside_network.c:1466
#16 0x00005574db578d5d in comm_point_udp_callback (fd=266, event=<optimized out>, arg=<optimized out>) at util/netevent.c:784
#17 0x00007fa124e43b4f in ?? () from /lib/x86_64-linux-gnu/libevent-2.1.so.7
#18 0x00007fa124e4428f in event_base_loop () from /lib/x86_64-linux-gnu/libevent-2.1.so.7
#19 0x00005574db5a1a29 in ub_event_base_dispatch (base=<optimized out>) at util/ub_event.c:280
#20 comm_base_dispatch.isra.0 (b=<optimized out>, b=<optimized out>) at util/netevent.c:256
#21 0x00005574db4fb5af in worker_work (worker=<optimized out>) at daemon/worker.c:2135
#22 daemon_fork (daemon=<optimized out>) at daemon/daemon.c:701
#23 0x00005574db4f43b8 in run_daemon (need_pidfile=1, debug_mode=1, cmdline_verbose=0, cfgfile=0x7ffc5e4f6f03 "/etc/unbound-upper/unbound.conf") at daemon/unbound.c:736
#24 main (argc=<optimized out>, argv=<optimized out>) at daemon/unbound.c:838

@wcawijngaards
Copy link
Member

The code turns out to perform a double alloc_reg_release when a serviced query create fails. That should happen only very sporadically. It matches the description and stack traces that you report. I fixed it in the commit, but I have not been able to test that it fixes these stack traces. I hope that it fixes the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants