Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional abort related to dnssec-trigger #24

Open
pemensik opened this issue Apr 26, 2019 · 4 comments
Open

Occasional abort related to dnssec-trigger #24

pemensik opened this issue Apr 26, 2019 · 4 comments

Comments

@pemensik
Copy link
Contributor

It seems this issue is related to some action done by dnssec-trigger we are using. It happens sometime on Fedora, bug #1667387. We were not able to find a reason for it, maybe you could help us?

(gdb) where
#0  0x00007f8f19bb053f in raise () from /lib64/libc.so.6
#1  0x00007f8f19b9a895 in abort () from /lib64/libc.so.6
#2  0x00007f8f19bf3927 in __libc_message () from /lib64/libc.so.6
#3  0x00007f8f19c870f5 in __fortify_fail_abort () from /lib64/libc.so.6
#4  0x00007f8f19c87127 in __fortify_fail () from /lib64/libc.so.6
#5  0x00007f8f19c850e6 in __chk_fail () from /lib64/libc.so.6
#6  0x00005583269dca0d in memmove (__len=<optimized out>, __src=<optimized out>, __dest=0x7fffad5ea930)
    at /usr/include/bits/string_fortified.h:40
#7  xfr_probe_send_probe (xfr=xfr@entry=0x5583287edea0, env=env@entry=0x5583288151c8, timeout=timeout@entry=100)
    at services/authzone.c:5793
#8  0x00005583269dd760 in xfr_probe_send_or_end (xfr=xfr@entry=0x5583287edea0, env=env@entry=0x5583288151c8)
    at services/authzone.c:6074
#9  0x00005583269ddf6d in xfr_probe_send_or_end (env=0x5583288151c8, xfr=0x5583287edea0) at services/authzone.c:6147
#10 auth_xfer_probe_lookup_callback (arg=0x5583287edea0, rcode=<optimized out>, buf=0x0, sec=<optimized out>, 
    why_bogus=<optimized out>, was_ratelimited=<optimized out>) at services/authzone.c:6147
#11 0x000055832699b70e in mesh_state_cleanup (mstate=0x558328d644c0) at services/mesh.c:744
#12 0x000055832699d0d3 in mesh_state_delete (qstate=<optimized out>) at services/mesh.c:795
#13 mesh_state_delete (qstate=<optimized out>) at services/mesh.c:761
#14 0x000055832699d219 in mesh_delete_helper (n=<optimized out>) at services/mesh.c:293
#15 mesh_delete_all (mesh=0x558328e60860) at services/mesh.c:293
#16 0x0000558326973ce6 in do_flush_requestlist (worker=0x558328813d10, ssl=0x7fffad5eb260) at daemon/remote.c:2947
#17 execute_cmd (rc=rc@entry=0x5583286efa40, ssl=ssl@entry=0x7fffad5eb260, 
    cmd=cmd@entry=0x7fffad5eae20 " flush_requestlist", worker=0x558328813d10) at daemon/remote.c:2947
#18 0x0000558326974b6f in handle_req (rc=rc@entry=0x5583286efa40, res=res@entry=0x7fffad5eb260, s=<optimized out>)
    at daemon/remote.c:3095
#19 0x0000558326974c72 in remote_control_callback (c=0x5583287e0b40, arg=arg@entry=0x558328eb9e20, err=err@entry=0, 
    rep=rep@entry=0x0) at daemon/remote.c:3177
#20 0x0000558326974f45 in remote_accept_callback (err=0, rep=<optimized out>, arg=0x5583286efa40, c=<optimized out>)
    at daemon/remote.c:521
#21 remote_accept_callback (c=<optimized out>, arg=0x5583286efa40, err=<optimized out>, rep=<optimized out>)
    at daemon/remote.c:443
#22 0x00007f8f1a3d6031 in event_process_active_single_queue () from /lib64/libevent-2.1.so.6
#23 0x00007f8f1a3d67c7 in event_base_loop () from /lib64/libevent-2.1.so.6
#24 0x0000558326a14530 in comm_base_dispatch (b=<optimized out>) at util/netevent.c:244
#25 0x00005583269796ed in worker_work (worker=<optimized out>) at daemon/worker.c:1884
#26 0x000055832696d26b in daemon_fork (daemon=0x5583286bb260) at daemon/daemon.c:666
#27 0x0000558326968c6f in run_daemon (need_pidfile=1, log_default_identity=0x7fffad5ecdf3 "unbound", debug_mode=1, 
    cmdline_verbose=0, cfgfile=0x558326a2fe1f "/etc/unbound/unbound.conf") at daemon/unbound.c:652
#28 main (argc=<optimized out>, argv=<optimized out>) at daemon/unbound.c:749
(gdb) frame 7
#7  xfr_probe_send_probe (xfr=xfr@entry=0x5583287edea0, env=env@entry=0x5583288151c8, timeout=timeout@entry=100)
    at services/authzone.c:5793
5793			memmove(&addr, &xfr->task_probe->scan_addr->addr, addrlen);
(gdb) p *xfr->task_probe->scan_addr
$1 = {next = 0x558328f599b0, addr = {ss_family = 161, 
    __ss_padding = "\000\000\000\000\000\000\360\\\361(\203U\000\000 \022~(\203U\000\000\200\000\000\000\000\000\000\000\340@\241&\203U\000\000`\205\353(\203U\000\000\377\377\377\377", '\000' <repeats 12 times>, "\377\377\377\377\000\000\000\000\360\364\200(\203U", '\000' <repeats 34 times>, "\001\000\000\000\000\000\000", __ss_align = 56728}, 
  addrlen = 151524}
(gdb) p *xfr->task_probe->scan_addr->next
$2 = {next = 0x0, addr = {ss_family = 177, 
    __ss_padding = "\000\000\000\000\000\000`\215\323\031\217\177\000\000pT\365(\203U\000\000\200\000\000\000\000\000\000\000\340@\241&\203U\000\000p\261\355(\203U\000\000\377\377\377\377", '\000' <repeats 12 times>, "\377\377\377\377\000\000\000\000\360\364\200(\203U", '\000' <repeats 34 times>, "\001\000\000\000\000\000\000", __ss_align = 56727}, 
  addrlen = 862559}
(gdb) p *xfr->task_probe->scan_target
$3 = {next = 0x5583287ee120, host = 0x5583287ee100 "k.root-servers.net", file = 0x0, http = 0, ixfr = 1, 
  allow_notify = 0, ssl = 0, port = 0, list = 0x0}
(gdb)

Happens often in version 1.8.3, but I think it happens sometime in 1.9.x sometime.

@wcawijngaards
Copy link
Member

This looks like a problem that is fixed in one of these issues.
474afc9
c6369e9
specifically the first one one address type that could cause it to fail.

So, updating to the code repository version may fix that problem.
It is the auth-zone code, and not really dnssec-trigger. dnssec-trigger is also active I guess, during network start.
Thanks for the stacktrace and report, by the way.

@wcawijngaards
Copy link
Member

wcawijngaards commented Apr 26, 2019

In any case, the new c26fc84 auth zone logging code could be useful to diagnose the trouble more clearly if the existing fixes do not cover the issues you are facing. They log at verbosity high levels, with more information why unbound is doing lookups. (edit to fix commit link)

@pemensik
Copy link
Contributor Author

pemensik commented May 7, 2019

Thanks for points to release. Unfortunately, it crashed to me again recently with version containing all those commits. Again on root zone check. Backtrace on bugzilla comment

wcawijngaards added a commit that referenced this issue Jun 17, 2019
@wcawijngaards
Copy link
Member

Added a new commit that fixes the scan_addr that is referenced in the memmove from the backtrace, so that it is zeroed when the list is freed. That should stop that from getting used if the lookups fail and it points to the result of a previous scan. But I did not reproduce the issue, so although this fixes elements from the failure and stacktrace, I cannot say for sure. Thanks for the detailed stack traces, by the way, those are very helpful.

jedisct1 added a commit to jedisct1/unbound that referenced this issue Jun 29, 2019
* nlnet/master: (22 commits)
  Nicer spelling and layout.
  - For NLnetLabs#45, check that 127.0.0.1 and ::1 are not used in unbound.conf   when do-not-query-localhost is turned on, or at default on,   unbound-checkconf prints a warning if it is found in forward-addr or   stub-addr statements.
  - Fix memleak in unit test, reported from the clang 8.0 static analyzer.
  - Fix python dict reference and double free in config.
  - Merge PR NLnetLabs#6: Python module: support multiple instances - Merge PR NLnetLabs#5: Python module: define constant MODULE_RESTART_NEXT - Merge PR NLnetLabs#4: Python module: assign something useful to the   per-query data store 'qdata' Noted in Changelog.
  - Added documentation to the ipset files (for doxygen output).
  - make depend
  - Fix to make unbound-control with ipset, remove unused variable,   use unsigned type because of comparison, and assign null instead   of compare with it.  Remade lex and yacc output.
  - PR NLnetLabs#28: IPSet module, by Kevin Chou.  Created a module to support   the ipset that could add the domain's ip to a list easily.   Needs libmnl, and --enable-ipset and config it, doc/README.ipset.md. - Fix to omit RRSIGs from addition to the ipset.
  - Fix for NLnetLabs#24: Fix abort due to scan of auth zone masters using old   address from previous scan.
  - Fix NLnetLabs#39: In libunbound, leftover logfile is close()d unpredictably.
  - Master contains version 1.9.3 in development.
  fix segmentation fault
  rollback the code
  bugfix
  performance improvement
  edit config parser to support ipset
  Add support for ipset
  Document how to configure multiple python modules
  Support multiple python module instances
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants