unbound crashing with SIGABRT #38

bleve · 2019-06-06T05:21:41Z

Jun 06 07:46:51 mx3-fi1 unbound[20686]: *** stack smashing detected ***: /usr/sbin/unbound terminated
Jun 06 07:46:51 mx3-fi1 unbound[20686]: ======= Backtrace: =========
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /lib64/libc.so.6(__fortify_fail+0x37)[0x7fcfe5cf0b67]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /lib64/libc.so.6(+0x117b22)[0x7fcfe5cf0b22]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x2f351)[0x55cc1f6e1351]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x36822)[0x55cc1f6e8822]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x2e703)[0x55cc1f6e0703]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x2a23e)[0x55cc1f6dc23e]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x3cdd8)[0x55cc1f6eedd8]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(iter_operate+0x335)[0x55cc1f6efb05]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x4d782)[0x55cc1f6ff782]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(worker_handle_request+0x1b45)[0x55cc1f6da055]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0xc91db)[0x55cc1f77b1db]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0xc5868)[0x55cc1f777868]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(comm_point_tcp_handle_callback+0xf4)[0x55cc1f777ba4]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /lib64/libevent-2.0.so.5(event_base_loop+0x774)[0x7fcfe6a00a14]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0xc2eac)[0x55cc1f774eac]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x1ddf1)[0x55cc1f6cfdf1]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x1995f)[0x55cc1f6cb95f]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fcfe5bfb495]
Jun 06 07:46:51 mx3-fi1 unbound[20686]: /usr/sbin/unbound(+0x1a542)[0x55cc1f6cc542]

This is new issue, This didn't happen yet with svn trunk@5175 but happens with git a4f4d7b

The text was updated successfully, but these errors were encountered:

bleve · 2019-06-06T05:23:03Z

svn trunk@5175 is git 2a78803

wcawijngaards · 2019-06-06T07:19:41Z

Hi bleve,
Could you get a stack trace with debug symbols available? (perhaps install the debug symbol package). Or compile with debug symbols. Is this reproducible somehow, eg. in valgrind or gdb? If so that would be nice, or perhaps verbosity 4 logging of the events preceding the failure?

bleve · 2019-06-06T07:23:28Z

Problem is I only see that on heavily loaded production email servers where I can't experiment. Upgraded to new snapshot from previous one last night and more than half of the servers had unbound crash this morning.

wcawijngaards · 2019-06-06T07:30:20Z

The debug symbol package should only affect the printout and not the running server. On some package systems this is just a package you can install next to it, with the debug symbols, it would make all the anonymous unbound+xx functions have a name.

bleve · 2019-06-06T07:35:10Z

I know - but I can't run crashing version on production. I downgaded already to previous working commit.

bleve · 2019-06-06T07:39:26Z

Ok. I downgraded unbound on one server to crashing version and installed debuginfos - we can only hope it hits the issue.

wcawijngaards · 2019-06-06T07:39:41Z

Thanks!

bleve · 2019-06-06T08:54:50Z

Crahser happened this fast first time but no better backtrace, installing abrtd to get proper coredump.

bleve · 2019-06-06T14:51:42Z

(gdb) bt full
#0 0x00007f0d130b22c7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
resultvar = 0
pid = 25268
selftid = 25273
#1 0x00007f0d130b39b8 in __GI_abort () at abort.c:90
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x7f0d13204071, sa_sigaction = 0x7f0d13204071}, sa_mask = {__val = {3, 139694080928315, 5, 139694132179495, 1,
139694132188209, 3, 139694080928292, 12, 139694132188213, 2, 139694132188213, 2, 139694080928976, 139694080928976, 139694080930736}}, sa_flags = 4,
sa_restorer = 0x7f0d10122ed0}
sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x00007f0d130f4e17 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f0d13205492 "*** %s ***: %s terminated\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7f0d101233c0, reg_save_area = 0x7f0d101232d0}}
ap_copy = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7f0d101233c0, reg_save_area = 0x7f0d101232d0}}
fd = 2
on_2 =
list =
nlist =
cp =
written =
#3 0x00007f0d13193b67 in __GI___fortify_fail (msg=msg@entry=0x7f0d1320547a "stack smashing detected") at fortify_fail.c:30
No locals.
#4 0x00007f0d13193b22 in __stack_chk_fail () at stack_chk_fail.c:28
No locals.
#5 0x0000556a51981351 in dname_query_hash (dname=0x7f0cf875fcb2 "", h=654970236) at util/data/dname.c:295
labuf = "\240\373u\370\f\177\000\000"\000\000\000\000\000\000\000\020\000\001", '\000' <repeats 14 times>, "\001", '\000' <repeats 29 times>
lablen =
i =
#6 0x0000000000000000 in ?? ()
No symbol table info available.

wcawijngaards · 2019-06-06T14:55:54Z

Hi bleve,
Another person has reported a similar bug report. His bug report included information that makes me conclude there was a build failure. Did you make clean? Can you rebuild in a new directory? Can you disable link time optimizations with --disable-flto option for configure. Or disable more optimizations with CFLAGS="-g" and (if you passed custom compile flags) not pass other optimizer flags?
That should give a cleaner stack trace or maybe fix the bug if it is related to compiler processing. With CFLAGS="-g3" for configure there is even more debug information. The performance is not really that different (I mean, not like high levels of logging and so on, the compiler optimizer is obviously good at making it faster, but the non-optimized code can run too).

bleve · 2019-06-06T14:59:20Z

I build rpm packages in mock build which is completely clean for all builds.

bleve · 2019-06-06T15:00:26Z

And previous commit id works just fine with exactly same build falgs.

wcawijngaards · 2019-06-06T15:00:51Z

That is excellent. So I would like to disable the optimizations, eg. custom CFLAGS with optimization options, and with --disable-flto passed to configure.... what really?

wcawijngaards · 2019-06-06T15:01:56Z

You mean that last commit, I mean that one commit is the cause of the problem? Or you mean the older version that you cited above built fine. Yeah I guess the code changed in the meantime...

bleve · 2019-06-06T15:02:18Z

I mean the previous non-problematic commit id works.

wcawijngaards · 2019-06-06T15:03:41Z

Ok, but I still think it could be the optimizer. Could try optimization options disabled. Or could try to 'bisect' to search which commit(s) cause the problem, eg. what caused the problem to appear.

bleve · 2019-06-06T15:07:13Z

Used build flags are:

CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -fPIE -pie'
CXXFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -fPIE -pie'
LDFLAGS='-Wl,-z,relro,-z,now -pie'

bleve · 2019-06-06T15:09:57Z

I cna bisect that but it takes some time, it takes about one hour to crash and there is about five steps to bisect.

wcawijngaards · 2019-06-07T06:10:26Z

Okay thanks for the build flags. So the --disable-flto flag to configure disables another optimization argument that is added by configure and not in the list put there. Trying that or the bisect could maybe show up something, or show a better stack trace (i.e. #6 is not ??).

bleve · 2019-06-07T07:18:40Z

I removed -O2 and added --disable-flto for bisect builds. d4f697f was good.

bleve · 2019-06-07T10:39:01Z

Tested with latest git to make sure I get crash with new build options.

(gdb) bt full
#0 x86_64_fallback_frame_state (context=0x7f5f300758c0, context=0x7f5f300758c0, fs=0x7f5f300759b0) at ./md-unwind-support.h:58
pc = 0x100000000 <Address 0x100000000 out of bounds>
sc =
new_cfa =
#1 uw_frame_state_for (context=context@entry=0x7f5f300758c0, fs=fs@entry=0x7f5f300759b0) at ../../../libgcc/unwind-dw2.c:1253
fde = 0x0
cie =
aug =
insn =
end =
#2 0x00007f5f2f670fb9 in _Unwind_Backtrace (trace=0x7f5f330e2f20 <backtrace_helper>, trace_argument=0x7f5f30075b70) at ../../../libgcc/unwind.inc:290
fs = {regs = {reg = {{loc = {reg = 0, offset = 0, exp = 0x0}, how = REG_UNSAVED} <repeats 18 times>}, prev = 0x0, cfa_offset = 0, cfa_reg = 0,
cfa_exp = 0x0, cfa_how = CFA_UNSET}, pc = 0x0, personality = 0x0, data_align = 0, code_align = 0, retaddr_column = 0,
fde_encoding = 0 '\000', lsda_encoding = 0 '\000', saw_z = 0 '\000', signal_frame = 0 '\000', eh_ptr = 0x0}
context = {reg = {0x0, 0x0, 0x0, 0x7f5f300764e8, 0x0, 0x0, 0x7f5f300764f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f5f30076410, 0x7f5f30076418,
0x7f5f30076420, 0x7f5f30076428, 0x7f5f300764f8, 0x0}, cfa = 0x7f5f30076500, ra = 0x100000000, lsda = 0x0, bases = {tbase = 0x0, dbase = 0x0,
func = 0x5574cf3d0710 <dname_query_hash>}, flags = 4611686018427387904, version = 0, args_size = 0, by_value = '\000' <repeats 17 times>}
code =
#3 0x00007f5f330e3096 in __GI___backtrace (array=array@entry=0x7f5f30075d50, size=size@entry=64) at ../sysdeps/x86_64/backtrace.c:107
arg = {array = 0x7f5f30075d50, cfa = 140046804411504, cnt = 4, size = 64}
once = 2
#4 0x00007f5f33047e34 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f5f33158492 "*** %s ***: %s terminated\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:176
addrs = {0x7f5f33047e34 <__libc_message+1012>, 0x7f5f330e6b67 <__GI___fortify_fail+55>, 0x7f5f330e6b22, 0x5574cf3d07d7 <dname_query_hash+199>,
0x1, 0x7f5f336bae68 <int_rsa_verify+808>, 0x7f5f30075ed0, 0x7f5f198859e0, 0x339f7f00, 0x7f5f1987f160, 0x33, 0x339f7f0000000020,
0xed1bffff7bfac1d3, 0x7f5f19885a13, 0x7f5f1987f160, 0xb120852805306200, 0x7f5f198851a0, 0x7f5f19590ec0, 0x20,
0x5574cf420151 <ub_crypto_lock_cb+162>, 0x80, 0x1871987c97d, 0x7f5f337a6c8f, 0xa0000000a, 0xa, 0x337a6c8f, 0x7f5f337a6c8f,
0x7f5f33625f56 <CRYPTO_add_lock+102>, 0x7f5f1986f510, 0x18719590ec0, 0x7f5f19884540, 0x7f5f336dd589 <getrn+137>, 0x7f5f19884540,
0x5574d042edb0, 0x6, 0x5574cf420151 <ub_crypto_lock_cb+162>, 0x7f5f19885200, 0x152198851a0, 0x7f5f3375e50e, 0xa00000002, 0x5574d120e590, 0x6,
0x6, 0x5574cf420151 <ub_crypto_lock_cb+162>, 0x6, 0x20530076bc8, 0x7f5f3375e50e, 0x7f5f19864eb0, 0x1, 0x7f5f30076bc8, 0x0, 0x7f5f19886100,
0x7f5f30075fd0, 0x5574cf3d0198 <query_dname_compare+154>, 0x5574cf4e2ad1, 0x0, 0x0, 0x0, 0x7f5f30075f60,
0x5574cf3d8471 <query_info_compare+116>, 0x7f5f19238220, 0x7f5f300760c0, 0x7f5f18000960, 0xffffffff5d0331cf}
n =
ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7f5f30076440, reg_save_area = 0x7f5f30076350}}
ap_copy = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7f5f30076440, reg_save_area = 0x7f5f30076350}}
fd = 2
on_2 =
list =
nlist =
cp =
written =
#5 0x00007f5f330e6b67 in __GI___fortify_fail (msg=msg@entry=0x7f5f3315847a "stack smashing detected") at fortify_fail.c:30
No locals.
#6 0x00007f5f330e6b22 in __stack_chk_fail () at stack_chk_fail.c:28
No locals.
#7 0x00005574cf3d07d7 in dname_query_hash (dname=0x7f5f188a4156 "", h=569085103) at util/data/dname.c:295
labuf = "\177", '\000' <repeats 22 times>, "\b", '\000' <repeats 39 times>
lablen = 0 '\000'
i = 127
#8 0x0000000100000000 in ?? ()
No symbol table info available.
#9 0x0000000000000000 in ?? ()
No symbol table info available.

wcawijngaards · 2019-06-07T10:46:49Z

So that #8 looks like it is no information at all. Is it stripping information from the executable or something? Or the other options damage the stack trace somehow? CFLAGS="-g" without the other options (that excludes the frame protection), perhaps that gives a better backtrace? Perhaps try that after the bisect.

bleve · 2019-06-07T10:55:32Z

There is no other app, just unbound caching dns.

Antonio-Prado · 2019-06-07T19:30:35Z

over here we decided to downgrade to 1.8.3 in the meantime

bjovereinder · 2019-06-07T19:34:55Z

Hi Antonio,

Thanks for reporting. You are downgrading to 1.8.3 for the same reason (that is SIGABRT)?

bleve · 2019-06-08T07:05:48Z

git bisect start
good: [2a78803] - Fix wrong query name in local zone redirect answers with a CNAME, the copy of the local alias is in unpacked form.
git bisect good 2a78803
bad: [a4f4d7b] - Fix to guard _OPENBSD_SOURCE from redefinition.
git bisect bad a4f4d7b
good: [d4f697f] Merge branch 'master' into features/XoT
git bisect good d4f697f
good: [779b86f] - Fix doxygen output error on readme markdown vignettes.
git bisect good 779b86f
good: [3103961] - Fix #31: swig 4.0 and python module.
git bisect good 3103961
bad: [a03f0a3] - Fix double file close in tcp pipelined response code.
git bisect bad a03f0a3
bad: [0b77c9d] - Fix that spoolbuf is not used to store tcp pipelined response between mesh send and callback end.
git bisect bad 0b77c9d
skip: [1c3ba0c] - Note that so-reuseport at extreme load is better turned off, otherwise queries are not distributed evenly, on Linux 4.4.x.
git bisect skip 1c3ba0c
only skipped commits left to test
possible first bad commit: [0b77c9d] - Fix that spoolbuf is not used to store tcp pipelined response between mesh send and callback end.
possible first bad commit: [1c3ba0c] - Note that so-reuseport at extreme load is better turned off, otherwise queries are not distributed evenly, on Linux 4.4.x.
possible first bad commit: [3103961] - Fix #31: swig 4.0 and python module.
possible first bad commit: [a08fe8c] - Attempt to fix malformed tcp response.
possible first bad commit: [a95f5fd] - Squelch log messages from tcp send about connection reset by peer. They can be enabled with verbosity at higher values for diagnosing network connectivity issues.
possible first bad commit: [a922a19] - Revert fix for oss-fuzz, error is in that build script that unconditionally includes .o files detected by configure, also when the machine architecture uses different LIBOBJS files.
possible first bad commit: [a8d0177] - Attempt to fix build failure in oss-fuzz
good: [1c3ba0c] - Note that so-reuseport at extreme load is better turned off, otherwise queries are not distributed evenly, on Linux 4.4.x.
git bisect good 1c3ba0c
first bad commit: [0b77c9d] - Fix that spoolbuf is not used to store tcp pipelined response between mesh send and callback end.

bleve · 2019-06-08T07:53:51Z

Now testing with commit 0b77c9d reverted to verify bisect.

Antonio-Prado · 2019-06-08T07:59:09Z

Thanks for reporting. You are downgrading to 1.8.3 for the same reason (that is SIGABRT)?

yes, for the very same reason on both our 1.9.1 instances

bleve · 2019-06-08T08:19:06Z

This issue is not in 1.9.1. You are talking about different issue which has been fixed already.

Antonio-Prado · 2019-06-08T08:26:34Z

you mean this one is a different issue?

May 29 11:11:44 ns2-rec kernel: pid 78000 (unbound), uid 59: exited on signal 6

bleve · 2019-06-08T08:31:56Z

That is different because that issue we are talking here is introduced in last two weeks in git version.

bleve · 2019-06-08T08:35:24Z

Can you get a backtrace from your crasher? And if it's not same as this one, you could report that as a separate issue - and of course if it's same, please give that information too.

Antonio-Prado · 2019-06-08T09:45:30Z

well, as previously mentioned, we just downgraded to 1.8.3 so currently we're no more observing that issue.
will report updates, of course

bleve · 2019-06-09T11:30:10Z

No crashes since reverting bisected bad commit.

wcawijngaards · 2019-06-11T07:36:51Z

Thank you very much for the bisect! The commit has to be removed. Issue is that the bug it fixes needs a different fix now...

wcawijngaards · 2019-06-11T10:17:57Z

Fix with the original fix and a fix for the fix, this makes several error cases also use the correct buffer. In 6067ce6
That should work without the bisect commit removed. And I intend to release that in the rc2, and that to become the release. (Further fixes either as rc3 or for the release after, I want to get the current bugfixes out).

bleve · 2019-06-11T11:52:37Z

Testing fix, I'll report back in few hours.

bleve · 2019-06-11T14:06:41Z

Looks good in production.

wcawijngaards · 2019-06-11T14:19:49Z

Wow Thanks! That is good to hear, since rc2 already contains it, that should go to the released version soon. I'll close the issue then.

wcawijngaards closed this as completed Jun 11, 2019

unbound crashing with SIGABRT #38

unbound crashing with SIGABRT #38

Comments

bleve commented Jun 6, 2019

bleve commented Jun 6, 2019

wcawijngaards commented Jun 6, 2019

bleve commented Jun 6, 2019

wcawijngaards commented Jun 6, 2019

bleve commented Jun 6, 2019

bleve commented Jun 6, 2019

wcawijngaards commented Jun 6, 2019

bleve commented Jun 6, 2019

bleve commented Jun 6, 2019

wcawijngaards commented Jun 6, 2019

bleve commented Jun 6, 2019

bleve commented Jun 6, 2019

wcawijngaards commented Jun 6, 2019

wcawijngaards commented Jun 6, 2019

bleve commented Jun 6, 2019

wcawijngaards commented Jun 6, 2019

bleve commented Jun 6, 2019

bleve commented Jun 6, 2019

wcawijngaards commented Jun 7, 2019

bleve commented Jun 7, 2019

bleve commented Jun 7, 2019

wcawijngaards commented Jun 7, 2019

bleve commented Jun 7, 2019

Antonio-Prado commented Jun 7, 2019

bjovereinder commented Jun 7, 2019 • edited Loading

bleve commented Jun 8, 2019 • edited Loading

bleve commented Jun 8, 2019

Antonio-Prado commented Jun 8, 2019

bleve commented Jun 8, 2019

Antonio-Prado commented Jun 8, 2019

bleve commented Jun 8, 2019

bleve commented Jun 8, 2019 • edited Loading

Antonio-Prado commented Jun 8, 2019

bleve commented Jun 9, 2019 • edited Loading

wcawijngaards commented Jun 11, 2019

wcawijngaards commented Jun 11, 2019

bleve commented Jun 11, 2019

bleve commented Jun 11, 2019

wcawijngaards commented Jun 11, 2019

bjovereinder commented Jun 7, 2019 •

edited

Loading

bleve commented Jun 8, 2019 •

edited

Loading

bleve commented Jun 8, 2019 •

edited

Loading

bleve commented Jun 9, 2019 •

edited

Loading