Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound core dump on SIGSEGV #1099

Closed
iruzanov opened this issue Jul 5, 2024 · 4 comments
Closed

Unbound core dump on SIGSEGV #1099

iruzanov opened this issue Jul 5, 2024 · 4 comments

Comments

@iruzanov
Copy link

iruzanov commented Jul 5, 2024

Hello, colleagues!

There is some strange issue has been obsorved under loaded Unbound's work: the program was crashed suddenly.
Unbound version: 1.20.0
System version: 4.18.0-240.el8.x86_64 (Centos 8)
Compiled with gcc: gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC)
Compiled options and modules:
Version 1.20.0
Configure line: --disable-dnscrypt --enable-dnstap --enable-ecdsa --disable-event-api --enable-gost --with-libevent --disable-subnet --disable-tfo-client --disable-tfo-server --with-pthreads --prefix=/usr/local --localstatedir=/var --mandir=/usr/local/man --infodir=/usr/local/share/info/
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 1.1.1g FIPS 21 Apr 2020
Linked modules: dns64 respip validator iterator

Below there are some details:

  1. The code on that program was crashed inside of current trhead:
    Core was generated by `/usr/local/sbin/unbound -c /srvs/unbound/etc/unbound/unbound.conf'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0 0x00000000004a5cb0 in store_rrsets (env=, env=, qrep=, qstarttime=1719417033, region=0x7f7a7a0c6180, pside=, leeway=, now=1719417033, rep=0x7f7ad81fb3b0)
    at services/cache/dns.c:119
    119 if(ttl < min_ttl) min_ttl = ttl;
    [Current thread is 1 (Thread 0x7f7f9f7fe700 (LWP 518126))]
    (gdb) list
    114 rep->rrsets[i] = rep->ref[i].key;
    115 }
    116 /* if ref was updated make sure the message ttl is updated to
    117 * the minimum of the current rrsets. /
    118 ttl = ((struct packed_rrset_data
    )rep->rrsets[i]->entry.data)->ttl;
    119 if(ttl < min_ttl) min_ttl = ttl;
    120 }
    121 if(min_ttl < rep->ttl) {
    122 rep->ttl = min_ttl;
    123 rep->prefetch_ttl = PREFETCH_TTL_CALC(rep->ttl);

This is simple if(ttl < min_ttl) min_ttl = ttl;
However min_ttl was not changed although min_ttl > ttl:
(gdb) print ttl
$1 = 1719417633
(gdb) print min_ttl
$2 = 1719503433

  1. stack trace of current thread:
    (gdb) bt
    0 0x00000000004a5cb0 in store_rrsets (env=, env=, qrep=, qstarttime=1719417033, region=0x7f7a7a0c6180, pside=, leeway=, now=1719417033, rep=0x7f7ad81fb3b0)
    at services/cache/dns.c:119
    1 dns_cache_store_msg (qstarttime=1719417033, region=0x7f7a7a0c6180, flags=272, qrep=0x7f7a7a0c8ed0, pside=, leeway=, rep=0x7f7ad81fb3b0, hash=1152309568, qinfo=0x7f7f9f7fd440, env=0x23efc20)
    at services/cache/dns.c:164
    2 dns_cache_store (env=0x23efc20, msgqinf=, msgrep=0x7f7a7a0c8ed0, is_referral=, leeway=, pside=, region=0x7f7a7a0c6180, flags=272, qstarttime=1719417033)
    at services/cache/dns.c:1092
    3 0x000000000048a7f7 in iter_dns_store (env=, msgqinf=, msgrep=, is_referral=, leeway=, pside=, region=0x7f7a7a0c6180, flags=272,
    qstarttime=1719417033) at iterator/iter_utils.c:697
    4 0x0000000000494d4b in processQueryResponse (ie=, ie=, id=2, iq=0x7f7a7a0c6468, qstate=0x0) at iterator/iterator.c:3222
    5 iter_handle (qstate=qstate@entry=0x7f7a7a0c6210, iq=iq@entry=0x7f7a7a0c6468, ie=ie@entry=0x23af880, id=id@entry=2) at iterator/iterator.c:4145
    6 0x00000000004968f3 in process_response (qstate=, iq=, ie=, id=, outbound=, event=) at iterator/iterator.c:4377
    7 0x0000000000483386 in mesh_run (mesh=0x7f7b912fd840, mstate=0x7f7a7a0c61c0, ev=, e=) at services/mesh.c:1932
    8 0x00000000004ae613 in mesh_report_reply (what=0, reply=0x7f7f9f7fdbb0, e=0x7f7a7a0c8e80, mesh=) at services/mesh.c:856
    9 worker_handle_service_reply (c=0x7f7b90f63790, arg=0x7f7a7a0c8e80, error=0, reply_info=0x7f7f9f7fdbb0) at daemon/worker.c:268
    10 0x000000000042647f in serviced_callbacks (sq=, error=0, c=0x7f7b90f63790, rep=) at services/outside_network.c:3051
    11 0x0000000000426bc0 in serviced_udp_callback (c=0x7f7b90f63790, arg=0x7f7b24f7b410, error=, rep=0x7f7f9f7fdbb0) at services/outside_network.c:3392
    12 0x0000000000430fe7 in outnet_udp_cb (c=c@entry=0x7f7b90f63790, arg=0x7f7be5d06bc0, error=error@entry=0, reply_info=reply_info@entry=0x7f7f9f7fdbb0) at services/outside_network.c:1537
    13 0x00000000004316a7 in comm_point_udp_callback (fd=2967, event=, arg=) at util/netevent.c:1145
    14 0x00007f7fb0131d3a in event_persist_closure (ev=, base=0x7f7f68000c20) at event.c:1623
    15 event_process_active_single_queue (base=base@entry=0x7f7f68000c20, activeq=0x7f7f68001070, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1682
    16 0x00007f7fb01323e7 in event_process_active (base=0x7f7f68000c20) at event.c:1783
    17 event_base_loop (base=0x7f7f68000c20, flags=) at event.c:2006
    18 0x00000000004348cc in ub_event_base_dispatch (base=) at util/ub_event.c:280
    19 comm_base_dispatch (b=) at util/netevent.c:282
    20 0x00000000004b7b01 in worker_work (worker=0x23ee700) at daemon/worker.c:2325
    21 thread_start (arg=0x23ee700) at daemon/daemon.c:636
    22 0x00007f7fafef814a in start_thread () from /lib64/libpthread.so.0
    23 0x00007f7faf538dc3 in clone () from /lib64/libc.so.6

So, i have investigted any variables and structures in the core file. And i have not found any strange values. By the way, the lines (dns.c):
/* if ref was updated make sure the message ttl is updated to
* the minimum of the current rrsets. /
ttl = ((struct packed_rrset_data
)rep->rrsets[i]->entry.data)->ttl;
if(ttl < min_ttl) min_ttl = ttl;
}
if(min_ttl < rep->ttl) {
rep->ttl = min_ttl;
rep->prefetch_ttl = PREFETCH_TTL_CALC(rep->ttl);
rep->serve_expired_ttl = rep->ttl + SERVE_EXPIRED_TTL;
}
do present in Unbound 1.20.0 only. In early program versions there are no such lines. But i have seen the switch() block with three case-sections and i have intersted in case 1:
case 1: /* ref updated, item inserted /
rep->rrsets[i] = rep->ref[i].key;
Is there need for lock_rw_rdlock() call?
Just like its done in case 2:
case 2: /
ref updated, cache is superior /
if(region) {
struct ub_packed_rrset_key
ck;
lock_rw_rdlock(&rep->ref[i].key->entry.lock);
[...]

Please excuse me for maybe stupid question. But i have no clue why the progam was crashed on simple operation over time_t variables (that are not even pointers).

Thank you a lot in advance!

@wcawijngaards
Copy link
Member

What seems to be wrong with the time_t variable access is that it is not locked. The structure is locked just before when it is accessed. But this new access in 1.20.0, is not guarded by locks. I guess that the RRset can be updated by another thread in the mean time and then the pointer of the data becomes invalid while the thread is using it. The fix commit adds locking for the access.

The fix commit makes it through the unit test set, but I have not been able to replicate the same situation as that is reported. Hopefully the fix actually fixes the issue that is reported. It is certainly an improvement to lock this item before accessing it, much like it is locked some lines earlier.

@iruzanov
Copy link
Author

iruzanov commented Jul 5, 2024

BIG thank you, Wouter!

wcawijngaards added a commit that referenced this issue Jul 5, 2024
  is updated and fetched after it is stored, and also check for a
  changed RRset.
@wcawijngaards
Copy link
Member

Another commit to fix it, and this one instead of only locking it, also tests for deleted RRset. That could crash, because the pointer dereference would fail. It also adds checks on the id value, also for the earlier lock code, so that it does not pick up a modified RRset, if it was very quickly deleted and reused for another piece of data. It also passes the unit tests, and I think it is an improvement.

@iruzanov
Copy link
Author

iruzanov commented Jul 8, 2024

Great improvement! Many Thanks and waiting for new release! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants