Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional crash with 3.0.19 #2732

Closed
pauldekkers opened this issue Jun 6, 2019 · 19 comments
Closed

Occasional crash with 3.0.19 #2732

pauldekkers opened this issue Jun 6, 2019 · 19 comments

Comments

@pauldekkers
Copy link

pauldekkers commented Jun 6, 2019

Issue type

  • Defect - Crash or memory corruption.

Defect

How to reproduce the issue

There's no trigger I'm aware of, or no special configuration. It just happens after a couple of weeks. It does happen occasionally. It happened with earlier releases too, I never got to capturing the event: but now I have a core dump :-)

(While it happened with earlier releases, somewhere above 3.0.15 I think, the exact same configuration did run without issues somewhere in the past. And while this is a busy server, the crashes are really occasional.)

I'm using the 3.0.19 docker container these days, on an Ubuntu 18.04 LTS host. (Previously it crashed with self-compiled releases and no containers too.) Looking at system graphs, I see no issues with resources (memory, disk spikes) whatsoever at the time of the crash. (It was actually at a quiet time, in the middle of the night).

This is a proxy server (it does no EAP, just RADIUS/UDP and TLS, sqlite and rlm_cache_rbtree).

Full backtrace from LLDB or GDB

# gdb /usr/sbin/freeradius /tmp/core.1
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/freeradius...Reading symbols from /usr/lib/debug/.build-id/34/960d4eec3b9f81521a5e219541668773d33c98.debug...done.
done.
[New LWP 86]
[New LWP 88]
[New LWP 1]
[New LWP 92]
[New LWP 85]
[New LWP 90]
[New LWP 84]
[New LWP 87]
[New LWP 89]
[New LWP 83]
[New LWP 91]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `freeradius -f -d /etc/freeradius'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7f65b17fa700 (LWP 86))]
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000560a22119575 in request_handler_thread (arg=0x560a257ccc60) at src/main/threads.c:826
#2  0x00007f65e913b6db in start_thread (arg=0x7f65b17fa700) at pthread_create.c:463
#3  0x00007f65e89ae88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)

Hope this gives some direction to look at. If there's a new event (hopefully similar) I'll add it to this issue, but for this crash I was already waiting for over 6 weeks. (And before that, core dumps were not happening for various reasons.)

jpereira added a commit to jpereira/freeradius-server that referenced this issue Jun 7, 2019
jpereira added a commit to jpereira/freeradius-server that referenced this issue Jun 7, 2019
jpereira added a commit to jpereira/freeradius-server that referenced this issue Jun 7, 2019
@pauldekkers
Copy link
Author

@jpereira just checking; none of your patches ended up in a PR, right?

@jpereira
Copy link
Member

@pauldekkers exactly. Alan mentioned some good points and it needs more investigation.

@mcnewton
Copy link
Member

No comments on this for a few months, is this still an issue?

@pauldekkers
Copy link
Author

@mcnewton I did not test with 3.0.20 yet, but I did not see commits to threads.c and am not aware of another fix (that would have resolved this in 3.0.19). Happy to try for a while with 3.0.20 though (it may take some time to occur).

@mcnewton
Copy link
Member

Thanks. Could you try with v3.0.x HEAD? There's been a couple of fixes since 3.0.20 which affect proxying.

@pauldekkers
Copy link
Author

Sorry it took a while. I used v3.0.x from March 2, up to commit 395a247 - it still crashes in the same way after 2 weeks:

GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/freeradius...Reading symbols from /usr/lib/debug/.build-id/2f/cbc2ddf5f383e5d1c406405f50beb6d1407be3.debug...done.
done.
[New LWP 159]
[New LWP 162]
[New LWP 164]
[New LWP 157]
[New LWP 160]
[New LWP 165]
[New LWP 158]
[New LWP 156]
[New LWP 163]
[New LWP 161]
[New LWP 1]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `freeradius -f -d /etc/freeradius'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7fcef8ff9700 (LWP 159))]
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000561875dd0dc5 in request_handler_thread (arg=0x561879e5e170) at src/main/threads.c:826
#2  0x00007fcf58fd96db in start_thread (arg=0x7fcef8ff9700) at pthread_create.c:463
#3  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)```

@jpereira
Copy link
Member

@pauldekkers can you share the result of the GDB command "bt all"? even if you have the .core file and don't mind, please compact with gzip /path/core.1 and share with us the core.1.gz file.

@pauldekkers
Copy link
Author

@jpereira I'll send you an e-mail

@jpereira
Copy link
Member

Thank you @pauldekkers, you could send it to jpereira@freeradius.org and I will share it with the others.

@jpereira
Copy link
Member

jpereira commented Mar 19, 2020

Ah! if you faced any issues to send the file due to the size. you could try to send it to Google Drive, Dropbox and share it with us. if you don't mind, of course. :)

@pauldekkers
Copy link
Author

@jpereira I sent a message first with the additional bt, hope it got through spam filters ;-)

@jpereira
Copy link
Member

jpereira commented Mar 19, 2020

I got it @pauldekkers , The .core would be better, but it is fine to see the full backtrace as the below snip.

(gdb) thread apply all bt

Thread 11 (Thread 0x7fcf5a3e97c0 (LWP 1)):
#0  0x00007fcf5883c03f in __GI___select (nfds=nfds@entry=141,readfds=readfds@entry=0x7fffdc0f3770, writefds=writefds@entry=0x0,exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffdc0f3760) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00007fcf59d895b5 in fr_event_loop (el=0x561877d80c50) at src/lib/event.c:608
#2  0x0000561875db2295 in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:634

Thread 10 (Thread 0x7fcedb7fe700 (LWP 161)):
#0  0x00007fcf58fe4607 in __libc_sendmsg (fd=fd@entry=18, msg=msg@entry=0x7fcedb7fdae0, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/sendmsg.c:28
#1  0x00007fcf59d84c9b in sendfromto (s=s@entry=18, buf=buf@entry=0x5618782a2880, len=len@entry=20, flags=flags@entry=0, from=from@entry=0x7fcedb7fdd00, fromlen=<optimized out>, to=0x7fcedb7fdc80, tolen=16) at src/lib/udpfromto.c:448
#2  0x00007fcf59d7aa98 in rad_sendto (sockfd=18, data=0x5618782a2880, data_len=20, src_ipaddr=src_ipaddr@entry=0x5618782a2664, src_port=<optimized out>, dst_ipaddr=dst_ipaddr@entry=0x5618782a2680, dst_port=2458, flags=0) at src/lib/radius.c:308
#3  0x00007fcf59d80a9b in rad_send (packet=0x5618782a2660, original=0x5618782a2320, secret=0x561877cad1b0 "..................") at src/lib/radius.c:2113
#4  0x0000561875dbd347 in auth_socket_send (listener=<optimized out>, request=0x5618782a24b0) at src/main/listen.c:1348
#5  0x0000561875dd8058 in request_finish (request=0x5618782a24b0, action=1) at src/main/process.c:1564
#6  0x0000561875dd0dc5 in request_handler_thread (arg=0x561878333070) at src/main/threads.c:826
#7  0x00007fcf58fd96db in start_thread (arg=0x7fcedb7fe700) at pthread_create.c:463
#8  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7fceda7fc700 (LWP 163)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x561877f68580) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fceda7fc700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7fcefa7fc700 (LWP 156)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x5618780f7820) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fcefa7fc700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7fcef97fa700 (LWP 158)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x5618794fe220) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fcef97fa700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7fced97fa700 (LWP 165)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x5618787e7140) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fced97fa700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7fcedbfff700 (LWP 160)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
---Type <return> to continue, or q <return> to quit---
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x5618783f1e20) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fcedbfff700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fcef9ffb700 (LWP 157)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow(sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x561878dba720) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fcef9ffb700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fced9ffb700 (LWP 164)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow(sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x561878862330) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fced9ffb700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fcedaffd700 (LWP 162)):
#0  0x00007fcf58fe26d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x561876010f08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fcf58fe27c8 in __new_sem_wait_slow (sem=sem@entry=0x561876010f08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007fcf58fe2839 in __new_sem_wait (sem=sem@entry=0x561876010f08 <thread_pool+168>) at sem_wait.c:42
#4  0x0000561875dd0c5b in request_handler_thread (arg=0x561878897760) at src/main/threads.c:755
#5  0x00007fcf58fd96db in start_thread (arg=0x7fcedaffd700) at pthread_create.c:463
#6  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fcef8ff9700 (LWP 159)):
#0  0x0000000000000000 in ?? ()
#1  0x0000561875dd0dc5 in request_handler_thread (arg=0x561879e5e170) at src/main/threads.c:826
#2  0x00007fcf58fd96db in start_thread (arg=0x7fcef8ff9700) at pthread_create.c:463
#3  0x00007fcf5884688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)

@pauldekkers
Copy link
Author

Here's another one that crashed last night. Also gives an indication of how often it occurs (after two weeks or so). Hope it helps.


Thread 11 (Thread 0x7f47a77fe700 (LWP 648)):
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab3395efa0) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47a77fe700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 10 (Thread 0x7f47c7f3b700 (LWP 9)):
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab338b10d0) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47c7f3b700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7f47c5f37700 (LWP 644)):
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab338ef580) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47c5f37700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7f47d454d7c0 (LWP 1)):
#0  0x00007f47d3146ab4 in futex_wake (private=0, processes_to_wake=1, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:231
#1  __new_sem_post (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_post.c:57
#2  0x000055ab311cf898 in request_enqueue (request=request@entry=0x55ab33d2a1f0) at src/main/threads.c:554
#3  0x000055ab311d4298 in request_queue_or_run (request=request@entry=0x55ab33d2a1f0, process=process@entry=0x55ab311d7800 <request_running>) at src/main/process.c:1093
#4  0x000055ab311d4f87 in request_receive (ctx=ctx@entry=0x55ab33d29f80, listener=listener@entry=0x55ab338c1540, packet=<optimized out>, client=client@entry=0x55ab33656ee0, fun=fun@entry=0x55ab311b2070 <rad_authenticate>) at src/main/process.c:1892
#5  0x000055ab311bd636 in auth_socket_recv (listener=0x55ab338c1540) at src/main/listen.c:1597
#6  0x000055ab311d148e in event_socket_handler (xel=<optimized out>, fd=<optimized out>, ctx=<optimized out>) at src/main/process.c:4867
#7  0x00007f47d3eed65f in fr_event_loop (el=0x55ab33823c50) at src/lib/event.c:649
#8  0x000055ab311b1295 in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:634

Thread 7 (Thread 0x7f47a7fff700 (LWP 647)):
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab339ef190) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47a7fff700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f47c4f35700 (LWP 646)):
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab33eda250) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47c4f35700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f47c5736700 (LWP 645)):
---Type <return> to continue, or q <return> to quit---
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab338a4690) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47c5736700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f47c6f39700 (LWP 11)):
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab338b13d0) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47c6f39700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f47c773a700 (LWP 10)):
#0  0x00007f47d31466d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55ab3140ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f47d31467c8 in __new_sem_wait_slow (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f47d3146839 in __new_sem_wait (sem=sem@entry=0x55ab3140ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055ab311cfc5b in request_handler_thread (arg=0x55ab338b1250) at src/main/threads.c:755
#5  0x00007f47d313d6db in start_thread (arg=0x7f47c773a700) at pthread_create.c:463
#6  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f47c873c700 (LWP 8)):
#0  xlat_process (out=out@entry=0x7f47c873a8f0, request=request@entry=0x55ab33d2a1f0, head=0x55ab338a3410, escape=escape@entry=0x0, escape_ctx=escape_ctx@entry=0x0) at src/main/xlat.c:2446
#1  0x00007f47d4122436 in xlat_expand_struct (out=out@entry=0x7f47c873a990, outlen=outlen@entry=0, request=request@entry=0x55ab33d2a1f0, node=<optimized out>, escape=escape@entry=0x0, escape_ctx=escape_ctx@entry=0x0) at src/main/xlat.c:2509
#2  0x00007f47d4123f1a in radius_axlat_struct (out=out@entry=0x7f47c873a990, request=request@entry=0x55ab33d2a1f0, xlat=<optimized out>, escape=escape@entry=0x0, ctx=ctx@entry=0x0) at src/main/xlat.c:2650
#3  0x00007f47d41176b5 in map_to_vp (ctx=0x55ab33d29fe0, out=0x7f47c873ae60, request=0x55ab33d2a1f0, map=0x55ab3389bfe0, uctx=<optimized out>) at src/main/map.c:806
#4  0x00007f47d4118277 in map_to_request (request=request@entry=0x55ab33d2a1f0, map=map@entry=0x55ab3389bfe0, func=func@entry=0x7f47d41174d0 <map_to_vp>, ctx=ctx@entry=0x0) at src/main/map.c:1124
#5  0x000055ab311c6965 in modcall_recurse (request=0x55ab33d2a1f0, component=MOD_AUTHORIZE, depth=1, entry=entry@entry=0x7f47c873b528, do_next_sibling=true) at src/main/modcall.c:602
#6  0x000055ab311c6193 in modcall_child (request=<optimized out>, component=<optimized out>, depth=<optimized out>, entry=0x7f47c873b510, c=<optimized out>, result=0x7f47c873b2f4, do_next_sibling=true) at src/main/modcall.c:410
#7  0x000055ab311c633c in modcall_recurse (request=request@entry=0x55ab33d2a1f0, component=component@entry=MOD_AUTHORIZE, depth=depth@entry=0, entry=entry@entry=0x7f47c873b510, do_next_sibling=do_next_sibling@entry=true) at src/main/modcall.c:795
#8  0x000055ab311c8f8f in modcall (component=component@entry=MOD_AUTHORIZE, c=c@entry=0x55ab3389a5e0, request=request@entry=0x55ab33d2a1f0) at src/main/modcall.c:1140
#9  0x000055ab311c39fa in indexed_modcall (comp=comp@entry=MOD_AUTHORIZE, idx=idx@entry=0, request=request@entry=0x55ab33d2a1f0) at src/main/modules.c:1022
#10 0x000055ab311c4adf in process_authorize (autz_type=autz_type@entry=0, request=request@entry=0x55ab33d2a1f0) at src/main/modules.c:2154
#11 0x000055ab311b212e in rad_authenticate (request=0x55ab33d2a1f0) at src/main/auth.c:515
#12 0x000055ab311d791f in request_running (request=0x55ab33d2a1f0, action=<optimized out>) at src/main/process.c:1628
#13 0x000055ab311cfdc5 in request_handler_thread (arg=0x55ab33885ab0) at src/main/threads.c:826
#14 0x00007f47d313d6db in start_thread (arg=0x7f47c873c700) at pthread_create.c:463
#15 0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f47c6738700 (LWP 12)):
#0  0x000000005e866d0e in ?? ()
#1  0x000055ab311cfdc5 in request_handler_thread (arg=0x55ab338b1550) at src/main/threads.c:826
#2  0x00007f47d313d6db in start_thread (arg=0x7f47c6738700) at pthread_create.c:463
#3  0x00007f47d29aa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95```

@alandekok
Copy link
Member

#0 xlat_process (out=out@entry=0x7f47c873a8f0, request=request@entry=0x55ab33d2a1f0, head=0x55ab338a3410, escape=escape@entry=0x0, escape_ctx=escape_ctx@entry=0x0) at src/main/xlat.c:2446

That's good... except I have no idea what's at line 2446 of xlat.c

i.e. all of the releases I checked have line 2446 which is empty space, brackets, simple assignment, etc. Nothing which would cause a crash.

So what version are you running, and what's on line 2446?

@pauldekkers
Copy link
Author

This was 395a247:

radiusd: FreeRADIUS Version 3.0.21 (git #395a247), for host x86_64-pc-linux-gnu

That (still) seems to be:
https://github.com/FreeRADIUS/freeradius-server/blob/v3.0.x/src/main/xlat.c#L2446
if I'm not mistaken. Which would indeed be a simple assignment. The previous core dump was different I see, so I'm not sure this is the cause of the crash or just where that thread was stuck in?

I upgraded to 9c36e20 BTW, to stay current. Next crash in two weeks perhaps ;-)

@pauldekkers
Copy link
Author

Well, that didn't take long:

(gdb) thread apply all bt

Thread 9 (Thread 0x7f919f616700 (LWP 28)):
#0  0x00007f91ac8256d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55756336ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f91ac8257c8 in __new_sem_wait_slow (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f91ac825839 in __new_sem_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055756312fc7b in request_handler_thread (arg=0x557565d6b340) at src/main/threads.c:755
#5  0x00007f91ac81c6db in start_thread (arg=0x7f919f616700) at pthread_create.c:463
#6  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7f91adc2c7c0 (LWP 1)):
#0  0x00007f91ac07f03f in __GI___select (nfds=nfds@entry=68, readfds=readfds@entry=0x7ffceacf8410, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffceacf8400) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00007f91ad5cc5b5 in fr_event_loop (el=0x557565af5550) at src/lib/event.c:608
#2  0x0000557563111295 in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:634

Thread 7 (Thread 0x7f91a161a700 (LWP 7)):
#0  0x00007f91ac8256d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55756336ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f91ac8257c8 in __new_sem_wait_slow (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f91ac825839 in __new_sem_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055756312fc7b in request_handler_thread (arg=0x557565b829d0) at src/main/threads.c:755
#5  0x00007f91ac81c6db in start_thread (arg=0x7f91a161a700) at pthread_create.c:463
#6  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f919ee15700 (LWP 29)):
#0  0x00007f91ac8256d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55756336ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f91ac8257c8 in __new_sem_wait_slow (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f91ac825839 in __new_sem_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055756312fc7b in request_handler_thread (arg=0x557565ffd840) at src/main/threads.c:755
#5  0x00007f91ac81c6db in start_thread (arg=0x7f919ee15700) at pthread_create.c:463
#6  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f919e614700 (LWP 30)):
#0  0x00007f91ac8256d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55756336ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f91ac8257c8 in __new_sem_wait_slow (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f91ac825839 in __new_sem_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055756312fc7b in request_handler_thread (arg=0x557565ba9dd0) at src/main/threads.c:755
#5  0x00007f91ac81c6db in start_thread (arg=0x7f919e614700) at pthread_create.c:463
#6  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f91a1e1b700 (LWP 6)):
#0  0x00007f91ac8256d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55756336ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f91ac8257c8 in __new_sem_wait_slow (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f91ac825839 in __new_sem_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055756312fc7b in request_handler_thread (arg=0x557565b573b0) at src/main/threads.c:755
#5  0x00007f91ac81c6db in start_thread (arg=0x7f91a1e1b700) at pthread_create.c:463
#6  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f91a0e19700 (LWP 8)):
#0  0x00007f91ad5c6a16 in strlcpy (dst=dst@entry=0x7f91a0e18d4c "Status-SeU", src=src@entry=0x55756314ab50 "Status-Server", siz=siz@entry=129) at src/lib/strlcpy.c:47
#1  0x00007f91ad5b0d2e in dict_valbyname (attr=attr@entry=1011, vendor=vendor@entry=0, name=name@entry=0x55756314ab50 "Status-Server") at src/lib/dict.c:3391
#2  0x0000557563116187 in rad_status_server (request=0x557565c2bd00) at src/main/listen.c:368
#3  0x000055756313793f in request_running (request=0x557565c2bd00, action=<optimized out>) at src/main/process.c:1628
#4  0x000055756312fde5 in request_handler_thread (arg=0x557565b82b50) at src/main/threads.c:826
#5  0x00007f91ac81c6db in start_thread (arg=0x7f91a0e19700) at pthread_create.c:463
---Type <return> to continue, or q <return> to quit---
#6  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f91a0618700 (LWP 9)):
#0  0x00007f91ac8256d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55756336ff08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f91ac8257c8 in __new_sem_wait_slow (sem=sem@entry=0x55756336ff08 <thread_pool+168>, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f91ac825839 in __new_sem_wait (sem=sem@entry=0x55756336ff08 <thread_pool+168>) at sem_wait.c:42
#4  0x000055756312fc7b in request_handler_thread (arg=0x557565b82cd0) at src/main/threads.c:755
#5  0x00007f91ac81c6db in start_thread (arg=0x7f91a0618700) at pthread_create.c:463
#6  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f919fe17700 (LWP 10)):
#0  0x0000000000000000 in ?? ()
#1  0x000055756312fde5 in request_handler_thread (arg=0x557565b82e50) at src/main/threads.c:826
#2  0x00007f91ac81c6db in start_thread (arg=0x7f919fe17700) at pthread_create.c:463
#3  0x00007f91ac08988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) quit
# freeradius -v
radiusd: FreeRADIUS Version 3.0.22 (git #9c36e20), for host x86_64-pc-linux-gnu

@alandekok
Copy link
Member

#0  0x00007f91ad5c6a16 in strlcpy (dst=dst@entry=0x7f91a0e18d4c "Status-SeU", src=src@entry=0x55756314ab50 "Status-Server", siz=siz@entry=129) at src/lib/strlcpy.c:47
#1  0x00007f91ad5b0d2e in dict_valbyname (attr=attr@entry=1011, vendor=vendor@entry=0, name=name@entry=0x55756314ab50 "Status-Server") at src/lib/dict.c:3391

There is no way that line could possibly cause a crash. It's doing a strlcpy() of a static string compiled into the binary into a buffer on the stack.

Similarly, the previous crash was in a line which assigned a value to a variable on the stack.

The only way that the system crashes when accessing the stack is (a) stack overflow, or (b) bad memory.

I would run a memory checker on the hardware. I suspect bad memory.

Also, try running the server on completely different hardware. If it doesn't crash, you know that the original hardware was faulty.

@pauldekkers
Copy link
Author

It's a VMware virtual machine. I can have it migrated to a different host. It's the only process that has been crashing since mid 2019 though. And I believe it also happened to the workload in a completely different datacenter, but I'll try. Thanks for the followup!

@pauldekkers
Copy link
Author

Last crash was April 25th (around 04:00 at night, relatively quiet), and while it should have been on different hardware: I'm not 100% sure if VMware didn't migrate things back before that time. I would expect VMware to give warnings about memory corruption either way, or have the ECC memory of the servers repair things, and there was nothing in the event logs. Meanwhile the entire hardware cluster is replaced/VM migrated.

As far as I'm concerned we could close this issue and I'd open it again if it reoccurs.

For completeness, this crash was:

Thread 3 (Thread 0x7f76a11e97c0 (LWP 1)):
#0  0x00007f769fde3ab4 in futex_wake (private=0, processes_to_wake=1, futex_word=0x55baeba2cf08 <thread_pool+168>) at ../sysdeps/unix/sysv/linux/futex-internal.h:231
#1  __new_sem_post (sem=sem@entry=0x55baeba2cf08 <thread_pool+168>) at sem_post.c:57
#2  0x000055baeb7ec8b8 in request_enqueue (request=request@entry=0x55baed4f9210) at src/main/threads.c:554
#3  0x000055baeb7f12b8 in request_queue_or_run (request=request@entry=0x55baed4f9210, process=process@entry=0x55baeb7f4820 <request_running>) at src/main/process.c:1093
#4  0x000055baeb7f1fa7 in request_receive (ctx=ctx@entry=0x55baed4f9020, listener=listener@entry=0x55baed29baf0, packet=<optimized out>, client=client@entry=0x55baed127730, fun=fun@entry=0x55baeb7d30a0 <rad_status_server>) at src/main/process.c:1892
#5  0x000055baeb7da656 in auth_socket_recv (listener=0x55baed29baf0) at src/main/listen.c:1597
---Type <return> to continue, or q <return> to quit---
#6  0x000055baeb7ee4ae in event_socket_handler (xel=<optimized out>, fd=<optimized out>, ctx=<optimized out>) at src/main/process.c:4867
#7  0x00007f76a0b8a65f in fr_event_loop (el=0x55baed1fe200) at src/lib/event.c:649
#8  0x000055baeb7ce295 in main (argc=<optimized out>, argv=<optimized out>) at src/main/radiusd.c:634

... and no new crashes since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants