Core dump while logging (master) #1472

danpascu · 2018-09-21T08:59:59Z

I got some backtraces from crashes that happen while logging. This crash happened multiple times, but I do not know what triggers it. In both of the backtraces below the crash happened while executing this line in the script:

xlog("L_INFO", "[CONFIG] ----- Start routing\n");

but the crash doesn't happen on every execution of that line, it happens randomly.

Core was generated by `/usr/sbin/opensips -w /run/opensips -P opensips.pid -m 512'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb7799cf9 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7799cf9 in __kernel_vsyscall ()
#1  0xb75d3dd0 in __libc_signal_restore_set (set=0xbfb30a00) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xb75d5297 in __GI_abort () at abort.c:89
#4  0xb760f38f in __libc_message (do_abort=<optimized out>, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
#5  0xb7615fc7 in malloc_printerr (action=<optimized out>, str=0xb7705cc6 "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=0xb775b780 <main_arena>) at malloc.c:5049
#6  0xb76184d5 in _int_malloc (av=av@entry=0xb775b780 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3765
#7  0xb761a6df in __libc_calloc (n=1, elem_size=8192) at malloc.c:3274
#8  0xb760ddb0 in __GI___open_memstream (bufloc=0xbfb30f1c, sizeloc=0xbfb30f20) at memstream.c:83
#9  0xb768af35 in __GI___vsyslog_chk (pri=134, flag=1, fmt=0x64fdf4 "%.*s", ap=0xbfb30fcc "\035") at ../misc/syslog.c:167
#10 0xb768b449 in __syslog_chk (pri=134, flag=1, fmt=0x64fdf4 "%.*s") at ../misc/syslog.c:129
#11 0x0051cd6b in syslog (__fmt=0x64fdf4 "%.*s", __pri=<optimized out>) at /usr/include/i386-linux-gnu/bits/syslog.h:31
#12 xlog_2 (msg=0x18a4970, lev=0x1842a80 "", frm=0x1842ac0 " jv\001\035") at xlog.c:230
#13 0x004e0c5a in do_action (a=0x1766aa8, msg=0x18a4970) at action.c:1638
#14 0x004e400a in run_action_list (a=0x1766aa8, msg=0x18a4970) at action.c:172
#15 0x004e3041 in do_action (a=0x1767678, msg=0x18a4970) at action.c:1130
#16 0x004e400a in run_action_list (a=0x1760ad0, msg=0x18a4970) at action.c:172
#17 0x004e430a in run_actions (msg=0x18a4970, a=0x1760ad0) at action.c:137
#18 run_top_route (a=0x1760ad0, msg=0x18a4970) at action.c:214
#19 0x004eaa33 in receive_msg (buf=<optimized out>, len=<optimized out>, rcv_info=<optimized out>, existing_context=<optimized out>, flags=<optimized out>) at receive.c:209
#20 0x00633878 in tcp_handle_req (_max_msg_chunks=<optimized out>, con=0x97e06170, req=0x76cea0 <tcp_current_req>) at net/proto_tcp/tcp_common.h:411
#21 tcp_read_req (con=0x97e06170, bytes_read=0xbfb31a88) at net/proto_tcp/proto_tcp.c:1179
#22 0x00613a8e in handle_io (fm=<optimized out>, idx=idx@entry=0, event_type=event_type@entry=1) at net/net_tcp_proc.c:241
#23 0x006160d5 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at net/../io_wait_loop.h:280
#24 tcp_worker_proc_loop () at net/net_tcp_proc.c:386
#25 0x00621993 in tcp_start_processes (chd_rank=0x749980 <chd_rank>, startup_done=0x0) at net/net_tcp.c:1887
#26 0x004c742c in main_loop () at main.c:761
#27 main (argc=<optimized out>, argv=<optimized out>) at main.c:1407
(gdb) frame 12
#12 xlog_2 (msg=0x18a4970, lev=0x1842a80 "", frm=0x1842ac0 " jv\001\035") at xlog.c:230
230	xlog.c: No such file or directory.
(gdb) print log_len
$1 = 29
(gdb) print log_buf
$2 = 0x18412d8 "[CONFIG] ----- Start routing\n"
(gdb) 

Core was generated by `/usr/sbin/opensips -w /run/opensips -P opensips.pid -m 512'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb770ccf9 in __kernel_vsyscall ()
(gdb) bt
#0  0xb770ccf9 in __kernel_vsyscall ()
#1  0xb7546dd0 in __libc_signal_restore_set (set=0xbfb81eb0) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xb7548297 in __GI_abort () at abort.c:89
#4  0xb758238f in __libc_message (do_abort=<optimized out>, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
#5  0xb7588fc7 in malloc_printerr (action=<optimized out>, str=0xb7678cc6 "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=0xb76ce780 <main_arena>) at malloc.c:5049
#6  0xb758b4d5 in _int_malloc (av=av@entry=0xb76ce780 <main_arena>, bytes=bytes@entry=360) at malloc.c:3765
#7  0xb758cbf5 in __GI___libc_malloc (bytes=360) at malloc.c:2928
#8  0xb7580d8b in __GI___open_memstream (bufloc=0xbfb823ac, sizeloc=0xbfb823b0) at memstream.c:76
#9  0xb75fdf35 in __GI___vsyslog_chk (pri=134, flag=1, fmt=0x65edf4 "%.*s", ap=0xbfb8245c "\035") at ../misc/syslog.c:167
#10 0xb75fe449 in __syslog_chk (pri=134, flag=1, fmt=0x65edf4 "%.*s") at ../misc/syslog.c:129
#11 0x0052bd6b in syslog (__fmt=0x65edf4 "%.*s", __pri=<optimized out>) at /usr/include/i386-linux-gnu/bits/syslog.h:31
#12 xlog_2 (msg=0x190d480, lev=0x18dba80 "", frm=0x18dbac0 " \372\177\001\035") at xlog.c:230
#13 0x004efc5a in do_action (a=0x17ffaa8, msg=0x190d480) at action.c:1638
#14 0x004f300a in run_action_list (a=0x17ffaa8, msg=0x190d480) at action.c:172
#15 0x004f2041 in do_action (a=0x1800678, msg=0x190d480) at action.c:1130
#16 0x004f300a in run_action_list (a=0x17f9ad0, msg=0x190d480) at action.c:172
#17 0x004f330a in run_actions (msg=0x190d480, a=0x17f9ad0) at action.c:137
#18 run_top_route (a=0x17f9ad0, msg=0x190d480) at action.c:214
#19 0x004f9a33 in receive_msg (buf=<optimized out>, len=<optimized out>, rcv_info=<optimized out>, existing_context=<optimized out>, flags=<optimized out>) at receive.c:209
#20 0x00642878 in tcp_handle_req (_max_msg_chunks=<optimized out>, con=0x97d793ec, req=0x77bea0 <tcp_current_req>) at net/proto_tcp/tcp_common.h:411
#21 tcp_read_req (con=0x97d793ec, bytes_read=0xbfb82f18) at net/proto_tcp/proto_tcp.c:1179
#22 0x00622a8e in handle_io (fm=<optimized out>, idx=idx@entry=0, event_type=event_type@entry=1) at net/net_tcp_proc.c:241
#23 0x006250d5 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at net/../io_wait_loop.h:280
#24 tcp_worker_proc_loop () at net/net_tcp_proc.c:386
#25 0x00630993 in tcp_start_processes (chd_rank=0x758980 <chd_rank>, startup_done=0x0) at net/net_tcp.c:1887
#26 0x004d642c in main_loop () at main.c:761
#27 main (argc=<optimized out>, argv=<optimized out>) at main.c:1407
(gdb) frame 12
#12 xlog_2 (msg=0x190d480, lev=0x18dba80 "", frm=0x18dbac0 " \372\177\001\035") at xlog.c:230
230	xlog.c: No such file or directory.
(gdb) print log_len
$1 = 29
(gdb) print log_buf
$2 = 0x18da2d8 "[CONFIG] ----- Start routing\n"
(gdb)

The text was updated successfully, but these errors were encountered:

razvancrainea · 2018-09-21T09:20:50Z

Hi, Dan!

Can you send us the output of opensips -V?

Best regards,
Răzvan

danpascu · 2018-09-21T10:39:41Z

I run the latest master branch from git:

version: opensips 3.0.0-dev (i386/linux)
flags: STATS: On, SHM_EXTRA_STATS, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, QM_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: unknown
main.c compiled on with gcc 6.3.0

razvancrainea · 2018-09-21T11:23:07Z

Hi, Dan!

I see that you don't have PKG_MALLOC defined, nor USE_SHM_MEM, which means you are using the system malloc() for pkg memory. So in case there's a memory corruption, it is caught by the system allocator, which actually happens (see the "corrupted size vs. prev_size" error in frame #5).

In order to track this down, I would suggest:

define PKG_MALLOC and use QM_MALLOC to check for any over/underflows
run in log_level=4, which will dump all malloc memory operations - perhaps we can gather some information to understand how that pointer/zone became corrupted.

Best regards,
Răzvan

bogdan-iancu · 2018-09-27T15:47:07Z

@danpascu , any luck in using QM_MALLOC with DBG_MALLOC to see if the memory fault is detected and reported?

danpascu · 2018-09-28T09:50:17Z

I won't be able to test this for the next 2-3 weeks. I'll get back when I have some more info.

danpascu · 2018-10-16T11:09:31Z

QM_MALLOC with DBG_MALLOC didn't detect any memory fault.

In the meanwhile I didn't get this anymore. I don't know if it's because of the various recent commits since I first had the problem, or it's just luck. I'll keep an eye on it and let you know when I have more info.

danpascu · 2018-12-05T10:17:20Z

I have not seen this in a while, so I'll assume some of the memory fixes that were committed since the bug was reported might have fixed it. I'll reopen if I run into this again.

danpascu added the bug label Sep 21, 2018

This was referenced Sep 21, 2018

Crash in master for inserting sip trace record in mysql #1473

Closed

Segmentation fault in master while sending reply #1471

Closed

danpascu closed this as completed Dec 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core dump while logging (master) #1472

Core dump while logging (master) #1472

danpascu commented Sep 21, 2018 •

edited by razvancrainea

Loading

razvancrainea commented Sep 21, 2018

danpascu commented Sep 21, 2018

razvancrainea commented Sep 21, 2018

bogdan-iancu commented Sep 27, 2018

danpascu commented Sep 28, 2018

danpascu commented Oct 16, 2018

danpascu commented Dec 5, 2018

Core dump while logging (master) #1472

Core dump while logging (master) #1472

Comments

danpascu commented Sep 21, 2018 • edited by razvancrainea Loading

razvancrainea commented Sep 21, 2018

danpascu commented Sep 21, 2018

razvancrainea commented Sep 21, 2018

bogdan-iancu commented Sep 27, 2018

danpascu commented Sep 28, 2018

danpascu commented Oct 16, 2018

danpascu commented Dec 5, 2018

danpascu commented Sep 21, 2018 •

edited by razvancrainea

Loading