Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dump while logging (master) #1472

Closed
danpascu opened this issue Sep 21, 2018 · 7 comments
Closed

Core dump while logging (master) #1472

danpascu opened this issue Sep 21, 2018 · 7 comments
Labels

Comments

@danpascu
Copy link
Member

danpascu commented Sep 21, 2018

I got some backtraces from crashes that happen while logging. This crash happened multiple times, but I do not know what triggers it. In both of the backtraces below the crash happened while executing this line in the script:

xlog("L_INFO", "[CONFIG] ----- Start routing\n");

but the crash doesn't happen on every execution of that line, it happens randomly.

Core was generated by `/usr/sbin/opensips -w /run/opensips -P opensips.pid -m 512'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb7799cf9 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7799cf9 in __kernel_vsyscall ()
#1  0xb75d3dd0 in __libc_signal_restore_set (set=0xbfb30a00) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xb75d5297 in __GI_abort () at abort.c:89
#4  0xb760f38f in __libc_message (do_abort=<optimized out>, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
#5  0xb7615fc7 in malloc_printerr (action=<optimized out>, str=0xb7705cc6 "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=0xb775b780 <main_arena>) at malloc.c:5049
#6  0xb76184d5 in _int_malloc (av=av@entry=0xb775b780 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3765
#7  0xb761a6df in __libc_calloc (n=1, elem_size=8192) at malloc.c:3274
#8  0xb760ddb0 in __GI___open_memstream (bufloc=0xbfb30f1c, sizeloc=0xbfb30f20) at memstream.c:83
#9  0xb768af35 in __GI___vsyslog_chk (pri=134, flag=1, fmt=0x64fdf4 "%.*s", ap=0xbfb30fcc "\035") at ../misc/syslog.c:167
#10 0xb768b449 in __syslog_chk (pri=134, flag=1, fmt=0x64fdf4 "%.*s") at ../misc/syslog.c:129
#11 0x0051cd6b in syslog (__fmt=0x64fdf4 "%.*s", __pri=<optimized out>) at /usr/include/i386-linux-gnu/bits/syslog.h:31
#12 xlog_2 (msg=0x18a4970, lev=0x1842a80 "", frm=0x1842ac0 " jv\001\035") at xlog.c:230
#13 0x004e0c5a in do_action (a=0x1766aa8, msg=0x18a4970) at action.c:1638
#14 0x004e400a in run_action_list (a=0x1766aa8, msg=0x18a4970) at action.c:172
#15 0x004e3041 in do_action (a=0x1767678, msg=0x18a4970) at action.c:1130
#16 0x004e400a in run_action_list (a=0x1760ad0, msg=0x18a4970) at action.c:172
#17 0x004e430a in run_actions (msg=0x18a4970, a=0x1760ad0) at action.c:137
#18 run_top_route (a=0x1760ad0, msg=0x18a4970) at action.c:214
#19 0x004eaa33 in receive_msg (buf=<optimized out>, len=<optimized out>, rcv_info=<optimized out>, existing_context=<optimized out>, flags=<optimized out>) at receive.c:209
#20 0x00633878 in tcp_handle_req (_max_msg_chunks=<optimized out>, con=0x97e06170, req=0x76cea0 <tcp_current_req>) at net/proto_tcp/tcp_common.h:411
#21 tcp_read_req (con=0x97e06170, bytes_read=0xbfb31a88) at net/proto_tcp/proto_tcp.c:1179
#22 0x00613a8e in handle_io (fm=<optimized out>, idx=idx@entry=0, event_type=event_type@entry=1) at net/net_tcp_proc.c:241
#23 0x006160d5 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at net/../io_wait_loop.h:280
#24 tcp_worker_proc_loop () at net/net_tcp_proc.c:386
#25 0x00621993 in tcp_start_processes (chd_rank=0x749980 <chd_rank>, startup_done=0x0) at net/net_tcp.c:1887
#26 0x004c742c in main_loop () at main.c:761
#27 main (argc=<optimized out>, argv=<optimized out>) at main.c:1407
(gdb) frame 12
#12 xlog_2 (msg=0x18a4970, lev=0x1842a80 "", frm=0x1842ac0 " jv\001\035") at xlog.c:230
230	xlog.c: No such file or directory.
(gdb) print log_len
$1 = 29
(gdb) print log_buf
$2 = 0x18412d8 "[CONFIG] ----- Start routing\n"
(gdb) 

Core was generated by `/usr/sbin/opensips -w /run/opensips -P opensips.pid -m 512'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb770ccf9 in __kernel_vsyscall ()
(gdb) bt
#0  0xb770ccf9 in __kernel_vsyscall ()
#1  0xb7546dd0 in __libc_signal_restore_set (set=0xbfb81eb0) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xb7548297 in __GI_abort () at abort.c:89
#4  0xb758238f in __libc_message (do_abort=<optimized out>, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
#5  0xb7588fc7 in malloc_printerr (action=<optimized out>, str=0xb7678cc6 "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=0xb76ce780 <main_arena>) at malloc.c:5049
#6  0xb758b4d5 in _int_malloc (av=av@entry=0xb76ce780 <main_arena>, bytes=bytes@entry=360) at malloc.c:3765
#7  0xb758cbf5 in __GI___libc_malloc (bytes=360) at malloc.c:2928
#8  0xb7580d8b in __GI___open_memstream (bufloc=0xbfb823ac, sizeloc=0xbfb823b0) at memstream.c:76
#9  0xb75fdf35 in __GI___vsyslog_chk (pri=134, flag=1, fmt=0x65edf4 "%.*s", ap=0xbfb8245c "\035") at ../misc/syslog.c:167
#10 0xb75fe449 in __syslog_chk (pri=134, flag=1, fmt=0x65edf4 "%.*s") at ../misc/syslog.c:129
#11 0x0052bd6b in syslog (__fmt=0x65edf4 "%.*s", __pri=<optimized out>) at /usr/include/i386-linux-gnu/bits/syslog.h:31
#12 xlog_2 (msg=0x190d480, lev=0x18dba80 "", frm=0x18dbac0 " \372\177\001\035") at xlog.c:230
#13 0x004efc5a in do_action (a=0x17ffaa8, msg=0x190d480) at action.c:1638
#14 0x004f300a in run_action_list (a=0x17ffaa8, msg=0x190d480) at action.c:172
#15 0x004f2041 in do_action (a=0x1800678, msg=0x190d480) at action.c:1130
#16 0x004f300a in run_action_list (a=0x17f9ad0, msg=0x190d480) at action.c:172
#17 0x004f330a in run_actions (msg=0x190d480, a=0x17f9ad0) at action.c:137
#18 run_top_route (a=0x17f9ad0, msg=0x190d480) at action.c:214
#19 0x004f9a33 in receive_msg (buf=<optimized out>, len=<optimized out>, rcv_info=<optimized out>, existing_context=<optimized out>, flags=<optimized out>) at receive.c:209
#20 0x00642878 in tcp_handle_req (_max_msg_chunks=<optimized out>, con=0x97d793ec, req=0x77bea0 <tcp_current_req>) at net/proto_tcp/tcp_common.h:411
#21 tcp_read_req (con=0x97d793ec, bytes_read=0xbfb82f18) at net/proto_tcp/proto_tcp.c:1179
#22 0x00622a8e in handle_io (fm=<optimized out>, idx=idx@entry=0, event_type=event_type@entry=1) at net/net_tcp_proc.c:241
#23 0x006250d5 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at net/../io_wait_loop.h:280
#24 tcp_worker_proc_loop () at net/net_tcp_proc.c:386
#25 0x00630993 in tcp_start_processes (chd_rank=0x758980 <chd_rank>, startup_done=0x0) at net/net_tcp.c:1887
#26 0x004d642c in main_loop () at main.c:761
#27 main (argc=<optimized out>, argv=<optimized out>) at main.c:1407
(gdb) frame 12
#12 xlog_2 (msg=0x190d480, lev=0x18dba80 "", frm=0x18dbac0 " \372\177\001\035") at xlog.c:230
230	xlog.c: No such file or directory.
(gdb) print log_len
$1 = 29
(gdb) print log_buf
$2 = 0x18da2d8 "[CONFIG] ----- Start routing\n"
(gdb) 
@danpascu danpascu added the bug label Sep 21, 2018
@razvancrainea
Copy link
Member

Hi, Dan!

Can you send us the output of opensips -V?

Best regards,
Răzvan

@danpascu
Copy link
Member Author

I run the latest master branch from git:

version: opensips 3.0.0-dev (i386/linux)
flags: STATS: On, SHM_EXTRA_STATS, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, QM_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: unknown
main.c compiled on with gcc 6.3.0

@razvancrainea
Copy link
Member

Hi, Dan!

I see that you don't have PKG_MALLOC defined, nor USE_SHM_MEM, which means you are using the system malloc() for pkg memory. So in case there's a memory corruption, it is caught by the system allocator, which actually happens (see the "corrupted size vs. prev_size" error in frame #5).

In order to track this down, I would suggest:

  1. define PKG_MALLOC and use QM_MALLOC to check for any over/underflows
  2. run in log_level=4, which will dump all malloc memory operations - perhaps we can gather some information to understand how that pointer/zone became corrupted.

Best regards,
Răzvan

@bogdan-iancu
Copy link
Member

@danpascu , any luck in using QM_MALLOC with DBG_MALLOC to see if the memory fault is detected and reported?

@danpascu
Copy link
Member Author

I won't be able to test this for the next 2-3 weeks. I'll get back when I have some more info.

@danpascu
Copy link
Member Author

QM_MALLOC with DBG_MALLOC didn't detect any memory fault.

In the meanwhile I didn't get this anymore. I don't know if it's because of the various recent commits since I first had the problem, or it's just luck. I'll keep an eye on it and let you know when I have more info.

@danpascu
Copy link
Member Author

danpascu commented Dec 5, 2018

I have not seen this in a while, so I'll assume some of the memory fixes that were committed since the bug was reported might have fixed it. I'll reopen if I run into this again.

@danpascu danpascu closed this as completed Dec 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants