Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade (new install) from 1.11.6 -> 3.0.3 (x86_64/linux) CRASH after long run. #2294

Closed
cc3283 opened this issue Oct 30, 2020 · 10 comments
Closed
Assignees
Labels

Comments

@cc3283
Copy link

cc3283 commented Oct 30, 2020

Hello, After moving traffic to new install of 3.0.1, opensips 3.0.1 started crashing seemingly at random times after about a week of traffic load. found bug related to auto_scaling. Disabled auto_scaling, and upgraded to 3.0.3. Still crashing, but it appears I am able to create the environment with non-production traffic at fairly low rate. I have uploaded the crash file. we run both 1.11.6-notls and 1.11.1-tls without issue for years and same traffic sources and type.

version: opensips 3.0.3 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: cf2c490
main.c compiled on 05:17:45 Oct 7 2020 with gcc 5.4.0
core.opensips.sig11.22486.gz

@cc3283 cc3283 changed the title opensips 3.0.3 (x86_64/linux) CRASH upgrade from 1.11.6 opensips 3.0.3 (x86_64/linux) CRASH Oct 30, 2020
@cc3283 cc3283 changed the title upgrade from 1.11.6 opensips 3.0.3 (x86_64/linux) CRASH upgrade (new install) from 1.11.6 -> 3.0.3 (x86_64/linux) CRASH after long run. Oct 30, 2020
@cc3283
Copy link
Author

cc3283 commented Oct 30, 2020

Another thing worth noting is memory usage and CPU appear stable. What seems to expediate the crash is many transaction time outs. forcing timeouts on the network side help to re-create the issue. Hope the crash dump can help.

@cc3283
Copy link
Author

cc3283 commented Nov 11, 2020

Updated crash file and gdb dump;

gdb /usr/local/opensips3.x/sbin/opensips core.opensips.sig11.32688.2
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/opensips3.x/sbin/opensips...done.
[New LWP 32688]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/opensips3.x/sbin/opensips -E -w /var/crash -n 16 -P /var/run/opensip'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f7a1e371cd0 in _IO_vfprintf_internal (s=s@entry=0xdc9040, format=,
format@entry=0x68a200 "CRITICAL:core:%s: freeing already freed %s pointer (%p), first free: %s: %s(%ld) - aborting!\n",
ap=0x7ffde240f558) at vfprintf.c:1632
1632 vfprintf.c: No such file or directory.
(gdb) where
#0 0x00007f7a1e371cd0 in _IO_vfprintf_internal (s=s@entry=0xdc9040, format=,
format@entry=0x68a200 "CRITICAL:core:%s: freeing already freed %s pointer (%p), first free: %s: %s(%ld) - aborting!\n",
ap=0x7ffde240f558) at vfprintf.c:1632
#1 0x00007f7a1e439f46 in ___vfprintf_chk (fp=fp@entry=0xdc9040, flag=flag@entry=1,
format=format@entry=0x68a200 "CRITICAL:core:%s: freeing already freed %s pointer (%p), first free: %s: %s(%ld) - aborting!\n",
ap=ap@entry=0x7ffde240f558) at vfprintf_chk.c:33
#2 0x00007f7a1e4237f2 in __GI___vsyslog_chk (pri=, flag=1,
fmt=0x68a200 "CRITICAL:core:%s: freeing already freed %s pointer (%p), first free: %s: %s(%ld) - aborting!\n",
ap=0x7ffde240f558) at ../misc/syslog.c:222
#3 0x00007f7a1e423ca2 in __syslog_chk (pri=pri@entry=130, flag=flag@entry=1,
fmt=fmt@entry=0x68a200 "CRITICAL:core:%s: freeing already freed %s pointer (%p), first free: %s: %s(%ld) - aborting!\n")
at ../misc/syslog.c:129
#4 0x00000000005517f9 in syslog (
__fmt=0x68a200 "CRITICAL:core:%s: freeing already freed %s pointer (%p), first free: %s: %s(%ld) - aborting!\n", __pri=130)
at /usr/include/x86_64-linux-gnu/bits/syslog.h:31
#5 fm_free (fm=, p=, file=, func=, line=)
at mem/f_malloc_dyn.h:231
#6 0x00007f789e6d59fc in _shm_free_bulk (file=0x7f789e7358ae "h_table.c", function=, line=147,
ptr=) at ../../mem/shm_mem.h:487
#7 free_cell (dead_cell=dead_cell@entry=0x7f78a3afa600) at h_table.c:147
#8 0x00007f789e725523 in delete_cell (p_cell=p_cell@entry=0x7f78a3afa600, unlock=unlock@entry=1) at timer.c:239
#9 0x00007f789e726591 in wait_handler (wait_tl=0x7f78a3afa680) at timer.c:458
#10 timer_routine (ticks=, set=) at timer.c:1091
#11 0x00000000004cbbb5 in handle_timer_job () at timer.c:864
#12 0x000000000061cedf in handle_io (idx=, event_type=, fm=) at net/net_udp.c:276
#13 io_wait_loop_epoll (repeat=0, t=1, h=) at net/../io_wait_loop.h:284
#14 0x000000000062162d in udp_start_processes (chd_rank=chd_rank@entry=0xa2c6b0 <chd_rank>, startup_done=startup_done@entry=0x0)
at net/net_udp.c:496
#15 0x0000000000420296 in main_loop () at main.c:800
#16 main (argc=, argv=) at main.c:1479
(gdb)
core.opensips.sig11.32688.gz

@stale
Copy link

stale bot commented Nov 26, 2020

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@stale stale bot added the stale label Nov 26, 2020
@bogdan-iancu
Copy link
Member

@cc3283 , thank you for your report and sorry for the delay here.

Your back trace shows a memory corruption, a double free. Do you see in the logs something like "freeing already freed 0xnnnnnnn pointer" ??

Also try to grab the latest version from GIT, I see your checkout is a bit old.

@stale stale bot removed the stale label Jan 5, 2021
@bogdan-iancu bogdan-iancu self-assigned this Jan 5, 2021
@cc3283
Copy link
Author

cc3283 commented Jan 7, 2021 via email

@vasilevalex
Copy link
Contributor

Looks similar to #2362 .

@bogdan-iancu
Copy link
Member

@cc3283 , let's see if upgrading to latest 3.0 will also do the job here.

@cc3283
Copy link
Author

cc3283 commented Jan 8, 2021 via email

@cc3283
Copy link
Author

cc3283 commented Feb 2, 2021 via email

@stale
Copy link

stale bot commented Jul 21, 2021

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@stale stale bot added the stale label Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants