Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] free_contacts and _unref_dlg segfault #2184

Closed
Fernandojdk opened this issue Jul 24, 2020 · 3 comments
Closed

[CRASH] free_contacts and _unref_dlg segfault #2184

Fernandojdk opened this issue Jul 24, 2020 · 3 comments
Assignees
Milestone

Comments

@Fernandojdk
Copy link

Fernandojdk commented Jul 24, 2020

OpenSIPS version you are running

version: opensips 3.1.0 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
main.c compiled on 16:09:51 Jul 22 2020 with gcc 4.8.5

Installed via CentOS rpms - Latest version

Crash Core Dump
https://pastebin.com/YtDVEmUE (core for error in free_contacts)
https://pastebin.com/g1f8zAs2 (core generated together with error on _unref_dlg)

Describe the traffic that generated the bug
Its a production server.
The crash happens randomly several times a day.
It only happens when it has high traffic and opensips load-all reaches >= 90%.
More specifically, the crash occurs when reaches 200 CPS (calls per second) or more
Whenever the crash happens it is in the same function: free_contacts
Every time the crash occurs, two coredump files of same size are generated.
One core dump have segfault on tm free_contacts function and the second core dump have segfault on dialogs _unref_dlg function.
I posted a link of bt full for each core dump.

Opensips have a lot of free SHM and a lot of free PKG Mem.

To Reproduce

It is not possible to reproduce because it only happens with high traffic.

Relevant System Logs

Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:handle_sigs: child process 26137 exited by a signal 11
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:handle_sigs: core was generated
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:handle_sigs: terminating due to SIGCHLD
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26134]: INFO:core:sig_usr: signal 15 received
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26132]: INFO:core:sig_usr: signal 15 received
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26131]: INFO:core:sig_usr: signal 15 received
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26133]: INFO:core:sig_usr: signal 15 received
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26159]: WARNING:core:handle_timer_job: utimer job <tm-utimer> has a 280000 us delay in execution
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 4(26134) [timer] terminated, still waiting for 41 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26144]: WARNING:core:handle_timer_job: utimer job <tm-utimer> has a 280000 us delay in execution
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 2(26132) [MI FIFO] terminated, still waiting for 40 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 1(26131) [HTTPD 127.0.0.1:8080] terminated, still waiting for 39 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 3(26133) [time_keeper] terminated, still waiting for 38 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 12(26142) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 37 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 26(26156) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 36 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 16(26146) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 35 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 13(26143) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 34 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 36(26166) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 33 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 6(26136) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 32 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 27(26157) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 31 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 28(26158) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 30 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 10(26140) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 29 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 17(26147) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 28 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 9(26139) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 27 more
Jul 24 15:33:54 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 32(26162) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 26 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 34(26164) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 25 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 18(26148) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 24 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 40(26170) [SIP receiver udp:177.21.244.136:5080] terminated, still waiting for 23 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26149]: ERROR:dialog:fix_route_dialog: Script error - validate function before having a dialog
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26149]: ERROR:topology_hiding:topo_dlg_onroute: Failed to fix the SIP request according to dialog info
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 30(26160) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 22 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 39(26169) [SIP receiver udp:177.21.244.136:5080] terminated, still waiting for 21 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 25(26155) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 20 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26149]: ERROR:dialog:fix_route_dialog: Script error - validate function before having a dialog
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26149]: ERROR:topology_hiding:topo_dlg_onroute: Failed to fix the SIP request according to dialog info
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 35(26165) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 19 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 38(26168) [SIP receiver udp:177.21.244.136:5080] terminated, still waiting for 18 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 41(26171) [SIP receiver udp:177.21.244.136:5080] terminated, still waiting for 17 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 31(26161) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 16 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 37(26167) [SIP receiver udp:177.21.244.136:5080] terminated, still waiting for 15 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 20(26150) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 14 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 15(26145) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 13 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 11(26141) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 12 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 43(26173) [Timer handler] terminated, still waiting for 11 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 23(26153) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 10 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 19(26149) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 9 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 8(26138) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 8 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 42(26172) [SIP receiver udp:177.21.244.136:5080] terminated, still waiting for 7 more
Jul 24 15:33:55 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 29(26159) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 6 more
Jul 24 15:33:56 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 14(26144) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 5 more
Jul 24 15:33:56 sipproxy /usr/sbin/opensips[26129]: INFO:core:shutdown_opensips: process 5(26135) [SIP receiver udp:177.21.244.136:5060] terminated, still waiting for 4 more
Jul 24 15:33:59 sipproxy /usr/sbin/opensips[26129]: INFO:core:cleanup: cleanup
Jul 24 15:33:59 sipproxy /usr/sbin/opensips[26129]: CRITICAL:core:sig_usr: segfault in attendant (starter) process!

OS/environment information

  • Operating System: CentOS 7
  • OpenSIPS installation: opensips-yum-releases
  • other relevant information:

Additional context

I use dialog + topology_hiding (force_dialog = 1)
In branch_route I use uac_replace_from and uac_replace_to
I use nathelper
I'm using rtpengine for some calls.
In branch_route i do nat_uac_test(1) and fix_nated_contact() before calling rtpngine_offer
For some calls, i'm using SST
Just proto_udp is loaded
Opensips udp_workers is 32

@Fernandojdk
Copy link
Author

If helps, this is output of dmesg:

[101724.308214] traps: opensips[8325] general protection ip:7f2b11444079 sp:7ffc9205d150 error:0 in libc-2.17.so[7f2b113f7000+1c3000]
[101735.792254] opensips[8294]: segfault at 8 ip 00007f2abb8d4802 sp 00007ffc9205e7e0 error 4 in dialog.so[7f2abb8a5000+7d000]
[108871.033118] opensips[25832]: segfault at 6a ip 00000000005362b8 sp 00007ffd6653b2d0 error 4 in opensips[400000+1fc000]
[109361.785576] opensips[12659]: segfault at 60 ip 00000000005362b8 sp 00007ffce7a77910 error 4 in opensips[400000+1fc000]
[109376.395498] opensips[12632]: segfault at 8 ip 00007f3337834802 sp 00007ffce7a77c60 error 4 in dialog.so[7f3337805000+7d000]
[113014.864189] opensips[16441]: segfault at 8 ip 00007f0af8f94802 sp 00007ffda2f6fbd0 error 4 in dialog.so[7f0af8f65000+7d000]
[113168.666018] opensips[7798]: segfault at 2d ip 00007f3f2d6c3079 sp 00007ffe07d171c0 error 4 in libc-2.17.so[7f3f2d676000+1c3000]
[113181.951784] opensips[7776]: segfault at 8 ip 00007f3ed7b53802 sp 00007ffe07d17d30 error 4 in dialog.so[7f3ed7b24000+7d000]
[114289.520314] traps: opensips[9156] general protection ip:5362b8 sp:7ffeadf63630 error:0 in opensips[400000+1fc000]
[114304.493978] opensips[9114]: segfault at 8 ip 00007f3654bff802 sp 00007ffeadf63980 error 4 in dialog.so[7f3654bd0000+7d000]
[114562.636829] traps: opensips[16966] general protection ip:5362b8 sp:7ffe2a4e62f0 error:0 in opensips[400000+1fc000]
[114577.568572] opensips[16955]: segfault at 8 ip 00007f214d6a0802 sp 00007ffe2a4e6640 error 4 in dialog.so[7f214d671000+7d000]
[115892.560088] opensips[18215]: segfault at 8 ip 00007f08e5a8d802 sp 00007ffee035f7d0 error 4 in dialog.so[7f08e5a5e000+7d000]
[116393.926393] traps: opensips[26137] general protection ip:5362b8 sp:7ffebafd10f0 error:0 in opensips[400000+1fc000]
[116412.988123] opensips[26129]: segfault at 8 ip 00007f773acde802 sp 00007ffebafd1440 error 4 in dialog.so[7f773acaf000+7d000]
[117852.971750] traps: opensips[28657] general protection ip:5362b8 sp:7fff6e4df780 error:0 in opensips[400000+1fc000]
[117869.847650] opensips[28627]: segfault at 8 ip 00007f696198e802 sp 00007fff6e4dfad0 error 4 in dialog.so[7f696195f000+7d000]
[118512.482171] traps: opensips[5905] general protection ip:4e1e78 sp:7ffef66711e0 error:0 in opensips[400000+1fc000]
[118532.311949] opensips[5884]: segfault at 8 ip 00007fa2cf982802 sp 00007ffef66724e0 error 4 in dialog.so[7fa2cf953000+7d000]

@Fernandojdk
Copy link
Author

Fernandojdk commented Jul 30, 2020

@bogdan-iancu Any instruction to deal with this issue? This is happening a lot at day, on different production environment and different hardware. I tried to look at source code to see who is trying to free a contact_t on the same time but with no success. If you give some direction what to look, i'll try to fix this and make a PR.

Thanks in advance.

@bogdan-iancu bogdan-iancu self-assigned this Jul 31, 2020
@bogdan-iancu bogdan-iancu added this to the 3.1.1 milestone Jul 31, 2020
@bogdan-iancu
Copy link
Member

This is similar to #2095 and I'm chasing a possible race between a 200 OK and Cancel, race that may lead to a corruption of the SIP request cloned into shm.

I will close this one as duplicate and continue on 2095 as older - please monitor that one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants