-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CRASH] free_contacts and _unref_dlg segfault #2184
Comments
If helps, this is output of dmesg:
|
@bogdan-iancu Any instruction to deal with this issue? This is happening a lot at day, on different production environment and different hardware. I tried to look at source code to see who is trying to free a contact_t on the same time but with no success. If you give some direction what to look, i'll try to fix this and make a PR. Thanks in advance. |
This is similar to #2095 and I'm chasing a possible race between a 200 OK and Cancel, race that may lead to a corruption of the SIP request cloned into shm. I will close this one as duplicate and continue on 2095 as older - please monitor that one |
OpenSIPS version you are running
Installed via CentOS rpms - Latest version
Crash Core Dump
https://pastebin.com/YtDVEmUE (core for error in free_contacts)
https://pastebin.com/g1f8zAs2 (core generated together with error on _unref_dlg)
Describe the traffic that generated the bug
Its a production server.
The crash happens randomly several times a day.
It only happens when it has high traffic and opensips load-all reaches >= 90%.
More specifically, the crash occurs when reaches 200 CPS (calls per second) or more
Whenever the crash happens it is in the same function: free_contacts
Every time the crash occurs, two coredump files of same size are generated.
One core dump have segfault on tm free_contacts function and the second core dump have segfault on dialogs _unref_dlg function.
I posted a link of
bt full
for each core dump.Opensips have a lot of free SHM and a lot of free PKG Mem.
To Reproduce
It is not possible to reproduce because it only happens with high traffic.
Relevant System Logs
OS/environment information
Additional context
I use dialog + topology_hiding (force_dialog = 1)
In branch_route I use uac_replace_from and uac_replace_to
I use nathelper
I'm using rtpengine for some calls.
In branch_route i do nat_uac_test(1) and fix_nated_contact() before calling rtpngine_offer
For some calls, i'm using SST
Just proto_udp is loaded
Opensips udp_workers is 32
The text was updated successfully, but these errors were encountered: