-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CRASH] Opensips 2.4.8 Segfault #2214
Comments
Please install the |
Hi, Below th result of "bt full" command:
Thank you |
For more information, this problem is present in 2.4.6 and 2.4.7 |
Just in case of, the following result of dump crash (this morning) seems different than yesterday (previous message): Syslog: Aug 19 11:12:30 Opensips kernel: [54876.070244] opensips[16248]: segfault at 0 ip 000055cb3437adb3 sp 00007ffff7832970 error 6 in opensips[55cb342cb000+162000] bt full result:
|
Maybe I'm totaly wrong, but the segfault seems appear only on BYE request. In my "BYE" request, I have this function --> loose_route(); Does this function can make a segfault (memory access). I will make more test and give a feedback as soon as possible. Anthony |
For information, each crash appear on BYE request On my logs , I Have this:
Below, the part of my conf file where the crash appear. Any idea?
|
Hi, Anthony! Please run OpenSIPS with memory debugging - this will help us gather more information on the crash. You can follow this tutorial to compile OpenSIPS with Best regards, |
Hi Răzvan, I compiled Opensips from source with QM_MALLOC and DBG_MALLOC. I will send you the logs when Opensips will crash. Thanks again for your help. |
@Anthony-76 : LWP 16234 is the wrong process. That's the parent process. You really want the info from the actual child (LWP 16248) that died. If you're writing core files, make sure you have |
Hi Răzvan, Does this informations are enough for you in order to debug this issue? At shutdown: At run time: If you want more informations let me know Best regards |
Hi, I've got a new crash this morning: Memory logs at crash Core dump: As each crash, it appear on BYE request If you want any tests/details, let me know Best regards |
Can you tell us what sort of operations you are doing on BYE messages? I am particularly interested in operations that add/remove certain headers, and the order of the ops. |
Below, tha part of BYE in opensips.cfg
.... route[raccroche] {
exit; If you want the all opensips.cfg, Let me know how can I send you the all configuration file.? Best regards |
Coming late to this one and only have about 30 minutes of running to work with but we upgraded a couple of heavily used proxies to release 2.4.8 last night and had to roll back as they were core dumping every 10 to 20 minutes. There were a variety of process stacks leading up to the crashes from timer threads and response building to one that appeared to happen while processing the initial invite. The thing they all had in common is where they crashed. This snippet is from a dispatcher options ping being generated but the fm_malloc followed by fm_remove_free was consistently where the crashes occured. Core was generated by `/usr/local/opensips/sbin/opensips -P /usr/local/opensips/var/run/opensips/opens'. |
From which version did you upgrade? / To which version did you revert? As for it crashing in malloc/free: that's likely because the memory got corrupted before then. The memory management heavily relies on pointers. And if they are messed up (point to the wrong thing) you'll get a crash. What @razvancrainea is after -- correct me if I'm wrong -- is unusual functions being called, or functions being called in an unusual order, as the corruption likely derives from those. So yes, more of your entire cfg helps. (I'm sure you can sanitize it.) (And if you have a working version, we can check which code has changed between that version and 2.4.8, reducing the search scope.) Building a non-optimized opensips also helps (I don't know off-hand which Make flags, but you want gcc to end up with |
We upgraded from opensips-2.4.6-12112019.tar.gz (nightly build from December, 11 2019) and reverted back to same. I'll provide more information if I am able to duplicate this our lab but for some reason that seems to be difficult. It was crashing so often I had to remove core files while investigating just to keep from running out of disk space. In the lab - not a sign of trouble to be seen. Anyway, the one thing that might be even a little unusual in our call flow is that we are doing rest_client http calls on a lot of calls. |
We've been able to duplicate the crashes in the lab now and are starting to gather information. The first one was odd in that it happened in a thread where i was not expecting to have anything happening. [root@bw7.lab1 opensips-2.4.8]# opensipsctl ps So the crash happened while processing an internally generated timeout in PID 15163. When internally generated messages need to be created and sent is that task handed off to whatever thread is most idle or something? I have a core file I'm going to be looking at in a moment but here is what i got from the logs so far. 2020-09-10T19:52:05.993414+00:00 bw7 /usr/local/opensips/sbin/opensips[15163]: shm_malloc (322), called from mem/shm_mem.c: sh_realloc(190) [root@bw7.lab1 opensips-2.4.8]# grep 0x7fdc76c43ac8 /home/rrevels/opensips.log |
In the log it looks like a thread releases some shared memory and then the above mentioned thread attempts to release that same section of memory. here is where it gets freed and then the first couple lines of the thread that crashes 2020-09-10T19:52:05.974110+00:00 bw7 /usr/local/opensips/sbin/opensips[15153]: pkg_malloc(72), returns address 0x7fdcb5c657c8 frag. 0x7fdcb5c65798 (size=72) on 1 -th hit and here is the line that cause the core dump #0 0x000000000050a3eb in free_contacts (_c=0x7fdcb5c657c8) at parser/contact/contact.c:296 (gdb) p *_c so, yeah, it wasnt happy with that pointer. |
It occurred to me that the call flow might be of interest on this particular one. A re-invite happened and then a BYE came in before it could get processed end-to-end. So, opensips had to do some clean-up for it. 192.168.102.99 is the opensips proxy 10257 2020-09-10 19:50:04.773434525 192.168.102.9 -> 192.168.102.99 SIP/SDP 963 Request: INVITE sip:+12082447755@192.168.102.99:5060 | |
Good that you're getting somewhere. I did browse some differences between 1e891b5 (2.4.8) and a5f9815 (2019-12-11), but there are quite a few changes to search through. (And we don't know which modules you're using, so we cannot filter on that.) You mentioned rest_client. There is a change there with respect to a new |
@rrevels-bw thanks for the detailed info, but is there any chance you could post the full backtrace? The contact does indeed seem to be corrupted, and most likely due to the fact that it was parsed by a different process - what I am trying to see is who exactly is parsing the contact and why. |
@razvancrainea sure thing. This one is fairly small so i'll just paste it here. Let me know if you also want the log file with the memory debug stuff in it and ill put it in pastebin or something.
|
@wdoekes we are not getting 100s in this particular call flow but have had to deal with it in the past so account for it in the config with manually adding Expect header. However, if this is handled in the module now I will try making changes to the parameter and see what happens. Thank You. |
I should also note that one reason we are updating from the version 2.4.6 we are using is that it intermittently (once every 2 to 3 weeks) will core dump on next_branches after a negative response. That one we have never been able to duplicate in the lab so we tried 2.4.7 and had more frequent crashes so jumped at 2.4.8. It's good that we can reproduce this problem in the lab at least. Here are the modules we load: loadmodule "signaling.so" |
@rrevels-bw that |
So, I removed these 3 lines in the BYE request, I have no more crash since 1 week. Before I have 1/2 crash by day Theses lines: t_on_branch("2"); |
I've pulled and compiled the latest build. I'll let you know what happens. |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details. |
Hi,
Opensips 2.4.8 installed on Debian 10 via apt source (installed on Proxmox - 12 CPU and 8GB of RAM)
Crash Core Dump
there is 2 dump file for the crash
https://drive.google.com/drive/folders/12xd9HcNenaZhqHGSrr2q6ZkSIK4zXg_A?usp=sharing
The result of bt full is:
There is about 50 conccurents calls. The crash is random and I can't reproduce it.
Opensips crash between 1 and 2 times by day
Relevant System Logs
Aug 18 09:26:23 Opensips /usr/sbin/opensips[17102]: DELETE DANS bdd 3 (id,account) VALUES(7fb4fde52282a7b416934b407a08a641@xxxxx,sip:0xxxxxxx@xxxxxx)
Aug 18 09:26:23 Opensips /usr/sbin/opensips[17102]: CRITICAL:core:sig_usr: segfault in process pid: 17102, id: 13
Aug 18 09:26:23 Opensips /usr/sbin/opensips[17103]: ERROR:nathelper:fix_nated_sdp_f: Unable to get bodies from message
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17125]: CRITICAL:core:handle_worker: dead child 13 (EOF received), pid 17102
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17089]: INFO:core:handle_sigs: child process 17102 exited by a signal 11
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17089]: INFO:core:handle_sigs: core was generated
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17089]: INFO:core:handle_sigs: terminating due to SIGCHLD
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17092]: INFO:core:sig_usr: signal 15 received
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17090]: INFO:core:sig_usr: signal 15 received
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17093]: INFO:core:sig_usr: signal 15 received
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Aug 18 09:26:24 Opensips /usr/sbin/opensips[17091]: INFO:core:sig_usr: signal 15 received
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Aug 18 09:26:24 Opensips media-dispatcher[976]: DEBUG [main] Connection to OpenSIPS lost: Connection was closed cleanly.
Can you help me about this crash please?
Thank you in advance
Best regards
Anthony
The text was updated successfully, but these errors were encountered: