Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opensips crash on CANCEL on unanswerd call. (2.1-rc2) #484

Closed
avfedorov opened this issue Apr 29, 2015 · 12 comments
Closed

Opensips crash on CANCEL on unanswerd call. (2.1-rc2) #484

avfedorov opened this issue Apr 29, 2015 · 12 comments
Assignees
Labels
Milestone

Comments

@avfedorov
Copy link

(gdb) bt
#0  0x00007ffff765d625 in raise () from /lib64/libc.so.6
#1  0x00007ffff765ee05 in abort () from /lib64/libc.so.6
#2  0x00007ffff328f7a2 in shm_free (faked_req=<value optimized out>, t=<value optimized out>) at ../../evi/../mem/shm_mem.h:403
#3  free_faked_req (faked_req=<value optimized out>, t=<value optimized out>) at t_msgbuilder.h:260
#4  0x00007ffff3290b8f in run_failure_handlers (Trans=0x7fffef9d7328, new_code=<value optimized out>, branch=<value optimized out>, should_store=0x7fffffffe0c8, should_relay=0x7fffffffe0cc, cancel_bitmap=<value optimized out>,
    reply=0x7ffff749e2e0) at t_reply.c:586
#5  t_should_relay_response (Trans=0x7fffef9d7328, new_code=<value optimized out>, branch=<value optimized out>, should_store=0x7fffffffe0c8, should_relay=0x7fffffffe0cc, cancel_bitmap=<value optimized out>, reply=0x7ffff749e2e0)
    at t_reply.c:911
#6  0x00007ffff3291dc3 in relay_reply (t=0x7fffef9d7328, p_msg=0x7ffff749e2e0, branch=0, msg_status=487, cancel_bitmap=0x7fffffffe1a8) at t_reply.c:1125
#7  0x00007ffff329335f in reply_received (p_msg=0x7ffff749e2e0) at t_reply.c:1505
#8  0x0000000000432e22 in forward_reply (msg=0x7ffff749e2e0) at forward.c:516
#9  0x0000000000486c4d in receive_msg (
    buf=0x839660 "SIP/2.0 487 Request Terminated\r\nVia: SIP/2.0/UDP 192.168.254.93:5060;received=192.168.254.93;rport=5060;branch=z9hG4bKc9e3.2c674375.0\r\nVia: SIP/2.0/UDP 192.168.254.92:5060;rport=5060;received=192.168."...,
    len=<value optimized out>, rcv_info=<value optimized out>) at receive.c:243
#10 0x00000000005776bc in udp_read_req (si=<value optimized out>, bytes_read=<value optimized out>) at net/proto_udp/proto_udp.c:189
#11 0x000000000056c98c in handle_io (si=<value optimized out>) at net/net_udp.c:253
#12 io_wait_loop_epoll (si=<value optimized out>) at net/../io_wait_loop.h:186
#13 udp_rcv_loop (si=<value optimized out>) at net/net_udp.c:301
#14 0x000000000056da2e in udp_start_nofork () at net/net_udp.c:367
#15 0x000000000043bb7e in main_loop (argc=<value optimized out>, argv=<value optimized out>) at main.c:665
#16 main (argc=<value optimized out>, argv=<value optimized out>) at main.c:1248
@avfedorov
Copy link
Author

In sip_msg_cloner() if updatable=0 reply_lump is placed inside the same chunk as new_msg,
if updatable>0 reply_lump is allocated.
But later in free_faked_req() code does not distinguish how reply_lump has been set up.
We hit into call shm_free(faked_req->reply_lump) with faked_req->reply_lump pointing inside whole faked_req.

@vladpaiu
Copy link
Member

Hello,

Can you privately send me the script you are using and describe how you are able to reproduce this ? ( email at vladpaiu@opensips.org if possible )

Best Regards,
Vlad

@bogdan-iancu bogdan-iancu added this to the 2.1 milestone May 4, 2015
@bogdan-iancu bogdan-iancu self-assigned this May 4, 2015
@bogdan-iancu
Copy link
Member

@avfedorov , any chances to grab your script for reproducing this crash ?

@avfedorov
Copy link
Author

I send all to Vlad.

@vladpaiu
Copy link
Member

vladpaiu commented May 7, 2015

Hello,

The shm_free checks whether the pointer is inside the SHM memory, and in your case it is entirely out of the memory block, so it looks like a memory corruption. Furthermore, sip_msg_cloner is called with updatable=0 only for replies, but in your case the crash is related to a request.

The source of the crash appears to be a memory corruption - follow the steps at http://www.opensips.org/Documentation/TroubleShooting-OutOfMem in order to enable memory debugging ( see the make menuconfig section ) and reproduce your scenario - OpenSIPS should crash and generate a new core file. Please post the new backtrace after doing this.

Best Regards,
Vlad

@avfedorov
Copy link
Author

I add some debug to t_msgbuilder.h:free_faked_req

inline static void free_faked_req(struct sip_msg *faked_req, struct cell *t)
{
        LM_DBG("free_faked_req 1 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->new_uri.s) {
                pkg_free(faked_req->new_uri.s);
                faked_req->new_uri.s = NULL;
        }
        LM_DBG("free_faked_req 2 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->dst_uri.s) {
                pkg_free(faked_req->dst_uri.s);
                faked_req->dst_uri.s = NULL;
        }
        LM_DBG("free_faked_req 3 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->path_vec.s) {
                pkg_free(faked_req->path_vec.s);
                faked_req->path_vec.s = NULL;
        }
        LM_DBG("free_faked_req 4 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->set_global_address.s) {
                pkg_free(faked_req->set_global_address.s);
                faked_req->set_global_address.s = NULL;
        }
        LM_DBG("free_faked_req 5 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->set_global_port.s) {
                pkg_free(faked_req->set_global_port.s);
                faked_req->set_global_port.s = NULL;
        }
        LM_DBG("free_faked_req 6 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);

        /* SDP in not cloned into SHM, so if we have one, it means the SDP
         * was parsed in the fake environment, so we have to free it */
        if (faked_req->sdp)
                free_sdp(&(faked_req->sdp));
        LM_DBG("free_faked_req 7 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);

        if (faked_req->multi) {
                free_multi_body(faked_req->multi);
                faked_req->multi = NULL;
        }
        LM_DBG("free_faked_req 8 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);

        if (faked_req->msg_cb) {
                msg_callback_process(faked_req, MSG_DESTROY, NULL);
        }
        LM_DBG("free_faked_req 9 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);

        /* free all types of lump that were added in failure handlers */
        del_notflaged_lumps( &(faked_req->add_rm), LUMPFLAG_SHMEM );
        LM_DBG("free_faked_req 10 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        del_notflaged_lumps( &(faked_req->body_lumps), LUMPFLAG_SHMEM );
        LM_DBG("free_faked_req 11 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        del_nonshm_lump_rpl( &(faked_req->reply_lump) );
        LM_DBG("free_faked_req 12 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->add_rm && faked_req->add_rm != t->uas.request->add_rm)
                shm_free(faked_req->add_rm);
        LM_DBG("free_faked_req 13 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->body_lumps && faked_req->body_lumps != t->uas.request->body_lumps)
                shm_free(faked_req->body_lumps);
        LM_DBG("free_faked_req 14 faked_req->req_lump=%p, t->uas.request->reply_lump=%p\n",faked_req->reply_lump,t->uas.request->reply_lump);
        if (faked_req->reply_lump && faked_req->reply_lump != t->uas.request->reply_lump)
                shm_free(faked_req->reply_lump);

        clean_msg_clone( faked_req, t->uas.request, t->uas.end_request);
}

And I get in log this:

May  7 14:05:46 [29045] DBG:tm:free_faked_req: free_faked_req 1 faked_req->req_lump=(nil), t->uas.request->reply_lump=(nil)
params(0x7f359e3a8010, 0x7f359e47da48), called from t_msgbuilder.h: free_faked_req(217)
freeing frag. 0x7f359e47da18 alloc'ed from t_msgbuilder.h: fake_req(148)
May  7 14:05:46 [29045] DBG:tm:free_faked_req: free_faked_req 2 faked_req->req_lump=(nil), t->uas.request->reply_lump=(nil)
May  7 14:05:46 [29045] DBG:tm:free_faked_req: free_faked_req 3 faked_req->req_lump=(nil), t->uas.request->reply_lump=(nil)
params(0x7f359e3a8010, 0x7f35969a77d0), called from t_msgbuilder.h: free_faked_req(227)
May  7 14:05:46 [29045] CRITICAL:core:qm_free: bad pointer 0x7f35969a77d0 (out of memory block!) - aborting
Aborted (core dumped)

#0  0x00007f359e5db625 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.5.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-33.el6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 mysql-libs-5.1.73-3.el6_5.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64 openssl-1.0.1e-30.el6_6.7.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x00007f359e5db625 in raise () from /lib64/libc.so.6
#1  0x00007f359e5dce05 in abort () from /lib64/libc.so.6
#2  0x00000000004e875b in qm_free (qm=<value optimized out>, p=0x7f35969a77d0, file=0x7f359a222304 "t_msgbuilder.h", func=<value optimized out>, line=<value optimized out>) at mem/q_malloc.c:459
#3  0x00007f359a215524 in free_faked_req (faked_req=0x7f359a4345c0, t=0x7f35969a41a8) at t_msgbuilder.h:227
#4  0x00007f359a21a08c in run_failure_handlers (Trans=0x7f35969a41a8, new_code=<value optimized out>, branch=<value optimized out>, should_store=0x7fff227582f8, should_relay=0x7fff227582fc, cancel_bitmap=<value optimized out>,
    reply=0x7f359e47ec18) at t_reply.c:587
#5  t_should_relay_response (Trans=0x7f35969a41a8, new_code=<value optimized out>, branch=<value optimized out>, should_store=0x7fff227582f8, should_relay=0x7fff227582fc, cancel_bitmap=<value optimized out>, reply=0x7f359e47ec18)
    at t_reply.c:912
#6  0x00007f359a21b15c in relay_reply (t=0x7f35969a41a8, p_msg=0x7f359e47ec18, branch=0, msg_status=487, cancel_bitmap=0x7fff227583d8) at t_reply.c:1126
#7  0x00007f359a21c6c0 in reply_received (p_msg=0x7f359e47ec18) at t_reply.c:1506
#8  0x0000000000430adc in forward_reply (msg=0x7f359e47ec18) at forward.c:516
#9  0x000000000047f646 in receive_msg (
    buf=0x82e680 "SIP/2.0 487 Request Terminated\r\nVia: SIP/2.0/UDP 192.168.254.93:5060;received=192.168.254.93;rport=5060;branch=z9hG4bK161b.ba138443.0\r\nVia: SIP/2.0/UDP 192.168.254.92:5060;rport=5060;received=192.168."...,
    len=<value optimized out>, rcv_info=<value optimized out>) at receive.c:243
#10 0x000000000056c74a in udp_read_req (si=<value optimized out>, bytes_read=<value optimized out>) at net/proto_udp/proto_udp.c:189
#11 0x000000000056222c in handle_io (si=<value optimized out>) at net/net_udp.c:253
#12 io_wait_loop_epoll (si=<value optimized out>) at net/../io_wait_loop.h:186
#13 udp_rcv_loop (si=<value optimized out>) at net/net_udp.c:301
#14 0x00000000005632ce in udp_start_nofork () at net/net_udp.c:367
#15 0x00000000004397ea in main_loop (argc=<value optimized out>, argv=<value optimized out>) at main.c:665
#16 main (argc=<value optimized out>, argv=<value optimized out>) at main.c:1248

Now it is crashed on pkg_free(faked_req->path_vec.s)

Vlad, I send you log with memory debug.

@bogdan-iancu bogdan-iancu assigned vladpaiu and unassigned bogdan-iancu May 8, 2015
@petekelly
Copy link

I am also seeing this problem in a certain scenario, the output from opensips in logs is:

Jun 30 15:54:52 lab-pett-opensips2 scp1a[13568]: CRITICAL:tm:free_faked_req:

pkg_free() on shm ptr 0x7fd12f2aeb30 - aborting!

It seems you have hit a programming bug.

@petekelly
Copy link

Managed to get a core dump for this:

Core was generated by `/usr/local/opensips/sbin/opensips -P /var/run/opensips/opensips.pid -m 64 -M 64'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f86d7b2a107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f86d7b2a107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f86d7b2b4e8 in __GI_abort () at abort.c:89
#2  0x00007f86cfd35fde in free_faked_req (faked_req=faked_req@entry=0x7f86cffa8bc0 <faked_req>, t=t@entry=0x7f86c608f120) at t_msgbuilder.h:216
#3  0x00007f86cfd3d094 in run_failure_handlers (t=0x7f86c608f120) at t_reply.c:586
#4  t_should_relay_response (reply=0x7f86d3bdf430, cancel_bitmap=<optimized out>, should_relay=<synthetic pointer>, should_store=<synthetic pointer>,
    branch=<optimized out>, new_code=487, Trans=0x7f86c608f120) at t_reply.c:911
#5  relay_reply (t=0x7f86c608f120, p_msg=0x7f86d3bdf430, branch=<optimized out>, msg_status=487, cancel_bitmap=<optimized out>) at t_reply.c:1125
#6  0x00007f86cfd4101a in reply_received (p_msg=0x7f86d3bdf430) at t_reply.c:1505
#7  0x000000000047d615 in forward_reply (msg=msg@entry=0x7f86d3bdf430) at forward.c:516
#8  0x000000000045e37d in receive_msg (
    buf=0x860540 <buf> "SIP/2.0 487 Request Terminated\r\nVia: SIP/2.0/UDP 192.168.4.213:5060;rport=5060;received=192.168.4.213;branch=z9hG4bKa26b.8f7eed45.0\r\nVia: SIP/2.0/UDP 192.168.4.214:5060;rport=5060;received=192.168.4.2"..., len=<optimized out>, rcv_info=rcv_info@entry=0x7ffc8c72a8e0) at receive.c:243
#9  0x00000000005a13f2 in udp_read_req (si=<optimized out>, bytes_read=<optimized out>) at net/proto_udp/proto_udp.c:190
#10 0x000000000058b9de in handle_io (fm=<optimized out>, fm=<optimized out>, fm=<optimized out>, idx=<optimized out>, event_type=2) at net/net_udp.c:259
#11 io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at net/../io_wait_loop.h:190
#12 udp_rcv_loop (si=si@entry=0x7f86d3afd3f8) at net/net_udp.c:307
#13 0x000000000058d5bc in udp_start_processes (chd_rank=chd_rank@entry=0x84d22c <chd_rank>, startup_done=startup_done@entry=0x0) at net/net_udp.c:447
#14 0x000000000041aa23 in main_loop () at main.c:722
#15 main (argc=<optimized out>, argv=<optimized out>) at main.c:1253
(gdb)

@vladpaiu
Copy link
Member

vladpaiu commented Jul 1, 2015

Hello,

Thanks for the backtrace, Pete. Will coordinate with you today to login on the box where the crash occurs and further debug the core file.

Best Regards,
Vlad

@petekelly
Copy link

Some further info @vladpaiu - this seems very connected to failure route. If there is a failure route defined (even one that does only a simple xlog) then opensips will crash on failure response.

I feel sure it must be related to script somehow too as I have other 2.1 scripts which exhibit no crashing behaviour

@bogdan-iancu bogdan-iancu assigned bogdan-iancu and unassigned vladpaiu Aug 5, 2015
@bogdan-iancu
Copy link
Member

Hi @avfedorov , @petekelly - I'm taking over this issue.
If any of you could describe how to reproduce this crash (like what kind of call needs to be canceled, etc), it will be great to sort it out asap.

Thanks and regards,
Bogdan

bogdan-iancu added a commit that referenced this issue Aug 18, 2015
The path_vec (array holding the PATH string) was not properly pushed into the faked requests before calling the failure route. This was generating a mixture of shm/pkg memory when the faked request had to be freed.
This crash was trigger only if PATH support was used and requests were looked up with PATH header.

Reported by @avfedorov and @petekelly.
Closes #484

Many thanks to @avfedorov for his details information and to @petekelly for support in investigating and testing.
@bogdan-iancu
Copy link
Member

@avfedorov and @petekelly many thanks for the information and support you provided in regards to this bug.
It was tricky to reproduce and fix as this crash was trigger only if PATH support was used and requests were looked up with PATH header.

This fix will be part of the tomorrow minor release OpenSIPS 2.1.1 .

Thanks and regards,
Bogdan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants