msg_translator.c:1539 -- segfault due to invalid offset #155

Closed
shimaore opened this Issue Jan 4, 2014 · 12 comments

Comments

Projects
None yet
3 participants
Contributor

shimaore commented Jan 4, 2014

(Using git HEAD) OpenSIPS crashes and reports segfault in libc.so due to invalid offset:

/* copy the rest of the message */
memcpy(new_buf+offset, buf+s_offset, len-s_offset);

with len = 614 and s_offset = 642.

Contributor

shimaore commented Jan 4, 2014

Another one with len = 617 anf s_offset = 662. Happens on BYE messages here. Trying to analyze with gdb.

Contributor

shimaore commented Jan 4, 2014

(gdb) up
#1  0x000000000043e344 in build_req_buf_from_sip_req (
    msg=msg@entry=0x7f6107e03660, 
    returned_len=returned_len@entry=0x7fff7e5ac434, 
    send_sock=send_sock@entry=0x7f6107dec340, proto=<optimized out>, 
    flags=flags@entry=1) at msg_translator.c:1539
1539    in msg_translator.c

(gdb) print len
$36 = 617
(gdb) print s_offset
$37 = 662
(gdb) print offset
$38 = 582

So s_offset is 45 bytes past len. I checked and len properly points to the last char (\0) in msg->buf.

Looking right before line 1539:

(gdb) print msg->first_line.u.request.uri.s-buf
$7 = 4

OK, that's a BYE, so 4.

(gdb) print msg->first_line.u.request.uri.len
$14 = 46

At that point s_offset should be 50, correct?

(gdb) print buf+50
$18 = 0x78e4b2 <buf.7798+50> " SIP/2.0\r\nFrom: <sip:(removed)

Hmmm, not sure if that's what's expected.

Now, looking into process_lumps:

(gdb) print msg->body_lumps
$1 = (struct lump *) 0x0

OK, can skip that one I assume? For add_rm:

(gdb) print *(msg->add_rm)
$2 = {type = HDR_VIA_T, op = LUMP_NOP, u = {offset = 286, subst = 286, 
    cond = 286, value = 0x11e <Address 0x11e out of bounds>}, len = 0, 
  before = 0x7f6107e04a30, after = 0x0, next = 0x7f6107e04530, 
  flags = LUMPFLAG_BRANCH}
(gdb) print *(msg->add_rm->before)
$3 = {type = HDR_VIA_T, op = LUMP_ADD, u = {offset = 132139392, 
    subst = 132139392, cond = 132139392, 
    value = 0x7f6107e04980 "Via: SIP/2.0/UDP (removed):5060;branch=z9hG4bKe234.19887526.0\r\n"},
  len = 66, before = 0x0, after = 0x0, 
  next = 0x0, flags = LUMPFLAG_BRANCH}

(gdb) print *(msg->add_rm->next)
$30 = {type = HDR_OTHER_T, op = LUMP_DEL, u = {offset = 530, 
    subst = 530, cond = 530, 
    value = 0x212 <Address 0x212 out of bounds>}, len = 66, 
  before = 0x0, after = 0x0, next = 0x7f6107e04700, 
  flags = LUMPFLAG_NONE}

(gdb) print *(msg->add_rm->next->next)
$31 = {type = HDR_ROUTE_T, op = LUMP_DEL, u = {offset = 530, 
    subst = 530, cond = 530, 
    value = 0x212 <Address 0x212 out of bounds>}, len = 66, 
  before = 0x0, after = 0x0, next = 0x0, flags = LUMPFLAG_NONE}
Contributor

shimaore commented Jan 4, 2014

Got many of them now, in all of them s_offset-len is always 45. Happens with BYE and ACK message processing.

Owner

bogdan-iancu commented Jan 6, 2014

Hi, many thanks on the detailed report. The easier way for us to troubleshoot this is by reproducing it. So, is is possible to get the simplest OpenSIPS script and the SIP message that may cause the crash ?
Thanks, Bogdan

bogdan-iancu was assigned Jan 6, 2014

Owner

bogdan-iancu commented Jan 15, 2014

@shimaore , any update on this ? can you help in reproducing the crash ?

Thanks, Bogdan

Contributor

shimaore commented Jan 17, 2014

@bogdan-iancu I need to have our testbed be put back together. Those traces were collected during an attempt at migrating a production machine & the script is a few hundred lines long so it's pretty hard to say without live testing.

Owner

bogdan-iancu commented Feb 5, 2014

@shimaore any progress with this ?

Contributor

shimaore commented Feb 5, 2014

@bogdan-iancu none yet, I haven't been able to reproduce in the lab (using sipp, and a replica of the production database); even trying with broken headers etc didn't lead to a crash. We're going to do another upgrade tomorrow morning on one of the production systems to try to capture data again (esp. carefully capturing with a sniffer) and see if we can replicate the issue.

Contributor

shimaore commented Feb 6, 2014

Got a first crash after 32 minutes (last time it only took a few seconds), syslog reports:

Feb  6 04:42:18 cs2 kernel: [ 5489.499019] opensips[28672]: segfault at 7c2000 ip 00007f26ca592b0f sp 00007fff0f704e28 error 4 in libc-2.17.so[7f26ca506000+1a3000]
Feb  6 04:42:18 cs2 opensips-complete[28753]: CRITICAL:core:receive_fd: EOF on 56
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 60!=72 for hash   1
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 3283!=3290 for hash   2
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 2957!=2960 for hash   3
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 5!=9 for hash   5
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 8!=57 for hash   6
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 1376!=1383 for hash   7
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 1141!=1142 for hash   8
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 1071!=1076 for hash   9
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 1!=3 for hash  84
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 1!=3 for hash  88
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 8!=9 for hash  89
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 1!=4 for hash  91
Feb  6 04:42:18 cs2 opensips-complete[28625]: CRITICAL:core:fm_status: different free frag. count: 3!=4 for hash  92
Feb  6 04:42:18 cs2 kernel: [ 5489.549215] opensips[28625] general protection ip:4ada50 sp:7fff0f707830 error:0 in opensips[400000+158000]

Obviously this might be a different issue than the one I originally reported, can't tell for now. I enabled coredumps, hopefully will have more data shortly.

Contributor

shimaore commented Feb 15, 2014

The coredump is similar to the ones we got last time (same line in the code, 45 bytes offset). I tried to reproduce the issue by replaying the packets (using a custom sipp script) for the Call-ID that crashed, with the same opensips script, but I can't get OpenSIPS to crash that way. Also can't crash if I play that same scenario at high call setup volume.

So the issue isn't related to a specific call's handling but must come from some buildup over time; during these last "live" tests I was able to run OpenSIPS with light traffic (early morning hours) for an hour or so before the first crash happened; then a second time 45 minutes without crash; once traffic started picking up the crashes were more pronounced (every 5 minutes or so) and we backed out.

Member

liviuchircu commented Apr 15, 2014

Hello,

A likely fix for this issue can be found in commit f9d9654.

Best regards,
Liviu

Member

liviuchircu commented May 7, 2014

Backported to 1.11 and 1.10.

liviuchircu closed this May 7, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment