Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in pua_dialoginfo #70

Closed
nikbyte opened this issue Sep 6, 2013 · 3 comments
Closed

Crash in pua_dialoginfo #70

nikbyte opened this issue Sep 6, 2013 · 3 comments
Assignees
Labels
Milestone

Comments

@nikbyte
Copy link
Member

nikbyte commented Sep 6, 2013

pua_dialoginfo crashes on dialog_publish.c too quick on line 77 (1.9 branch).
And I think all other branches because code (and line) is the same.
#0 0x00007f9b2463f122 in memcpy () from /lib64/libc.so.6

No symbol table info available.
#1 0x00007f9b19f089c9 in memcpy (state=0x7f9b19f0d7d3 "terminated",

entity=0x7fffa95f1900, peer=0x7fffa95f1690, callid=0x7f9b11d58b88,
initiator=1, localtag=0x0, remotetag=0x0) at /usr/include/bits/string3.h:52

No locals.
#2 build_dialoginfo (state=0x7f9b19f0d7d3 "terminated",

entity=0x7fffa95f1900, peer=0x7fffa95f1690, callid=0x7f9b11d58b88,
initiator=1, localtag=0x0, remotetag=0x0) at dialog_publish.c:77
    doc = 0x0
    root_node = 0x0
    dialog_node = 0x0
    state_node = 0x0
    remote_node = 0x0
    local_node = 0x0
    tag_node = 0x0
    id_node = 0x0
    body = 0x0
    buf = "ʿp$\260\375\377\377\377\377\377\377\377\377\377\377\003\000\000\000\000\000\000\000\350\320%\037\233\177\000\000`\372\236\033\233\177\000\000`\372\236\033\233\177\000\000`\372\236\033\233\177\000\000\230\037_\251\377\177\000\000\234\037_\251\377\177\000\000\340\060A\000\000\000\000\000\200\065\224$\000\000\000\000\300\321%\037\233\177\000\000\230\322%\037\233\177\000\000\000\000\00---Type <return> to continue, or q <return> to quit---

0\000\000\000\000\000`\372\236\033\233\177\000\000\215\064A\000\000\000\000\000\000\000\000\000\260\375\377\377\000\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\200N\224$\233\177\000\000\000\000\000\000\000\000\000\000\003", '\000' <repeats 15 times>, "\004\000\000\000\000\000\000\000\240\311D\002", '\000' <repeats 12 times>, "P\023_\251\377\177\000\000\247\320g$\233\177\000\000\002\000\000\000\000\000\000\000n^I\000\000\000\000\000\000\000\000\000\377\177\000\000$\000\000\000\000\000\000\000\300\227B\037\233\177\000\000\003", '\000' <repeats 11 times>...
FUNCTION = «build_dialoginfo"
or
#0 0x00007f41cec45122 in memcpy () from /lib64/libc.so.6

No symbol table info available.
#1 0x00007f41c450e9c9 in memcpy (state=0x7f41c45137de "early",

entity=0x7fffa2243690, peer=0x7fffa2243420, callid=0x7f41bbf576f8,
initiator=1, localtag=0x0, remotetag=0x0) at /usr/include/bits/string3.h:52

No locals.
#2 build_dialoginfo (state=0x7f41c45137de "early", entity=0x7fffa2243690,

peer=0x7fffa2243420, callid=0x7f41bbf576f8, initiator=1, localtag=0x0,
remotetag=0x0) at dialog_publish.c:77
    doc = 0x0
    root_node = 0x0
    dialog_node = 0x0
    state_node = 0x0
    remote_node = 0x0
    local_node = 0x0
    tag_node = 0x0
    id_node = 0x0
    body = 0x0
    buf = "\200(\377Ű\375\377\377\377\377\377\377\377\377\377\377", '\000' <repeats 16 times>, " \000\000\000\000\000\000\000\315\033\000\000\001\000\000\000\002\000\000\000\377\177", '\000' <repeats 18 times>"\304, \324\307\316A\177\000\000\200\225\364\316", '\000' <repeats 28 times>, "H\005\000\000\377\177\000\000\221\032\321\316A\177\000\000\220\060$\242\377\177\000\000\000\000\000\000\000---Type <return> to continue, or q <return> to quit---

\000\000\000\004\000\000\000\000\000\000\000\200\256\364\316A\177", '\000' <repeats 34 times>"\240, Yt\002", '\000' <repeats 12 times>"\340, \060$\242\377\177\000\000\247\060\310\316A\177\000\000\002\000\000\000\000\000\000\000n^I\000\000\000\000\000a.137810$\000\000\000\000\000\000\000\230n\210\311A\177\000\000\003", '\000' <repeats 11 times>, "\004\000\000\000\000\000\000\000A\177\000\000\350\061$\242\377\177\000\000\000\061$\242\377\177\000\000\000\000\000\000A"...
FUNCTION = «build_dialoginfo"

because

(gdb) p entity->uri
$5 = {s = 0x0, len = 36}

or

(gdb) p entity->uri
$1 = {s = 0x0, len = 39}

@ghost ghost assigned bogdan-iancu Sep 6, 2013
@nikbyte
Copy link
Member Author

nikbyte commented Jan 30, 2014

Additional information. To reproduce problem is enough to enable pua and pua_dialoginfo modules and call dialoginfo_set("A") or dialoginfo_set("B") to all of our calls. We can reproduce it only on live traffic. At the random moment about one time in 1-3 hours struct dlg_cell loose its pointer to callid. In another words, callid.s == NULL. Any other data is there. Rarely is the same problem with name.s value in struct dlg_val. It's all in random places of code !!! I inserted checkpoints in any other functions to print callid and callid disappears in random moment in same dialogs. Usually, it disappears just right after creation dialog when processing initial INVITE or first/second reply after this INVITE - 200, 486, 180, 100 !!! Just one moment we have callid and then we have no callid there, but current process functions does not touch callid! It rewrites by another process. My idea was also to check timer routines in pua module. Cleanup function was not called, but dbupdate function was ended usually 1 second ago before callid is broken. Then we have crashes in dialog or pua_dialoginfo modules. If we disable pua & pua_dialoginfo modules then opensips is stable. No loosing vars. Version is 1.10 latest git (a36e379, Jan 15). I cannot find who rewrites pointers in dlg_cell and dlg_val structures.

@bogdan-iancu
Copy link
Member

@nikbyte , do you still have the corefile (available for inspection) ? or can you reproduce it and get a new core ? I would need some more info from the corefile, just to validate a theory of mine on how this crash happens.

Thanks, Bogdan

@bogdan-iancu
Copy link
Member

This bug was intensively troubleshooted with @nikbyte , but there was not final conclusion on it. The "callid" field is overwritten during an overflow in the "dlg_cell" structure, but we did not manage to find the actual source (the code responsible for the the underflow).
According to @nikbyte tests, the problem seems to be fixed in 1.11 code. Also following his upgrade (from 1.10 to 1.11) we cannot troubleshoot this crash anymore. Even more , 1.10 gets to the end of his lifetime, so we decide not to follow this bug anymore and close the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants