Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSIPS 1.8.4 can not be started when there are dialogs in the dialogs table #197

Closed
dsandras opened this issue Apr 10, 2014 · 16 comments
Closed
Assignees
Labels
Milestone

Comments

@dsandras
Copy link
Contributor

This did not happen with 1.8.3.

Reproducing the bug is easy:

  1. Start a call
  2. killall -9 opensips
  3. start opensips: it never starts
@vladpaiu
Copy link
Member

Hello,

I've just tried to replicate this - all went fine for me.
We'll need some more information in order to replicate this :

  • what is your dialog db_mode ?
  • please send us ( gist / pastebin ) the OpenSIPS log errors when it fails to start

Best Regards,
Vlad

@dsandras
Copy link
Contributor Author

I'll have a look right now for the logs.

But:
modparam("dialog", "db_mode", 1) # REALTIME

@dsandras
Copy link
Contributor Author

You can find the log here:
http://ekiga.net/misc/callserver.log

Not much interesting to see over there. I'm surprised you can not reproduce it because it is 100% reproducable here.

@dsandras
Copy link
Contributor Author

Vlad,

Could you reproduce it ?
It is 100% repeatable here. Make a call (we have 2 legs, so it means 2 calls), then kill opensips, and restart it.

Something different between your test and mine is that we are using profiles (set_dlg_profile).

That might be the culprit. I seem to remember a few commits on opensips dealing with profiles loading on restart (but I'm not sure).

@vladpaiu
Copy link
Member

Hello,

Still cannot reproduce it ( tested with multiple legs and profiles ). Are you using OpenSIPS 1.8 from git, or from the 1.8.4 tar on the website ?

Would it be possible to give me access to your server so I can do further debugging there ?
If yes, please provide the access info at vladpaiu@opensips.org

Best Regards,
Vlad

@dsandras
Copy link
Contributor Author

Hi,

Would an OpenVPN access be ok ?

@dsandras
Copy link
Contributor Author

Btw, it is 1.8.4 from the tarball. We have a few patches (which are all available as pull requests), but located in the presence modules.

@vladpaiu
Copy link
Member

Hello,

After further talks on IRC, the crash is related to using dialog profiles with cachedb_* persistency.
Bug confirmed, working on a fix.

Best Regards,
Vlad

@vladpaiu vladpaiu added the bug label Apr 16, 2014
@vladpaiu vladpaiu added this to the 1.8 milestone Apr 16, 2014
@vladpaiu vladpaiu self-assigned this Apr 16, 2014
@dsandras
Copy link
Contributor Author

As discussed on IRC, and for the record, the crash is the following:

Program terminated with signal 11, Segmentation fault.
#0  0x00007f1fb645ed41 in db_mysql_raw_query (_h=0x0, _s=0x7f1fb55356c0, _r=0x7fff6994c760) at dbase.c:1004
1004    dbase.c: Aucun fichier ou dossier de ce type.
(gdb) bt
#0  0x00007f1fb645ed41 in db_mysql_raw_query (_h=0x0, _s=0x7f1fb55356c0, _r=0x7fff6994c760) at dbase.c:1004
#1  0x00007f1fb53331e0 in dbcache_add (con=0x7f1fb66f20a0, attr=0x7f1fb1d37140, val=1, expires=1397723480, new_val=0x0) at cachedb_db.c:327
#2  0x00007f1fb1b19db5 in link_dlg_profile (linker=0x7f1fa096a188, dlg=0x7f1fa09692a8) at dlg_profile.c:692
#3  0x00007f1fb1b1a194 in set_dlg_profile (msg=0x0, value=0x7fff6994c890, profile=0x7f1fa09470f8) at dlg_profile.c:751
#4  0x00007f1fb1afbcd9 in read_dialog_profiles (b=0x16b2aa6 "calls/s#1,751|", l=14, dlg=0x7f1fa09692a8, double_check=0) at dlg_db_handler.c:444
#5  0x00007f1fb1afca86 in load_dialog_info_from_db (dlg_hash_size=4096) at dlg_db_handler.c:581
#6  0x00007f1fb1afabf6 in init_dlg_db (db_url=0x7f1fb1d36ef0, dlg_hash_size=4096, db_update_period=60) at dlg_db_handler.c:192
#7  0x00007f1fb1af6d2f in mod_init () at dialog.c:765
#8  0x0000000000475986 in init_mod (m=0x7f1fb6682d20) at sr_module.c:458
#9  0x00000000004758c2 in init_mod (m=0x7f1fb66830c8) at sr_module.c:453
#10 0x00000000004758c2 in init_mod (m=0x7f1fb6683198) at sr_module.c:453
#11 0x00000000004758c2 in init_mod (m=0x7f1fb6683268) at sr_module.c:453
#12 0x00000000004758c2 in init_mod (m=0x7f1fb6683338) at sr_module.c:453
#13 0x00000000004758c2 in init_mod (m=0x7f1fb6683408) at sr_module.c:453
#14 0x00000000004758c2 in init_mod (m=0x7f1fb66834d8) at sr_module.c:453
#15 0x00000000004758c2 in init_mod (m=0x7f1fb66835a8) at sr_module.c:453
#16 0x0000000000475cb0 in init_modules () at sr_module.c:498
#17 0x000000000042e522 in main (argc=11, argv=0x7fff6994ce68) at main.c:1503

dsandras added a commit to dsandras/opensips that referenced this issue Apr 16, 2014
This is due to the dialog module mod_init method using the cachedb
module methods before child_init has been called.

In that case, the SQL handled is still set to NULL.

Fixes issue OpenSIPS#197.
@dsandras
Copy link
Contributor Author

Vlad,

I have pushed a pull request for master.

@dsandras
Copy link
Contributor Author

Hi Vlad,

Do you have a patch to test ?

Indeed, my "workaround" is unsafe.

Thank you,

Le 16/04/14 11:54, vladpaiu a écrit :

Hello,

After further talks on IRC, the crash is related to using dialog
profiles with cachedb_* persistency.
Bug confirmed, working on a fix.

Best Regards,
Vlad


Reply to this email directly or view it on GitHub
#197 (comment).


Damien SANDRAS

Ekiga Project
http://www.ekiga.org

@vladpaiu
Copy link
Member

vladpaiu commented May 5, 2014

Hello,

I have fixed the bug in head, 1.11 and 1.10 - see commit cdf725f

For 1.11 and head also fixed the DB schema for making the keyname a primary key : 705bba1

Since the cachedb_sql module was added in OpenSIPS 1.9, you'll have to patch your sources with the above commits.

Best Regards,
Vlad

@vladpaiu vladpaiu closed this as completed May 5, 2014
@dsandras
Copy link
Contributor Author

dsandras commented May 5, 2014

Hi,

What is weird is that you commented out instructions that were not present in the original cachedb_sql implementation we provided to you.

In other words, the 1.8 backtrace I sent you corresponds to a crash without cdb_dbf.close(cdb_db_handle);
cdb_db_handle = 0;

Are you sure the correct bug is fixed ?

@dsandras
Copy link
Contributor Author

dsandras commented May 5, 2014

And what I do not understand either is that Bogdan closed my pull request that was moving the cdb_db_handle = cdb_dbf.init(&db_url) from child_init to mod_init telling that it was not correct because we are using a global connection in that case, but your fix does the same except it does not remove the initialisation from child_init.

That means that we start the module with a global connection, then move to a "per process" connection as soon as possible ? Is that intended ?

@vladpaiu
Copy link
Member

vladpaiu commented May 5, 2014

Hello,

In my tests, the crash no longer occurs with the provided commit.

The underlying issue is that, indeed, the cachedb_sql module does not properly implement the cachedb interface - it manages it's own global connection to the back-end instead of allowing the interface to do it ( this is a side effect of the fact that the module was implemented following the cachedb_local template where there is no actual connection ).

For a full example of the way the interface was meant to work, please take a look at the cachedb_redis ( eg. the init function from the cachedb interface should actually create the connection to the backend , instead of creating the connection in the mod_init func of the back-end module. Further on, that connection can be retrieved when running a command by accessing the cachedb_con->data pointer ).

Best Regards,
Vlad

@dsandras
Copy link
Contributor Author

dsandras commented May 5, 2014

Hi,

Understood. I hope it is a safe fix :)
It seems to work here too...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants