-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Segfault in free_contacts() under load in 2.4.7 #2095
Comments
I would really appreciate a response to this bug report. If more details are required to troubleshoot this I will provide whatever I can. I am also happy to work on a fix for this myself but I'd like to hear back first to know whether perhaps this has already been fixed since there was a similar report to mine. |
It's worth noting that since reverting back to 2.3 (2.3.6 to be exact) we have not encountered this segfault again. |
hey @attermann -- we're pretty busy these days, getting the 3.1 release ready. Expect this issue to get more attention once we hit the "feature freeze" point, within less than a couple of weeks. Meanwhile, if you can reproduce this crash in a lab, I highly recommend you use the
Cheers! |
Also have this problem and can't reproduce it intentionally.
Updated to 2.4.7 and have these crashes.
|
@attermann do you use dialog, rtpengine, rest_client modules? I've briefly checked what changed and seen some contact related changes in dialog module. It's just an assumption and I have no proofs. I've rebuilt opensips with q_malloc and memdbg and sometimes it notifies about overwritten memory chunks. |
@vitalikvoip Yes I do use the dialog module but only on a small percentage of total calls. I do have one core that crashed in dialog though. Backtrace below for reference.
|
@vitalikvoip and @attermann : I did some stress tests on Bottom line: please provide either your Cheers! |
@liviuchircu Unfortunately I haven't been able to reproduce outside of production so I'm going to have to send my whole config. I trimmed a lot of commented sections and anonymized so hopefully I didn't break it. Note that most of the "external" connections are TCP, and "internal" are UDP. opensips.cfg:
local.cfg
|
I have a feeling that I'm really close to figure it out. Have logs and coredump when contact_body_t allocated in PKG mem in one worker and then freed in another what crashed it. It has crashed because I have mem debug and q_malloc detected broken memory block. Without memory debug I suppose it may either crash or corrupt PKG memory. To make it clear want to emphasize that in my case crash happens only when UAC sends reINVITE, received 100 Trying, and sends CANCEL immediately. While this "CANCEL" travels across the network, opensips has enough time to forward reINVITE and than it receives CANCEL from UAC and 200 OK for reINVITE from UAS a few moments later.
|
The patch attached fixed it for me. At least the problem I was able to reproduce with SIPP.
|
Unfortunately, it looks like that was not the only crash related to dialog module in 2.4.7 Got a new crash:
|
@vitalikvoip Curious did your patch address the crash in free_contacts() during transaction clean-up, or only the dialog-related crash? |
My patch addresses crash I had in dialog module when had described call flow with CANCEL from caller and 200 OK from callee. |
It's been working without any issues for 5 days. @attermann if your opensips still crashes and you see in coredumps it happens around free_contacts() I think it worth trying my patch. |
I suspect that there is a TMCB_RESPONSE_OUT (type 128) TM callback (registered by other module) that is parsing the Contact hdr of the request that was cloned in SHM memory. |
After applying the patch I've posted above I had no more crashes. So maybe in my case it was only due to CANCEL handling since CANCEL is not end-2-end request and dlg update functions dealt with the cloned sip msg. |
You suggest that the concurrent access+parsing+free over the request stored in shmem happens between (1) the 200 OK reply and (2) the CANCEL ? And your patched eliminated this concurrency by not doing any parsing for the 200 OK ? |
The problem was not in concurrency. Back then I couldn't understand why it crashes and the described case was the only one I knew to reproduce the crash. |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
Cause I have a lot of environment running the latest 3.1 from rpm packages, I can take some coredump info to provide more info if needed. Let me know if this will help. |
I have it running almost a month without any issues while had at least one crash day before applied the patch. |
I’ll try your patch to see if it will fix for me too. |
You've mentioned you are using opensips 3.1. I've tested my patch only with 2.4.7 and don't if this part is still the same in 3.1. Anyhow give it a chance and I can take a look at your coredumps closely when have a spare moment. |
I’m able to locate the block of code on 3.1 where i need to apply your patch. Today I’ll try to see if it fix for me on the latest 3.1. |
@Fernandojdk , did you have a chance to check it with 3.1? |
Hello @vitalikvoip I applied your patch on opensips 3.1, but the bug continued when we are under high load. For 3.1 this patch did not work. |
Hey guys, @razvancrainea came up with the following patch for
LE: I have edited the above patch, do not use the version included in the email! :) |
Hello guys, I've changed the code posted by @liviuchircu and ported to 3.1 to see if it will fix this issue. On the next week i can tell if this fix works. |
This avoids concurrent access to the contact header's parsed structure in request messages saved in shm - addresses ticket #2095
We've tested this in several environments and crashes stopped to happened, hence I believe this issue is fixed. If you're still facing this problem, please re-open the ticket with further info. |
Hello guys. Just for feedback. I already adapted the above fix to work with 3.1 and until now we have no crash anymore. So i think this issue is fixed. Thanks all 🥇 |
Hi @liviuchircu. I was able to apply your patch above to 2.4.7 and will be testing it in production over the next week. I also tried applying it to 2.4.8 but it failed. Is this patch already in 2.4.8? |
Yes it is, @razvancrainea pushed the same fix in df68e3b |
* xsnprintf.c: fix excessive expression (cherry picked from commit 99d35618a22daa7bed3d426cdb9db7eb87d507df) (cherry picked from commit d3ba9c451e07ebc91f9036f7f2468a6ea1ccf1e2) * msilo: fix excessive check (cherry picked from commit 71b6088d888af32609b7dad87f0e250580c6c7ff) (cherry picked from commit 222a80153abed8fd731a623aa371861ea4f4a006) * osp: fix use of uninitialized pointer (cherry picked from commit fc5b59ea3eeeabac4e0a28302b5d0b0d1b5f507d) (cherry picked from commit 451fe196826ead159e0ddda34719b1f9948c0bad) * b2b_logic: fix checking potentially uninitialized pointer (cherry picked from commit fefd32149a97cdd2ad1b1ea0f19a4eeaa0809375) (cherry picked from commit 081b654024ea627f479c9137414b108db85004d9) * pua: fix bitmask check (cherry picked from commit cd7dec8af8f71fa1024b69fb66e8f02ef3d43278) * proto_help: Fix broken doc build (cherry picked from commit b9a6292e9de47c02a9e5ae1dcc540c888f90be8a) * Revert "auth: fix error check" This reverts commit b7772a087e4bf5c57fa4655d9fd53e074ca4247e. (cherry picked from commit d8ec4f3b05f2c4b21fe3363b3e146c1811ccf870) (cherry picked from commit 72a3f08a96149ebd7a18d8338d86fe587d4ab669) * auth: return -1 on error in auth_get_ha1() (cherry picked from commit 96fcb5f9fd95796e11ec00a8246426d6dfe11a58) (cherry picked from commit 9578c7fef7f54000b9e38c5b248302649a739d06) * parser/content: handle \r for Accept folding lines When receiving a heeader with folding lines, we also need to skip '\r' when parsing the new lines. This way we can accept headers such as: Accept: application/sdp, application/isup, multipart/mixed, application/dtmf, application/dtmf-relay Thanks go to 46Labs for reporting this (cherry picked from commit f7efd44c9e796e941452d2c173bac6aec4342bb3) * tls_mgm: enforce at least TLSv1 when SSLv2.3 is used Starting with Debian 10 (buster), the OpenSSL library is compiled with TLSv1.2 as the minimum TLS version. Therefore, when specifying SSLv2.3, basically only TLSv1.2 and TLSv1.3 can be used. This commit sets the minimum version to TLSv1, since SSLv2.3 will soon be deprecated. Thanks go to @telematico for reporting this issue Close #1986 * Rebuild documentation * acc: Improve oom handling Do not populate bogus str values (e.g. {NULL, 3}) when out of SHM and setting $acc_extra. (cherry picked from commit 86ef10af2cbaad475dd92d41be64873af40e716e) * Fix param order in goes_to_gw_1 without partitions Credits go to @zhouboyong * Rebuild documentation * osp: actually fix potential use of uninitialized pointer If OSPPTransactionNew() failed, callids[0] is still uninitialized. (cherry picked from commit c8a99ad3d1dea71ddaa73ed4cac42f4c56b49a38) (cherry picked from commit 8a6d458ec72243c667b39ac1f97cfe7e7e98d1ec) * exec: fix potential use of uninitialized pointer (cherry picked from commit 9da92ced576d18e5487cf5dfa584ee638c170520) (cherry picked from commit a4da730c13d52b494f834fdd5c7ff5d61a17937f) * exec: fix NULL check after malloc hf->next_other should be checked instead. (cherry picked from commit 8d7e2f310d7f43535dea1dff6b6889df02d95c02) (cherry picked from commit bcd723680b506f6ac0a6ec095e3920777bcb6b65) * rtpengine: check NULL before assignment (cherry picked from commit f33f1d839102a24342a91f15a6902c05ee90ea45) (cherry picked from commit b428f119fb179d00f47d1917362b33484ebaf4b1) * db_virtual: check NULL before log (cherry picked from commit bbbe695ea8d64efe8a6152ed69aca88896d7a3dc) (cherry picked from commit 39a3a06fd2486848d7e22736182602a741d42f5d) * Rebuild documentation * Fix compilation warnings with -DUSE_FUTEX It seems that whenever "time.h" is included without first enabling _GNU_SOURCE, including "unistd.h" afterwards will leave the "syscall" function undeclared, regardless whether _GNU_SOURCE is defined or not. ../../parser/../mem/../futex_lock.h: In function ‘get_lock’: ../../parser/../mem/../futex_lock.h:78:31: warning: implicit declaration of function ‘syscall’; did you mean ‘sysconf’? [-Wimplicit-function-declaration] #define futex_wait(lock, val) syscall(SYS_futex, lock, FUTEX_WAIT, val, 0, 0, 0) ^ ../../parser/../mem/../futex_lock.h:268:4: note: in expansion of macro ‘futex_wait’ futex_wait(lock, 2); ^~~~~~~~~~ (cherry picked from commit fa4e4fa70085db9da4dc72f856e0648341b0f5a6) * Fix un-released transaction upon t_wait_for_new_branches + cancel. If t_wait_for_new_branches() is used and no real signaling branch is added, the caller canceling must not be ignored upon branch selection. The injected 408 timeout on the phony branch must be considered, leading the transaction termination. Thanks to Chris Maciejewski for reporting and providing troubleshooting info. Also thanks to Razvan Crainea ( @razvancrainea ) for investigating and reproducing the problem. (cherry picked from commit 7221b984d555cb68fd07e1bfb502607446345d42) * Improve canceling of transactions doing only t_wait_for_new_branches(). If a transaction has no real signalling branch, but only doing t_wait_for_new_branches(), upon caller cancel inject a 487 reply on the phony/waiting branch in order to terminate the transaction on the spot. The provious behavior was to wait (even after the cancel) for the phony/waiting branch timeout (fr_inv_timeout) and to generate an 408 timeout. Now, there is no useless waiting and a proper 487 is generated. (cherry picked from commit 8af80a40bb9877b91380da670353a68c53f0a36b) * Merge pull request #2074 from besser82/topic/besser82/json_c_014_so_5 Add support for upcoming json-c 0.14.0. (cherry picked from commit fe24824e042486575daa01e20df2c8380b9e0b07) * Merge branch 'master' of https://github.com/robdyck/opensips into robdyck-master (cherry picked from commit 4558e7af66240a2e359a1ffbeee114bad3b4f71d) * Removed extra space (cherry picked from commit b47d7a617cd8384f2929655f497f3b52c554028d) * [exec] improve docs on exec async + launch Related to #2049 (cherry picked from commit bb7afdc2dc4363f24d8e28e00ab337c7c174d1e5) * proto2a(): Do not segfault on PROTO_NONE (0) (cherry picked from commit 54064b4614d42164b49105b43f9a5dc1ce308a92) * xml: allow access to nodes with undefined namespace prefixes Related to #1988 (cherry picked from commit 17b44ea1f2270be8e59b893ab70b3aac8d5a3826) * Rebuild documentation * Merge pull request #2037 from l2dy/tracer tracer: fix null pointer dereference (cherry picked from commit d59715ec3614c5c9d5afb40aa17704d86fca6021) * rest_client: Disable "Expect: 100-continue" by default Although the "100 Continue" flow sometimes saves bandwidth when the server replies with an error response, it also has some drawbacks: * extra roundtrips, adding latency and complicating the async handling * the client body upload and server body download must be performed within the same curl_multi_perform() loop, during the async resume callback, limiting the amount of parallelism that can be obtained when compared to the simple flow From now on, the "100 Continue" flow for POST/PUT requests with bodies larger than 1024 bytes is disabled by default. Developers may enable it using a new, backwards-compatible, "enable_expect_100" module parameter. Fixes #2081 * async statement: Improve error log when tm is missing (cherry picked from commit afb721ed164206855130202cf5b0ff4c83bf2394) * Rebuild documentation * fraud_detection: Avoid random str_strcmp() error logs * fraud_detection: Fix bad test in e1767c80 * HP_MALLOC: Dramatically reduce memory fragmentation For HP_MALLOC to offer optimal parallel allocation performance, it needs the memory to be fragmented: the more available chunks, the better its performance. However, there needs to be some coalescing as well, otherwise variable-length allocations past the 10K range will quickly deplete (fragment) even SHM pools of 10+ GB. This patch adds the following heuristics to HP_MALLOC and HP_MALLOC_DBG: * shm_malloc(): only split a fragment if the resulting chunk is at least MIN_SPLIT_SIZE bytes (default 256 for PKG, 4096 for SHM) * shm_free(): try to coalesce the next fragment (enough to fix all fragmentation issues) * HP_MALLOC: Fix computation of PKG stats * HP_MALLOC: Add "re-scanning" logic Given that HP_MALLOC has fine-grained locking, if the big fragment shifts down from hash bucket N to bucket N-1 due to another process performing the allocation, we may actually "lose" it during our own scan, since bucket N-1 was empty and we're now waiting for bucket N to unlock. When it does unlock, it will also be empty. As a solution: retry the scan up to N times, as long as it's feasible! * HP_MALLOC: Aggregate SHM statistic updates It is better to update the stats with a diff of the attached/detached/split/merged fragment changes. Updating the stats in a "step by step" manner has the side-effect of creating temporarily bogus values. (i.e. negative "used" and/or "real_used") * Core: Add an HP_MALLOC stress testing suite Fixes #2089 * drouting: Fix segfault during dr API reloads (fraud_detection) (cherry picked from commit 362f8ffa49f2a1070ac16f5021a81871f568f4af) * Rebuild documentation * Add missing typedef keyword Without this keyword these struct definitions are considered by GCC as a variable definition. This was catched because there is a more strict check about these variable definitions since GCC 10: * https://gcc.gnu.org/gcc-10/porting_to.html#common Signed-off-by: Peter Lemenkov <lemenkov@gmail.com> (cherry picked from commit 242903711399ccb4eafeac614129424ffddb0e1b) * Fix building with gcc 10 GCC 10 started to use -no-common by default. Here is an official explanation: * https://gcc.gnu.org/gcc-10/porting_to.html#common We have to mark all the declarations of global variables with extern keyword and put the definitions into a proper file. Signed-off-by: Peter Lemenkov <lemenkov@gmail.com> (cherry picked from commit 7b59ee14e95ea3500dd8c3368b081404b9731c9b) * rtpengine: Remove unused struct fields These two fields were copied directly from rtpproxy module but never used by this module. Signed-off-by: Peter Lemenkov <lemenkov@gmail.com> (cherry picked from commit fd8262cee78a67e1dcca7f487f919239d0f13f33) * make: Prepare for upcoming gcc 10 (use -fno-common) Per https://gcc.gnu.org/gcc-10/porting_to.html#common, -fno-common will become a default in gcc 10, so variables defined within .h files will no longer be allowed! Thanks to Peter Lemenkov for the suggestion! (cherry picked from commit b864b2915551fc25509a54eaf30df0cedbe1643f) * HP_MALLOC tests: Fix unused function warning * Fix more "-fno-common" gcc warnings * proto_hep: Fix HEPv3 IPv6 destination contents (send DST not SRC addr) (cherry picked from commit 7667483bbfb43017ccdfc8929ffe91a37c10c71f) * dispatcher: Fix missing "tmb" definition (cherry picked from commit afbf2871a144466a16534db8c86d7d0377077e17) Fixes #2110 (cherry picked from commit c9e61269f360de6e91fae54f8b6a92964e5f00a3) * Rebuild documentation * Fix compilation with non-HP_MALLOC and -DUNIT_TESTS * signal_pkg_status(): Fix un-initialized pt entry detection otherwise, OpenSIPS may do a kill(-1, SIGUSR2), which terminates all processes of the current user... (sometimes this includes the entire X session :-) ) * Rebuild documentation * Rebuild documentation * Skip the disabled GWs when using route_to_gw() Closes #2072 (cherry picked from commit d0d05907099add7666e31820b4b9ecd2d4c85e7e) (cherry picked from commit a43ee8d78f8596e7f6fbf2d9fb154a879042c660) * EXpire the WAIT type subscriptions also (produced by async wait_for_event()) Credits go to @liviuchircu Closes #2054 (cherry picked from commit fe9bfd9c9e17d9ec069b9b98010a151800d30d83) * RPM spec: don't build cachedb_mongodb for centos-8 because of removed libzstd, fix perl build deps * db_perlvdb: Fix incorrect integer downcast Credits to Aaron Meriwether for spotting the bug and suggesting the fix Fixes #1952 (cherry picked from commit 3e83412d50b0d8dd5b8a0f7498fb8b66e96ef353) * Rebuild documentation * Module dependencies: Improve debug log text Suggested by Mark Farmer (cherry picked from commit 128c545677bfa7a8bdfd27b4d4ac399937df7bc2) * Fix crash in tracing xlogs from routes without SIP msg (cherry picked from commit b0a63d96ca16a26b3b4a623484ff2359d5533811) * Fixed the HEP type for the logs Credits go to @bibace Closes #2137 (cherry picked from commit 7b36e441a6314fbd54745d8084aa5cfe88c341d7) * Fix the src/dest info for the HEP "log" packages This fixes the way the logs are displayed in the call flow of Homer (cherry picked from commit 98c39776965527a4d91c25755239369029a5e023) * Fixed 3 x htons(), reducing to one :) (cherry picked from commit 1856155a6c2093efae7448fce80371e97eedd42a) * tls_mgm: fix a cleanup crash when failing to create a SSL_CTX (cherry picked from commit c6ac01258b61608b06ca400815e3b89302724107) * cachedb_mongodb: fix crashes when loading module along with tls_mgm Loading the cachedb_mongodb module causes the openssl library to be initialized in the pre-daemon process, by the mongoc library's constuctor, before tls_mgm gets the chance to do its own initialization in the attendant process. This commit overwrites the openssl functions used by the mongoc library in order to actually skip the initialization and leave it to tls_mgm. Related to #2091 (cherry picked from commit a3e87277f275d2b26a3f1472e78a07baea9de85c) * cachedb_mongodb: complete fix in commit 4dfa0ba for openssl < 1.1.0 (cherry picked from commit 9f454463e42f2148990bd5f307eeff0acecb18bb) * Rebuild documentation * net/tcp: mark connections as being added in reactor This way we can "remember" to remove them from the reactor when the connection times out. Full credits go to @vitalikvoip for reporting, troubleshooting and fixing this problem. Close #2134 (cherry picked from commit 3e862f0e34ab9fa7065e6132935ab7c429c730bb) * proto_wss: proper cleanup wss connection in case of tls error When TLS domain is not found, or the SSL context cannot be created, make sure the connection does not keep an hanging `proto_data` that will be later double-freed by the cleanup routine. Credits go to Jonathan Hulme (@digipigeon) for reporting this. Close #2123 (cherry picked from commit b150486a6944f684b238a62dc2d2dc9e307dd007) * dialog: clarify usage scope of $DLG_dir Reported by @imdrpn in #2132 (cherry picked from commit 896be267dba83b2e45d4b246a37c2de1dde5d6d1) * rest_client: fix crash when loading module along with tls_mgm Closes #2115 (cherry picked from commit f4227fa77a22e3439718fb9fe4b8077ea6fc659a) * Rebuild documentation * rest_client: complete fix in commit c4b84f7 Related to #2115 (cherry picked from commit 31104219475b94b864c126f42f1e5abba8e41148) * The main process (attendant) is no longer running the child_init() The main proc must stay free of any db connections, so it can later fork and child_init() other procs (due auto_scaling). For the modules which were keeping a DB conn open in main proc for later usage, during module destroy, the new approach is to open the conn, on the spot, in the destroy handler. Related to #2003 (cherry picked from commit 3a8f6f1375dc3e3dc8a4a786ee642d4a5fe61069) * siprec: send BYE for sessions not established When a SIPREC session was started on a 183 reply, but the session was not stopped, a BYE to the SIPREC server should be sent to close the session. Thanks go to Krunal Patel for reporting this and offering the fix. Close #2136 (cherry picked from commit b4cd132731f8e2fc0a949ea52b22952879b345ca) * acc: fix no name tags in extra lists Before this commit, extra_fields that were having spaces after the last tag (such as: `last; ` would generate an extra field with no tag. This commit fixes this case. (cherry picked from commit a69e6ada6f57989e7a3af940e75d3b99ca8bced3) * Fix wrong type of lumps. As the eoh (Enf Of Header) is determined during the on-demand parsing, the add_lump() may wrongly lable a lump as "header" or "body" (as the add_lump() may be used before the parsing reached the EOH). So, better identify all headers asap, so the add_lump() will properly work all the time. This will not add any overhead, as in 99.9999% of the case, due the functions used in cfg, anyhow you end up parsing the whole message. Fixes #2144 This is a 100% safe fix, there are 0 chances for side effects. * dialog API: Clarify documentation for DLGCB_FAILED The DLGCB_FAILED callback is subject to the "200 OK" vs. "408 timeout" SIP race condition during dialog establishment and developers should use it with extra caution :) (cherry picked from commit 1f2c8c4854f9494ff7b712be3dc1eba87cba57f7) * Delete expired subscription only with active tags In clustering setup, the DB rutine handling subsscription should delete from DB and run the del handler only if the subscription has an active sharing tag. Fixes #1848 (cherry picked from commit 929ab4d43c0ad1878d60d3ef27f2a71f905b80f0) * Fix test of recognizing SCTP as UDP-based protocol ...and allow per-socket/listener definition of number of workers/processes Closes #2125 (cherry picked from commit 749685c7ab6e7baa6562f9803bc52f2fc89ea82b) * Proper detection/skipping of EOH in multipart. Do not assume a len of 2 (not even checking the content). OpenSIPS accepts (for hdrs) a single \n or \r as EOH (cherry picked from commit 3ec9d1b25430715f2b800c455c8518104a243f49) * Fix wrong inheriting of Contact-Length hdr. When moving from multi part to single part, do not inherite the Ct-len hdr of the part. Fixes #1990 (cherry picked from commit 7dd1151341b8229cd30e335b246e56938551f6bd) * Fix various typos, phrases & small issues (cherry picked from commit 86f6e0c3038025bfc53243a27cb0fe76869c324e) (cherry picked from commit 55ebe7b72f5619b4dbf1ebe86585d1599ca53295) * TLS: Refresh expired default certificates ... and also extend their lifetime from 1 year -> 10 years * rr: Fix ordering issues with add_rr_param() and create_dialog() Commits bd1719dcf and c263182ee4d introduced regressions for record_route() and record_route_preset(), respectively, which caused add_rr_param() to not work anymore in the following sequence of function calls: create_dialog() with a "dlg_match_mode" != DID_NONE record_route() or record_route_preset() add_rr_param() # would not do anything, simply acting as a NOP The reason is that the "fake" lump chain which was meant to hold all add_rr_param() operations prior to an eventual record_route() call would no longer be masked after the record_route() and would continue to accumulate data from subsequent add_rr_param() calls. Since it was meant to be a "fake" chain from the beginning (maybe the user doesn't call record_route() at all!), the data would simply be lost. Many thanks to John Quick for the accurate report and instructions on how to reproduce the issue! * Fixes #2138 - mid_reg_save() mid_registrar_save() not forwarding REGISTER in aor throttling mode after unregister when in write-back mode. (cherry picked from commit 6ee1664559e65fefb2f9cc5f59772b4997711ce0) * Rebuild documentation * mid_registrar: Be compatible with gcc 4.x ... it seems named initializers were not supported back then. This commit also makes the initialization more human-friendly by using the "10" value (UL_EXPIRED_TIME) instead of the opaque "0" value -- the end result is the same. Thanks to Nick Altmann and Răzvan Crainea for their help! (cherry picked from commit 84bea49d7e4392a23526790d135fbff816081ec3) * rtpengine: use proper buffer iovec count when calling writev to send commands to rtpengine daemon - this went undetected for a long time probably because the last buffer had a zero iov_len ... (cherry picked from commit b1bf482df43320cae37207bf7975215785e825ab) * rtpengine: improve error logs: print # iovec buffers along errno/strerror on writev failure (cherry picked from commit bd099afa394a8033f6d988f355ff0923592c8c0b) * rtpengine: fix writev error 22:Invalid argument - this fix is based on a similar fix implemented in the rtpproxy module - if the # of iovec buffes is to high, concatenate last buffers and reduce the # of buffers (cherry picked from commit eda6ec3b6260db52a8420d3a9f0cc2f82273af0f) Conflicts: modules/rtpengine/rtpengine.c * rtpproxy: fix writev error 22:Invalid argument - align code with rtpengine implementation for systems with no IOV_MAX defined (cherry picked from commit 565e25ba9b967bb5442d62f2ef5791c51787f078) * rtpproxy: improve error logs: print # iovec buffers along errno/strerror on writev failure (cherry picked from commit f4ba69bc6708e94faeefe8d008f48c0a80d66eb8) * cachedb_cassandra: fix unsupported protocol version errors This commit prevents the Cassandra driver with version >= 2.15 from throwing errors while trying to negociate the appropriate protocol version. Closes #2145 (cherry picked from commit d60830f49c9c0ac6c161b56cddbc2572c04d3c66) * Fixed uac_auth() on sequential requests in dialogs were uac_auth() was already used The fix solves the conflict between the changes over the cseq value coming from (1) dialog module, trying to increment the cseq to compensate the cseq change during INVITE auth and (2) the uac module incrementing the cseq again because of auth'ing a sequential request. The solution was to first indetify such case in uac_auth() and to use a larger delete lump (when replacing the cseq) in uac_auth(). Such a larger/wider delete lump will invalidate the lump added by the dialog module. This is a follow up on #1969 Credits to Ben Newlin for reporting and assisting with the testing * Fixed potentially un-init dlg variable Thanks to Trevis for its report :) * Bump version to 2.4.8 * cluster sync logging: Fix incomplete "if" condition Reported by William Jin * tls_mgm docs: Fix example Reported by JP Hindin * clusterer: Fix some startup corner-cases (crashes) The first extra check fixes a crash in 'db_mode = 0', if only 'neighbor_node_info' is given, so cl->current_node will be NULL. The second extra check prevents: (gdb) bt full 0 check_seed_flag (cl_list=<optimized out>) at node_info.c:307 1 load_db_info (dr_dbf=dr_dbf@entry=0x7f2e65996300 <dr_dbf>, db_hdl=<optimized out>, db_table=db_table@entry=0x7f2e65993370 <db_table>, cl_list=<optimized out>) at node_info.c:485 2 mod_init () at clusterer_mod.c:408 Fixes #2086 (cherry picked from commit 0b609f03165ee80f525795f278128162f6418817) * Rebuild documentation * rtpengine: Remove unused defines Signed-off-by: Peter Lemenkov <lemenkov@gmail.com> (cherry picked from commit 463e815cfa1835f1ef97f9b8f16a943fce0b8762) * Revert "rtpengine: Remove unused defines" This reverts commit 27e758dfc0fd6ae40b2157c3535928b721c4b26f. (backport was not needed on 2.4) * db_mysql: fix crashes when loading module along with tls_mgm (cherry picked from commit 9459f5d3acb5efa9a4a0e5c8d60c18733062bc04) * Rebuild documentation * HP_MALLOC: Fix bad merge conflict solve It seems the 25eec9f5f backport was not properly done, as the re-scanning logic ended up _inside_ the main loop, rather than _outside_ it. Also improve the default unit test settings, so the HP_MALLOC stress tests can be more easily launched. Credits to Ryan Bullock for the report Fixes #2168 * pua: Fix some list management bugs; Improve code * do not use strncmp() to compare "str" structs, since it may lead to false positives (e.g. strncmp("foobar", "foo", 3) == 0). Use str_match() or str_strcmp() instead. * do not read memory after pkg_free() (severity: low, since it's PKG memory, so reading "invalid" heap memory works anyway...) * simplify list_pop() and list_free() code (cherry picked from commit 5df8b636a15a799b5f1ab7a58ebe1c208ce49d2d) * xml: properly handle oom when extending $xml result buffer Fixes Coverity CID #200068 (cherry picked from commit 3112b5942636ad281783a5ba0ec13992e0c78693) * db_text: Fix Coverity warning (bad 'switch' fallthrough) Thanks to Răzvan Crainea for the help! (cherry picked from commit 6b56ca5fa1163bc290a4c01f554e07cae860fa8c) (cherry picked from commit 8b9307f2ec54a0f1551599254d1afbc6e0f7799b) * mid_registrar: Fix uninitialized variable Although the severity is low, this code path could be triggered by a bogus downstream registrar which returns a 200 OK without the Contact advertised by the mid-registrar in its REGISTER request. Fixes CID #200051 (cherry picked from commit a4684d54f1d32088936a345afa36cb66bbf53727) (cherry picked from commit df197a285b7b6d3c751dce42a65f69a7d5abe68b) * usrloc: Fix unitialized variable CID #199971 (cherry picked from commit 5f03b8556bf241336795ba8ca517c982d732dd4a) (cherry picked from commit 96428bcf3e5f849486862f6f4a481674e1d6fed3) * freeswitch: Fix a rare memleak CID #164007 (cherry picked from commit 2a9824cdc955eb783e08f88906b70d337dbdebe9) (cherry picked from commit 2aa619c00666e85631cd687f243d05585ac1478f) * HP_MALLOC status logging: Fix incorrect bitwise shift CID #200101 (cherry picked from commit fcdd224a5c8638a28e3be10e4028c27aa404a637) * statistics macros: Avoid unsigned integer comparisons against 0 CID #199900, #199913, #199923, #199934, #199941, #199964, #199998, #200032, #200049, #200074, #200096. (cherry picked from commit 39fc05ec0ff2122dd79ec5582ac020c999c11c30) (cherry picked from commit b497af00652cac0421f9e316bec4d5d12bb9ede3) * mid_registrar: Remove unnecessary check CID #199954 (cherry picked from commit 7a638403ee648ba629fd1f6f4a17bdea091da16d) (cherry picked from commit c34e15660de83681a16c84ec61b96a85ec460e5d) * usrloc: Remove unnecessary NULL check CID #200086 (cherry picked from commit f305fa88f9341f37eb95bf88980e974723b57e94) (cherry picked from commit 42b71cf855efa610c1c408caea7c3e54c039aa3f) * mid_registrar: Add paranoid safety check This extra check doesn't fix anything right now, but at least it makes Coverity happy. CID #200020 (cherry picked from commit 2e60b50228a80c1e9629e43670a8cbcc5f71839c) (cherry picked from commit 85464ef2a96f9a3b4665f8ddbeea3292149eac79) * sanitize mpath length Fix Coverity #199897 (cherry picked from commit 587316afc6032142c96077e3efac8029d7fa8355) (cherry picked from commit 4f3ed1239338e0c647bb387dcf5acdb51a6f3162) * mpath: Fix possible buffer underrun Also attempt to mask the false positive in CID #200070. CID #211377 (cherry picked from commit 7f244f17a54833c6bb3f9ea58aa4afd855774ce1) (cherry picked from commit 0108f2e7cba7e596b8f2ab20e0e176cbc9720744) * drouting: Always NULL-terminate the GW buffer While this doesn't immediately fix any bug, it should make Coverity happy. CID #200062 (cherry picked from commit df2ce6d826587f1130c5928b0815bd10ed2957f1) (cherry picked from commit 2a8008ac6280059b9ba5548e7d856db7e0953ffb) * exec: Always NULL-terminate the read buffer CID #200052 (cherry picked from commit 0230f9c9032bd01661c6c7c197965528507e3f2c) (cherry picked from commit 5ff4861e836cd97889c831c5f3533a985fb19d5e) * mi_datagram: Minor improvements * always NULL-terminate the read buffer (CID #200029). Doesn't fix any known bug, just a good practice * avoid redundant memset(65535) on each read * improve startup error log (cherry picked from commit 0eee2119991270bd23a6279e6b1a825586f2d227) (cherry picked from commit a2ae1349a19b2fc79d1893bef96d7f01434020cf) * db_text: Complete 65b18091 (cherry picked from commit 673cf7f9eb3e5bd3a8da805483885e049bddc873) (cherry picked from commit 11205ebef3ec4f207f33f01afe209518a0ed4453) * dialog: don't allow RE-INVITE and OPTIONS pinging at the same time Enabling both RE-INVITE and OPTIONS pinging for the same endpoint causes issues with the CSEQ value in: * ACK (incremented value compared to the corresponding INVITE) * INVITE and OPTIONS (decreasing values between the requests), in case races occur between computing the new CSEQ values and sending the messages. (cherry picked from commit 182ceb0ca6126251ca9e872844925c3bf60b1c90) * Rebuild documentation * [registrar] Proper init of delete_nh_he variable The logic may end up to free_hostent(delete_nh_he) without actually using the delete_nh_he, leading to freeing so random pointers. Reported by coverity CID 199942 (cherry picked from commit 3b20c9db07ea88bd84609676f2657011588b8b41) * [core] proper checking on return code for getsockname() Reported by coverity CID 200064 (cherry picked from commit c489f462d236fddb883e64a6cde7b04929a76191) * [pua_dialoginfo] Fix bad test on resolving flag name get_flag_id_by_name() may also return other negative codes than -1 Reported by coverity CID 207914 (cherry picked from commit f3530c10adc1d2a5b01340a9823f84db61c98e72) * [drouting] removed useless code Makes no sense to select the table at db_con init time, as various queries (from various modules) may set and use different tables. Related to coverity report CID 58405 (cherry picked from commit ab7d4430941263d20acd1f45bbbbf09c80a5040f) * [nathelper] Fix bad test on resolving flag name get_flag_id_by_name() may also return other negative codes than -1 Reported by coverity CID 200026, 200005, 199906 (cherry picked from commit 249b80741ddbb0952c7f737af2a7c2473f05eacf) * [nathelper] Proper testing upon extracting hash value Be sure it is valid hexa and it it within the correct range. The lack of these tests may open the gate for some nasty exploits. Reported by coverity CID 199958 (cherry picked from commit b1d796344aa674e25fc4408bda5d336bce2bc605) * [load_balancer] Spelling over log messages (cherry picked from commit 3b6d0de16d68815b571cebca6bf4c9124f59f06b) * evi: annotate evi_param_set_int to fix Coverity Fixes several Coverity alarms * db_text: prevent buffer overflow Fixes Coverity CID #200093 * qos: fix getting SDP * parse_to: add fallthrough comments/indications for coverity CID #200094 * tm: cleanup replicated cancel msg Fixes Coverity CID #200031 * db_text: fix no-op code fixes Coverity CID #200004 * fix coverity detected false positives CID #40557, #211391 * proto_ws: make sure we have destination to connect to Fixes Coverity CID #58340 * proto_hep: remove useless NULL check Fixes Coverity CID #163991 * proto_wss: add fall through indication Fixes Coverity CID #58373 * proto_tls: make sure we have destination to connect to Fixes Coverity CID #58361 * db_text: complete coverity CID #200093 * db: merge db_is_neq_type() in db_ut Fix coverity CID #40717, #40718, #40719, #40720, #40721, #40722 * pua_xmpp: release memory in case of error Fixes Coverity CID #199967 * rtpengine: fix offer vs answer detection for replies When late negociation is not used, rtpengine_manage() should send an answer command, not an offer one. (cherry picked from commit d0641318ff238b01506e18bd4aa7422fb3cfeab7) * rtpengine: fix late negociation detection of rtpengine_manage() When a reply is received, do not look into the request if it has body, since that could lead to headers parsing in pkg - that might leak. Thanks go to @bogdan-iancu for spotting the issue (cherry picked from commit 06f7297065f079ebd4201055079a02e69dde2cc5) * tm: use msg_flags to indicate message is replicated (cherry picked from commit 804c9a0492940fe779e3dd1d4b4fe9f8cba57cf4) * event_flatstore: prevent NULL deref in case of bogus param Fixes Coverity CID #207918 (cherry picked from commit 596c572da62169d453fface97d13ec557d200561) * tls_mgm: set the TLS session id context with the proper length Fixes Coverity CID #207904 (cherry picked from commit 4a18ce89c89b4f9831ff748bfa75e2aa5db1b303) * ut.h: Fix str_casematch() implementation Since str's have a known length, we should not stop the iteration at '\0'. (cherry picked from commit 7d0df27a40d9bd0207637ec08e5deb662c45bbf6) (cherry picked from commit 4b29d8dc30065c2d92f686e870f9bfc771dd4954) * Rebuild documentation * userblacklist: Be compatible with db_cachedb over MongoDB Issue reported by Ryan Embgrets (cherry picked from commit f9fd982beec461cf15bd3001df744066e42af625) * fraud_detection: Add the possibility to disable stats A (0) value for any of the fraud detection thresholds now means: "ignore this threshold", instead of: "raise an event on each call for this threshold?!". Although (0) values could, in theory, have been useful for event generation testing purposes before this patch, this is a small price to pay in order to gain the ability to disable a stat. Developers can now add 1 more CPS and still keep testing with a new (1) value :) Although the DB schema changes slightly, due to the added "DEFAULT 0", it is fully backwards-compatible with the previous one, except fraud_detection row INSERTs will now be more quiet, generating no more warnings from the backend (e.g. "Column does not have a default value..." -> but still defaults to 0, thanks to non-strict mode, leading to strange warnings from the module). Fixes #890 * Supply a processing context to error route. When the error is triggered outside the context of another route, be sure it also have a processing context Closes #2187 * Rebuild documentation * rtpproxy: ignore unknown payload type media streams Thanks go to @wangduanduan for reporting and providing troubleshoot info * presence: reset delete prepare statement when delete not performed * fraud_detection: Improve accuracy for 'concurrent calls' Avoid using the DLGCB_FAILED callback, since it has no protection against the "408 Request Timeout / 200 OK" race condition, and simply use DLGCB_DESTROY in order to decrement the concurrent calls counter. This fixes a bug where a "408 Timeout / 200 OK" call would cause a concurrent calls value of 4294967295, due to an extra decrement. Fixes #2079 (cherry picked from commit 2404b96c87982387071cea63f5fc42d9169d8798) * fraud_detection: Fix misleading 'show_fraud_stats' docs * the 'fraud_profile' MI command parameter is both invalid and not at all required by the current code * document the "stale statistics" limitation of this function (cherry picked from commit 4dba112130651ca6013c7ebc931fc744b3776774) * drouting: Do not start with NULL DB connections Force all DB connection objects to be created right from the start, otherwise strange, hard to diagnose errors may _occasionally_ pop up during runtime: ERROR:core:db_use_table: invalid parameter value (nil), 0x7ff31329a108 ERROR:drouting:dr_state_flusher: cannot select table "dr_gateways" ... or: INFO:drouting:dr_reload_cmd: dr_reload MI command received! CRITICAL:core:db_table_version: invalid parameter value ERROR:core:db_check_table_version: querying version for table dr_gateways CRITICAL:drouting:dr_reload_data_head: failed to load routing info CRITICAL:drouting:dr_reload_cmd: Failed to load data head * tls_mgm: fix crashes with db_text and null string values Closes #2192 (cherry picked from commit 3d26747cc309d7f5c461e3be49fb30db78bc3591) * Rebuild documentation * fraud_detection: Fix a bug causing negative CC/CPM Following a reset of the statistics for a (user, dialed_number) pair, skip any pending subtraction operations scheduled during a previous interval (e.g. a dialog which started during fraud detection interval A and ended in interval B should not decrement CC anymore if it has already been reset to 0 in the meantime during a more recent check_fraud() call). Many thanks to Inderjeet Sharma (@inderjeetsharma) for reporting and helping track down this bug. Fixes #2079 (cherry picked from commit 4f414366ee0fcc52fa2d18c59be0c5d753047934) * rtpengine: handle PRACK for answer Thanks go to @nikbyte for the report. Close #2208 (cherry picked from commit 1f9b433fd68889c3e692e249d54ea8dbb2d576d3) * hep_udp listener: Fix "use_children" regression Commit 899f65c8f4 introduced a regression where "use_children" would be ignored for the HEP UDP protocol, possibly leading to extra forked workers after an OpenSIPS upgrade, failed restarts due to exceeded MySQL max connections, etc. * Rework previous commit Although the protocols listed in the @proto array "seem" to be initialized, they are still missing some essential data, which is only later provided by trans_load(), during the startup phase. * tls_mgm: Avoid creating a shared DB conn at mod_init() Many thanks to Adrien Martin (@adrien-martin) for the immense help in reporting, troubleshooting and helping close this one out. Fixes #2161 (cherry picked from commit 2c008cd9225dc9a60df2fbf2e37d032ce6f27db0) * Rebuild documentation * dialog: parse contact in separate structure This avoids concurrent access to the contact header's parsed structure in request messages saved in shm - addresses ticket #2095 * rtpengine: fix order of specified interface flags When both 'external' and 'internal' flags were specified, 'internal' was always assigned as iniface, and 'external' as outiface. This commit makes sure that the order they are specified in is respected. (cherry picked from commit 68f8f9d5c7f43af0a6fb914883ad700a93048835) * rtpengine: Fix docs for `extra_id_pv` Suggested by Nick Altmann (cherry picked from commit 968063d9f3d652e94c2895d9dee46b7843c875b5) * Fix command in documentation for clusterer_list_cap (cherry picked from commit 7a0f5991de5387b2496b1229f949353b568ad10c) * Rebuild documentation * [permissions] Fixed mem leak on pattern field Closes #2197 (cherry picked from commit d6f3424d1fe71de94a545a36d0e14fbd9ecd3b3f) * Fix next_hop crash (seen in nathelper nh_timer) due to reading reused mem Problem: - get_domain_db_ucontacts (through get_domain_ucontacts) was handing out (next_hop) pointers to memory that was unused. This resulted in a crash when this memory was reused before the invalid pointer was reused. Relevant issues and commits: - #1652 [OpenSIPS crashes since of child that serves rtpproxy] - e162f5f10 [fix 1652: usrloc: make next_hop point within the shared buffer] - #1710 [nathelper next_hop off by one for usrloc path] - 0300eb1d5 [fix 1710 / revert 1652: usrloc: fix next hop compute for ...] That is: e162f5f10 fixes this exact problem in get_domain_db_ucontacts, in get_domain_mem_ucontacts and in get_domain_cdb_ucontacts (cdb_pack_ping_data). But in 0300eb1d5 it is reverted for only get_domain_db_ucontacts. This fix: - Rewrites the fix for get_domain_db_ucontacts and get_domain_cdb_ucontacts, making it less fragile/bug-prone. - Adds comments about fragility to get_domain_mem_ucontacts - Fixed unaligned memcpy that might affect non-intel CPUs: `((struct proxy_l *)cp)->name.s = next_hop_host` Bug reported and fix tested by Jasper Hafkenscheid @hafkensite (VoIPGRID). (cherry picked from commit 5a6b3abe41a2eaed961a530675f6441a692e8640) * usrloc: Update doxygen comment about packing get_all_ucontacts (cherry picked from commit fc1c3ec32fa285e90cea457096f557e1088b1031) * Accept single \r as EOH in multipart This completes commit 3ec9d1b25430715f2b800c455c8518104a243f49. (cherry picked from commit f5d1b55aa213e448f2716751227234e073277855) * load_balancer: Fix AB/BA deadlock in lb_start() Assuming we have resources A, B, before this patch we would have the following lock grab order on the corresponding resource locks: (see lb_data.c +743): lb_start(): A, B lb_next(): B, A This patch fixes this type of deadlock by inverting the resource AVP elements, such that both functions grab the array locks identically: lb_start(): A, B lb_next(): A, B * List new / missing contributors (cherry picked from commit ef7a6801c483ece3ea8ab07275fcec7aa051a3f8) * rtpengine: fix single-bit field type (cherry picked from commit 9c4f6c1c7f7ac88e19ff78b5169c33b17b1d8d38) * Rebuild documentation * nathelper: Do not crash on bad ping reply Credits to Alexey Vasilyev for the report! Fixes #2242 * dialog: fix/reformat some logs (cherry picked from commit 2efaeee24c69841e9776e360960bc8d644081ff9) * cJSON lib: fix double free in case of string parsing error Fixes #1977 (cherry picked from commit a53f3bd689a01c495809214823d07bfc3cab512a) * Rebuild documentation * [core] fix building path hdr for IPv6 IPv6 addresses in path received parameter are not enclosed in brackes Credits go to @hafkensite / Jasper Hafkenscheid Closes #2247 * [core] fix limited support for transport vals in PATH The transport of the received parameter was limited to TPC/TLS/SCTP (no WS or WSS) * [core] missing file from e1347ebe4 * [core] fix missing check on DNS threshold For the DNS A record lookups the DNS threshold was not properly checked and reported. Reported by @bcnewlin / Ben Newlin Many thanks to Ben for helpin to troubleshoot this issue. * [core] convert space identation to tabs * [core] fixed bogus paste from 2736d6 * sqlite - fix upsert in db_insert_update (cherry picked from commit 6f71f3f065fa78a65ee021931207aded5903c909) * Rebuild documentation * o Don't assume bash lives in /bin; o fix arguments ordering while calling head to work correctly on non-GNU systems. (cherry picked from commit b0612c10fe610a76499e9d8170ad392f63d4396f) * sipcapture: fix overlapping function name the build_dummy_msg() function being used without the static identifier causes confusion for some compilers. An example is the AL2 compiler, which instead of using the module's implementation, it uses the core's one, hence resulting in broken behavior. Credit goes to @ghmj2417 (GitHub) for reporting this in ticket #2255 Close #2255 (cherry picked from commit 7ba72d434aee3db949c351a46970122f41a7c50a) * Rebuild documentation * usrloc is_contact_registered(): Add useful debug logs (cherry picked from commit 64c3724faa3b90918bda4c95ee716d487a3f0f1f) * emergency: fix memory leak on error (cherry picked from commit 18f9f7873330b7e5d7c1e31828062d80a6268402) * emergency: Remove bogus curl_global_cleanup() at runtime Suggested by Walter Doekes (@wdoekes) (cherry picked from commit f91c13af1fa9ee55071bd99b249902360c8def65) * Extend STR_NULL and str_init() to set a type explicitly. This allows those macros to be used elsewhere in the function body, not only in variable declaration(s), i.e.: void foobar(int baz) { str localstr; if (random() == 42) { localstr = str_init(I_STR LUCKY_STR FIN_STR); } else { localstr = str_init(I_STR UNLUCKY_STR FIN_STR); } some_api(&localstr); } (cherry picked from commit 36e0281b7309e2be39e478c96575cadf9a35260b) (cherry picked from commit 62b1b5320fc4e9b7e9d71ae8f069e7ea839f0b60) * Improve auth module examples: o Use $var(xyz) consistently. $avp's take shared resources and aren't really justified in this use case; o fix www_challenge / proxy_challenge second argument to match reality. (cherry picked from commit d9574358d1670391c2b1cbe1d7ccbe37eb4defca) (cherry picked from commit 008b5154a6007ad6c223fc4dad38bdaf8e0a2077) * auth: Fix doc regression in d9574358d :) Although $var() variables are awesome, some module params (e.g. the "rpid_avp") really mandate an AVP and won't settle on anything else. (cherry picked from commit 8ce9eb22e1ead5b0afeff367b01d0b510650cdc5) * Rebuild documentation * packaging: fix typo in emergency module description Thank you @SB-JohnK for reporting it! Close #2263 (cherry picked from commit 25e55f10c7d690230e3e1060958d8b5f0621b540) * Fix Memory leak in Freeswitch ESL module. (cherry picked from commit 294e8c489c22fce345450e0441dfbc5a79205d94) * Rebuild documentation * event_rabbitmq: cleanup connection after creating the socket Reported by @ycxwoo in ticket #2230 (cherry picked from commit 87e17276a79cc01ddef9334f0bf21256a6c21606) * topo_hiding: always allocate space for DID in user When contact could not be parsed, the new Contact's prefix was not allocating bytes in buffer for storing for the dialog ID in the user part. However, the DID was appended to the prefix without any error checks, resulting in a memory corruption. This commit always allocates spaces for storing the Dialog ID, even if the contact could not be parsed. This fixes all uses of the topology_hiding() function with the `D` flag. Thanks a lot Sergei Lavrov (@ccppprogrammer) for reporting this in ticket #2285 and for offering information for figure it out. Close #2285 (cherry picked from commit 2ea8f86813bfbf87b4b0ab61c9f3e852ce777f4c) * Rebuild documentation * registrars: Rebuild lib/reg on each build This fixes an issue where registrar would log lines resembling: INFO:mid_registrar:... due to the lib/reg/*.o files having been built beforehand with the mid_registrar source code and simply linked into the .so afterwards. This issue is fixed in 3.2+ using a different mechanism (separate build of the lib/reg/reg.a library). Reported by Vlad Paiu (cherry picked from commit 74a180d2a88546b0ae14a6a91c669ddb63b29f61) * acc: Remove redundant code (cherry picked from commit fcbe480f2eeb9b0933bb5a3fb1d15379fb79c493) * Code quality: Use str_match() instead of !str_strcmp() Reasons behind this change: - better performance: str_strcmp() includes lots of redundant checks and performs worse than the simplistic str_match() - safer code: although by making this change we increase the chance of a segfault (if either "a" or "b" strings are NULL), the crash will immediately indicate the coding bug and will be easy to troubleshoot. By the same argument, nobody complains that memset() or strcmp() include zero NULL checks: they don't have to! - more meaningful code: logically speaking, in most cases, the programmer simply intends to _match_ two strings, rather than to order them lexicographically... (cherry picked from commit 765a84b293db8ec3da960c0ae61f725f3519bfd9) (cherry picked from commit d004a6c5e35b7dac783b52dd95fa11d6ad2d13c2) * permissions: Integrate with 'db_default_url' * DB partitions now inherit their DB URL (if not given explicitly) from 'db_url' -> 'db_default_url' * the above is similar for the 'table_name' property (inherit from the 'address_table' modparam) * rewrite the 'partitions' parser: no more coding style limitations * backwards-compatible: the 'db_url' is still mandatory if you are not specifying a partition definition! This is due to the fact that the module has a non-DB based usage mode Part of a series of patches for #2117 (cherry picked from commit bb57cc1c375b9325138154eb415cebc2fa971ec0) (cherry picked from commit e663ee6a905f64c1d26954fa896bbc1461001163) (cherry picked from commit 8cb6048949cc0440d87b205bbaed88fa16d1fc93) * dialplan: Integrate with 'db_default_url' * DB partitions now inherit their DB URL (if not given explicitly) from 'db_url' -> 'db_default_url' * the above is similar for the 'table_name' property (inherit from the 'address_table' modparam) * rewrite the 'partitions' parser: no more coding style limitations * backwards-compatible: at least one partition must be specified (even through simply defining a global 'db_default_url') before the module can start Part of a series of patches for #2117 (cherry picked from commit 2e8164368b31da9292f08623eac6934ef2057e22) (cherry picked from commit f061ad31d39538de59da05e8a98018cc42c4c336) (cherry picked from commit 000df2ba61e91142ce3c8f09562e5867519a94b5) * dialplan docs: Update partition, db_url and table_name (cherry picked from commit 6443ad5829c15f8ae7dad8bb9f0e4f418702b1ad) (cherry picked from commit 5185859a5776b6cb6ed49938eeba80c2017c5087) (cherry picked from commit dbd3e9eace04e0b54a16bfff4b9c78d9a5c51732) * permissions docs: Update partition, db_url and address_table (cherry picked from commit 180674b833ef9e82cb3f97105add2ada6843ddc1) (cherry picked from commit 67c06628eae20cca4fc3e5509435b775a425d17e) (cherry picked from commit 8ad9847f0b010b5344c1d9fdf4b4f65f347efae8) * dispatcher: Fix a bug with the 'default' partition This commit allows the "default" partition to be populated just like the other ones: modparam("dispatcher", "partition", "default: db_url = mysql://opensips:opensipsrw@127.0.0.1/opensips_3_2; table_name = v1_dispatcher") (cherry picked from commit cc2f4a8c49402f99502a58940d2fbb51daed6e47) (cherry picked from commit 419f0119400229ac54df2b8ce1943d88c31c4b5a) (cherry picked from commit 1f14390a4b7147a3ba7434b5dc19c6aa98184a44) * dispatcher: Add useful debug log (cherry picked from commit 198687e17d13a06b0868676252292643c6d549c4) (cherry picked from commit 654f1e9ba552ab96e1b3b31b3199f5464b41bd9d) (cherry picked from commit 5c154e2cec728014993dd980309225b47e6af514) * dispatcher: Integrate with 'db_default_url' * both 'db_url' and a partition DB URL will inherit their starting value from the 'db_default_url' global * add the option to point the default partition to a known one, in order to avoid unwanted loading of the default table. Example: modparam("dispatcher", "partition", "default: trunks") modparam("dispatcher", "partition", "trunks: db_url = ...") (cherry picked from commit c7d668bad5b2b41a9c9065e1539914f0629ccf64) (cherry picked from commit ba7a4e33a6e2141aa7300f4e5db268c39f5e9520) (cherry picked from commit c0b2a0ae4117e395e1c4724b7b91270d83a0d269) * dispatcher docs: Update partition, db_url and table_name (cherry picked from commit 89a618be1f005d1bc80d528bc41de73fff24533a) (cherry picked from commit dd0245e98087e07fb2182ae4e21f06ce034c9b4b) (cherry picked from commit 02df9b8fdb2bdb2521481b9d1b466466689c38e1) * dialplan: Allow re-pointing the default partition This is useful in situations where script writers do not intend to cache the default 'dialplan' table, rather only the named partitions. Example syntax: db_default_url = "mysql://opensips:opensipsrw@127.0.0.1/opensips" ... modparam("dialplan", "partition", " pstn: table_name = dialplan_pstn") modparam("dialplan", "partition", "default: pstn") (cherry picked from commit efee3438773a83c1a05edc8b05ba45f53a83c5a6) (cherry picked from commit 4abd2617f970b2505534ee8dfd68fdd11ddcc716) (cherry picked from commit 03bf0ae1b0ce1931377ae131235413bbc5f01df1) * dialplan: Fix startup with a single, non-default partition * dispatcher: prevent possibl uninitialized access (cherry picked from commit 58804282fe53b25a2d18f8bed35b8f6fc8f8a1a8) * b2b entities/logic: Inherit `db_default_url` if possible If there is still no DB URL, simply start without DB support (just as before) (cherry picked from commit ff7b5849751d11537d989afe2b17ef35ebd2a133) (cherry picked from commit a54424be0c0ff2cd42adf6a0d748ac2a866dc319) (cherry picked from commit 0684e98376a841b6b7351b23e23de1d9a804ba41) * SIP digest auth: Improve handling for multiple digest challenges Before this patch, OpenSIPS would always work with the digest credentials of the 1st WWW/Proxy-Authenticate header field. While RFC 3261 does not define the behavior with multiple WWW/Proxy-Authenticate headers, in § 22.3 it is stated that: Note that if an authentication scheme that does not support realms is used in the Proxy-Authorization header field, a proxy server MUST attempt to parse all Proxy-Authorization header field values to determine whether one of them has what the proxy server considers to be valid credentials. ... so a proxy must _walk_ through unacceptable headers until it finds one with "valid" credentials. In the context of the upcoming RFC 8760, this would also mean: finding an auth header field with an MD5 digest algorithm, which is the only one currently supported. TL;DR: this patch improves the "uac", "uac_registrant" and "b2b_entities" modules so they correctly process 3 WWW-Authenticate headers with the following algorithm preference: algorithm=SHA-512-256 algorithm=SHA-256 algorithm=MD5 ... and correctly build an MD5-based response for the 3rd header field. Issue discovered during OpenSIPIt 2020 (cherry picked from commit 27d5862fb834e2c71f2fefb89a27430e30814647) SIP digest auth: Complete 27d5862fb8 The b2b_entities was not patched well, as some weird switch fallthrough-on-error behavior was introduced. (cherry picked from commit 5942d53f7aeec8eadcafd394fd1ee51ede2e5ab1) (cherry picked from commit 3274fc2b076bc8d2395ef1b68244a43c66463106) * Merge pull request #2253 from sippy/master_2020 Fix a rather obscure copy-n-paste bug in the digest parser (cherry picked from commit baddd32e2fd6cb43dd8b55f8a9f494fbd506d403) (cherry picked from commit 74558d8957f57b73427d50dd7b7908f989537a12) * Use body->len as the only authoritative way to check if the provided body is empty or not. (cherry picked from commit b0997b1fb56d6f518a26b9fe32e45cb16b9a0284) (cherry picked from commit bd5a1d027b238892d798e5c3fc0d0633387192fe) * dialog: update contact & SDPs for replicated update * Rebuild documentation * b2b_entities: fix restoring of DB persisted server entities Server entities restored from DB would not use the original To tag due to regenerating the timestamp part of the entity key. (cherry picked from commit dd237c001791be0b3b071d07b36057221f6ed3f6) * mid_registrar_save(): Fix shm memleak with no t_relay() Since mid_registrar_save() is stateful by definition, it needs the SIP transaction to properly function, so always forcing its creation if the REGISTER is about to be forwarded is not that significant of a change, if any at all. Thus, this patch fixes a SHM memleak on the following type of logic: mid_registrar_save(); # script terminates without having created the transaction, # so mid_registrar's "prepped" data does not get freed on T deletion exit; (cherry picked from commit b14a7eaff60b719c8b80af4f8bfdffe2bc7e995b) (cherry picked from commit e2141a3b66b4ffe2e01fb9ed27412c8532b09fb8) * tm: Fix transaction leakage with global onreply_route Given that the opensips.cfg execution order for replies is: 1) onreply_route (global) 2) reply_received(), i.e. "tm" module scope 2.1) onreply_route (branch) 2.2) onreply_route (transaction) ... this patch fixes some transaction leaks where if the script developer matches the current transaction within global onreply_route, e.g. with a random statement such as xlog("$T_reply_code\n"), it would cause an extra ref after a transaction matching operation would be performed yet again (??) within reply_received(), as the "tm" scope begins executing. The fix is to simply avoid transaction lookups when evaluating tm variables within the non-transactional, global SIP reply route. Credits to Bogdan-Andrei Iancu for suggesting this solution (cherry picked from commit 3a25c0b28f9bffc5e6942a7a3f5484ef903e28f6) * Rebuild documentation * Bump version to 2.4.9 * Update ChangeLog for 2.4.9 * APT repository: ubuntu 20 * Update Changelog * Work around for an issue on buster that the database query (pa_dbf.update) with a string NULL value won't match with any record. * Set up a special condition when 'sharing_tag' is always set to NULL in the active_watcher table. * Hotfix/openssl double free fix (#75) * proto_tls/wss: fix crashes when dumping the openssl error stack This commit serializes the execution of openssl's connect/accept/read/write operations in order to prevent adding/removing entries concurrently to/from the openssl error stack. If the performance penalty is deemed too high, the NO_SSL_GLOBAL_LOCK compilation flag can be used to disable this behavior and retain the risk of crashes. Reported in #2362 Credits to Alexey Vasilyev for helping troubleshoot this. (cherry picked from commit b6b7520024561c3cc65d272e038992adc0825fc7) (cherry picked from commit 00090791c7101d5c7e4499df16d54eba91c7cfda) * proto_tls/wss: complete fix in commit b6b7520 (cherry picked from commit dac973404b43b148f91d7fae66417d864f3c047d) (cherry picked from commit 5897b2294fb9cba16fb6c9590ccee5a3c4c1351c) Co-authored-by: Vlad Patrascu <vladp@opensips.org> * presence: fix memory leak when receiving an unmatched replicated publish (#76) (cherry picked from commit d1d898878555abe48cd3584ab1f3a8fe6f84ced7) Co-authored-by: Razvan Crainea <razvan@opensips.org> * transfromation/regex: remove unnecessary regex limitation (#77) Thanks go to Calvin Ellison for reporting this in the mailing list Completes 6e1afe64 (cherry picked from commit a8f8aabc245ca86bd40a9eddf9befcd74a087be1) Co-authored-by: Razvan Crainea <razvan@opensips.org> * fix record route leak in rls (#78) * Fix memory leak in sqlite module. (#72) * initial changes for sqlite memory leak * fix indentation and single pointer * using generic free function but freeing values separately * fix comment to address warning * Revert "fix comment to address warning" This reverts commit ed53f9ae8b09c30e9de203190023a61c8bc13408. * Revert "using generic free function but freeing values separately" This reverts commit 399c111f4186adab33cedd20ad47f197978b60c6. * Revert "fix indentation and single pointer" This reverts commit 55c6c56a96b3631a6a34485ecad368b47044e747. * Revert "initial changes for sqlite memory leak" This reverts commit f161ac86301fb65306542faa2554325685275525. * fix sqlite memory leak for row values * db_sqlite: drop pkg_free() NULL check Thanks go to Liviu Chircu for better readability suggestions Co-authored-by: Razvan Crainea <razvan@opensips.org> Co-authored-by: Zero King <l2dy@icloud.com> Co-authored-by: Liviu Chircu <liviu@opensips.org> Co-authored-by: Razvan Crainea <razvan@opensips.org> Co-authored-by: OpenSIPS <github@opensips.org> Co-authored-by: Bogdan-Andrei Iancu <bogdan@opensips.org> Co-authored-by: Vlad Patrascu <vladp@opensips.org> Co-authored-by: Peter Lemenkov <lemenkov@gmail.com> Co-authored-by: Walter Doekes <walter+github@wjd.nu> Co-authored-by: Nick Altmann <nick.altmann@gmail.com> Co-authored-by: Aron Podrigal <aronp@guaranteedplus.com> Co-authored-by: Ovidiu Sas <osas@voipembedded.com> Co-authored-by: Fabian Gast <fabian.gast@nfon.com> Co-authored-by: Eric Green <eric.green@onsip.com> Co-authored-by: Maksym Sobolyev <sobomax@sippysoft.com> Co-authored-by: rance <rance@ytel.com> Co-authored-by: aniketkuwarvonage <86266545+aniketkuwarvonage@users.noreply.github.com>
OpenSIPS version you are running
Describe the bug
Opensips crashes randomly under load. Sometime multiple times a day, and other times a day or more between crashes.
Some backtraces show slight variations in the call stack but always eventually through run_trans_callbacks_locked() and segfaults in free_contacts ().
To Reproduce
Unable to reproduce. Happens randomly in production under load.
Expected behavior
No segfault!
Relevant System Logs
OS/environment information
Additional context
Seems to be the same issue as reported mid-March in OpenSIPS-Users "Re: segfault at the opensips 2.4.5 (Liviu Chircu) (Johan De Clercq)".
The text was updated successfully, but these errors were encountered: