Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] wolfssl memory leak #2604

Closed
vasilevalex opened this issue Aug 18, 2021 · 23 comments
Closed

[BUG] wolfssl memory leak #2604

vasilevalex opened this issue Aug 18, 2021 · 23 comments
Assignees

Comments

@vasilevalex
Copy link
Contributor

OpenSIPS version you are running

version: opensips 3.2.0 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: unknown
main.c compiled on 08:38:33 Aug  4 2021 with gcc 8

Describe the bug
We have OpenSIPS server, which processes RTCP reports from more than 10000 phones. Reports are coming in SIP PUBLISH over TLS, sometimes with pikes like ~15000 connections per minute. But in general it has over 500 established TLS connections (some are closed, new are opened). After updating to OpenSIPS 3.2.0 with wolfssl shared memory consumption started growing constantly.
wolf_leak

I run OpenSIPS with -a F_MALLOC_DBG and made several snapshots with mi mem_shm_dump. Most amount of allocated shared memory is 564303792 : 808816 x [wolfssl.c: oss_malloc, line 132] and significant growth is only on this line. Reports are attached.

Then I changed only one line in config loadmodule "tls_wolfssl.so" to loadmodule "tls_openssl.so" This point is also shown on the graph at the end. With OpenSSL shared memory consumption is not growing.

To Reproduce

  1. Start OpenSIPS 3.2 with wolfssl
  2. Start TLS traffic
  3. Check shared memory consumption

Expected behavior
Don't leak.

OS/environment information

  • Operating System: CentOS Linux release 8.4.2105
  • OpenSIPS installation: git
  • other relevant information:

Additional context
Memory dumps during one day:
shmem_dump01.txt
shmem_dump02.txt
shmem_dump03.txt

@rvlad-patrascu rvlad-patrascu self-assigned this Aug 20, 2021
@github-actions
Copy link

github-actions bot commented Sep 5, 2021

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Sep 5, 2021
@stale stale bot removed the stale label Sep 5, 2021
@github-actions
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Sep 21, 2021
@stale stale bot removed the stale label Sep 21, 2021
@github-actions
Copy link

github-actions bot commented Oct 7, 2021

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Oct 7, 2021
@stale stale bot removed the stale label Oct 7, 2021
rvlad-patrascu added a commit that referenced this issue Oct 8, 2021
Do not use the wolfSSL ECC Fixed Point cache as it is not freed until
library cleanup. Also, clear the error queue after each call to
wolfSSL_read().

Fixes #2604

(cherry picked from commit d5d069d)
@vasilevalex
Copy link
Contributor Author

Hi @rvlad-patrascu
Now it is much-much better, but not perfect. Now it's growing slowly, but still growing.
Again data is from the same server with OpenSSL (3 weeks):
openssl_mem
And after applying patch with WolfSSL (2 weeks):
wolf_leak02
I also migrated SIP-proxy with about 10000 phones to OpenSIPS 3.2 with WolfSSL. Phones keep TLS connection alive, connections are created and removed not so often, so memory is leaking more slowly. But it is still noticeable on 2-3 weeks range.

@rvlad-patrascu
Copy link
Member

Hi @vasilevalex,

Can you please try the following patch?
wolfssl_mem_leak.patch.txt

@vasilevalex
Copy link
Contributor Author

Hi @rvlad-patrascu , I've built it. I will write here, when I have some stats.

@stale
Copy link

stale bot commented Jan 9, 2022

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@stale stale bot added stale and removed stale labels Jan 9, 2022
@github-actions
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Jan 25, 2022
@vasilevalex
Copy link
Contributor Author

I'm afraid the patch did not help.
wolf_leak04
The patch is applied to 3.2 branch on top of (e60d6e1)

@stale stale bot removed the stale label Feb 7, 2022
@stale
Copy link

stale bot commented Apr 18, 2022

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@stale stale bot added stale and removed stale labels Apr 18, 2022
@github-actions
Copy link

github-actions bot commented May 4, 2022

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label May 4, 2022
@stale stale bot removed the stale label May 4, 2022
@github-actions
Copy link

github-actions bot commented Jun 8, 2022

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Jun 8, 2022
@stale stale bot removed the stale label Jun 8, 2022
@rvlad-patrascu
Copy link
Member

Hi @vasilevalex

Unfortunately I can not identify why the leak is still happening without some additional information from the wolfSSL library. I've prepared a patch which enables memory debugging for wolfSSL so we can see where the allocations happen in the wolfSSL code when doing the shm memory dump.

Can you apply the following patch, rebuild wolfSSL and get a couple of mem dumps?
wolfssl_debug_mem.patch.txt

@vasilevalex
Copy link
Contributor Author

vasilevalex commented Aug 16, 2022

Hi @rvlad-patrascu
Sorry for delay, it's summer. I built version with latest patch. There are 2 dumps attached. First is right after OpenSIPS started and the second is after 5 days running.
wolf_shmem00.txt
wolf_shmem01.txt
wolf_shmem02.txt

@rvlad-patrascu
Copy link
Member

Hi @vasilevalex,

I still cannot determine why the leak happens based on the above dumps but it does give some hints. I've come up with a temporary "fix" that would help narrow this down more, should it prevent the leak.

So can you please apply the patch attached here(along with wolfssl_debug_mem.patch.txt, in case the leak persists) and give it another go?

wolfssl_mem_leak_tmp_fix.patch.txt

@vasilevalex
Copy link
Contributor Author

Hi @rvlad-patrascu ,

I built version with both patches. It behaviors strange. After start OpenSIPS processes several TLS connections, but then silently stops accepting any TLS traffic. No logs, no errors. UDP is still processed. But impossible to connect to TLS anymore. I made trap file with opensips-cli, if you need it, I can send to email. So I had to roll back.

@rvlad-patrascu
Copy link
Member

Hi @vasilevalex

Please send the trap file to vladp@opensips.org. Also, did you see any opensips processes with 100% CPU load?

@rvlad-patrascu
Copy link
Member

Ok, my mistake here, there was indeed an error in the previous patch that would deadlock the TCP main process. It should be fixed now, see the updated patch below.
wolfssl_mem_leak_tmp_fix_v2.patch.txt

@github-actions
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Sep 25, 2022
@github-actions
Copy link

github-actions bot commented Nov 5, 2022

Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details.

@github-actions github-actions bot closed this as completed Nov 5, 2022
@vasilevalex
Copy link
Contributor Author

It's not fixed yet, please reopen.

@rvlad-patrascu
Copy link
Member

Hi @vasilevalex,

By not fixed yet, do you mean the deadlock introduced with wolfssl_mem_leak_tmp_fix.patch.txt or the actual memleak?

@vasilevalex
Copy link
Contributor Author

Hi @rvlad-patrascu ,
The deadlock was introduced by [wolfssl_mem_leak_tmp_fix.patch.tx]. It is not related to the original memleak. And [wolfssl_mem_leak_tmp_fix_v2.patch.txt] did not fix deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants