Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b2b_logic can't handle more than 300-400 CPS and has memory corruption #2425

Closed
nikbyte opened this issue Feb 28, 2021 · 10 comments
Closed

b2b_logic can't handle more than 300-400 CPS and has memory corruption #2425

nikbyte opened this issue Feb 28, 2021 · 10 comments
Assignees
Labels
Milestone

Comments

@nikbyte
Copy link
Member

nikbyte commented Feb 28, 2021

Hi team,

I found that if start load tests with common calls, starting from 300-400 CPS b2b_logic starts to have memory corruptions.
It's easy reproduceable in two cases:

  1. Enable write-through db writing (db_mode=1) and you'll get corrupted dlg structures on b2bl_db_update. If enable db_mode=2 even with 1 second interval this doesn't reproduce.

  2. From time to time call b2b_cback() in b2b_entities module crashes opensips because b2b_cback pointer points to some released memory.

Usually it occurs on BYE or final ACK when there is a large amount of requests and especially when there are retransmits.

@nikbyte nikbyte added the bug label Feb 28, 2021
@nikbyte nikbyte added this to the 3.2 milestone Feb 28, 2021
@rvlad-patrascu
Copy link
Member

Hi @nikbyte ,

Did you test this with the "top_hiding" scenario ? If not, can you also post the script logic that triggers this?

@rvlad-patrascu rvlad-patrascu self-assigned this Feb 28, 2021
@nikbyte
Copy link
Member Author

nikbyte commented Feb 28, 2021

It seems, I found a reason. I'll post pool request soon.

@nikbyte
Copy link
Member Author

nikbyte commented Feb 28, 2021

I'm testing with non-top hiding, but scenario is pretty simple: create server, create client, init b2b.

@nikbyte
Copy link
Member Author

nikbyte commented Feb 28, 2021

@rvlad-patrascu this helps a lot: #2426
Also I have patch which helps, but I'm not sure it's proper:

diff --git a/modules/b2b_entities/dlg.c b/modules/b2b_entities/dlg.c
index f4f1370e8..d2281a6a5 100644
--- a/modules/b2b_entities/dlg.c
+++ b/modules/b2b_entities/dlg.c
@@ -1070,7 +1070,7 @@ logic_notify:

 				if(dlg->uas_tran && dlg->uas_tran!=T_UNDEFINED)
 				{
-					if(dlg->uas_tran->uas.request)
+					if(method_value != METHOD_BYE && dlg->uas_tran->uas.request)
 					/* there is another transaction for which no reply
 					 * was sent out */
 					{

Without this patch there is dlg->uas_tran exists, but dlg->uas_tran.uas points to released memory.
Maybe you know how to fix this properly.

With these two patches it's not crashing for me.

@bogdan-iancu
Copy link
Member

@nick, could you explain a bit your fix here, please ?

@nikbyte
Copy link
Member Author

nikbyte commented Apr 6, 2021

@bogdan-iancu I'm not sure that this fix is proper, it may be more workaround. But.

Initial condition is if(dlg->uas_tran->uas.request) .
Under load it's crashing because of dlg->uas_tran.uas points to released memory, however dlg->uas_tran structure is still exists. It seems, that BYE transaction is already released when we come to this part of code.

Adding condition method_value != METHOD_BYE resolves this issue, but I'm not sure if this is a proper solution or I'm trying just to hide another issue.

@rvlad-patrascu
Copy link
Member

@nikbyte Do you by any chance still have any backtraces for these crashes ?

@nikbyte
Copy link
Member Author

nikbyte commented Jul 14, 2021

Unfortunately not possible for the moment. :(

@nikbyte
Copy link
Member Author

nikbyte commented Jul 14, 2021

From another side, you can try sipp with high cps, I think you'll see these crashes.

rvlad-patrascu added a commit that referenced this issue Jul 15, 2021
rvlad-patrascu added a commit that referenced this issue Jul 15, 2021
Credits to @nikbyte for the fix

Related to #2425
Closes #2426

(cherry picked from commit aba8ae5)
rvlad-patrascu added a commit that referenced this issue Jul 15, 2021
Credits to @nikbyte for the fix

Related to #2425
Closes #2426

(cherry picked from commit aba8ae5)
@rvlad-patrascu
Copy link
Member

@nikbyte I could not reproduce these crashes. And since you are saying that the issue is now solved with your own patches and you cannot provide any other info, I think we can close this ticket for now.

Feel free to re-open this when you will be using the official 3.2 code. If the crashes persist at that point, it will confirm that the fix in aba8ae5 is not enough and you should also be able to provide backtraces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants