Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can Srf client know about the loss of TCP/WS connection from SIP client to Drachtio server #138

Closed
jyotigroomefy opened this issue Dec 25, 2022 · 26 comments

Comments

@jyotigroomefy
Copy link

We have an application that uses drachtio srf to connect to the drachtio server.
We have situation when the TCP/WS connection of SIP client to Drachtio server is broken but it appears that there is no way drachtio server communicate that connection loss to the drachtio srf client. After this we have dialogs with disconnected sip client alive the in srf client . Now when we send sip message using these dialogs then drachtio server crashes with segmentation fault. If there is way to get destroy events with all the dialogs associated with the disconnected TCP/WS connections, it would be very helpful for application using drachtio srf.

@davehorton
Copy link
Collaborator

what version of drachtio server are you running? If not the latest, could you try with 0.8.20 and let me know if the server still crashes in this instance? If it does, I would appreciate seeing the drachtio server logs with sofia-loglevel=4

@jyotigroomefy
Copy link
Author

Thanks for your prompt response Dave.

I have collected the logs. I have attached here.
I am using Drachtio version 0.8.20

filtered4.txt
filtered2.txt

@davehorton
Copy link
Collaborator

Thanks this log is useful. Reviewing it now. How easy is this for you to recreate?

@jyotigroomefy
Copy link
Author

filtered5.txt
Here is one more log file for above crashes.

@davehorton
Copy link
Collaborator

Thanks. Again -- is this easy for you to recreate? If so, I would like to ask your help in testing a fix

@davehorton
Copy link
Collaborator

Would it be possible for you to test with v0.8.21-rc1 and to forward the logs again when you recreate? I have added some logging that I hope will help me troubleshoot this further

@davehorton
Copy link
Collaborator

Also, would it possible to get more history leading up to the crash? I would like to be able to see the wss connection from when it is first made, and the logs so far all start after the client has established the wss socket connection. In any case, please just provide as much log history as possible leading up to the seg fault.

@jyotigroomefy
Copy link
Author

we are testing the latest one. we will share logs if we found seg fault issue.

@davehorton
Copy link
Collaborator

ok great.

@jyotigroomefy
Copy link
Author

Hi we have tested latest one v0.8.21-rc1. We have found same crashes. Sharing logs here.

filtered1.txt

@davehorton
Copy link
Collaborator

davehorton commented Jan 13, 2023

I looked over the logs but I need more history. I have to be able to track the entire history of when this specific client connected to drachtio. This log covers the period 2023-01-12 17:03:09 - 2023-01-12 17:05:15. Thats only a few minutes. Have you got the logs just preceeding this time peroid? If so please provide.

In this log, I can see the issue has to do with reference counting, but the reference counts for this connection are already off by one at the start of the log, so the issue I need to investigate and fix happened at an earlier time, The specific client in this case was connecting from wss/192.168.1.125:21268 and I need to go back to the log where he first connected

I really want to fix this bug but I need your help to preserve and provide all of the logs for a much longer period of time,

@davehorton
Copy link
Collaborator

by the way we can leave the issue open here, but it is more properly a drachtio server bug

@davehorton
Copy link
Collaborator

any chance you've been able to make this happen again, and get more log history @jyotigroomefy ?

@jyotigroomefy
Copy link
Author

Yes. i am able to take full log history. Please find in the attachment.

drachtioCompleteFiltered.txt

@davehorton
Copy link
Collaborator

Excellent thanks, will have a look

@davehorton
Copy link
Collaborator

davehorton commented Jan 24, 2023

Your code is doing one thing wrong which needs to be corrected. You are responding to an incoming REGISTER with a 202 Accepted. This is not correct, a success response to a REGISTER is a 200 OK, not a 202. (A 202 is the success response for a SUBSCRIBE).

2023-01-21T06:46:55.28341336Z stdout F [0] SIP/2.0 202 Accepted
2023-01-21T06:46:55.283418339Z stdout F [0] Via: SIP/2.0/WSS consort.org;branch=z9hG4bK1356925;received=192.168.1.125;rport=22661
2023-01-21T06:46:55.283423296Z stdout F [0] From: <sip:mcx.cons-857.cons@consort.org>;tag=9f2pefdtd3
2023-01-21T06:46:55.283428524Z stdout F [0] To: <sip:mcx.cons-857.cons@consort.org>;tag=ea4vQS96QNZ7D
2023-01-21T06:46:55.2834335Z stdout F [0] Call-ID: rljqtset8cartjdbv5f6r1
2023-01-21T06:46:55.283438734Z stdout F [0] CSeq: 7942 REGISTER
2023-01-21T06:46:55.283453612Z stdout F [0] Contact: <sip:mcx.cons-857.cons@consort.org>;+g.3gpp.icsi-ref="urn%3Aurn-7%3A3gpp-service.ims.icsi.mcptt";+g.3gpp.mcptt
2023-01-21T06:46:55.283460235Z stdout F [0] Expires: 0
2023-01-21T06:46:55.283465531Z stdout F [0] Content-Length: 0
2023-01-21T06:46:55.283470693Z stdout F [0]  
2023-01-21T06:46:55.283475785Z stdout F [0] 2023-01-21 12:16:55.275963 tport.c:2345 tport_set_secondary_timer() tport(0x563333420000): reset timer

Could you please change your code to respond 200 OK when you want to respond with a success result to a REGISTER. Also please upgrade to drachtio-server version v0.8.21-rc3. After making both changes please retest and let me know the results.

v0.8.21-rc3 has a fix that may resolve this issue (but it also requires that you properly respond 200 instead of 202 to successful register requests)

@jyotigroomefy
Copy link
Author

thank you :) I will change response 200 OK. I will use v0.8.21-rc3 and let you know the results.

@jyotigroomefy
Copy link
Author

Hi
we have changed response to 200 OK and used latest version in our application. We have tested it. now frequently crashes is not happening but got same crash yesterday. i have attached log file here.

We have gone through your code and noticed one thing in "sip-dialog-controller.cpp" file.
'pData" pointer variable is redeclared same as argument variable. i have attached code snapshot here.
PHOTO-2023-01-27-11-52-09 2
PHOTO-2023-01-27-11-52-09

drachtio_log.txt

@davehorton
Copy link
Collaborator

Thanks I will take a look at your log.

As far as that statement defining a local var named pData, that is not a problem. It is not re-declaring an existing variable it is creating a new variable of the same name so in this scope that variable is what is used when we refer to pData

@davehorton
Copy link
Collaborator

you are still incorrectly sending 202 responses to REGISTER requests. Could you fix that, and provide a new log if you still experience crashes?

2023-01-25T13:57:19.295093506Z stdout F [0] 2023-01-25 19:27:19.294716 Client::read_handler read: c6d15884-7839-4f24-9883-67c92a303060|sip|73aa8742-80d4-4848-98a7-ce183f85b68f|
2023-01-25T13:57:19.295110268Z stdout F [0] SIP/2.0 202 Accepted
2023-01-25T13:57:19.295119837Z stdout F [0] Call-ID: idc38nsv5vb7096e9v5pv6
2023-01-25T13:57:19.295128731Z stdout F [0] cseq: 904 REGISTER
2023-01-25T13:57:19.295145116Z stdout F [0] from: <sip:mcx.cons-487.cons@consort.org>;tag=i175v71a1f
2023-01-25T13:57:19.295154496Z stdout F [0] to: <sip:mcx.cons-487.cons@consort.org>
2023-01-25T13:57:19.295163473Z stdout F [0] Content-Length: 0
2023-01-25T13:57:19.295172128Z stdout F [0] contact: <sip:mcx.cons-487.cons@consort.org>;+g.3gpp.icsi-ref="urn%3Aurn-7%3A3gpp-service.ims.icsi.mcptt";+g.3gpp.mcptt
2023-01-25T13:57:19.295181279Z stdout F [0] expires: 0
2023-01-25T13:57:19.2951902Z stdout F [0] 
2023-01-25T13:57:19.295198971Z stdout F [0] 

@davehorton
Copy link
Collaborator

can you give me an updated status on this? Are you having any issues after the change above?

@jyotigroomefy
Copy link
Author

I am really sorry for sending logs without changing 202 status code. We have now fixed 202 status code responses to REGISTER requests and good thing we have not found any issue for last 3 days. we are continue doing testing. I will update here if we found any issue. Thank you for helping us :)

@davehorton
Copy link
Collaborator

no worries at all. Please let me know how it goes, whether you experience any crashes or hangs, and also how memory utilization looks over time

@davehorton
Copy link
Collaborator

@jyotigroomefy can you give me an update on where this stands? Is your issue fixed or is it still occurring?

@davehorton
Copy link
Collaborator

closed do to lack of response; presumed fixed.

@jyotigroomefy
Copy link
Author

@davehorton it's working fine. Our issue is fixed. It hasn't occur again. Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants