New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No response after authenticated allocation request - Server broken until restart #894
Comments
I have the same issue. |
I realised that authentication related threads deadlock while getting sqlite connection as the following. Installed sqlite version is pretty old in my environment: sqlite-3.7.17-8.el7_7.1.x86_64
|
@korayvt You are referencing 15 different issues that unrelated to this one. |
Done |
Do you see something in the server logs? |
After seeing sqlite lock in the core dump, I compiled coturn excluding sqlite and I haven't had this problem again. |
Closing it. Pls open a new issue if you see the problem again. TY! |
@eakraly Can you please reopen this? I was able to reproduce this problem with coturn tag 4.5.1.3. It impacted one of my production deployments in December, and I've been researching the underlying issue ever since. Only just able to "catch it in the wild" today. The issue appears to be some kind of mutex synchronization issue where two threads are both waiting on a mutex. Not explicitly seen since i had to redact some of the stack variable contents for confidentiality reasons, but the username for the auth of the two stuck threads is the same. This may be relevant, but i can't say for certain. In a packet capture, the behavior was that the turnserver would respond to an ALLOCATE with the standard error+nonce, but then would ignore ALLOCATE+nonce packets. Each ALLOCATE+nonce appears to be associated with an increase in the amount of memory consumed by the process (appears to be 896 bytes every time, but i only measured this a couple times, so it's not necessarily always this size). I haven't studied how the auth code works yet, but a wild speculation might be that it's queueing the auth processing, waiting on currently in-process sqlite queries to finish before starting anew. We've tryed to reproduce this with only one success. So far we've launched approximately 1,000 new instances, and only one has had the problem. It appears to only occur on startup, and then impact the process for the duration of execution. No log statements are made when this bug is triggered. Moving forward: I am going to compile the turnserver with all of the DB engines disabled, since my organization does not use them. The version of sqlite3 currently compiled in is 3.7.17, which is the version vended with Amazon Linux 2. It may be that this problem is a bug in sqlite3 that has subsequently been fixed. It's hard to say, and I'm not going to dig further since we're removing all of the compile-time optional functionality entirely.
Looking at the auth code here: coturn/src/apps/relay/userdb.c Line 302 in 87602ea
I'm a bit confused why any database queries are performed at all, when in my situation, the secrets are provided as part of the config file. I think an improvement here would be to either:
|
Problem
Our TURN-Server stops working randomly. After the initial request without credentials and the 401 response, the authenticated request does not get an answer. This happens for all sessions until the process is restarted. This happens on multiple servers.
The logs only show when the clients finally give up and close the socket.
Steps to reproduce
The text was updated successfully, but these errors were encountered: