-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First attempt at starting an audio call fails sometimes #2130
Comments
We do ship source maps for Javascript, which should result in Chrome decoding those errors, but I have seen cases where it doesn't work. Googling around a little bit, it seems like uncaught exceptions thrown inside of promises might just not trigger sourcemap decoding? Which is pretty unfortunate. There is still enough information that we can start digging. I'm reasonably certain that exception is getting thrown from here:
(If you want to check my math, FWIW, I don't seem to be able to reproduce this on our instance of JR. I'm skeptical that NLB is somehow protecting us here, but I'm not entirely sure what would be happening instead. Have you verified that coturn is behaving correctly? I left some notes on #1970 about how I test our coturn server. |
Looks to be working per those steps, at least when I have calls that work: FWIW there are two In any case I'm also skeptical that the lack of NLB would matter. And the issue only ever seems to come up on the first attempt to connect after starting a new instance (and I don't think it's 100% consistent there, though I definitely saw it a few times in that scenario). Only other oddity I've noticed is that sometimes, even when it works, I see that yellow warning bar very briefly, though it does connect fast enough that I can't even read the warning on it. |
I am also seeing rare page load failures in the web UI where some initial load fails with a ERR_INCOMPLETE_CHUNKED_ENCODING error in the console. I wonder if something is just up on the networking side. |
Just reproduced again on my first attempt of the night (so doesn't seem like haproxy 2.9.0 was the issue vs 2.9.7). FWIW BugSnag agrees with you on the impacted line of code. On success I typically see a line in haproxy's logs indicating a turn connection. In the failure case I don't see that. |
I don't know if this is relevant, but looking at the coturn logs, it seems like every successful session starts with a 401 error before things become successful, e.g.:
|
I'm not positive but I suspect that's normal - at least with HTTP, it's standard to try requesting without auth and only provide auth when the server demands it, so it makes sense to me that browser behavior around TURN would be similar. Is it easy for you to change https://github.com/deathandmayhem/jolly-roger/blob/main/imports/client/hooks/useCallState.ts#L31 to also pass |
Captured the following logs. (FWIW the change you described didn't seem to do anything with console log output or Bugsnag reporting; I had to change imports/logger.ts's level to have any effect. Something unexpected with the logging library?)
|
There was a new release of mediasoup-client a few days ago but it doesn't seem to have fixed this. I can consistently reproduce the
In this case, I see an additional failure, The failure doesn't seem to be fatal in this case as the call does connect, though it is fatal when I am seeing it in production. But I wonder if fixing it in this case would also fix it there.
Perhaps what is happening is that there are two calls to stop, and the second one is rejecting the first? So they look like they come from the same place but are in fact from two different operations. That would maybe line up with this reproducing when a retry occurs. |
I see in the debug logs that there are two instances of:
On successful calls this only happens once. |
From JS console (but probably not helpful since the code is optimized; is it possible to deobfuscate this given the Docker image?):
Doesn't seem to happen on subsequent attempts - I've only seen it the first time I've tried launching a call after the server. It says it'll connect automatically, but seems to be stuck permanently.
This could be related to my pending single instance changes - I'll need to try to reproduce with the standard network + NLB setup to make sure that isn't it.
The text was updated successfully, but these errors were encountered: