Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First attempt at starting an audio call fails sometimes #2130

Open
jpd236 opened this issue May 5, 2024 · 9 comments
Open

First attempt at starting an audio call fails sometimes #2130

jpd236 opened this issue May 5, 2024 · 9 comments

Comments

@jpd236
Copy link
Contributor

jpd236 commented May 5, 2024

From JS console (but probably not helpful since the code is optimized; is it possible to deobfuscate this given the Docker image?):

Uncaught (in promise) AwaitQueueStoppedError: AwaitQueue stopped
    at e.stop (index.js:1:1712)
    at C.close (Transport.js:1:3075)
    at P.debug.direction (76d9292fb3a47a03a0eae3d3561909c11809ddfd.js?meteor_js_resource=true:226:301467)
    at cB (76d9292fb3a47a03a0eae3d3561909c11809ddfd.js?meteor_js_resource=true:70:101972)
    at cM (76d9292fb3a47a03a0eae3d3561909c11809ddfd.js?meteor_js_resource=true:70:103837)
    at f8 (76d9292fb3a47a03a0eae3d3561909c11809ddfd.js?meteor_js_resource=true:70:123356)
    at 76d9292fb3a47a03a0eae3d3561909c11809ddfd.js?meteor_js_resource=true:70:121804
    at eI (76d9292fb3a47a03a0eae3d3561909c11809ddfd.js?meteor_js_resource=true:78:1571)
    at MessagePort.eV (76d9292fb3a47a03a0eae3d3561909c11809ddfd.js?meteor_js_resource=true:78:2154)

Doesn't seem to happen on subsequent attempts - I've only seen it the first time I've tried launching a call after the server. It says it'll connect automatically, but seems to be stuck permanently.

This could be related to my pending single instance changes - I'll need to try to reproduce with the standard network + NLB setup to make sure that isn't it.

@ebroder
Copy link
Member

ebroder commented May 7, 2024

We do ship source maps for Javascript, which should result in Chrome decoding those errors, but I have seen cases where it doesn't work. Googling around a little bit, it seems like uncaught exceptions thrown inside of promises might just not trigger sourcemap decoding? Which is pretty unfortunate.

There is still enough information that we can start digging. I'm reasonably certain that exception is getting thrown from here:

if (!newTransport.closed) newTransport.close();

(If you want to check my math, AwaitQueue is an abstraction from the mediasoup people that queues promises so they don't run in parallel. Within mediasoup-client, it's only used by the Transport class. What I don't understand is why that's throwing from within the Transport.close method, since as far as I can tell, AwaitQueue.stop causes other enqueued promises to reject with that exception.

FWIW, I don't seem to be able to reproduce this on our instance of JR. I'm skeptical that NLB is somehow protecting us here, but I'm not entirely sure what would be happening instead. Have you verified that coturn is behaving correctly? I left some notes on #1970 about how I test our coturn server.

@jpd236
Copy link
Contributor Author

jpd236 commented May 7, 2024

Looks to be working per those steps, at least when I have calls that work:

image

FWIW there are two turnConfig events in the WS logs - one for direction "recv" and one for direction "send". Not sure if it matters which I use.

In any case I'm also skeptical that the lack of NLB would matter. And the issue only ever seems to come up on the first attempt to connect after starting a new instance (and I don't think it's 100% consistent there, though I definitely saw it a few times in that scenario).

Only other oddity I've noticed is that sometimes, even when it works, I see that yellow warning bar very briefly, though it does connect fast enough that I can't even read the warning on it.

@jpd236
Copy link
Contributor Author

jpd236 commented May 7, 2024

I am also seeing rare page load failures in the web UI where some initial load fails with a ERR_INCOMPLETE_CHUNKED_ENCODING error in the console. I wonder if something is just up on the networking side.

@jpd236
Copy link
Contributor Author

jpd236 commented May 8, 2024

Just reproduced again on my first attempt of the night (so doesn't seem like haproxy 2.9.0 was the issue vs 2.9.7). FWIW BugSnag agrees with you on the impacted line of code.

On success I typically see a line in haproxy's logs indicating a turn connection. In the failure case I don't see that.

@jpd236
Copy link
Contributor Author

jpd236 commented May 8, 2024

I don't know if this is relevant, but looking at the coturn logs, it seems like every successful session starts with a 401 error before things become successful, e.g.:

154269: (15): INFO: IPv4. tcp or tls connected to: 127.0.0.1:55170
154269: (15): INFO: session 000000000000000015: realm <ange.management> user <>: incoming packet message processed, error 401: Unauthorized
154269: (15): INFO: IPv4. Local relay addr: 127.0.0.1:41748
154269: (15): INFO: session 000000000000000015: new, realm=<ange.management>, username=<snip>, lifetime=600
154269: (15): INFO: session 000000000000000015: realm <ange.management> user <snip>: incoming packet ALLOCATE processed, success
154271: (15): INFO: session 000000000000000015: refreshed, realm=<ange.management>, username=<snip>, lifetime=0
154271: (15): INFO: session 000000000000000015: realm <ange.management> user <snip>: incoming packet REFRESH processed, success
154271: (15): INFO: session 000000000000000015: TCP socket closed remotely 127.0.0.1:55170
154271: (15): INFO: session 000000000000000015: usage: realm=<ange.management>, username=<snip>, rp=3, rb=276, sp=3, sb=280
154271: (15): INFO: session 000000000000000015: peer usage: realm=<ange.management>, username=<snip>, rp=0, rb=0, sp=0, sb=0
154271: (15): INFO: session 000000000000000015: closed (2nd stage), user <snip> realm <ange.management> origin <>, local 127.0.0.1:3478, remote 127.0.0.1:55170, reason: TCP connection closed by client (callback)
154271: (15): INFO: session 000000000000000015: delete: realm=<ange.management>, username=<snip>

@ebroder
Copy link
Member

ebroder commented May 8, 2024

I'm not positive but I suspect that's normal - at least with HTTP, it's standard to try requesting without auth and only provide auth when the server demands it, so it makes sense to me that browser behavior around TURN would be similar.

Is it easy for you to change https://github.com/deathandmayhem/jolly-roger/blob/main/imports/client/hooks/useCallState.ts#L31 to also pass level: "debug"? That should generate a bunch of extra log entries that might tell us more about at least how things are getting sequenced.

@jpd236
Copy link
Contributor Author

jpd236 commented May 8, 2024

Captured the following logs. (FWIW the change you described didn't seem to do anything with console log output or Bugsnag reporting; I had to change imports/logger.ts's level to have any effect. Something unexpected with the logging library?)

configureLogger.ts:52 [useCallState] Clearing Mediasoup device
configureLogger.ts:52 [useCallState] clearing transport {direction: 'send'}
configureLogger.ts:52 [useCallState] clearing transport {direction: 'recv'}
configureLogger.ts:52 [useCallState] producerTracks {tracks: Array(0)}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-peers', selfPeer: undefined, otherPeers: Array(0)}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-router', router: undefined}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-device', device: undefined}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-transport', direction: 'send', transport: undefined}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-transport', direction: 'recv', transport: undefined}
configureLogger.ts:52 [useCallState] producerTracks {tracks: Array(0)}
configureLogger.ts:52 [useCallState] dispatch {type: 'request-capture'}
configureLogger.ts:52 [useCallState] dispatch {type: 'join-call', audioState: {…}}
configureLogger.ts:52 [useCallState] producerTracks {tracks: Array(1)}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-peers', selfPeer: {…}, otherPeers: Array(0)}
configureLogger.ts:52 [useCallState] server set initial peer state {initialPeerState: 'active'}
configureLogger.ts:52 [useCallState] Fetching mediasoup-client code
configureLogger.ts:52 [useCallState] dispatch {type: 'set-router', router: {…}}
configureLogger.ts:52 [useCallState] Creating new Mediasoup device
configureLogger.ts:52 [useCallState] dispatch {type: 'set-device', device: e}
configureLogger.ts:52 [useCallState] subscribe mediasoup:transports {peerId: 'tFkshY49XT2qTnyqn', rtpCaps: '{"codecs":[{"mimeType":"audio/opus","kind":"audio"…preferredEncrypt":false,"direction":"sendrecv"}]}'}
configureLogger.ts:52 [useCallState] clearing transport {direction: 'send'}
configureLogger.ts:52 [useCallState] clearing transport {direction: 'recv'}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-transport', direction: 'send', transport: undefined}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-transport', direction: 'recv', transport: undefined}
configureLogger.ts:52 [useCallState] Creating new Mediasoup transport {transportId: 'e299e359-02ab-42e8-82c5-d3bdf8efcf24', direction: 'send'}
configureLogger.ts:52 [useCallState] setting transport {direction: 'send', newTransport: C}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-transport', direction: 'send', transport: C}
configureLogger.ts:52 [useCallState] producerTracks {tracks: Array(1)}
configureLogger.ts:52 [useCallState] Creating Mediasoup producer {track: 'c0e7e06a-c82d-487d-9be9-5bcac449ee3c'}
configureLogger.ts:52 [useCallState] Creating new Mediasoup transport {transportId: 'e299e359-02ab-42e8-82c5-d3bdf8efcf24', direction: 'send'}
configureLogger.ts:52 [useCallState] setting transport {direction: 'send', newTransport: C}
configureLogger.ts:52 [useCallState] Creating new Mediasoup transport {transportId: '1c02de70-f42c-4328-a1c2-dd35d2f7b372', direction: 'recv'}
configureLogger.ts:52 [useCallState] setting transport {direction: 'recv', newTransport: C}
index.js:1 Uncaught (in promise) AwaitQueueStoppedError: AwaitQueue stopped
    at e.stop (index.js:1:1712)
    at C.close (Transport.js:1:3075)
    at P.debug.direction (useCallState.ts:421:48)
    at cB (is-plain-object.mjs:1:47)
    at cM (is-plain-object.mjs:1:47)
    at f8 (is-plain-object.mjs:1:47)
    at is-plain-object.mjs:1:47
    at eI (is-plain-object.mjs:1:47)
    at MessagePort.eV (is-plain-object.mjs:1:47)
stop @ index.js:1
close @ Transport.js:1
P.debug.direction @ useCallState.ts:421
cB @ is-plain-object.mjs:1
cM @ is-plain-object.mjs:1
f8 @ is-plain-object.mjs:1
(anonymous) @ is-plain-object.mjs:1
eI @ is-plain-object.mjs:1
eV @ is-plain-object.mjs:1
await in eV (async)
(anonymous) @ useCallState.ts:890
(anonymous) @ useCallState.ts:851
cR @ is-plain-object.mjs:1
f8 @ is-plain-object.mjs:1
(anonymous) @ is-plain-object.mjs:1
eI @ is-plain-object.mjs:1
eV @ is-plain-object.mjs:1
Show 13 more frames
Show less
configureLogger.ts:52 [useCallState] dispatch {type: 'set-transport', direction: 'send', transport: C}
configureLogger.ts:52 [useCallState] dispatch {type: 'set-transport', direction: 'recv', transport: C}
configureLogger.ts:52 [useCallState] producerTracks {tracks: Array(1)}

@jpd236
Copy link
Contributor Author

jpd236 commented May 12, 2024

There was a new release of mediasoup-client a few days ago but it doesn't seem to have fixed this.

I can consistently reproduce the AwaitQueueStoppedError locally - but not the permanent failure state - as follows:

  • Open a puzzle page
  • Set a breakpoint in useCallState line 610 (dispatch({ type: "set-device", device: newDevice });)
  • Join audio call - this should trip the breakpoint
  • Wait ~60 seconds
  • Resume execution
  • The breakpoint should trip a second time - resume it immediately

In this case, I see an additional failure, Meteor method call failed: Mediasoup.Transports.methods.connect. I suspect that the 60 second wait causes something to time out, which causes a second try to occur. If I skip the 60 second wait, the breakpoint doesn't trigger a second time, and there are no issues.

The failure doesn't seem to be fatal in this case as the call does connect, though it is fatal when I am seeing it in production. But I wonder if fixing it in this case would also fix it there.

What I don't understand is why that's throwing from within the Transport.close method, since as far as I can tell, AwaitQueue.stop causes other enqueued promises to reject with that exception.

Perhaps what is happening is that there are two calls to stop, and the second one is rejecting the first? So they look like they come from the same place but are in fact from two different operations. That would maybe line up with this reproducing when a retry occurs.

@jpd236
Copy link
Contributor Author

jpd236 commented May 12, 2024

I see in the debug logs that there are two instances of:

configureLogger.ts:52 [useCallState] Creating new Mediasoup transport {transportId: 'e299e359-02ab-42e8-82c5-d3bdf8efcf24', direction: 'send'}

On successful calls this only happens once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants