New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.5 rc4] Users hitting 403 (tricky to reproduce) #15070
Comments
Thanks for reporting this issue @schrd. |
We've also experienced this for some users in the meetings. (in both desktop and android). I also don't know where it's coming from.
But I haven't changed any specific configurations. Just a bit of playing with audio/video bitrates. |
This issue happened to me again!
I hope it helps. |
Thanks for sharing this @mokazemi. It looks like the client may have just lost connection to the server (after trying to reconnect). Do you know if any other users in the session experienced the same issue (i.e. the problem looks closer to your internet connection and not with the server)? |
We've released 2.5.4 and we're still tracking this issue. We starting to look at ValidateAuthToken/the reconnection procedure. Our theory is there may be a reconnection issue that, when triggered, will flood meteor with events, which causes a CPU spike and similar to FreeSWITCH using all the memory, causes clients to disconnect with a 403 error. Of course, the challenge is right now to replicate this reconnection issue. @schrd We'd be interested if your able to force this happening in that release under testing load. |
@ffdixon we are facing this same issue, Its reproduceable under stress testing. |
I can reproduce this consistently. I've seen the Still investigating. |
This is reproducible on 2.6.0-alpha.2, in addition to 2.5.4. |
In reproducing the issue on 2.5, does it matter if https://groups.google.com/g/bigbluebutton-setup/c/Qefm8dduv5Y/m/SYThs_6uAQAJ |
Hi. If you tracking websocket connections over userid and the parameter (allowDuplicateExtUserid) set true it could be a problem. Best solution to handle websocket connection without problem is using stomp broker (rabbitmq). With this method you can solve websocket connection problems( reconnect, heartbeat, etc) and also it will help to solve you access bbb over the load balancer. We have based on spring cloud app and multiple nginx and gateway. It was only one option for us. |
@ffdixon, |
@ffdixon After upgrading to 2.5.6, I am no longer able to reproduce this issue. Previously on version 2.5.4, I was able to join using firefox multi-containers and then trigger reconnections by turning the connection on and off repeatedly. |
That's very positive feedback -- thanks for sharing! |
Is it possible that it was resolved due to #15723? seems relevant to me, |
Check on your server, that setting is currently false by default But if you enable it, it will reduce the load when users fall back to long polling (which isn't very efficient and too many users doing long polling could cause disconnects for others, which is why we introduced this setting). |
Hi Brent, I'm curious on your tests with the latest 2.5.8 regarding 403 disconnects. |
@ffdixon, an hour-long test with ten clients and a modest level of broken TCP sessions (all sessions broken once every ten seconds) yielded no client disconnects of any kind. I'd still like to collect some more data on this issue, though. |
Thanks Brent! Keep pushing the boundaries, but very positive indeed. |
so it looks like max_participant counter is rejecting in above browser console. Wonder if reconnects of users keep adding to the meeting count somehow. A good test could be to set max_participant to 3 for a meeting, and play around with brower refreshs and new users joining. Let me try to replicate in 2.6.1 |
See this for 2.6.4 example. can replicate with maxParticipants set to 2 |
Describe the bug
In a load test with bots users got kicked out from the meeting with a 403 "you have been removed from the meeting" message. There was an unintended configuration problem on the server which resulted in all listen only participants being connected to freeswitch instead of mediasoup. Freeswitch then consumed all available CPU on the server, top showed 0.x% idle CPU. Not all bots were able to connect and few of the humans in the meeting were kicked out.
To Reproduce
I don't know how I can reproduce this. It happend only once in several tests.
BBB version:
BBB 2.5 rc4
Desktop (please complete the following information):
Additional context
I don't know if this behaviour of kicking out users in an overload situation is intended. If it is then there should be a different message. If you don't consider this a serious bug, I'm fine with this. Just didn't want to ignore our observation
The text was updated successfully, but these errors were encountered: