Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.5 rc4] Users hitting 403 (tricky to reproduce) #15070

Closed
schrd opened this issue May 23, 2022 · 22 comments
Closed

[2.5 rc4] Users hitting 403 (tricky to reproduce) #15070

schrd opened this issue May 23, 2022 · 22 comments

Comments

@schrd
Copy link
Collaborator

schrd commented May 23, 2022

Describe the bug
In a load test with bots users got kicked out from the meeting with a 403 "you have been removed from the meeting" message. There was an unintended configuration problem on the server which resulted in all listen only participants being connected to freeswitch instead of mediasoup. Freeswitch then consumed all available CPU on the server, top showed 0.x% idle CPU. Not all bots were able to connect and few of the humans in the meeting were kicked out.

To Reproduce
I don't know how I can reproduce this. It happend only once in several tests.

BBB version:
BBB 2.5 rc4

Desktop (please complete the following information):

  • OS: Linux
  • Browser Firefox
  • Version 91 ESR

Additional context

I don't know if this behaviour of kicking out users in an overload situation is intended. If it is then there should be a different message. If you don't consider this a serious bug, I'm fine with this. Just didn't want to ignore our observation

@ffdixon
Copy link
Member

ffdixon commented May 23, 2022

Thanks for reporting this issue @schrd.

@antobinary antobinary added this to the Release 2.5 milestone May 31, 2022
@antobinary antobinary changed the title [2.5 rc4] [2.5 rc4] Users hitting 403 (tricky to reproduce) May 31, 2022
@mokazemi
Copy link

mokazemi commented Jun 16, 2022

We've also experienced this for some users in the meetings. (in both desktop and android). I also don't know where it's coming from.
403 "you have been removed from the meeting" message.
Even if the user was moderator.

There was an unintended configuration problem on the server which resulted in all listen only participants being connected to freeswitch instead of mediasoup.
...

But I haven't changed any specific configurations. Just a bit of playing with audio/video bitrates.
Next time I'll try to check browser console logs when I experienced the same issue again, to check if it has any useful information.

@mokazemi
Copy link

This issue happened to me again!
I noticed two things:

  • My internet connection was so bad, and it went on reconnecting state a few times, then suddenly I saw 403 Error
  • I saw this error in the console logs:

Uncaught (in promise) Call to chatMessageBeforeJoinCounter failed because Meteor is not connected

Screenshot_20220625_192826

I hope it helps.

@ffdixon
Copy link
Member

ffdixon commented Jun 25, 2022

Thanks for sharing this @mokazemi. It looks like the client may have just lost connection to the server (after trying to reconnect). Do you know if any other users in the session experienced the same issue (i.e. the problem looks closer to your internet connection and not with the server)?

@ffdixon
Copy link
Member

ffdixon commented Jul 23, 2022

We've released 2.5.4 and we're still tracking this issue.

We starting to look at ValidateAuthToken/the reconnection procedure. Our theory is there may be a reconnection issue that, when triggered, will flood meteor with events, which causes a CPU spike and similar to FreeSWITCH using all the memory, causes clients to disconnect with a 403 error.

Of course, the challenge is right now to replicate this reconnection issue. @schrd We'd be interested if your able to force this happening in that release under testing load.

@MBM1607
Copy link
Contributor

MBM1607 commented Jul 25, 2022

@ffdixon we are facing this same issue, Its reproduceable under stress testing.

@MBM1607
Copy link
Contributor

MBM1607 commented Jul 25, 2022

We were able to reproduce the issue by joining with 8+ users, and remove connection to force reconnecting attempts.

Error

Uncaught TypeError: a.getSubscription(...) is null

Uncaught (in promise) Call to chatMessageBeforeJoinCounter failed because Meteor is not connected

image

@BrentBaccala
Copy link
Contributor

I can reproduce this consistently.

I've seen the Call to chatMessageBeforeJoinCounter error, but it seems to come after the 403 error, so I think it's incidental to the issue.

Still investigating.

@BrentBaccala
Copy link
Contributor

This is reproducible on 2.6.0-alpha.2, in addition to 2.5.4.

@ffdixon
Copy link
Member

ffdixon commented Aug 9, 2022

In reproducing the issue on 2.5, does it matter if allowDuplicateExtUserid is set to true or false? See

https://groups.google.com/g/bigbluebutton-setup/c/Qefm8dduv5Y/m/SYThs_6uAQAJ

@blueiceprj
Copy link

Hi. If you tracking websocket connections over userid and the parameter (allowDuplicateExtUserid) set true it could be a problem. Best solution to handle websocket connection without problem is using stomp broker (rabbitmq). With this method you can solve websocket connection problems( reconnect, heartbeat, etc) and also it will help to solve you access bbb over the load balancer. We have based on spring cloud app and multiple nginx and gateway. It was only one option for us.

@BrentBaccala
Copy link
Contributor

@ffdixon, allowDuplicateExtUserid doesn't seem to make a difference

@MBM1607
Copy link
Contributor

MBM1607 commented Oct 13, 2022

@ffdixon After upgrading to 2.5.6, I am no longer able to reproduce this issue. Previously on version 2.5.4, I was able to join using firefox multi-containers and then trigger reconnections by turning the connection on and off repeatedly.

@ffdixon
Copy link
Member

ffdixon commented Oct 13, 2022

@ffdixon After upgrading to 2.5.6, I am no longer able to reproduce this issue. Previously on version 2.5.4, I was able to join using fire multi-containers and then trigger reconnections by turning the connection on and off repeatedly.

That's very positive feedback -- thanks for sharing!

@MBM1607
Copy link
Contributor

MBM1607 commented Oct 13, 2022

Is it possible that it was resolved due to #15723? seems relevant to me,

@ffdixon
Copy link
Member

ffdixon commented Oct 13, 2022

Check on your server, that setting is currently false by default

https://github.com/bigbluebutton/bigbluebutton/blob/v2.5.x-release/bigbluebutton-html5/private/config/settings.yml#L210

But if you enable it, it will reduce the load when users fall back to long polling (which isn't very efficient and too many users doing long polling could cause disconnects for others, which is why we introduced this setting).

@ffdixon
Copy link
Member

ffdixon commented Oct 27, 2022

Hi Brent, I'm curious on your tests with the latest 2.5.8 regarding 403 disconnects.

@BrentBaccala
Copy link
Contributor

@ffdixon, an hour-long test with ten clients and a modest level of broken TCP sessions (all sessions broken once every ten seconds) yielded no client disconnects of any kind.

I'd still like to collect some more data on this issue, though.

@ffdixon
Copy link
Member

ffdixon commented Oct 28, 2022

@ffdixon, an hour-long test with ten clients and a modest level of broken TCP sessions (all sessions broken once every ten seconds) yielded no client disconnects of any kind.

Thanks Brent! Keep pushing the boundaries, but very positive indeed.

@antobinary antobinary modified the milestones: Release 2.5, Release 2.7 Jan 31, 2023
@Davka
Copy link
Contributor

Davka commented Apr 14, 2023

I I'm not sure if it's the same bug, but we have a similar problem in 2.6.1
The meeting sizes were limited to 40 participants and after a certain time even moderators / presenters are removed from the meeting with the 403 message. Re-entry is not possible. Even afterwards, if there are only 4 people in the meeting, for example, each new participant gets the 403 message.

The browser console shows following error.
Bildschirmfoto 2023-04-14 um 12 06 48

I haven't found anything in the logs yet

@antobinary

@hostbbb
Copy link
Contributor

hostbbb commented Apr 14, 2023

so it looks like max_participant counter is rejecting in above browser console. Wonder if reconnects of users keep adding to the meeting count somehow. A good test could be to set max_participant to 3 for a meeting, and play around with brower refreshs and new users joining. Let me try to replicate in 2.6.1

@hostbbb
Copy link
Contributor

hostbbb commented Apr 24, 2023

#17699

See this for 2.6.4 example. can replicate with maxParticipants set to 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants