New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split html5-server in multiple processes to larger meetings #10349
Conversation
Changes to be committed: new file: akka-bbb-apps/src/main/scala/org/bigbluebutton/core/apps/externalvideo/ExternalVideoApp2x.scala new file: akka-bbb-apps/src/main/scala/org/bigbluebutton/core/apps/externalvideo/StartExternalVideoPubMsgHdlr.scala new file: akka-bbb-apps/src/main/scala/org/bigbluebutton/core/apps/externalvideo/StopExternalVideoPubMsgHdlr.scala new file: akka-bbb-apps/src/main/scala/org/bigbluebutton/core/apps/externalvideo/UpdateExternalVideoPubMsgHdlr.scala modified: akka-bbb-apps/src/main/scala/org/bigbluebutton/core/pubsub/senders/ReceivedJsonMsgHandlerActor.scala modified: akka-bbb-apps/src/main/scala/org/bigbluebutton/core/running/MeetingActor.scala modified: akka-bbb-apps/src/main/scala/org/bigbluebutton/core2/FromAkkaAppsMsgSenderActor.scala new file: bbb-common-message/src/main/scala/org/bigbluebutton/common2/msgs/ExternalVideoMsgs.scala new file: bigbluebutton-html5/imports/api/external-videos/server/eventHandlers.js new file: bigbluebutton-html5/imports/api/external-videos/server/handlers/startExternalVideo.js new file: bigbluebutton-html5/imports/api/external-videos/server/handlers/stopExternalVideo.js new file: bigbluebutton-html5/imports/api/external-videos/server/handlers/updateExternalVideo.js modified: bigbluebutton-html5/imports/api/external-videos/server/index.js modified: bigbluebutton-html5/imports/api/external-videos/server/methods.js modified: bigbluebutton-html5/imports/api/external-videos/server/methods/emitExternalVideoEvent.js modified: bigbluebutton-html5/imports/api/external-videos/server/methods/startWatchingExternalVideo.js modified: bigbluebutton-html5/imports/api/external-videos/server/methods/stopWatchingExternalVideo.js new file: bigbluebutton-html5/imports/api/external-videos/server/streamer.js modified: bigbluebutton-html5/imports/api/meetings/server/handlers/meetingDestruction.js modified: bigbluebutton-html5/imports/api/meetings/server/modifiers/addMeeting.js modified: bigbluebutton-html5/imports/api/meetings/server/modifiers/meetingHasEnded.js modified: bigbluebutton-html5/imports/api/users/server/handlers/validateAuthToken.js modified: bigbluebutton-html5/imports/api/users/server/store/bannedUsers.js modified: bigbluebutton-html5/imports/startup/server/index.js modified: bigbluebutton-html5/imports/startup/server/redis.js modified: bigbluebutton-html5/imports/ui/components/external-video-player/service.js modified: bigbluebutton-html5/private/config/settings.yml
Hi @amguirado73 ! Thank you for your contribution! Could you please confirm you have filled out a CLA? https://docs.bigbluebutton.org/support/faq.html#why-do-i-need-to-sign-a-contributor-license-agreement-to-contribute-source-code ? |
Hello,
I just signed it and sent it by email. If there is something that is not
clear in the PR, please let me know to clarify it.
Thank you.
Regards.
El vie., 28 ago. 2020 a las 22:19, Anton Georgiev (<notifications@github.com>)
escribió:
… Hi @amguirado73 <https://github.com/amguirado73> ! Thank you for your
contribution! Could you please confirm you have filled out a CLA?
https://docs.bigbluebutton.org/support/faq.html#why-do-i-need-to-sign-a-contributor-license-agreement-to-contribute-source-code
?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10349 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APQOHHZ7TFPFZ2HWYO7TFW3SDAGMTANCNFSM4QOAZT3Q>
.
|
Contributor agreement received -- thanks! |
@amguirado73 thanks a lot for introducing this new PR also during email conversation. We've tested your PR in release build
We've configured
Our target was to hold 600 users on each server & we're happy to find it working. Hope BBB core team will have a look on this & merge/further improve. Thanks again to @amguirado73 for all your help during run the test. |
Thanks @jibon57 for sharing your experience! This work (or a variation of it) will be merged into BigBlueButton 2.3-dev. |
@jibon57 - thank you very much for this valuable live test and sharing the results. One question: When you say "Both server was able to hold 800+ users ..." do you mean both servers together had a total of 800+ users or each server managed to service 800+ ... a total of 1600+ between the two servers? Thank you again |
Hi guys, I have 4 questions
Regards, |
And thank you again ! |
Hi,
I can answer questions 1 and 2:
#1 Users sessions are balanced by nginx using source IP address. A
meeting can be distributed in several processes.
#2 There are no tests about it. You can start as many process as you
want. Obviously, an optimal number of processes must be found.
A possible sizing rule is 100-150 users per process, similarly to the
current limit.
Regards.
El 21/09/2020 a las 8:56, Aurélien GUERSON escribió:
…
Hi guys,
You are on on the right way ;)
I have 4 questions
1. On your 5 HTML servers, one meeting exploded on the 5, or one
meeting will stay always in 1 of the 5 servers ?
2. Could you increase the barrier of 5 server to up to 10 or more ?
3. When you said "you decreased the bitrate to 50kbit/s", where did
you did this ?
4. Maybe I missed something on the doc, when you said "the pagination
was on where Moderator: 10 & Attendee: 5" I activated the
pagination in 2.2.25, this is not automatic ? you have to push a
button during a meeting ? Active an option on the meeting ?
Regards,
Aurélien.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10349 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APQOHH73QOCO5BG4EYKLKXLSG32JLANCNFSM4QOAZT3Q>.
|
@amguirado73 @ffdixon Without this PR and without the 3 KMS PR, I managed to accept 450 persons with 2cam in one meeting ! ( but the chat at the end didn't respond,... and the meeting freezed )... with this server ( OpenVZ 7 installed and CTs in ubuntu 16.04 LTS used ) I already update to 2.2.25 to user the 3 KMS PR, and I hope with this PR, I could finaly take more than 500 users in one sessions. |
I managed to accept 450 persons with 2.2.23 |
Thanks @amguirado73 I am just confused about:
The motivation was to get more, if we still in the same limit then what is point? If I am using Scalelite then this not adding any value? |
Hi,
Although the initial idea of the patch was to have more users per
meeting, (and it still is), Jibon has tested that the patch allows to
have more users within the same server.
100-150 users per process is a basic rule of thumb to see how many
html5-servers should be started to avoid problems. This is a complement
for scalalite. It let you grow inside server.
About if we still in the same limit for a meeting, I have not enough
resources to do tests for this number of users (I can generate only 100
bots). I would appreciate any kind of help about it.
Regards
El 21/09/2020 a las 9:54, Mohamad Abras escribió:
…
Thanks @amguirado73 <https://github.com/amguirado73>
I am just find this confusing
Currently, there is a limitation whereby a meeting can have
between 100-200 users depending on the restrictions applied to
users. I would like get more users per meeting.
A possible sizing rule is 100-150 users per process, similarly to
the current limit.
The motivation was to get more, if we still in the same limit then
what is point?. If I am using Scalelite then this not adding any value?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10349 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APQOHHYUWHBN2E2RVVK4A33SG4BE7ANCNFSM4QOAZT3Q>.
|
How can you generate 100 bots, have you got a script ? I want to test it to charge my server with the max bots users I could. I could give you a feedback after. |
Just yesterday I was using: |
Hi,
could you tell us what type of server you have?.
Thank you very much.
Regards
El 21/09/2020 a las 14:34, Mohamad Abras escribió:
…
About if we still in the same limit for a meeting, I have not
enough resources to do tests for this number of users (I can
generate only 100 bots). I would appreciate any kind of help about it.
Just yesterday I was using:
https://github.com/mconf/bigbluebot
I was able to generate 250 bots with it. Five server, each 50 bot.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10349 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APQOHHZ2SIRS2U4GPTYBMPTSG5B6JANCNFSM4QOAZT3Q>.
|
I am always trying to implement bigbluebot. I opened an issue to use it |
@jibon57 - Many thanks for the answers. Thanks to everyone else for sharing the info. |
Hello guys, I was checking Now edit
Nginx config:
Now start process |
In my case, I am always trying to find enough ressources to run the 500+ Bots. |
Hi @jibon57 Your servers have a lot of RAM, if you use PM2 to have a cluster then you will use a lot of RAM too and I think this is what is happening on your servers, same for your CPU you are using a lot of it. |
Thanks @GhaziTriki . Actually PR from @amguirado73 . For my case I didn't need to have more than 100 users capacity in per room that's why didn't perform test. But @amguirado73 did test with over 320 users. So far this solution is working fine for my case. |
@amguirado73 How much efforts is it to create a similar PR 2.2.x? I am interested into testing it. |
|
Will this be included in one of the 2.2.xx releases or only in 2.3? I only read someone manage to test this in 2.2.23 and 2.2.25 but I am sure this would help a lot of users -right now- facing problems getting enough users on there servers and meetings within limited resources available. I am currently on 2.2.29 (dev) |
We are running this patch adapted for 2.2.28 in production on 12 servers since a week with 2 frontend processes each. It works very well so far. We decided to give this patch a try because scalelite balancing strategy is so stupid. We have a very unequal distribution of conference sizes. Two weeks ago scalelite decided to balance two conferences with 150 participants each onto a server which was already loaded with 100 users. Of course the inevitable kicking of users happened. In a load test with 100 desktop computers running 5 bigbluebots each, this patch performed very well, we got 475 bots joined into 4 conferences. However today one nodejs frontend processes crashed with
I assume that this is a nodejs bug and is not related to the patch and I believe I saw this message some time ago as well without the patch. Memory was not short on the system. Existing users on the server were immediately balanced to the other frontend process by nginx. I doubt that anyone noticed that error. However when the process was restarted by systemd and nginx balanced connections to it, I discovered the following messages in the log:
I could reproduce this on our test server by restarting one frontend process. The cause is that cursor position and annotations in the presentations are distributed between the users by the meteor-streamer library. The subscriptions of meteor-streamer are started by addMeeting events. The freshly started process however never got those events for the meetings already running. Distribution of annotations and cursor position worked for fresh conferences. I would like to contribute a patch that fixes this behaviour, however I am not understanding why this meteor-streamer stuff works and why the replication of annotation state between the two meteor processes works at all. To me this meteor-streaming stuff looks like it should broadcast the messages between all users in a conference, but just within the same nodejs process. Can anybody give me a hint where to look at and how to improve that patch that it will continue to work if meteor is restarted? My current idea is to query the mongodb at server startup for meetings and register those meetings for streaming of cursor and annotations. |
This pull request has conflicts ☹ |
With 2.3 taking its time, is there any chance to maybe get this into one of the coming 2.2 releases? |
There is work underway to get a variation of this approach into the next build of BigBlueButton 2.3-dev. |
So, not planned for 2.2? Guess I will have to wait for the 2.3 release then. |
It's easier for us to implement and test this in 2.3-dev. We're weaning off of 2.2 soon as we want to focus our efforts on the next release and get the product onto Ubuntu 18.04 asap. |
@ffdixon Can we migrate from 2.2 to 2.3? or we can't? (I'm worry about my prev recorded views) |
We would recommend setting up 2.3-dev on a new 18.04 server, leaving your 16.04 server untouched. The recommendation would be to then copy over the recordings from 2.2 onto the 2.3-dev server. This way, nothing changes on the 2.2 server and you can test the 2.3-dev server indepednetly. |
We are running this patch since 7 weeks with more than 32k meeting hours and more than 200k participant hours. A meeting with 10 users for one hour is counted as 1 meeting hour and 10 participant hours. We ported this to 2.2.30 in the mean time. In total we hat 3 nodejs crashes across all of our servers since then. So I assume that this is safe. At least I never hat to struggle with overloaded nodejs processes since then. It also reduces the latency of actions such as mute/unmute on servers with more than 200 concurrent users. Compared to #11008 this approach is superior in my optinion:
I setup the nginx loadbalancer to use the number of connections for deciding to which meteor process to use. We run it with 2 workers currently. |
Do you maybe have documentation for applying the patch to 2.2.30? Also: Does applying the patch require repackaging BBB, and are there any changes in 2.2.30 which require adjustments to the patch? |
You need to rebuild bbb-html5 and bbb-apps-akka packages. In addition a new systemd unit
The patched source tree is here: https://gitlab.hrz.tu-chemnitz.de/bigbluebutton/bigbluebutton/-/tree/pr-10349-2.2.30 The systemd unit is included in https://gitlab.hrz.tu-chemnitz.de/bigbluebutton/bigbluebutton-packaging/-/blob/master/bbb-html5/bbb-html5-frontend@.service |
I am looking at the nginx load balancing strategies trying to understand more about the way frontend f1 is preferred over frontend f2 when a new user is joining. I understand that the |
We are running it with |
Before I always tried with I also tried with |
2.3-alpha7 was just released and it included #11317 which is based on this work of @amguirado73 and @jfsiebel https://github.com/bigbluebutton/bigbluebutton/releases/tag/v2.3-alpha-7 Please give it a try if you have the opportunity to put some load on a 2.3-alpha7 server (not in production) |
What does this PR do?
This PR allows to split html5-server in multiple processes. It is inspired by PR # 8788 but with a different approach. The main idea is to create multiple html5-server processes that allow bypassing the current limitation imposed by the fact that node executes practically in a single thread. In this way, you can get more users per meeting.
Motivation
Currently, there is a limitation whereby a meeting can have between 100-200 users depending on the restrictions applied to users. I would like get more users per meeting.
More
Using an environment variable METEOR_ROLE = [backend | frontend] multiple processes can be started. There should only be one backend process that will handle all events related to the mongoDB. The frontend processes, which only listen to the frontend-redis-channel redis channel, will only process certain messages. It is possible to start as many frontend processes as desired. NGINX is used to balance users' meteor connections.
Examples or use in file /usr/share/meteor/bundle/systemd_start.sh
#For 2 process FRONTEND and 1 BACKEND
/usr/bin/npx concurrently -n 'backend,frontend1,frontend2' "env METEOR_ROLE=backend PORT=3000 /usr/share/$NODE_VERSION/bin/node main.js" "env METEOR_ROLE=frontend PORT=3001 /usr/share/$NODE_VERSION/bin/node main.js" "env METEOR_ROLE=frontend PORT=3002 /usr/share/$NODE_VERSION/bin/node main.js"
#For 3 process FRONTEND and 1 BACKEND
/usr/bin/npx concurrently -n 'backend,frontend1,frontend2,frontend3' "env METEOR_ROLE=backend PORT=3000 /usr/share/$NODE_VERSION/bin/node main.js" "env METEOR_ROLE=frontend PORT=3001 /usr/share/$NODE_VERSION/bin/node main.js" "env METEOR_ROLE=frontend PORT=3002 /usr/share/$NODE_VERSION/bin/node main.js" "env METEOR_ROLE=frontend PORT=3003 /usr/share/$NODE_VERSION/bin/node main.js"
The configuration to apply in NGINX is:
File: /etc/nginx/nginx.conf
File /etc/bigbluebutton/nginx/bbb-html5.nginx
location / html5client {
proxy_pass http: // poolhtml5servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $ http_upgrade;
proxy_set_header Connection "Upgrade";
}
The hash command $ remote_addr is used to ensure that each user always goes to the same html5server.
As collateral effects, a problem has been observed with external videos, when the presenter makes start / stop or reposition the video. To do this, the part of external videos has been modified for events that generate messages to Redis channels and thus be able to be received by all existing frontend processes. I have reused PR#7484 to do this.
Also, the bannedusers part has been modified so that mongoDB is used instead of a Set local to each process.