bbb-web to loadbalance bbb-html5 instances #11008

antobinary · 2020-12-11T21:52:47Z

bbb-web loadbalances all bbb-html5 instances (based on cpu load it picks the lowest cpu and assigns it to handle new meeting)

$ ps -u meteor -o pcpu,cmd= | grep node-
 7.7 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=3
 7.7 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=1
 7.6 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=4
 7.8 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=2

Ensure the new meeting is handled by the process with the most capacity to do so. No need of calculating number of meetings or users, just use CPU load.
No longer need to pass a meta parameter on meeting create to utilize parallel nodejs process.

akka-apps sends redis events to a new special channel to each of the bbb-html5 instances (so the redis events are filtered by meeting - only hear what you need to handle, prevent parallel processes handling the same events)
to-html5-redis-channel1 - where 1 is the instanceId

No need to "open" each redis message in every process to see whether it is to be handled by that process.

Removed the bigbluebutton.properties for picking HTML5 over Flash (reverting Provide option to join via html5 over flash client #4754 ), now HTML5 client is considered the default

TODO:

Add the INFO_INSTANCE_ID= to systemd_start.sh

- PORT=$PORT /usr/share/$NODE_VERSION/bin/node main.js
+ PORT=$PORT /usr/share/$NODE_VERSION/bin/node main.js INFO_INSTANCE_ID=$INSTANCE_ID

Go through the akka-apps messages and ensure the 21 messages needed by bbb-web are routed to it and the 82 needed by html5 client are routed to it (and thus reducing the noice to from-akka-apps* channels

Related to #10933 #10868 #10860 #10969

defnull · 2020-12-14T09:08:15Z

Using current load per node process is not a very good metric and might actually be worse than just plain simple round-robin. I see several problems here:

The current load of a (single-threaded) node process is either 0% or 100%. You always need to measure over a certain amount of time to get an average. Thus, the ps tool does not actually return current CPU usage, but "the percentage of time spent running during the entire lifetime of a process". Node processes are long-lived. A large meeting maxing out a node process would only raise its ps reported CPU usage very slowly. A node process that is currently idling will still report high usage caused by a large meeting in the past. The reading is useless for load-balancing.
Waiting for MongoDB does not count towards CPU usage, but is an important factor for meteor applications.
Load-balancing based on a single metric with no random factor and no instant penalty after each request is very vulnerable to a variant of the "thundering herd" problem. The current 'best' backed will receive all new requests until its load actually raises, which, as we learned above, may take some time. This is really bad in scenarios where lots of meetings start at the same time (schools, universities).
Meetings usually start slowly. Once the users actually join and load increases, they cannot be moved. You cannot know in advance how big a meeting will be, or which features are going to be used. Load-balancing meetings, instead of requests, is a gamble. Two large meetings may end up on the same backend and there is no way to prevent that.

We currently have the same discussion for scalelite, and no real solution yet. It sounds intuitive to allocate new meetings to the server/process with the lowest load, but if you take into account the "thundering herd" problem and that you cannot predict the future of a meeting, it is not that easy anymore. It is not possible to make an informed decision before you have the required data. Trying to do so might actually be worse than a naive round-robin approach.

antobinary · 2020-12-15T01:57:16Z

@defnull Thank you for your comment! I have set the loadbalancing to use round robin approach, at least for now.

antobinary · 2020-12-15T22:07:32Z

I will finish the redis message routing to to-html5-redis-channel{N} channel in another pull request. I already have an if-else condition that ensures only one nodejs instances handles a meeting's mesages

antobinary added 2 commits December 9, 2020 19:11

remove obsolete attendeesJoinViaHTML5Client moderatorsJoinViaHTML5Client

418fdb1

Loadbalance bbb-html5 in bbb-web based on CPU

0be8773

antobinary requested a review from pedrobmarin December 11, 2020 21:53

This was referenced Dec 11, 2020

Support for parallel bbb-html5 nodejs processes. Part 1 #10860

Merged

Performance Issues after 2.2.28: NodeJS concurrent users limit #10739

Closed

antobinary added this to the Release 2.3 milestone Dec 11, 2020

antobinary added 2 commits December 11, 2020 23:46

cleanup

09ca742

dispatch whiteboard events to html5 redis channel only

9239d51

Set bbb-html5 loadbalancing to be round robin

8b65f9e

remove obsolete callback from user remove

99ff801

antobinary marked this pull request as ready for review December 15, 2020 22:08

antobinary merged commit 65f2d5b into bigbluebutton:develop Dec 15, 2020

antobinary mentioned this pull request Dec 16, 2020

Change bigbluebutton.properties client url param to defaultHTML5ClientUrl #11028

Merged

schrd mentioned this pull request Jan 4, 2021

Split html5-server in multiple processes to larger meetings #10349

Merged

antobinary mentioned this pull request Jan 5, 2021

Handle guestWait url for multiple nodejs instanceIds #11104

Merged

This was referenced Apr 29, 2024

Clean up backend vs frontend + loadbalancing for bbb-html5 package #20106

Open

refactor/build: drop html5InstanceId and simplify bbb-html5 frontend/backend #20132

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bbb-web to loadbalance bbb-html5 instances #11008

bbb-web to loadbalance bbb-html5 instances #11008

antobinary commented Dec 11, 2020 •

edited

defnull commented Dec 14, 2020

antobinary commented Dec 15, 2020

antobinary commented Dec 15, 2020

bbb-web to loadbalance bbb-html5 instances #11008

bbb-web to loadbalance bbb-html5 instances #11008

Conversation

antobinary commented Dec 11, 2020 • edited

defnull commented Dec 14, 2020

antobinary commented Dec 15, 2020

antobinary commented Dec 15, 2020

antobinary commented Dec 11, 2020 •

edited