Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bbb-web to loadbalance bbb-html5 instances #11008

Merged
merged 6 commits into from Dec 15, 2020

Conversation

antobinary
Copy link
Member

@antobinary antobinary commented Dec 11, 2020

  1. bbb-web loadbalances all bbb-html5 instances (based on cpu load it picks the lowest cpu and assigns it to handle new meeting)
$ ps -u meteor -o pcpu,cmd= | grep node-
 7.7 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=3
 7.7 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=1
 7.6 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=4
 7.8 /usr/share/node-v12.16.1-linux-x64/bin/node main.js INFO_INSTANCE_ID=2

Ensure the new meeting is handled by the process with the most capacity to do so. No need of calculating number of meetings or users, just use CPU load.
No longer need to pass a meta parameter on meeting create to utilize parallel nodejs process.

  1. akka-apps sends redis events to a new special channel to each of the bbb-html5 instances (so the redis events are filtered by meeting - only hear what you need to handle, prevent parallel processes handling the same events)
    to-html5-redis-channel1 - where 1 is the instanceId

No need to "open" each redis message in every process to see whether it is to be handled by that process.

  1. Removed the bigbluebutton.properties for picking HTML5 over Flash (reverting Provide option to join via html5 over flash client #4754 ), now HTML5 client is considered the default

TODO:

  • Add the INFO_INSTANCE_ID= to systemd_start.sh
- PORT=$PORT /usr/share/$NODE_VERSION/bin/node main.js
+ PORT=$PORT /usr/share/$NODE_VERSION/bin/node main.js INFO_INSTANCE_ID=$INSTANCE_ID
  • Go through the akka-apps messages and ensure the 21 messages needed by bbb-web are routed to it and the 82 needed by html5 client are routed to it (and thus reducing the noice to from-akka-apps* channels

Related to #10933 #10868 #10860 #10969

@defnull
Copy link
Contributor

defnull commented Dec 14, 2020

Using current load per node process is not a very good metric and might actually be worse than just plain simple round-robin. I see several problems here:

  • The current load of a (single-threaded) node process is either 0% or 100%. You always need to measure over a certain amount of time to get an average. Thus, the ps tool does not actually return current CPU usage, but "the percentage of time spent running during the entire lifetime of a process". Node processes are long-lived. A large meeting maxing out a node process would only raise its ps reported CPU usage very slowly. A node process that is currently idling will still report high usage caused by a large meeting in the past. The reading is useless for load-balancing.
  • Waiting for MongoDB does not count towards CPU usage, but is an important factor for meteor applications.
  • Load-balancing based on a single metric with no random factor and no instant penalty after each request is very vulnerable to a variant of the "thundering herd" problem. The current 'best' backed will receive all new requests until its load actually raises, which, as we learned above, may take some time. This is really bad in scenarios where lots of meetings start at the same time (schools, universities).
  • Meetings usually start slowly. Once the users actually join and load increases, they cannot be moved. You cannot know in advance how big a meeting will be, or which features are going to be used. Load-balancing meetings, instead of requests, is a gamble. Two large meetings may end up on the same backend and there is no way to prevent that.

We currently have the same discussion for scalelite, and no real solution yet. It sounds intuitive to allocate new meetings to the server/process with the lowest load, but if you take into account the "thundering herd" problem and that you cannot predict the future of a meeting, it is not that easy anymore. It is not possible to make an informed decision before you have the required data. Trying to do so might actually be worse than a naive round-robin approach.

@antobinary
Copy link
Member Author

@defnull Thank you for your comment! I have set the loadbalancing to use round robin approach, at least for now.

@antobinary
Copy link
Member Author

I will finish the redis message routing to to-html5-redis-channel{N} channel in another pull request. I already have an if-else condition that ensures only one nodejs instances handles a meeting's mesages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants