Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workerpool rewrite #22

Merged
merged 2 commits into from Jun 9, 2020
Merged

Workerpool rewrite #22

merged 2 commits into from Jun 9, 2020

Conversation

j1elo
Copy link
Member

@j1elo j1elo commented Jun 9, 2020

Rewrite the WorkerPool class, and fix CPU bottlenecks that have been found with Callgrind profiling. See commit messages for detailed description of all changes.

What is the current behavior you want to change?

The WorkerPool class would exhaust all threads by locking itself into a cycle of detecting saturation and spawning more threads.

Also, CPU reached 100% during long spans of time due to some CPU bottlenecks in the disconnection implementation.

What is the new behavior provided by this change?

  • WorkerPool class has been rewritten to make use of Boost threadpool.
  • CPU bottlenecks have been identified with the help of Callgrind CPU profiler, and fixed.

How has this been tested?

Custom-made Node app based on kurento-hello-world tutorial, in which a one-to-many 1:8000 WebRtcEndpoint connections were made upon starting the demo. Then, when the user clicked Stop, all endpoints were released at the same time. This was enough to cause the CPU 100% lock for more than 10 minutes in the dev machine.

We also got some feedback about better behavior after applying this patch. See conversation and graphs in issue Kurento/bugtracker#462.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature / enhancement (non-breaking change which improves the project)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • My change requires a change to the documentation
  • My change requires a change in other repository

Checklist

  • I have read the Contribution Guidelines
  • I have added an explanation of what the changes do and why they should be included
  • I have written new tests for the changes, as applicable, and have successfully run them locally

Fixes Kurento/bugtracker#462
Fixes Kurento/bugtracker#468
Possibly related: Kurento/bugtracker#409

j1elo added 2 commits May 22, 2020
The old implementation was using 2 thread pools: the main one, used to
schedule and run tasks in secondary threads; and a helper one, used to
schedule a health check routine.

This design wasn't good because EACH.SINGLE.CALL to WorkerPool::post()
would imply TWO new tasks being scheduled:
1. The task itself
2. A timed task to measure execution time

In case the execution time was longer than 3 seconds, the task was
considered "locked", and somehow assumed that it would never recover, so
a new thread would be spawned.

Now picture this: the server is under heavy load and tasks start to get
too much time to finish. The existing implementation would start to
consider all tasks locked, and would spawn threads for all of them,
which soon enough would again take too much time, thus causing an
explosion of thread spawns.

The fact that two tasks are scheduled per post() only exacerbated the
issue.

The new code is much simpler. We realize that, for now, in case of CPU
exhaustion there is not much we can do: tasks will take longer, and
growing the numbr of threads won't necessarily help.

The code is based on the simple but effective ideas shown in several
sources:

* Asio C++ library recipes: A thread pool for executing arbitrary tasks
- http://think-async.com/Asio/Recipes
- https://stackoverflow.com/questions/19500404/how-to-create-a-thread-
pool-using-boost-in-c/19500405#19500405

* Boost Asio timer tutorial
-
https://www.boost.org/doc/libs/1_72_0/doc/html/boost_asio/tutorial.html

versions of Boost >= 1.66.0 have a 'thread_pool' class:
https://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/thread_pool.html
which maybe would have been interesting for this, but right now we have
to use Boost 1.58.0 (Ubuntu 16.04 Xenial) and Boost 1.65.1 (Ubuntu 18.04
Bionic)
MediaElementImpl::disconnect(), called from
MediaElementImpl::disconnectAll(), had to perform multiple searchs in
order to find the desired GstPad, only to then grab its name and pass it
to the C plugin, which would then have to search again for that name in
the list of element pads.

Iterating over all the connected pads multiple times, first on the C++
side and later on the C side, is very inefficient, and in our tests with
8000 connected endpoints, the CPU was wasted in mere lookups trying to
find a GstPad by its name. Depending on the server it could take up to
10 minutes with the CPU at 100%!

Instead, let's try this: upon creation we already have a pointer to the
GstPad. Why store its name and later have to search for it multiple
times? It is probably more efficient to just store the pointer and use
it directly whenever the element wants to disconnect (see private class
ElementConnectionDataInternal)

Fixes Kurento/bugtracker#462
Fixes Kurento/bugtracker#468
@j1elo j1elo self-assigned this Jun 9, 2020
@j1elo
Copy link
Member Author

j1elo commented Jun 9, 2020

Testing shows a much better behavior upon disconnecting thousands of consumer WebRtcEndpoints from a single producer.

This is a graph of CPU% usage during a load test. The test went up to 12 participants. The moment the test ends (4:24:30) all 12 participants leave at the same time, and the thread lock bug is triggered, with CPU staying at 100%:
image

With the fix proposed in https://github.com/Kurento/kms-core/tree/workerpool-rewrite, the test ends (4:44:30) and CPU comes back to idle levels:
image

@j1elo
Copy link
Member Author

j1elo commented Jun 9, 2020

Paulo Lanzarin did also some tests: Kurento/bugtracker#462 (comment)

Scenario: 30 participants with video, full star topology, 8 vCPU Xeon E5-2630 box, 8GB RAM.
Join time is spaced ~5s. Leave time is scheduled in batches of 5 participants every minute (kind of a friendly scenario). Could reproduce the lock with current master (see graph Before). Couldn't reproduce with workerpool-rewrite (see graph After).

Before

30x30, dynamic profiles_ cpu (before)

After

30x30_ dynamic profiles, cpu (after)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant