Skip to content

Load Testing on Docker

Henrique Rosa edited this page Apr 13, 2016 · 3 revisions

Load Testing on Docker

I've been conducting load tests to distinguish performance between RestComm binary versus RestComm-Docker image. For this effect, I developed a RestComm application that performs a Gather and then Says the digit pressed by the caller. The call duration is 6 seconds.

I setup an EC2 m3.2xlarge instance with specifications as shown below. This instance type is preferred mainly because of the increased number of cores. They greatly contribute to better RestComm performance, since the Media Server allocates a number of threads proportional to the number of available cores.

Model vCPU Mem (GiB) SSD Storage (GB)
m3.medium 1 3.75 1 x 4
m3.large 2 7.5 1 x 32
m3.xlarge 4 15 2 x 40
m3.2xlarge 8 30 2 x 80

Before running the tests, I configured RestComm as follows:

  • Set logging threshold to ERROR (both JBoss and AKKA)
  • Configure Media Server's resources pool ($MS_HOME/deploy/server-beans.xml) to accommodate load peaks by increasing the initial size of resources pools (endpoints, players, connections, etc).
  • Increase MGCP timeout on restcomm.xml to 1000ms (from 500).
  • Configure JVM for both RestComm and Media Server as

JAVA_OPTS="-Xms8g -Xmx8g -Xmn512m -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycle=100 -XX:CMSIncrementalDutyCycleMin=100 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:MaxPermSize=512m"

The list of results from the binary load tests is the following:

Concurrent Calls Call Rate (cps) Total Calls Successful Failed Comments
86 10 20000 20000 0
150 50 50000 50000 0
167 20 50000 50000 0
200 30 50000 50000 0
260 30 50000 49954 46 - 46 timeouts, because SIPp did not receive BYE
- logs in RestComm show SIP Servlet exception when sending BYE
- no errors in media server side
300 (peak 355) 30 100000 66744 18678 - Aborted test after 85422 calls
- After 50k calls, MGCP timeouts started to happen
- From that moment on, results got worse and CPU leak soon rendered MS unresponsive
250 50 44020 31824 12196 - mgcp timeouts
- no errors on MS log
- MS CPU leak (prevented test from continue running)

The best result I could obtain was 260 concurrent calls with a call rate of 30 calls per second. Although I've got 46 failures, these happened because RestComm was unable to send a BYE back to SIPp, which means that in practice the call was established successfully but only failed to hang up in elegant manner. Loads higher than this would start to generate MGCP timeouts from the Media Server, which would lead to a CPU leak that would soon enough degenerate the quality of the tests and ultimately leave the Media Server unresponsive.

Finally, I performed the same round of tests for the RestComm-Docker image, using similar configuration as described above. One detail worth mentioning is that Docker was operating in host mode (--net=host) instead of default bridge mode, because of well known performance issues. The highest load I could test that would result in a clean run was 60 concurrent calls with a call rate of 15 calls per second. Quite a performance hit.

Conclusions

My conclusion so far is that Docker is imposing a performance penalty on RestComm-Docker image. It might help to investigate well-known issues and bottlenecks inherent of Docker and what workarounds are currently adopted by community. For example, setting Docker container to run in host mode help to obtain better performance and reduces memory consumption compared to bridged mode. Also, it may be worth to investigate Docker performance using OverlayFS filesystem as described in here

On the other side, improving the responsiveness of the Media Server’s MGCP stack will help reducing the number of timeouts which will surely help prevent the CPU leak. That would surely translate in better test results and even allow to increase the call rates. Media Server issues to keep an eye on: #109, #58, #60, #92.