Skip to content

2.16 Load Testing Synopsis

John Jenkins edited this page Sep 4, 2013 · 2 revisions

Purpose

The purpose of these tests was to determine the approximate maximum load a single instance of ohmage could withstand while continuing to have acceptable performance during specific scenarios. The acceptable performance was generally considered to be sub-second response time with certain "heavy" operations having less than two seconds of response time. The heavy operations were uploading and downloading survey response and observer-stream data.

The tests were performed in two parts. The first part was designed to test the observer-stream data, and the second part was designed to test the survey response data.

Hardware

The instance of ohmage that was being tested was a virtual machine running on a single, powerful machine. The machine was running several other virtual machines. But, it was under-provisioned, and the load on the other virtual machines should never have been significant enough to affect the tests.

  • (2) 8-core Intel Sandy Bridge E5-2690 processors with a maximum frequency of 2.9 GHz
  • 256 GB DDR3 RAM operating at 1600 MHz
  • (4) 2 TB ES.2 hard drives operating at 7200 RPM in a RAID10 configuration
  • (2) 1 Gbps network cards, bonded

Software

One point of the test was to see how everything would perform if it were located on a single machine. Distributing the workload would create too complex of a set of tests for us to perform. The software used was:

  • nginx 1.4.1
  • Tomcat 7.0.42
  • OpenJDK 7u25
  • MySQL 5.5

Observer-Stream Data

This test was to determine the limits of ohmage when reading from a large dataset. The dataset had approximately 50 million rows, which equaled about 65 GB of data in the database and and index size of approximately 10 GB for a total of 75 GB of data in a single database on a single virtual machine running alongside the web application that was processing and returning it.

Virtual Machines

Eight virtual machines were used for testing to see where the machine began to break down. They were generated by combining the following values:

  • 4, 8, 16, or 32 GB of RAM
  • 2 or 4 cores

Tests

Tests were performed on three subsets of the data. The subsets each had unique properties.

  • Very sparse and small.
  • Moderately frequent and very large.
  • Very frequent and moderately sized.

The latter two were of the most interest, but it was also deemed interesting to see how they compared with the first.

Revelations

During the course of testing, a few things were realized.

  • The majority of the time was spent in the database.
  • The two machines with 4 GB of RAM took unacceptably long in all cases. (on occasion, 3 to 4 times longer than all of their counterparts).
  • Unsurprisingly, for these serial tests, the number of cores was irrelevant.
  • The number of data points was much more of a factor than the size of the data points. Therefore, the third type of data being queried was the most time-consuming.
  • The query uses an IN clause. There was a documented issue with versions of MySQL less than and equal to 5.5 regarding the IN clause that, under certain conditions of which ours was one, would cause requests to take a very long time.

Results

The tests were initially taking much longer than expected. With an uncached query and uncached data, the requests were taking approximately 10 seconds. Once the data was cached it was taking closer to 2 seconds.

After investigating the code that generates the query and observing the usage of the system, it was determined that a specific set of columns were being used in each query. Therefore, we crafted a multi-column index that covered these columns, "user_id", "observer_stream_link_id", "time_adjusted", and "time". Using this index, uncached queries would range from 10's of milliseconds to a few seconds depending on the data set being queried against. Once cached, nearly all queries would return in 10's or low 100's of milliseconds.

Survey Response Data

The tests for survey response data were performed with a specific scenario. The tests had two phases, upload then download.

Virtual Machine

The virtual machine was provisioned with the maximum hardware that any person would sanely give to a single machine before investigating distributing the workload.

  • Access to all the CPU power (shared with other virtual machines)
  • 64 GB RAM
  • 320 GB of hard disk space
    • Due to the magnitude of the image data, an NSFv4-backed network share was used to read and write only the image data.

nginx was used to close connections and report timeouts and handle SSL handshakes because it performed these tasks more reliably and faster than Tomcat. nginx also served up the static pages which were requested as part of the login flow.

Tomcat was used as the servlet container for the ohmage web application, and MySQL was its datastore. Binary data (e.g. images) were not stored in the database and were, instead, stored in a NSFv4-backed network share.

Scenario

The scenario was based on the requirements of a closely related project called Mobilize. Mobilize is a program designed to help educate high-school students on how to use technology and how to analyze data. They have created a curriculum designed to get classes of students to use ohmage as a data collection platform.

The scenario was such that 40 teachers would teach classes with 80 teachers being added each year. Each teacher would teach 2 new classes each year, and each class would have 30 students. The students would have 180 days to upload up to 3 survey responses per day with a probability of 0.75 that they would actually submit a given survey response. Each survey had 10 prompts, of which one was a photo prompt. Each photo is 98,831 bytes and has two modified images of 3,175 bytes and 2,632 bytes for a total of (98,831 + 3,175 + 2,632) 104,638 bytes.

As an example, the second year would have (40 + 80) 120 teachers. This would result in (120 * 2) 240 new classes. There would already be (40 * 2) 80 classes from the previous year, so there would now be a total of (240 + 80) 320 classes. The year would begin with (80 * 30) 2,400 students from the previous year plus the (240 * 30) 7,200 new students for this year for a total of (2,400 + 7,200) 9,600 students. Each student would submit approximately (180 * 3 * 0.75) 405 survey responses for a total of (7,200 * 405) 2,916,000 new survey responses, which would be combined with the (2,400 * 405) 972,000 survey responses from the previous year for a total of (2,916,000 + 972,000) 3,888,000 survey responses for each year. This also means that there are (3,888,000 * 10) 38,880,000 prompt responses, of which 3,888,000 were photo prompt responses with accompanying photos. This lead to a total (3,888,000 * 104,638) 406,832,544,000 bytes or 378.89 GB for the images alone at the end of the 2 year of simulation.

The actual numbers for the 3 year test were:

Field Value
Teachers 360
Classes 720
Students 21,600
Survey Responses 8,722,312
Prompt Responses 87,223,120
Images 8,722,312
Survey Response Data 3,762.328 MB
Prompt Response Data 28,049.797 MB
Image Data 850 GB

Upload

The uploads were performed using a custom script running on a local, but different, machine. MySQL was given 32 GB of RAM while Tomcat was given 28 GB. There were a maximum of 100 concurrent connections between MySQL and Tomcat. Tomcat was configured to have 200 concurrent processing threads and 100 queue threads. nginx was configured to have a 60 second timeout, which meant that it would close the connection to the client, but Tomcat may still continue processing the dead connection. This means that the data was successfully stored, but the client didn't hear about it.

When the script spawned a new thread, it would pick a random user and upload their data.

Concurrent Threads Percentage of Successful Calls Response Time (ms)
500 99.93 2,104
1,000 84.81 4,318
2,000 85.16 10,224

Download

To test download, a flow from the website was taken. The most demanding common flow would be for one to load the homepage, login, click the "Responses" tab, choose a campaign, and view responses. A JMeter script was created and run on a local, but different, machine that made the following calls in this order (yes, there are duplicates):

  • Login Page Request
  • WhoAmI API Request
  • Config API Request
  • Authentication Token API Request
  • Home Page Request
  • WhoAmI API Request
  • Config API Request
  • User Information API Request
  • Campaign API Request
    • List of visible campaigns.
  • Class API Request
    • All users in all visible classes.
  • Campaign API Request
    • List of visible campaigns.
  • Survey Response API Request
    • 1 request for each campaign.
  • Campaign API Request
    • To get the campaign definition.
  • Survey Response API Request
    • To get the list of users that have uploaded survey responses for the selected campaign.
  • Campaign API Request
    • To get the campaign definition.
  • Survey Response API Request
    • To get the survey responses.
  • Survey Response API Request
    • To get the list of users that have uploaded survey responses for the selected campaign.

The most demanding and important API request was for the survey responses. What was quickly noticed was that the first survey response API request was for the list of users that uploaded survey responses. This meant that when the request for the actual survey response data was made, it would perform very well, most likely due to MySQL caching the data. By comparison, the request for the list of users was very slow.

We used a combination of dropping an unnecessary column, running OPTIMIZE TABLE, and rewriting the query to given the following results:

Number of Concurrent Threads Average Response Time 90% Line
100 112 116
200 778 2415
300 651 1557
400 608 1000