Skip to content

Load Testing Plan

Pierre Bastianelli edited this page Jun 3, 2021 · 3 revisions

This document aims at describing the load testing procedure ran on June 1st

Objectives

  • Verify that the system can handle the high load when approaching the deadline of the application submission
  • Collect performance data on queries and mutations
  • Assess which operations can be optimized

Tools

  • k6, an open-source load-testing toolkit
  • nodejs to orchestrate k6 calls
  • an ad-hoc helm chart to deploy load-testing data and run a load-testing job on the OpenShift platform

Testing plan

We split the load testing into 2 plans:

Queries: load test

  • Admin queries and Reporter queries were split up in 2 separate load testing scenarios, that were ran sequentially.
  • each VU (Virtual User) cycles through all the queries as fast as it can. k6 allows us to specify how many VUs
    are used for how long, and interpolates between the points that we give.
// k6 scenario that was used for the queries load test
export default {
  stages: [
    {duration: '2m', target: 100},
    {duration: '2m', target: 100},
    {duration: '1m', target: 200},
    {duration: '1m', target: 200},
    {duration: '1m', target: 100},
    {duration: '2m', target: 100},
    {duration: '2m', target: 0}
  ]
};

Mutations: spike test

We identified the mutations that are the heaviest on the system:

  • createApplicationMutation that will be called once per facility
  • updateFormResultMutation that is called a lot while applicants fill out the form

Since only one application is allowed per facility in the system, this required to setup a large amount
of facilities ahead of the testing.

// We have 1000 facilities.
// This scenario will start a spike of 100 VUs,
// each creating 10 applications and updating a form result
export default {
  scenarios: {
    mutations_spike: {
      vus: 100,
      iterations: 10,
      executor: 'per-vu-iterations'
    }
  }
};

Artifacts

All artifacts used for the load testing can be found here:

Data Collected

We collected different metrics from the server and the database:

  • pg stat data
    psql -c "\copy (select * from pg_stat_statements) to stdout csv header" > stats.csv
  • pg logs from patroni
    data from /home/postgres/pgdata/pgroot/pg_log/postgresql-*.csv
  • k6 log output, as json, stored on an ad-hoc PVC

Data Monitored

  • Health on an ad-hoc Sysdig dashboard
  • Pod health on the OCP 4 console
  • Infrastructure health, monitored by Platform Services

Analysis

...coming up...