Skip to content

A load tester focused on lifecycle issues and long-running tests

License

Notifications You must be signed in to change notification settings

BuoyantIO/slow_cooker

Repository files navigation

CircleCI

slow_cooker

A load tester for tenderizing your servers.

Most load testers work by sending as much traffic as possible to a backend. We wanted a different approach, we wanted to be able to test a service with a predictable load and concurrency level for a long period of time. Instead of getting a report at the end, we wanted periodic reports of qps and latency.

Running it

go build; ./slow_cooker <url>

or:

go run main.go <url>

Testing

go test ./...

Flags

Flag Default Description
-qps 1 QPS to send to backends per request thread.
-concurrency 1 Number of goroutines to run, each at the specified QPS level. Measure total QPS as qps * concurrency.
-iterations 0 Number of iterations for the experiment. Exits gracefully after iterations * interval (default 0, meaning infinite).
-compress <unset> If set, ask for compressed responses.
-data <none> Include the specified body data in requests. If the data starts with a '@' the remaining value will be treated as a file path to read the body data from, or if the data value is '@-', the body data will be read from stdin.
-hashSampleRate 0.0 Sampe Rate for checking request body's hash. Interval in the range of [0.0, 1.0]
-hashValue <none> fnv-1a hash value to check the request body against
-header <none> Adds additional headers to each request. Can be specified multiple times. Format is key: value.
-host <none> Overrides the default host header value that's set on each request.
-interval 10s How often to report stats to stdout.
-latencyUnit ms latency units [ms
-method GET Determines which HTTP method to use when making the request.
-metric-addr <none> Address to use when serving the Prometheus /metrics endpoint. No metrics are served if unset. Format is host:port or :port.
-noLatencySummary <unset> If set, don't print the latency histogram report at the end.
-noreuse <unset> If set, do not reuse connections. Default is to reuse connections.
-reportLatenciesCSV <none> Filename to write CSV latency values. Format of CSV is millisecond buckets with number of requests in each bucket.
-timeout 10s Individual request timeout.
-totalRequests <none> Exit after sending this many requests.
-help <unset> If set, print all available flags and exit.

Using a URL file

If the <url> argument begins with @ the argument will be treated as a file path to read a newline separated list of URLs to send requests to, or if the value is @-, the url list will be read from stdin.

Example url file contents:

http://localhost:4140/foo
http://localhost:4140/bar
http://localhost:4140/baz

Reading url list from a file:

$ slow_cooker -qps 100 @urllist

Using a hypothetical url generation script to pipe url list to slow cooker via stdin:

$ url_generator | slow_cooker -qps 100 @-

The urls in the list file will be processed sequentially.

Using multiple Host headers

If you want to send multiple Host headers to a backend, pass a comma separated list to the host flag. Each request will be selected randomly from the list.

For more complex distributions, you can run multiple slow_cooker processes:

$ slow_cooker -host web_a,web_b -qps 200 http://localhost:4140

$ slow_cooker -host web_b -qps 100 http://localhost:4140

This example will send 300 qps total to http://localhost:4140/ with 100 qps sent with Host: web_a and 200 qps sent with Host: web_b

TLS use

Pass in an https url and it'll use TLS automatically.

Warning We do not verify the certificate, we use InsecureSkipVerify: true

Example usage

$ ./slow_cooker -qps 100 -concurrency 10 http://slow_server

2016-05-16T20:45:05Z 0   7102/0/0 10000 71% 10s 0 [ 12  26  37  91 ] 91
2016-05-16T20:45:16Z 1   7120/0/0 10000 71% 10s 1 [ 11  27  37  53 ] 53
2016-05-16T20:45:26Z 2   7158/0/0 10000 71% 10s 0 [ 11  27  37  74 ] 74
2016-05-16T20:45:36Z 3   7169/0/0 10000 71% 10s 1 [ 11  27  36  52 ] 52
2016-05-16T20:45:46Z 4   7273/0/0 10000 72% 10s 0 [ 11  27  36  58 ] 58
2016-05-16T20:45:56Z 5   7087/0/0 10000 70% 10s 1 [ 11  28  37  61 ] 61
2016-05-16T20:46:07Z 6   7231/0/0 10000 72% 10s 0 [ 11  26  35  71 ] 71
2016-05-16T20:46:17Z 7   7257/0/0 10000 72% 10s 0 [ 11  27  36  57 ] 57
2016-05-16T20:46:27Z 8   7205/0/0 10000 72% 10s 0 [ 11  27  36  64 ] 64
2016-05-16T20:46:37Z 9   7256/0/0 10000 72% 10s 0 [ 11  27  36  62 ] 62
2016-05-16T20:46:47Z 10  7164/0/0 10000 71% 10s 0 [ 11  27  38  74 ] 74
2016-05-16T20:46:58Z 11  7232/0/0 10000 72% 10s 0 [ 11  26  35  63 ] 63

In this example, we see that the server is too slow to keep up with our requested load. that slowness is noted via the throughput percentage.

Docker usage

Run

docker run -it buoyantio/slow_cooker -qps 100 -concurrency 10 http://$(docker-machine ip default):4140

Build your own

docker build -t buoyantio/slow_cooker -f Dockerfile .

Log format

We use vertical alignment in the output to help find anomalies and spot slowdowns. If you're running multi-hour tests, bumping up the reporting interval to 60 seconds (60s or 1m) is recommended.

$timestamp $good/$bad/$failed $trafficGoal $percentGoal $interval $min [$p50 $p95 $p99 $p999] $max $bhash

bad means a status code in the 500 range. failed means a connection failure. percentGoal is calculated as the total number of good and bad requests as a percentage of trafficGoal.

bhash is the number of failed hashes of body content. A value greater than 0 indicates a real problem.

Tips and tricks

keep a logfile

Use tee to keep a logfile of slow_cooker results and cut to find bad or failed requests.

./slow_cooker_linux_amd64 -qps 5 -concurrency 20 -interval 10s http://localhost:4140 | tee slow_cooker.log

use cut to look at specific fields from your tee'd logfile

cat slow_cook.log |cut -d ' ' -f 3 | cut -d '/' -f 2 |sort -rn |uniq -c

will show all bad (status code >= 500) requests.

cat slow_cook.log |cut -d ' ' -f 3 | cut -d '/' -f 3 |sort -rn |uniq -c

will show all failed (connection refused, dropped, etc) requests.

dig into the full latency report

With the -reportLatenciesCSV flag, you can thoroughly inspect your service's latency instead of relying on pre-computed statistical summaries. We chose CSV to allow for easy integration with statistical environments like R and standard spreadsheet tools like Excel.

use the latency CSV output to see system performance changes

Use -totalRequests and -reportLatenciesCSV to see how your system latency grows as a function of traffic levels.

use -concurrency to improve throughput

If you're not hitting the throughput numbers you expect, try increasing -concurrency so your requests are issued over more goroutines. Each goroutine issues requests serially, waiting for a response before issuing the next request.

If you have scripts that process slow_cooker logs, feel free to add them to this project!