Skip to content
Curated list of resources on testing distributed systems
HTML
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
_includes Create footer.html Dec 14, 2017
_layouts Add footer.html Dec 14, 2017
LICENSE Add LICENSE file Jul 27, 2017
README.md Fix PingCap TLA+ spec url Oct 16, 2019
_config.yml Update _config.yml Oct 12, 2018

README.md

List of resources on testing distributed systems curated by Andrey Satarin (@asatarin).

Contents

Overview of testing approaches

Research Papers

Technologies for Testing Distributed Systems by Colin Scott

Colin Scott shares his viewpoint from academia on testing distributed systems, specifically regression testing for correctness and performance bugs.

Testing in a Distributed World by Ines Sombra (RICON 2014)

Great overview of techniques for testing distributed systems. Video is available on Archive Additional materials could be found in this Github repo

Resilience In Complex Adaptive Systems

These materials are not directly related to testing distributed systems, but they greatly contribute to general understanding of such systems.

Jepsen

State of the art approach to testing stateful distributed systems.

Some notable Jepsen analyses:

Jepsen is used by CockroachDB, VoltDB, Cassandra, ScyllaDB and others.

Formal Methods

Companies using TLA+ to verify correctness of algorithms:

Lineage-driven Fault Injection

Netflix adopted lineage-driven fault injection techniques for testing microservices.

Chaos Engineering

Netflix pioneered chaos engineering discipline.

Fuzzing

There are two flavors of fuzzing. First, randomized concurrency testing, where the ordering of messages is fuzzed:

And input fuzzing, where message contents or user inputs are fuzzed:

Game Days

Performance and Benchmarking

See also benchmarking tools.

Test Case Reduction

Misc

Specific approaches in different distributed systems

Amazon Web Services

See also formal methods section.

Netflix

Automated failure injection (see also Lineage-driven Fault Injection):

Random/manual failure injection testing:

See also Chaos Engineering.

Twitter

Datastax (Cassandra)

ScyllaDB

They published series of blog posts on testing ScyllaDB:

VoltDB

Series of post on testing at VoltDB:

Additional resources:

MemSQL

CockroachLabs (CockroachDB)

PingCap (TiDB)

See also formal methods section.

MongoDB

See also formal methods section.

Cloudera

FoundationDB

Wallaroo Labs

There is also talk from Sean T. Allen on testing stream processing system at Wallaroo Labs (ex. Sendence)

Google

Microsoft

See also formal methods section.

Dropbox

Atomix Copycat

Onyx

LinkedIn

Druid.io

Salesforce

SQLite

SQLite is not a distributed system by any stretch of the imagination, but provides good example of comprehensive testing of database implementation.

InfluxDB

Shopify

Confluent (Kafka)

See also formal methods section.

Elastic (Elastic Search)

YugaByte DB

FaunaDB

Hazelcast

Basho (Riak)

CoreOS (etcd)

Tools

Network Simulation

QuickCheck

Benchmarking

Linkbench

YCSB

You can’t perform that action at this time.