Skip to content

STESTS Specification

Ashley486 edited this page Feb 26, 2020 · 2 revisions

Introduction

Launching a blockchain network within a hostile permission-less setting is non-trivial. The elevated risk profile of such an undertaking must be matched by a concomitant level of sophistication in respect of system testing. At the core of such testing will be a test platform whose underlying strategic objective is to progressively constrain network behaviour to that of a successful production setting, i.e. main net.

A variety of test networks will be required during the progression of the system towards main net. These range from recyclable networks operated internally, to disposable & non-disposable networks operated (to a large degree) by the wider community. Such networks will be instantiated partly as a natural response to an SDLC underpinned by CI/CD, and partly as a response to ideas and/or concerns arising from the developer, validator & stakeholder community.

Broadly speaking, the test platform will simulate real-world behaviour by assuming the roles of: node operators (i.e. validators); applications (i.e. DApps); or adversaries (i.e. hackers). The simulation process will be predicated upon the execution of test agents, i.e. scripts, that perform actions ranging from provisioning a node, to deploying smart contracts, to spamming the network …etc. Such test agents can be broadly categorised as those that simulate network behaviour and those that simulate DApp behaviour.

This document aims to lay the foundations for a CasperLabs system test platform. After an assessment of the various considerations & motivations for building such a platform, the document outlines what a network & DApp simulator may look like. Orthogonal requirements such as resourcing, skills, development tooling are touched upon accordingly.

Considerations

  1. Scheduling: a useful distinction may be made between test networks that support ad-hoc manual interventions (e.g. by a development team), and networks designed solely for automated system testing. Whilst the former serve to tighten development feedback loops, the latter (typically) serve to formalise a release process in a CI/CD sense. In either case, system testing is predicated upon a set of test agents designed to simulate various aspects of network infrastructure & load. The test platform will support the manual and automated scheduling of such agents.
  2. Duration: many data points that provide useful feedback to the development team require long-running simulations. Such data points support the analysis of various problems:
  • state bloat
  • consensus decoherence
  • byzantine resilience
  • functional regressions
  • scalability degradation
  • protocol upgrades
  • proof of stake (POS) game theory edge-cases
  • node disk, CPU, memory stress
  1. Snapshots: if pathological network behaviour is observed at time point X, where X is relatively significant, one will typically wish to suspend the network, apply software updates, & restart the network at a point in time prior to X. This feature requires the ability to rollback to periodic snapshots of the (increasingly voluminous) network state.

  2. Instrumentation: each & every interaction of the test platform with a test network will be logged in a structured fashion. Such logs should support the ability to replay an interaction sequence. This replay feature plays well in tandem with the aforementioned snapshot feature.

  3. Resourcing: internal test networks will be provisioned within a cloud setting, e.g. AWS. The dynamic cost model of such a setting introduces resource constraints in respect of the number of nodes, network traffic ...etc. Whilst the test platform must respect imposed budgetary limits, management teams must be prepared to increase budgetary allocations as a main net release approaches.

  4. Metadata: whilst effective instrumentation requires structured logging, it also requires structured metadata. Each network, and each node within a network, will be associated with metadata. Such metadata is primarily static in nature (e.g. network name) but also real-time (e.g. current network state). It serves several purposes the most important of which is the correlation of simulator interactions with observed metrics & event logs.

  5. Metrics Collation: in order to analyse & monitor network behavioural dynamics, one is obliged to collate various sets of node metrics in a timely & dynamic fashion. Such metrics may be collated from the machine layer (i.e. cpu, disk, network, memory), from the application layer (i.e. processes associated with the node software itself), from the network layer (i.e. packet flows) … etc. The logistics of collation will follow both pull and push models, and involve a pipeline of formatting and aggregation.

  6. Behavioural Analysis: the test platform must streamline manual & automated anomaly detection. Manual detection is built upon logs, dashboards & visualisations. Automated detection applies pre-defined & configurable tolerances to collated metrics.

  7. Notifications: various parties will require timely notification of both normal & pathological network/node behaviour. Such notification channels should be configurable and include email, sms, slack ... etc. The test platform will dispatch notifications in response to various network events.

  8. Tooling: the test platform eco-system will be built upon the back of various tooling. Whilst most tooling will be ‘out of the box’ (e.g. Grafana), there may be a requirement for custom tooling. The development of custom tooling obviously depends upon cost/benefit considerations.

  9. Emergency Response: applying the shift-left philosophy to DevOps entails simulating an array of ‘emergency’ network level incidents. This serves to explore, define & improve response procedures & protocols. Such incidents may include: a sustained DDOS attack; a hacked validator; a ‘run’ on the underlying network utility token … etc.

Motivation

Hitherto CasperLabs system testing has targeted: discrete functions (i.e. unit tests), client driven interactions (i.e. functional tests), & the dynamics of small ad-hoc networks (i.e. integration tests).

As mentioned above, various classes of problems only manifest when running simulations over extended periods of time. In response the SRE team has begun to spin up long running test (LRT) environments. These environments have proved to be an extremely useful feedback mechanism. They have played an important role in the evolution of the software. However (by design) they are somewhat limited in the sense that the network composition is static and the network load is simplistic & non-externalised.

The mechanics & basic infrastructure of the existing LRT networks can act as a stepping stone towards establishing the foundations necessary to scale system testing. This means increasing the dynamicity of the test platform’s network simulator and the sophistication of the test platform’s DApp simulator.

Network Simulation

Introduction

Simply stated a network is a set of machines, i.e. nodes, upon which operators, i.e, validators, are running CasperLabs software. All nodes participate in the peer-to-peer network & maintain blockchain state, however only so-called validators may author new messages & participate in consensus.

To become a validator node, one must bond risk capital, such capital is denominated in the network’s utility token (i.e. CLX). A validator is rewarded for exhibiting behaviour that conforms with the network's consensus protocol, non-conformance results in a penalty.

This reward / penalty scheme forms one pillar of the crypto-economics that ultimately secures the network. Other pillars are governance protocols & ambient market conditions (i.e. liquidity). Game theory inspired simulations are used to probe the resilience of the network protocol’s crypto-economic assumptions.

Network’s are heterogeneous in nature:

  • nodes may be running different versions of the underlying protocol;

  • nodes will be running on machines provisioned at varying levels of sophistication;

  • nodes will be running on geographically dispersed infrastructure;

  • nodes may be malicious - i.e. byzantine nodes;

Network composition is dynamic, i.e. node participation responds to events such as:

  • standard DevOps maintenance tasks;
  • validator bonding account recycling;
  • market-conditions (e.g. a centralised exchange black-swan event);
  • networks split due to unforeseen bugs;
  • protocol updates;
  • governance decision enforcement;

In view of the above it will be observed that the test platform will by necessity be predicated upon a set of low-level primitives from which ‘realistic’ simulations may be composed. By way of illustration one may posit 2 primitives:

  • NODE-PROVISION-ADD - provisions a new network node;
  • NODE-BONDING-APPLY - supports the entry of a validator to the network

Leveraging simply these 2 primitives one can observe the outlines of a simulator sophisticated enough to test a heterogeneous, dynamic & game-theoretical network landscape.

Simulation Primitives

One metric of the success of a blockchain network is it's ability to exhibit liveness & finality over the course of real-world events such as protocol upgrades; a changing node-set …etc. To simulate such real-world behaviour at the network level, the test platform will provide a set of primitives from which ‘realistic’ simulations may be modelled. Such primitives are test agents, i.e. scripts that are executed in response to signals dispatched either from manual or automated interactions with the test platform. In respect of the underlying tooling most of these agents will either invoke a a cloud service provider API via Terraform, or invoke the CasperLabs client software via an Ansible role. See section on tooling for further information.

Node Level Primitives

  • NODE-BOND-INCREMENT
    • deposit CLX tokens in network’s POS contract
  • NODE-BOND-DECREMENT
    • withdraw CLX tokens from network’s POS contract
  • NODE-PROVISION-ADD
    • extend a network’s active node-set
  • NODE-PROVISION-SUSPEND
    • remove a node from a network’s active node-set
  • NODE-PROVISION-REMOVE
    • remove a node from a network’s node-set
  • NODE-PROVISION-DESTROY
    • totally de-provision remove a node from a network’s node-set
  • NODE-SOFTWARE-INSTALL
    • install a specific version of the node software
    • results in either a fresh install, an upgrade or a downgrade
  • NODE-SOFTWARE-CONFIGURE
    • configure installed software
  • NODE-THROTTLE-CPU
    • reallocate CPU to underlying VM
  • NODE-THROTTLE-DISK
    • reallocate DISK to underlying VM
  • NODE-THROTTLE-MEMORY
    • reallocate RAM to underlying VM
  • NODE-THROTTLE-BANDWIDTH-IN
    • reallocate inward bound bandwidth
  • NODE-THROTTLE-BANDWIDTH-OUT
    • reallocate outward bound bandwidth

Network Level Primitives

  • NETWORK-INSTANTIATE
    • spin-up a new network (based upon a configuration file)
  • NETWORK-SUSPEND
    • suspend an entire network
  • NETWORK-REACTIVATE
    • reactivate a suspended network
  • NETWORK-DESTROY
    • destroy all nodes within a network
  • NETWORK-BOND-REBALANCE
    • rebalance the node-set’s relative bonding weights

Network Properties

Status Network status is by definition real-time in nature and predicated upon a global view of the current state of each node, thus derivation depends upon aggregation. Note that the current network status constrains the set of possible subsequent states - this property can perhaps be leveraged to detect anomalies.

  • NULL
    • Network is either uninstantiated or has previously been destroyed
    • Subsequent states → GENESIS
  • GENESIS
    • Count of genesis nodes in state INITIALIZING = 0
    • Subsequent states → INITIALIZING | HEALTHY | DOWN
  • INITIALIZING
    • Count of nodes in state INITIALIZING = 0
    • Subsequent states → HEALTHY | DOWN
  • HEALTHY
    • Network is up & operating within expected parameters
    • Subsequent states → DISTRESSED | DE-INITIALIZING
  • DISTRESSED
    • Network is up but operating outside of expected parameters
    • Subsequent states → HEALTHY | DE-INITIALIZING
  • DOWN
    • Count of nodes in state HEALTHY | DISTRESSED = 0
    • Cause may be fault or DevOps
    • Subsequent states → INITIALIZING | NULL
  • DE-INITIALIZING
    • Network is either failing or being taken down
    • Subsequent states → DOWN
  • NULL
    • Network is either uninstantiated or has previously been destroyed

Operators Networks will be operated either by Casper Labs, invited 3rd parties or any 3rd party. A distinction is made between invited and any 3rd parties as typically early-adopter validators will play a role in Proof-Of-Concept networks.

  • Local
    • Nodes operated by developer team and/or testers
  • Internal
    • Nodes operated solely by clabs
  • Hybrid
    • Nodes operated by clabs + 3rd parties
  • External
    • Nodes operated solely by 3rd parties

Lifetimes

The lifetime of a network will vary according to resource constraints, operators & strategic purpose.

  • Singleton
    • network is considered as disposable & time-boxed
    • lifetime = finite
  • Repeat
    • network is recycled once it's immediate purpose is served
    • lifetime = repeatable
  • Semi-Persistent
    • network protocol upgrades ensure it's longevity beyond immediate purposes
    • lifetime = indefinite
  • Persistent
    • part of the fabric of internet infrastructure
    • lifetime = indefinite

Typologies The composition of a network's node-set may be referred to as it's topology. One may thus categorise networks in terms of node-set size and/or dynamicity, byzantine setting, node deployment types …etc.

  • TOP-00
    • node-set
      • min 1, max 5
      • static, homogeneous, benign
    • remarks
      • is hackable by developer team(s)
      • can perform sanity testing
  • TOP-01
    • node-set
      • min 5, max 5
      • static, homogeneous, benign
    • remarks
      • is hackable by developer team(s)
      • can perform sanity testing
      • can isolate DApp behaviour from network behaviour
  • TOP-02
    • node-set
      • min 3, max 9
      • dynamic, homogeneous, benign
    • remarks
    • can begin to explore game theory scenarios
    • can simulate wider array of network behaviours
    • can simulate wider array of DApp behaviours
  • TOP-03
    • node-set
      • min 3, max 25
      • dynamic, heterogeneous, hostile
    • remarks
      • can perform full eco-system testing
      • can simulate incident response analysis
  • TOP-04
    • node-set
      • min 50, max 1000
      • dynamic, heterogeneous, hostile
    • remarks
      • live setting
      • requires incident response protocols

Types

  • DEV-LOC
    • local | repeat | TOP-00
    • remarks
      • Low-level anarchic network
      • Permits development team(s) to hack as they see fit
  • INT-DEV
    • internal | repeat | TOP-01
    • remarks
      • Low-level anarchic network
      • Permits development team(s) to hack as they see fit
      • Emergency response = N/A
  • INT-SYS
    • internal | repeat | TOP-02
    • remarks
      • Formal test network
      • Automated & sophisticated testing
      • Full array of network behaviours
      • Full array of DApp behaviours
      • Emergency response = N/A
  • INT-STG
    • hybrid | repeat | TOP-03
    • remarks
      • Public release staging network
      • Automated testing to be plus structured/unstructured manual testing
      • Emergency response = < 48 hrs
  • EXT-POC
    • hybrid | singleton | TOP-03
    • remarks
      • Proof of concept
      • Designed to build community confidence
      • Permits testing of new features, wider eco-system, resilience to stress … etc.
      • Emergency response = < 24 hrs
  • EXT-CC
    • hybrid | semi-persistent | TOP-03
    • remarks
      • Candidate Chain
      • Emergency response = ASAP
  • EXT-MAIN
    • external | persistent | TOP-04
    • remarks
      • Main Chain
      • Emergency response = immediate

DApp Simulation

Introduction

Blockchain networks do not exist within a vacuum - they serve the needs of so-called decentralised applications (DApps). Such DApps typically target market verticals such as finance, gaming, identity, supply chain …etc. In terms of user adoption strategy, some DApps target mass adoption whilst others focus upon a relatively small but sophisticated user base.

Once the ideation phase is complete, the lifecycle of a DApp proper begins with the development of a suite of so-called smart contracts. DApp contract suites exhibit varying degrees of magnitude, sophistication & complexity. They may range from super-simple single contracts to an eco-system of complex contracts.

Such contracts require deep testing & external auditing prior to deployment within a production setting. Arguably the simplicity & sophistication of the associated tooling chain is more important than the feature set of the underlying blockchain to which they are deployed.

The CasperLabs test platform must be able to simulate the behaviour profiles of an array of DApps. Long running orchestration contracts common to supply chain systems are very different in nature to so-called ERC-20 compliant token auctions.

The test platform leverages the notion of workload generators to simulate DApp behaviours. As usual we begin by establishing a set of primitive low-level generators that attempt to probe specific aspects of the blockchain’s feature set. We then establish a set of custom workload generators that emulate examples of successful smart contract systems observable in the real world.

Standard Workload Generators

WG-001 :: +ve :: Ambient Noise

Generator places reasonable load in a monotonic fashion upon each node within a node-set. Such load should be non-conflictual from a consensus perspective and span multiple blocks. Thus finality should be observable in a simplified setting.

WG-002 :: -ve :: Conflictor

Generator dispatches deploys designed to cause write transforms with differing values under the same key. Nodes should handle this scenario accordingly.

WG-003 :: -ve :: Disconnector

Generator attempts to partition a node from the network without un-bonding. Theoretically this should result in a bond slashing event, however until slashing is specified & implemented, after a 5 minute delay the node will be simply be reconnected to the network.

WG-004 :: +ve :: Stake Changer

Generator simply modifies a validator's bonded stake by ±1000.

WG-005 :: -ve :: Equivocator

Generator simulates an equivocating node thus forcing network to prove correctness. This generator requires the simulation of a byzantine node, possibly via an interceptor.

WG-006 :: -ve :: Max Bytes

Generator deploys a contract with the permitted maximum number of bytes. A node should tolerate this but it may induce a (temporary) lag effect on network performance.

WG-007 :: -ve :: Deep GraphQL query

Node’s (optionally) support GraphQL endpoints that can be stressed in various means. For example one can ask for the maximum supported depth for a GraphQL query - this may cause the node to crash.

WG-008 :: -ve :: Obsolete Software Version

Generator bonds a validator node running an older version of the node software. The network should tolerate this however currently it de-stabilises the network.

WG-009 :: -ve :: Obsolete Protocol Version

Generator bonds a validator node running an older major version of the protocol. Will require simulation of protocol upgrades.

Custom Workload Generators

WG-100 :: ERC-20 Token Sale

Notes

  • Tokenisation of investment capital allocation, i.e. a token sale, is a strong use case for smart contract systems. This assertion only holds if the underlying network utility token is deemed sufficiently liquid.
  • Whilst various forms of token sale exist, the ERC-20 standard has emerged as a standardised smart contract interface in support of token sales. A more recent version of this standard is known as ERC-777.
  • Such token sales require a user to submit a bid to a time-sensitive ERC-20 compliant smart contract. Such bids are typically denominated in the host network’s utility token, e.g. CLX, and result in that amount of CLX being locked in the token sale contract.
  • Once the token sale contract active time-period elapses the token smart contract distributes tokens according to the sale dynamics (e.g. fixed price).

Pre-Conditions

  • Large set of wallets pre-funded with CLX
  • Deployed standards compliant token sale contract(s) for the ABC token

Execution Sequence

  • Generator submits in-time bids to token sale smart contract
  • Generator submits out-of-time bids to token sale smart contract

Post-Conditions

  • CLX & ABC balances in the wallets of auction participants are queried & compared to expected totals

WG-101 :: Tic Tac Toe

Notes

  • Gaming is a strong use case for smart contract systems as in-game assets lend themselves well to tokenisation.
  • Many games can be viewed are instances of state-machines, the popular tic-tac-toe game is one such example. Each game thus has a participant set and a lifetime of it's own.
  • Game DApp operators deploy a set of smart contracts to a blockchain network, and then post transactions to the contracts from the game user-interface. One important requirement is the ability for the user-interface to hook into event streams emanating from the contracts.
  • Games may deploy their own token, e.g. ABC, that the end user is obliged to purchase prior to playing the game.

Pre-Conditions

  • Large set of wallets pre-funded with CLX
  • Deployed tic-tac-toe game contract(s)

Execution Sequence

  • Test platform performs a sequence of parameterised simulations:
    • game count = 1, player count = 2
    • game count = 10, player count = 5
    • game count = 100, player count = 20
    • game count = 1000, player count = 100
    • game count = 10000, player count = 1000

Post-Conditions

  • Network has exhibited correct behaviour
  • The final result of each game can be queried and hence verified

Scenario Driven Tests

TODO: write introduction & bring together rest of Akosh’s Break Me Gently notes.

SCENARIO-001: Can the node be restarted ?

  • Pre-condition:
    • a node-set >= 5
    • a sizeable block DAG
  • Actions:
    • NODE-PROVISION-SHUT-DOWN
    • NODE-PROVISION-RESTART
  • Post-condition:
    • pre-state transitions are detected;
    • deploys are accepted;
    • blocks are created;
    • created blocks are propogated