Specification Testing framework prototype for Dafny Specifications.
FSVT brings together three different spec-testing techniques; Automatic Sanity Checking(ASC), automatic specification mutation testing, and a Methodology for writing Spec Testing Proofs (STPs). This repo contains the framework for the automated aspects of FSVT, mutation testing and the ASC.
- FSVT-dafny-grpc-server
- openjdk-13-jdk
- gcc-8
- g++-8
- bazel-4.0.0
- dotnet-6
The prototype is built off of Dafny v3.8.1
NOTE: All dependencies will be installed when running
./setup/configureFSVT.sh- detailed below
The recommended environment to run FSVT is using CloudLab:
FSVT is designed to use one root node and n distributed nodes for verification performance. The root node is responsible for generating the specification mutations and verification conditions, the n separate nodes are used to parallelize checking verification conditions. The n additional nodes are not necessary to run FSVT, they are used to help increase the performance and reduce experiment runtime.
In CloudLab, a default profile is available titled "IronSpecConfigNodes" and is configured to use 21, c8220 nodes - this can be modified in the experiment instantiation to change the configuration. A copy of this profile can be found in ./setup/cloudlabProfile.txt
After creating a CloudLab experiment, make note of all the unique numerical names of the nodes from the Node column in the experiment page List View. For example, the IDs of Nodes "clnode008" and "clnode059" are 008 and 059 respectively.
NOTE: FSVT should work on other CloudLab hardware, but if not using nodes from the Clemson cluster, some of the setup scripts will break. This can be fixed by replacing
.clemson.cloudlab.uswith the appropriate extension.
To check basic functionality, it is sufficient to follow the following "Setup" steps with a single CloudLab node. After setup, to test to make sure that everything is installed and configured correctly, run ./runSimpleExperiments.sh to run the mutation framework on a simple Max Spec found in: ./specs/max/maxSpec.dfy and the Automatic Sanity Checker on ./specs/sort/sortMethod.dfy
After executing this command, see the output in a file ./experimentOutput/MaxSpecCorrect/maxSpecCorrect_output.txt and the tail of the output will look something like this (the mutation IDs may be slightly different, and the mutation at the root may be a different equivalent mutations):
...
...
--- END Mutation Classifications --
TOTAL Elapsed Time is 15749 ms
root = (0):(a > b ==> c == a) && (b + 1 > a ==> c == b)
7 :: 8 :: 42 :: 43 :: 44 :: 48 :: 50 :: 52 :: 60 :: 62 :: 63
---
Total Alive Mutations = 1
and the output in ./experimentOutput/sortASC/sortASC_output.txt with a high serverity flag being raised -- FLAG(HIGH) -- : NONE of Ensures depend on Any input parameters
Compile and run FSVT on a local computer without multi-node distributed setup.
Advantages: Local development and testing without CloudLab dependency.
Disadvantages: Cannot simulate real multi-node environments; potential environment inconsistencies with CloudLab.
-
Install Dependencies
Ensure local environment has:git,make,g++,Bazel 4.0.0,.NET SDK,python3,pip, etc. -
Clone Repositories
git clone https://github.com/Mulatijiang/FSVT git clone https://github.com/Mulatijiang/FSVT-dafny-grpc-server
-
Run Setup Scripts
cd FSVT ./setup/node_prep.sh ./setup/install_dotnet_ubuntu_20.04.sh # For Ubuntu 20.04
-
Modify Username in Configuration
UpdateDafnyVerifier.csto use your actual username (e.g.,murat):sed -i "s/username/murat/" FSVT/Source/Dafny/DafnyVerifier.cs -
Build FSVT and Z3
make exe make z3-ubuntu # For Ubuntu -
Build gRPC Server
cd FSVT-dafny-grpc-server bazel-4.0.0 build --cxxopt="-g" --cxxopt="--std=c++17" //src:server
-
Configure gRPC Server Address
EnsureipPorts.txtcontains:echo "127.0.0.1:50051" > ipPorts.txt
-
Start gRPC Server (run in a separate terminal foreground):
cd FSVT-dafny-grpc-server ./bazel-bin/src/server -v -d ../FSVT/Binaries/Dafny -
Test Simple Examples
Run the script for basic tests:./runSimpleExperiments.sh
-
Verify Output
Check results in theexperimentOutputdirectory:cat experimentOutput/MaxSpecCorrect/maxSpecCorrect_output.txt cat experimentOutput/sortASC/sortASC_output.txt
-
Configure SSH Keys Locally
./setup/configureSSHKeys.sh Mulati 153 155
-
Connect to CloudLab Node
ssh Mulati@clnode153.clemson.cloudlab.us
-
Clone FSVT Repository
git clone https://github.com/Mulatijiang/FSVT cd FSVT -
Run Environment Configuration
./setup/configureFSVT.sh Mulati 153 155
-
Start/Stop Dafny Servers
./setup/startDafnyServers.sh Mulati 153 155 # Start servers./setup/checkDafnyPID.sh Mulati 153 155 # Check server status./setup/stopDafnyServers.sh Mulati 153 155 # Stop servers -
Run Simple Experiments
./runSimpleExperiments.sh
-
View Experiment Results
cat experimentOutput/MaxSpecCorrect/maxSpecCorrect_output.txt cat ./experimentOutput/sortASC/sortASC_output.txt
-
Start gRPC Server (as in local testing):
cd FSVT-dafny-grpc-server ./bazel-bin/src/server -v -d ../FSVT/Binaries/Dafny -
Run Mutation Test Command
cd ~/Desktop/graduate/FSVT mkdir -p experimentOutput/MaxSpecCorrect/outputLogs ./Binaries/Dafny /compile:0 /timeLimit:1520 /trace /arith:5 /noCheating:1 /mutationTarget:maxExample.maxSpec /proofName:maxTest.maxT /proofLocation:"$(pwd)/specs/max/lemmaMaxTestCorrect.dfy" /serverIpPortList:ipPorts.txt $(pwd)/specs/max/maxSpec.dfy &> experimentOutput/MaxSpecCorrect/maxSpecCorrect_output.txt
- Run ASC Command (No gRPC Server Needed)
cd ~/Desktop/graduate/FSVT ./cleanExperimentOutput.sh mkdir -p experimentOutput/sortASC ./Binaries/Dafny /compile:0 /timeLimit:1520 /trace /arith:5 /noCheating:1 /mutationTarget:sort.SortSpec /proofName:sort.merge_sort /mutationRootName:sort.SortSpec /proofLocation:"$(pwd)/specs/sort/sortMethod.dfy" /serverIpPortList:ipPorts.txt /checkInputAndOutputSpecified $(pwd)/specs/sort/sortMethod.dfy &> experimentOutput/sortASC/sortASC_output.txt
The FSVT prototype is designed to facilitate testing Dafny specifications. Examples of specifications can be found in the ./specs.
This directory consists of 14 specifications, 6 in-house specifications, and 8 specifications from open-source systems.
The 6 in-house specifications, max, sort, binary search, key-value state machine, token-wre, simpleAuction-wre, and two open source specs, div, and NthHarmonic are included in whole. div and NthHarmonic are transcribed to dafny from "Exploring automatic specification repair in dafny programs". Token-wre and simpleAuction-wre were taken from "Deductive verification of smart contracts with dafny" with small modifications to artificially introduce bugs into the corresponding specs.
In the following table is the list of all specs. For the open-source specs, the link corresponds to the initial commit that was used when testing the spec with FSVT.
| Spec Name | Source |
|---|---|
| Max | ./specs |
| Sort | ./specs |
| Binary Search | ./specs |
| Key-Value Store SM | ./specs |
| token-wre | ./specs |
| simpleAuction-wre | ./specs |
| div | ./specs |
| NthHarmonic | ./specs |
| QBFT | external |
| Distributed Validator | external |
| daisy-nfsd | external |
| TrueSat | external |
| Eth2.0 | external |
| AWS ESDK | external |
To run all automatic experiments (Both Mutation Testing and Automatic Sanity Checking experiments) described in the OSDI paper, run: ./runAllExperiments.sh
It is recommended to run this command in a detached shell session since it will take some time to complete. For example:
screen -S runExperiments
./runAllExperiments.sh
[Ctrl-a][Ctrl-d] #This detaches the screen "runExperiments"
#To reattach
screen -R runExperiments
NOTE: Running
./runExperiments.shwill automatically clone all external specs into the./specdirectory to the originally tested commit hash. Additionally, small patches will be applied to external specs to accommodate for small syntactical changes between Dafny versions.
Output and logs for each experiment will be found in ./experimentOutput split into different sub-directories for each spec and each experiment. The tail of each produced file's name ending in ..._output.txt for each experiment will show the final results.
After ./runExperiments.sh finishes, check ./experimentOutput/finalResults.txt for recreation of Tables 3 and 4 from the paper based on the results of running ./runExperiments.sh
NOTE: The original experimental setup consisted of 1 root node and 20 nodes running the dafny-GRPC servers to parallelize checking verification conditions. Using a different configuration will impact the total time to run each experiment. The time difference is approximately linear based on the number of nodes; for example, running with 21 nodes an experiment taking approximately 160 min will be 3x as long when running the same experiment with 7 nodes. Having too few nodes may cause verification timeouts that result from overloading the various nodes, to run the full experiments it is recommended to have at least 7 nodes. There is a minimum bound for the time to run each experiment, as even if all verification tasks can be parallelized, larger systems like qbft and DVT still will take 1-2 hours for a full end-to-end proof.
NOTE: Some results in appearing in
./experimentOutput/finalResults.txtmay differ slightly from Table 4 in the included pdf.
- The time taken for each experiment will vary based on the number of nodes, the CPU load at each node, and the black-box nature of z3, so there may be some variance in timing.
- The number of alive mutations for the
Divspec will appear higher (8 vs 3). This is due to a limitation in the final step of the mutation framework that is incompatible with this type of spec in classifying alive mutations into a DAG. The result of reducing the 8 alive mutations to 3 is calculated manually with the assistance of Dafny. Details on how to achieve this are included in comments in./specs/div/div.dfy.- The number of generated mutations for QBFT NetworkInit, QBFT AdversaryInit, and DVT AdversaryNext should be 44, 35, and 110 respectively. These are updated results due to an improvement in avoiding producing duplicate mutations and will be adjusted in the camera-ready version. The orignal numbers are from older experiments, and are typos. The difference in mutations is due to duplicate or logically equivalent mutations so this difference does not affect the final number of alive mutations or even the total set of mutations tested, just accounts for duplicated mutations.
- The number of alive mutations for QBFT AdversaryNext should be 4. This is the result of an increased timeout allowance for the mutation classification step - the same information is included in the 4 alive mutations as in the previous 7, so this improvement over the original values in table 4 as it further minimizes the manual effort to inspect more mutations. As a consequence of this same timeout allowance increase, there is expected to be one more alive mutation for DVT AdversaryNext. This alive mutation gives the same hint as the others.
To run either the mutation testing experiments or just the ASC experiments, utilize either ./runMutationTestingExperiments.sh or ./runASCExperiments.sh. If running either of these, make sure to also run ./cleanExperimentOutput.sh beforehand.
The general command to run mutation testing follows this form:
./Binaries/Dafny [Standard Dafny Arguments] [FSVT Arguemnts] [full-path-to-the-file-containing-mutation-target] &> output.txt
For example to test the specification for a Max method (/specs/max/maxSpec.dfy), run the following command from the root node:
./Binaries/Dafny /compile:0 /timeLimit:1520 /trace /arith:5 /noCheating:1 /mutationTarget:maxExample.maxSpec /proofName:maxTest.maxT /proofLocation:"$(pwd)/specs/max/lemmaMaxTestCorrect.dfy" /holeEvalServerIpPortList:ipPorts.txt $(pwd)/specs/max/maxSpec.dfy &> output.txt
After executing this command, see the output in a file named output.txt and the tail of the output will look like this:
...
...
--- END Mutation Classifications --
TOTAL Elapsed Time is 15749 ms
root = (0):(a > b ==> c == a) && (b + 1 > a ==> c == b)
7 :: 8 :: 42 :: 43 :: 44 :: 48 :: 50 :: 52 :: 60 :: 62 :: 63
---
Total Alive Mutations = 1
This output indicates that there is a single alive mutation(with ID 0) to investigate - (a > b ==> c == a) && (b + 1 > a ==> c == b)
Full experiment logs can be found in ./outputLogs which will contain the logs for each mutation and the intermediary files generated for the different mutation testing passes. Any .txt file contains the output from the dafny verifier for each mutation. There are many .dfy files generated to test the different mutation passes (i.e. weaker, vac, full, classification)
The general command to run mutation testing follows this form:
./Binaries/Dafny [Standard Dafny Arguments] [FSVT Arguemnts] /checkInputAndOutputSpecified [full path to the file containing mutation target] &> output.txt
./Binaries/Dafny /compile:0 /timeLimit:1520 /trace /arith:5 /noCheating:1 /holeEval:testWrapper.SATSolver.test /proofName:testWrapper.SATSolver.start /mutationRootName:testWrapper.SATSolver.test /holeEvalRunOnce /holeEvalDepth:2 /proofLocation:"$(pwd)/specs/truesat/truesat_src/solver/solver.dfy" /holeEvalServerIpPortList:ipPortOneNode.txt /checkInputAndOutputSpecified $(pwd)/specs/truesat/truesat_src/solver/solver.dfy &> output.txt
A brief description and some examples are included in ./STPS
STPs are just Dafny proofs, so running STPS requires no additional instrumentation other than just running Dafny.
| Argument | Description |
|---|---|
| /mutationTarget:[name] | Full Name of predicate for mutation target. In Dafny, this includes the module name followed by the predicate name. For example, if the predicate's name is spec located in module Example, the full name is: Example.spec |
| /proofName:[name] | Name of lemma for full end-to-end proof. Also follows Module.Name format |
| /proofLocation:[name] | Full path to .dfy file containing lemma targeted for /proofName |
| /serverIpPortList:[name] | Name of .txt file containing the IP addresses and ports for other node end points. If configured using ./configureFSVT.sh a file named ipPorts.txt will be created with the correct formatting. An example is included: ipPortsExample.txt |
| /inPlaceMutation | This flag indicates that the mutation target is for a method with pre/post conditions rather than a predicate. In this case /proofName is the name of the method to mutate the spec and the proof to check. |
| /mutationRootName:[name] | If the mutation target is different from the high-level safety property of the system, make sure to specify the high-level safety property with this flag. |
| /checkInputAndOutputSpecified | Argument to indicate the use of the ASC. When using this flag, /mutationTarget and /mutationRootName should be set to a predicate in the same file, the contents of the predicate are irrelevant, but these flags are still needed to run. /proofName should be set to the method that is being tested with the ASC |
| /isRequires | Tests for mutations that are stronger than the original, rather than weaker |