Predicting SMT Solver Performance for Software Verification
Andrew Healy, Rosemary Monahan, James, F. Power
Principles of Programming Research Group, Dept. of Computer Science, Maynooth University, Ireland
F-IDE 2016 support data
This repository contains data measuring 8 SMT solvers' performance on the Why3 examples dataset. We record the result returned by Alt-Ergo (versions 0.95.2 and 1.01), CVC3, CVC4, veriT, Yices, and Z3 (versions 4.3.2 and 4.4.1). We also measure the time taken by the solver to return the result.
Folder containing latex source files and images for the paper itself
This folder contains a subfolder for each file in the examples repository. Each folder contains:
<name>.mlwthe WhyML file sent to Why3
<name>.jsona JSON dictionary containing timings and results for various timeout values
stats.jsonthe syntacic features statically extracted from
<name>.mlw(used as independent variables for prediction)
split/folder containing the resultant goals after applying the Why3 transformation
split_goal_wpto each file
.mlw. file. Created by
Python interface to the Benchexec measurement framework. See
LICENCE_benchexec.txt for licence.
A collection of short, commonly-used constants and functions used by many of the other Python scripts.
Python script to collect data from the JSON files. Results printed for Table 1 and saved to
fig1_data.csv to be read in by
Make the first figure (stacked barcharts - 60 second timeout). Uses
Collect data from the JSON files and combine it with the syntax metrics. Save the data as
Use the entire dataset to plot the cumulative time taken for Valid/Invalid/Unknown answers to be returned. Renders
paper folder and prints values for the 99th percentile.
Disjoint partitions of
whygoal_stats.csv for testing (25%) and training/validation (75%) respectively
Perform KFold cross-validation on the training set to compare a number of regressor implementations from Sci-kit Learn. Renders
compare_regressors.pdf which is the full version of Table 2 in the paper.
Find values for the 'Random' strategy (either train or test) by averaging values for all possible rankings. Is slow because it has 8! rankings to get through.
Outputs several data files used in the Evaluation section:
forest.json: a JSON representation of the trained random forest - suitable for use when compiling the OCaml binary
data_for_second_barchart.csv: results for each prover and strategy for the test goals
data_for_second_linegraph.csv: how long each strategy took to return a Valid/Invalid answer for the test set
feature_importances.txt: These relevance metrics are computed by Sci-kit Learn's Random Forest implementation: they describe the proportion of decisions based on each input variable across all decision trees in Where4's random forest.
barcharts2.pdf to the
paper folder. Similar to
make_fig1.py but reads from
data_for_second_barchart.csv and includes theoretical strategies and Where4 results (result of choosing the first solver in each ranking).
The cumulative time taken for the three theoretical strategies and Where
to find an answer to the goals in the test dataset. Uses data stored in
data_for_second_linegraph.csv - particularly important for the time-consuming 'Random' calculations. Renders
line_graph_eval_provers.pdf to the
paper folder. Also prints the average times File/Theory/Goal times used in Table 3.
Parameterise Where4's performance by using a threshold, reading data from
These plots show the effect of the threshold on the time taken for a response (top) and number of goals which can be proved (bottom).
An example of how the Benchexec framework is used to measure the CPU time consumed by each SMT solver.
An application of the Why3 transformation
split_goal_wp applied to every
.mlw in order to count the number of simplified goals which could be created by this tactic.