Predicting SMT Solver Performance for Software Verification

Andrew Healy, Rosemary Monahan, James, F. Power

Principles of Programming Research Group, Dept. of Computer Science, Maynooth University, Ireland

F-IDE 2016 support data

This repository contains data measuring 8 SMT solvers' performance on the Why3 examples dataset. We record the result returned by Alt-Ergo (versions 0.95.2 and 1.01), CVC3, CVC4, veriT, Yices, and Z3 (versions 4.3.2 and 4.4.1). We also measure the time taken by the solver to return the result.

Python libraries we use: Pandas, Numpy, Sci-kit Learn, Matplotlib. All Python files can be run on the command line in the usual way: eg python <filename.py>

`paper/`

Folder containing latex source files and images for the paper itself

`data/`

This folder contains a subfolder for each file in the examples repository. Each folder contains:

<name>.mlw the WhyML file sent to Why3
<name>.json a JSON dictionary containing timings and results for various timeout values
stats.json the syntacic features statically extracted from <name>.mlw (used as independent variables for prediction)
split/ folder containing the resultant goals after applying the Why3 transformation split_goal_wp to each file .mlw. file. Created by split_goal.py

`benchexec/`

Python interface to the Benchexec measurement framework. See LICENCE_benchexec.txt for licence.

`common.py`

A collection of short, commonly-used constants and functions used by many of the other Python scripts.

`collect_data_fig1_table1.py`

Python script to collect data from the JSON files. Results printed for Table 1 and saved to fig1_data.csv to be read in by make_fig1.py

`make_fig1.py`

Make the first figure (stacked barcharts - 60 second timeout). Uses fig1_data.csv. Renders barcharts.pdf to paper folder

`create_stats_df.py`

Collect data from the JSON files and combine it with the syntax metrics. Save the data as whygoal_stats.csv

`make_fig3.py`

Use the entire dataset to plot the cumulative time taken for Valid/Invalid/Unknown answers to be returned. Renders line_graph.pdf to paper folder and prints values for the 99th percentile.

`whygoal_test.csv`, `whygoal_valid_test.csv`

Disjoint partitions of whygoal_stats.csv for testing (25%) and training/validation (75%) respectively

`compare_regressors.py`

Perform KFold cross-validation on the training set to compare a number of regressor implementations from Sci-kit Learn. Renders compare_regressors.pdf which is the full version of Table 2 in the paper.

`permute_rankings.py`

Find values for the 'Random' strategy (either train or test) by averaging values for all possible rankings. Is slow because it has 8! rankings to get through.

`output_eval_files.py`

Outputs several data files used in the Evaluation section:

forest.json: a JSON representation of the trained random forest - suitable for use when compiling the OCaml binary
data_for_second_barchart.csv: results for each prover and strategy for the test goals
data_for_second_linegraph.csv: how long each strategy took to return a Valid/Invalid answer for the test set
feature_importances.txt: These relevance metrics are computed by Sci-kit Learn's Random Forest implementation: they describe the proportion of decisions based on each input variable across all decision trees in Where4's random forest.

`barchart2.py`

Renders barcharts2.pdf to the paper folder. Similar to make_fig1.py but reads from data_for_second_barchart.csv and includes theoretical strategies and Where4 results (result of choosing the first solver in each ranking).

`plot_second_linegraph.py`

The cumulative time taken for the three theoretical strategies and Where to find an answer to the goals in the test dataset. Uses data stored in data_for_second_linegraph.csv - particularly important for the time-consuming 'Random' calculations. Renders line_graph_eval_provers.pdf to the paper folder. Also prints the average times File/Theory/Goal times used in Table 3.

`thresholds.py`

Parameterise Where4's performance by using a threshold, reading data from data_for_second_linegraph.csv. Renders thresholds.pdf to paper folder. These plots show the effect of the threshold on the time taken for a response (top) and number of goals which can be proved (bottom).

`test_time.py`

An example of how the Benchexec framework is used to measure the CPU time consumed by each SMT solver.

`split_goal.py`

An application of the Why3 transformation split_goal_wp applied to every .mlw in order to count the number of simplified goals which could be created by this tactic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting SMT Solver Performance for Software Verification

Andrew Healy, Rosemary Monahan, James, F. Power

Principles of Programming Research Group, Dept. of Computer Science, Maynooth University, Ireland

F-IDE 2016 support data

`paper/`

`data/`

`benchexec/`

`common.py`

`collect_data_fig1_table1.py`

`make_fig1.py`

`create_stats_df.py`

`make_fig3.py`

`whygoal_test.csv`, `whygoal_valid_test.csv`

`compare_regressors.py`

`permute_rankings.py`

`output_eval_files.py`

`barchart2.py`

`plot_second_linegraph.py`

`thresholds.py`

`test_time.py`

`split_goal.py`

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
benchexec		benchexec
data		data
paper		paper
.gitignore		.gitignore
LICENCE_benchexec.txt		LICENCE_benchexec.txt
barchart2.py		barchart2.py
collect_data_fig1_table1.py		collect_data_fig1_table1.py
common.py		common.py
compare_regressors.pdf		compare_regressors.pdf
compare_regressors.py		compare_regressors.py
compare_regressors.tex		compare_regressors.tex
create_stats_df.py		create_stats_df.py
data_for_second_barchart.csv		data_for_second_barchart.csv
data_for_second_linegraph.csv		data_for_second_linegraph.csv
feature_importances.txt		feature_importances.txt
fig1_data.csv		fig1_data.csv
forest.json		forest.json
make_fig1.py		make_fig1.py
make_fig3.py		make_fig3.py
output_eval_files.py		output_eval_files.py
permute_rankings.py		permute_rankings.py
plot_second_linegraph.py		plot_second_linegraph.py
random_time.py		random_time.py
readme.md		readme.md
split_goals.py		split_goals.py
test_time.py		test_time.py
thresholds.py		thresholds.py
whygoal_stats.csv		whygoal_stats.csv
whygoal_test.csv		whygoal_test.csv
whygoal_valid_test.csv		whygoal_valid_test.csv

License

ahealy19/F-IDE-2016

Folders and files

Latest commit

History

Repository files navigation

Predicting SMT Solver Performance for Software Verification

Andrew Healy, Rosemary Monahan, James, F. Power

Principles of Programming Research Group, Dept. of Computer Science, Maynooth University, Ireland

F-IDE 2016 support data

paper/

data/

benchexec/

common.py

collect_data_fig1_table1.py

make_fig1.py

create_stats_df.py

make_fig3.py

whygoal_test.csv, whygoal_valid_test.csv

compare_regressors.py

permute_rankings.py

output_eval_files.py

barchart2.py

plot_second_linegraph.py

thresholds.py

test_time.py

split_goal.py

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`paper/`

`data/`

`benchexec/`

`common.py`

`collect_data_fig1_table1.py`

`make_fig1.py`

`create_stats_df.py`

`make_fig3.py`

`whygoal_test.csv`, `whygoal_valid_test.csv`

`compare_regressors.py`

`permute_rankings.py`

`output_eval_files.py`

`barchart2.py`

`plot_second_linegraph.py`

`thresholds.py`

`test_time.py`

`split_goal.py`

Packages