This repository presents a new symbolic execution tool based on machine learning and the experiment data collected during the evaluation of the tool.
During program traversing, symbolic execution collects path conditions and feeds them to a constraint solver to obtain feasible solutions. However, complex path conditions, like nonlinear constraints, which widely appear in programs, are hard to be handled efficiently by the existing solvers.
In this repository, we adapt the classical symbolic execution framework with a theoretically grounded optimization technique based on machine learning. Our framework, MLBSE, encodes not only the simple linear path conditions, but also nonlinear arithmetic operations, and even black-box function calls of library methods, into symbolic path conditions. The feasibility problems of the path conditions are then transformed into optimization problems, which are handled by the underlying optimization solver through machine learning guided sampling and validation in the path conditions.
We implement MLBSE on the basis of Symbolic Path Finder (SPF) into a fully automatic Java symbolic execution engine. Users can feed their code to MLBSE directly, which is very convenient to use. To evaluate its performance, 19 real case programs are used as the benchmarks for MLBSE to generate test cases, which involve a total number of 360 methods that are full of nonlinear operations, float point arithmetic and even native method calls. Experiment results show that the coverage achieved by MLBSE is 1.5 to 2.9 times of the state-of-the-art tools with much better efficiency.
- The
Data/
directory contains the test cases generated by respective tools and the coverage reports generated by Jacoco. - The
Tool/
directory contains the executable program MLBSE, the target programs and the evaluation scripts.
Running MLBSE requires an environment with a Java Development Kit (JDK) supporting (1.8 recommended) and bash, ant for the evaluation scripts.
-
Tell JPF where MLBSE is located by creating the
~/.jpf/site.properties
file with the following lines:jpf.home = ${user.home}/PROJECT_DIR/Tool/MLB jpf-core = ${jpf.home}/jpf-core extensions= ${jpf-core} jpf-symbc = ${jpf.home}/jpf-symbc extensions+=, ${jpf-symbc} jpf-nhandler = ${jpf.home}/jpf-nhandler extensions+=, ${jpf-nhandler}
Note : PROJECT_DIR should be replaced by the actual root path to this repository.
-
For using SPF on a target program, a
*.jpf
configuration file needs to be given and the relevant information can be found in SPF documentation.MLBSE provides the configuration parameters as follows:
-
symbolic.mlpm = POSITIVE_INT (default:3000)
The sample size as the budget of the generation. Note that the running time is almost linear to the sample size which means a larger sample size may lead to a smaller estimated false negative rate but more time than the smaller ones . -
symbolic.complex = 0 / 1 / 2 (default:2)
The difficulty level of the problem under rough investigation. There are three preset levels: easy(0), where the function value changes slowly, e.g., x +1 = 0; modest(1), where the function value changes fast, e.g., sin(2|x|) = 0; and hard(2), where the function changes very fast, e.g., |e^x+3*e^3| = 0. -
symbolic.strategy = classical / ECS-Guided (default:classical)
The search strategy of symbolic execution. MLBSE backtracks when the path condition is infeasible or unsolved under the classical search strategy, while it continues to traverse deeper along the path condition under the ECS-Guided search strategy which possiblely leads to more time but higher coverages on the benchmarks .
In the
Tool/MLB/Test/experiment
directory, running thegenerate-test-cases.sh
script can automatically generate test cases and coverage reports for programs insrc/programs
with corresponding configuration files set insrc/drivers
. -
-
Different experiment scripts in the
Tool/MLB/Test/experiment
directory will help to reproduce the experiment data in the evaluation.- The
run-experiments-ex.sh
script runs the experiment to evaluate the performance of MLBSE for 19 real programs with a total number of 360 methods insrc/ex-programs
- The
run-experiments-samples.sh
script runs the experiment to evaluate what influence the sample size parameter has on the performance of MLBSE. - The
run-experiments-ECS-Guided.sh
script and therun-experiment-classical.sh
script run the experiments to evaluate the influence of different search strategies in symbolic execution on the performance. - The
run-experiments-black-box.sh
script and therun-experiments-white-box.sh
script run the experiments to evaluate the performance of MLBSE for programs containing library method calls.
- The