Skip to content

DeepPerf is an end-to-end deep learning based solution that can train a software performance prediction model from a limited number of samples and predict the performance value of a new configuration.

DeepPerf/DeepPerf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepPerf

Many software systems provide users with a set of configuration options and different configurations may lead to different runtime performance of the system. It is necessary to understand the performance of a system under a certain configuration, before the system is actually configured and deployed. This helps users make rational decisions in configurations and reduce performance testing cost. As the combination of configurations could be exponential, it is difficult to exhaustively deploy and measure system performance under all possible configurations. Recently, several learning methods have been proposed to build a performance prediction model based on performance data collected from a small sample of configurations, and then use the model to predict system performance with a new configuration. DeepPerf is an end-to-end deep learning based solution that can train a software performance prediction model from a limited number of samples and predict the performance value of software system under a new configuration. DeepPerf consists of two main stages:

  • Stage 1: Tune the hyperparameters of the neural network
  • Stage 2: Utilize the hyperparameters in Stage 1 to train the neural network with the samples and predict the performance value of software system under a new configuration.

Citing DeepPerf

If you find our code useful, please cite our paper:

@inproceedings{Ha2019DeepPerf,
  author    = {Huong Ha and
               Hongyu Zhang},
  title     = {DeepPerf: performance prediction for configurable software with deep
               sparse neural network},
  booktitle = {Proceedings of the 41st International Conference on Software Engineering,
               {ICSE} 2019, Montreal, QC, Canada, May 25-31, 2019},
  pages     = {1095--1106},
  publisher = {{IEEE} / {ACM}},
  year      = {2019}
}

Prerequisites

  • Python 3.6.x
  • Tensorflow (tested with tensorflow 1.10.0, 1.8.0)

Installation

DeepPerf can be directly executed through source code

  1. Download and install Python 3.6.x here.

  2. Install Tensorflow

    $ pip install tensorflow==1.10.0

  3. Clone DeepPerf

    $ clone https://github.com/DeepPerf/DeepPerf.git

Data

DeepPerf has been evaluated on 11 real-world configurable software systems:

  • Apache
  • LLVM
  • x264
  • BDBC
  • BDBJ
  • SQL
  • Dune
  • hipacc
  • hsmgp
  • javagc
  • sac

Six of these systems have only binary configuration options, the other five systems have both binary and numeric configuration options. The data is store in the DeepPerf\Data directory. These software systems were measured and published online by the SPLConqueror team. More information of these systems and how they were measured can be found in here.

Usage

To run DeepPerf, users need to specify the name of the software system they wish to evaluate and then run the script AutoDeepPerf.py. There are 11 software systems that users can evaluate: Apache, LLVM, x264, BDBC, BDBJ, SQL, Dune, hipacc, hsmgp, javagc, sac. The script will then evaluate DeepPerf on the chosen software system with the same experiment setup presented in our paper. Specifically, for binary software systems, DeepPerf will run with five different sample sizes: n, 2n, 3n, 4n, 5n with n being the number of options, and 30 experiments for each sample size. For binary-numeric software systems, DeepPerf will run with the sample sizes specified in Table IV of our paper, and 30 experiments for each sample size. For example, if users want to evaluate DeepPerf with the system LLVM, the command line to run DeepPerf will be:

$ python AutoDeepPerf.py LLVM

When finishing each sample size, the script will output a .csv file that shows the mean prediction error and the margin (95% confidence interval) of that sample size over the 30 experiments. These results will be same/similar as the results we report in Table III and IV of our paper.

Alternatively, users can customize the sample size and/or the number of experiments for each sample size by using the optional arguments -ss and -ne. For example, to set the sample size = 20 and the number of experiments = 10, the corresponding command line is:

$ python AutoDeepPerf.py LLVM -ss 20 -ne 10

Setting none or one option will result in the other option(s) running with the default setting. The default setting of the number of experiments is 30. The default setting of the sample size is: (a) the five different sample sizes: n, 2n, 3n, 4n, 5n, with n being the number of configuration options, when the evaluated system is a binary system OR (b) the four sample sizes specified in Table IV of our paper when the evaluated system is a binary-numeric system.

NOTE: The time cost of tuning hyperparameters and training the final neural network for each experiment ranges from 2-20 minutes depends on the software system, the sample size and the user's CPU. Typically, the time cost will be smaller when the software systems has smaller number of configurations or when the sample size is small. Therefore, please be aware that for each sample size, the time cost of evaluating 30 experiments ranges from 1 hour to 10 hours.

Experimental Results

To evaluate the prediction accuracy, we use the mean relative error (MRE), which is computed as,

where V is the testing dataset, predicted_c is the predicted performance value of configuration c generated using the model, actual_c is the actual performance value of configuration c. In the two tables below, Mean is the mean of the MREs seen in 30 experiments and Margin is the margin of the 95% confidence interval of the MREs in the 30 experiments. The results are obtained when evaluating DeepPerf on a Windows 7 computer with Intel Xeon CPU E5-1650 3.2GHz 16GB RAM.

Prediction accuracy for software systems with binary options

Subject System Sample Size DECART DeepPerf
Mean Margin Mean Margin
Apache n NA NA 17.87 1.85
2n 15.83 2.89 10.24 1.15
3n 11.03 1.46 8.25 0.75
4n 9.49 1.00 6.97 0.39
5n 7.84 0.28 6.29 0.44
x264 n 17.71 3.87 10.43 2.28
2n 9.31 1.30 3.61 0.54
3n 6.37 0.83 2.13 0.31
4n 4.26 0.47 1.49 0.38
5n 2.94 0.52 0.87 0.11
BDBJ n 10.04 4.67 7.25 4.21
2n 2.23 0.16 2.07 0.32
3n 2.03 0.16 1.73 0.12
4n 1.72 0.09 1.67 0.12
5n 1.67 0.09 1.61 0.09
LLVM n 6.00 0.34 5.09 0.80
2n 4.66 0.47 3.87 0.48
3n 3.96 0.39 2.54 0.15
4n 3.54 0.42 2.27 0.16
5n 2.84 0.33 1.99 0.15
BDBC n 151.0 90.70 133.6 54.33
2n 43.8 26.72 16.77 2.25
3n 31.9 22.73 13.1 3.39
4n 6.93 1.39 6.95 1.11
5n 5.02 1.69 5.82 1.33
SQL n 4.87 0.22 5.04 0.32
2n 4.67 0.17 4.63 0.13
3n 4.36 0.09 4.48 0.08
4n 4.21 0.1 4.40 0.14
5n 4.11 0.08 4.27 0.13

Prediction accuracy for software systems with binary-numeric options

Subject System Sample Size SPLConqueror DeepPerf
Sampling Heuristic Mean Sampling Heuristic Mean Margin
Dune 49 OW RD 20.1 RD 15.73 0.90
78 PW RD 22.1 RD 13.67 0.82
240 OW PBD(49, 7) 10.6 RD 8.19 0.34
375 OW PBD(125, 5) 18.8 RD 7.20 0.17
hipacc 261 OW RD 14.2 RD 9.39 0.37
528 OW PBD(125, 5) 13.8 RD 6.38 0.44
736 OW PBD(49, 7) 13.9 RD 5.06 0.35
1281 PW RD 13.9 RD 3.75 0.26
hsmgp 77 OW RD 4.5 RD 6.76 0.87
173 PW RD 2.8 RD 3.60 0.2
384 OW PBD(49, 7) 2.2 RD 2.53 0.13
480 OW PBD(125, 5) 1.7 RD 2.24 0.11
javagc 423 OW PBD(49, 7) 37.4 RD 24.76 2.42
534 OW RD 31.3 RD 23.27 4.00
855 OW PBD(125, 5) 21.9 RD 21.83 7.07
2571 OW PBD(49, 7) 28.2 RD 17.32 7.89
sac 2060 OW RD 21.1 RD 15.83 1.25
2295 OW PBD(125, 5) 20.3 RD 17.95 5.63
2499 OW PBD(49, 7) 16 RD 17.13 2.22
3261 PW RD 30.7 RD 15.40 2.05

About

DeepPerf is an end-to-end deep learning based solution that can train a software performance prediction model from a limited number of samples and predict the performance value of a new configuration.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages