This repository is a companion page for the following paper: “DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks”, (submitted paper). This paper is implemented in python on Google Colab .
This repository is a replication package of DeepGD, a black-box (BB) test selection approach for deep neural networks (DNNs) that uses a customized multi-objective genetic search to guide the selection of test inputs with high fault-revealing power. It relies on diversity and uncertainty scores and it only requires access to the complete test dataset and prediction probabilities of the DNN model. The approach also considers a clustering-based approach to estimate faults in DNN models. An empirical evaluation on five DNN models, four datasets, and nine baselines shows that DeepGD provides better guidance for selecting inputs with high fault-revealing power and improves model performance through retraining.
Our main contributions are:
1- Proposing BB test selection approach
2- Customized the multi-objective search and validating by doing an ablation study
3- Validating DeepGD by approximating faults in DNNs
4- Comparing existing test selection metrics with DeepGD in terms of fault detection abilities, execution time, diversity, and retraining improvement
- Baseline_results folder contains a selected subsets by each methods through different datasets and models.
- Fault_clusters folder contains DNNs' faults which are saved in it for six different combinations of models & datasets.
- Retraining folder contains the retrained models with the selected subsets of all methods.
- LSA_DSA folder contains some parts of [4] for computing the LSA and DSA scores.
- ATS-master_final folder contains the related code for applying ATS from [3].
You need to first install these Libraries:
!pip install umap-learn
!pip install tslearn
!pip install hdbscan
!pip install pymoo
The code was developed and tested based on the following environment:
- python 3.8
- keras 2.7.0
- Tensorflow 2.7.1
- pytorch 1.10.0
- torchvision 0.11.1
- matplotlib
- sklearn
Here is a documentation on how to use this replication package.
To run the code, you have to upload the repository to Google Drive and open it on Google Colab and just the first line and set your local path and run the code.
This is the root directory of the repository. The directory is structured as follows:
Replication-package
.
|
|--- Final DeepGD Customized multi-objective search-based test selection method
|
|--- BaseLine methods All the used test selection baselines that have used in the paper.
|
|--- Retraining Retraining experiments.
|
|--- Generating data Our implementation for generating test inputs by applying real transformations.
Our experimental evaluation answers the research questions below.
RQ1: Do we find more faults than existing test selection approaches with the same testing budget?
RQ2: Do we more effectively guide the retraining of DNN models with our selected inputs than with baselines?
RQ3. Can DeepGD select more diverse test input sets compared to other test selection approaches?
RQ4. How do DeepGD and baseline approaches compare in terms of computation time?
In our study, we would like to emphasize that the use of mispredictions as the only criterion for evaluating test selection approaches can be misleading and may not accurately reflect the fault detection capabilities of these methods. This is primarily due to the possible presence of redundant mispredicted test inputs that can be due to the same fault in the DNN model. Therefore, the specific numbers related to mispredicted inputs are not included in our paper. Instead, they are provided within this replication package, along with other supplementary material.
1- To speed-up the execution, you can use GPU-based TensorFlow by changing the Colab Runtime.
1- DeepGini
2- Diversity
3- ATS
5- Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction and Clustering