VulRepair Replication Package

VulRepair

A T5-based Automated Software Vulnerability Repair

Auto-Repair Real-World Software Vulnerabilities

VulRepair Performance on Top-25 Most Dangerous CWEs in 2021

Rank	CWE Type	Name	%PP	Proportion
1	CWE-787	Out-of-bounds Write	30%	16/53
2	CWE-79	Cross-site Scripting	0	0/1
3	CWE-125	Out-of-bounds Read	32%	54/170
4	CWE-20	Improper Input Validation	45%	68/152
5	CWE-78	OS Command Injection	33%	1/3
6	CWE-89	SQL Injection	20%	1/5
7	CWE-416	Use After Free	53%	29/55
8	CWE-22	Path Traversal	25%	2/8
9	CWE-352	Cross-Site Request Forgery	0	0/2
10	CWE-434	Dangerous File Type	-	-
11	CWE-306	Missing Authentication for Critical Function	-	-
12	CWE-190	Integer Overflow or Wraparound	53%	31/59
13	CWE-502	Deserialization of Untrusted Data	-	-
14	CWE-287	Improper Authentication	50%	3/6
15	CWE-476	NULL Pointer Dereference	66%	46/70
16	CWE-798	Use of Hard-coded Credentials	-	-
17	CWE-119	Improper Restriction of Operations	37%	141/386
18	CWE-862	Missing Authorization	0	0/2
19	CWE-276	Incorrect Default Permissions	-	-
20	CWE-200	Exposure of Sensitive Information	61%	39/64
21	CWE-522	Insufficiently Protected Credentials	0	0/4
22	CWE-732	Incorrect Permission Assignment	50%	1/2
23	CWE-611	Improper Restriction of XML Reference	0	0/3
24	CWE-918	Server-Side Request Forgery (SSRF)	0	0/1
25	CWE-77	Command Injection	100%	2/2
		TOTAL	41%	434/1048

Top-10 Most Accurately Repaired CWE Types of VulRepair

Rank	CWE Type	Name	%PP	Proportion
1	CWE-755	Improper Handling of Exceptional Conditions	100%	1/1
2	CWE-706	Use of Incorrectly-Resolved Name or Reference	100%	1/1
3	CWE-326	Inadequate Encryption Strength	100%	2/2
4	CWE-667	Improper Locking	100%	1/1
5	CWE-369	Divide By Zero	100%	5/5
6	CWE-77	Command Injection	100%	2/2
7	CWE-388	Error Handling	100%	1/1
8	CWE-436	Interpretation Conflict	100%	1/1
9	CWE-191	Integer Underflow	100%	2/2
10	CWE-285	Improper Access Control	75%	6/8
		TOTAL	92%	22/24

VulRepair Performance on Top-10 Majority CWE Types in Testing Data

Rank	CWE Type	Name	%PP	Proportion
1	CWE-119	Improper Restriction of Operations	37%	141/386
2	CWE-125	Out-of-bounds Read	32%	54/170
3	CWE-20	Improper Input Validation	45%	68/152
4	CWE-264	Permissions, Privileges, and Access Controls	51%	36/71
5	CWE-476	NULL Pointer Dereference	66%	46/70
6	CWE-200	Exposure of Sensitive Information	61%	39/64
7	CWE-399	Resource Management Errors	62%	37/60
8	CWE-190	Integer Overflow or Wraparound	53%	31/59
9	CWE-416	Use After Free	53%	29/55
10	CWE-362	Race Condition	43%	23/54
		TOTAL	44%	504/1141

The raw predictions of VulRepair can be accessed here

[FSE 2022 Technical track] [Paper #152] [7 mins talk]
VulRepair: A T5-based Automated Software Vulnerability Repair
To appear in ESEC/FSE 2022 (14-18 November, 2022).

How to replicate

About the Environment Setup

First of all, clone this repository to your local machine and access the main dir via the following command:

git clone https://github.com/awsm-research/VulRepair.git
cd VulRepair

Then, install the python dependencies via the following command:

pip install transformers
pip install torch
pip install numpy
pip install tqdm
pip install pandas
pip install tokenizers
pip install datasets
pip install gdown
pip install tensorboard
pip install scikit-learn

Alternatively, we provide requirements.txt with version of packages specified to ensure the reproducibility, you may install via the following commands:

pip install -r requirements.txt

If having an issue with the gdown package, try the following commands:

git clone https://github.com/wkentaro/gdown.git
cd gdown
pip install .
cd ..

We highly recommend you check out this installation guide for the "torch" library so you can install the appropriate version on your device.
To utilize GPU (optional), you also need to install the CUDA library, you may want to check out this installation guide.
Python 3.9.7 is recommended, which has been fully tested without issues.

About the Datasets

All of the dataset has the same number of columns (i.e., 7 cols), we focus on the following 2 columns to conduct our experiments:

source (str): The localized vulnerable function written in C (preprocessed by Chen et al.)
target (str): The repair ground-truth (preprocessed by Chen et al.)

source	target
...	...

Descriptive statistics of our experimental dataset

	1st Qt.	Median	3rd Qt.	Avg.
Function Length	138	280	593	586
Patch Length	12	24	48	55
Cyclomatic Complexity of Functions	3	8	19	23

Note.

This dataset is originally provided by Bhandari et al. and Fan et al., and it is further preprocessed by Chen et al.

For more information, please kindly refer to this repository.
We process cyclomatic complexity (CC) using Joern tool

Dataset with labelled CC. can be found here

About the Models

Model Naming Convention

Model Name	Model Specification	Related to RQ
M1 (VulRepair)	BPE Tokenizer + Pre-training (PL/NL) + T5	RQ1, RQ2, RQ3, RQ4
M2 (CodeBERT)	BPE Tokenizer + Pre-training (PL/NL) + BERT	RQ1, RQ2, RQ3
M3	BPE Tokenizer + No Pre-training + T5	RQ2, RQ4
M4	BPE Tokenizer + Pre-training (NL) + T5	RQ2
M5	BPE Tokenizer + No Pre-training + BERT	RQ2
M6	BPE Tokenizer + Pre-training (NL) + BERT	RQ2
M7	Word-level Tokenizer + Pre-training (PL/NL) + T5	RQ3, RQ4
M8	BPE Tokenizer + Vanilla XFMR	RQ3
M9	Word-level Tokenizer + Pre-training (PL/NL) + BERT	RQ3
M10	Word-level Tokenizer + No Pre-training + T5	RQ4

How to access the models

We host our VulRepair on the Model Hub provided by Huggingface Transformers which can be access here.
All other models can be downloaded from this public Google Cloud Space.

About VulRepair Replication

To reproduce the results of our VulRepair (M1 model), run the following commands (Inference only):

cd M1_VulRepair_PL-NL
python vulrepair_main.py \
    --output_dir=./saved_models \
    --model_name=model.bin \
    --tokenizer_name=MickyMike/VulRepair \
    --model_name_or_path=MickyMike/VulRepair \
    --do_test \
    --encoder_block_size 512 \
    --decoder_block_size 256 \
    --num_beams=50 \
    --eval_batch_size 1

Note. please adjust the "num_beams" parameters accordingly to obtain the results we present in the discussion section. (i.e., num_beams= 1, 2, 3, 4, 5, 10)

To retrain the VulRepair model from scratch, run the following commands (Training + Inference):

# training
cd M1_VulRepair_PL-NL
python vulrepair_main.py \
    --model_name=model.bin \
    --output_dir=./saved_models \
    --tokenizer_name=Salesforce/codet5-base \
    --model_name_or_path=Salesforce/codet5-base \
    --do_train \
    --epochs 75 \
    --encoder_block_size 512 \
    --decoder_block_size 256 \
    --train_batch_size 4 \
    --eval_batch_size 4 \
    --learning_rate 2e-5 \
    --max_grad_norm 1.0 \
    --evaluate_during_training \
    --seed 123456  2>&1 | tee train.log

# Inference
python vulrepair_main.py \
    --output_dir=./saved_models \
    --model_name=model.bin \
    --tokenizer_name=Salesforce/codet5-base \
    --model_name_or_path=Salesforce/codet5-base \
    --do_test \
    --encoder_block_size 512 \
    --decoder_block_size 256 \
    --num_beams=50 \
    --eval_batch_size 1

About the Experiment Replication

We recommend to use GPU with 8 GB up memory for training since T5 and BERT architecture is very computing intensive.

Note. If the specified batch size is not suitable for your device, please modify --eval_batch_size and --train_batch_size to fit your GPU memory.

How to replicate RQ1

You need to replicate M1(VulRepair) and M2(CodeBERT) to replicate the results of RQ1:

Click here for the instruction of replicating M1(VulRepair)
Click here for the instruction of replicating M2(CodeBERT)

How to replicate RQ2

You need to replicate M1(VulRepair), M2(CodeBERT), M3, M4, M5, M6 to replicate the results of RQ2:

Click here for the instruction of replicating M1(VulRepair)
Click here for the instruction of replicating M2(CodeBERT)
Click here for the instruction of replicating M3
Click here for the instruction of replicating M4
Click here for the instruction of replicating M5
Click here for the instruction of replicating M6

How to replicate RQ3

You need to replicate M1(VulRepair), M2(CodeBERT), M7, M8, M9 to replicate the results of RQ2:

Click here for the instruction of replicating M1(VulRepair)
Click here for the instruction of replicating M2(CodeBERT)
Click here for the instruction of replicating M7
Click here for the instruction of replicating M8
Click here for the instruction of replicating M9

How to replicate RQ4

You need to replicate M1(VulRepair), M3, M7, M10 to replicate the results of RQ2:

Click here for the instruction of replicating M1(VulRepair)
Click here for the instruction of replicating M3
Click here for the instruction of replicating M7
Click here for the instruction of replicating M10

Appendix

Results of RQ1

Methods	% Perfect Prediction
VulRepair	44%
CodeBERT	31%
VRepair	21%

Results of RQ2

T5	% Perfect Prediction
PL/NL (VulRepair)	44%
No Pre-training	30%
NL	6%
BERT	% Perfect Prediction
PL/NL (CodeBERT)	31%
No Pre-training	29%
NL	1%

Results of RQ3

VulRepair	% Perfect Prediction
Subword Tokenizer	44%
Word-level Tokenizer	35%
VRepair	% Perfect Prediction
Subword Tokenizer	34%
Word-level Tokenizer	23%
CodeBERT	% Perfect Prediction
Subword Tokenizer	31%
Word-level Tokenizer	17%

Results of RQ4

VulRepair	% Perfect Prediction
Pre-train + BPE + T5	44%
Pre-train + Word-level + T5	35%
No Pre-train + BPE + T5	30%
No Pre-train + Word-level + T5	1%

Acknowledgements

Special thanks to authors of VRepair (Chen et al.)
Special thanks to authors of CodeT5 (Wang et al.)
Special thanks to dataset providers of CVEFixes (Bhandari et al.) and Big-Vul (Fan et al.)

License

MIT License

Citation

@inproceedings{fu2022vulrepair,
  title={VulRepair: A T5-based Automated Software Vulnerability Repair},
  author={Fu, Michael and Tantithamthavorn, Chakkrit and Le, Trung and Nguyen, Van and Dinh, Phung},
  journal={To appear in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
M10_T5_no_pretrain_word_level		M10_T5_no_pretrain_word_level
M1_VulRepair_PL-NL		M1_VulRepair_PL-NL
M2_CodeBERT_PL-NL		M2_CodeBERT_PL-NL
M3_T5_no_pretrain_subword		M3_T5_no_pretrain_subword
M4_T5_base_NL		M4_T5_base_NL
M5_BERT_no_pretrain_subword		M5_BERT_no_pretrain_subword
M6_BERT_base_NL		M6_BERT_base_NL
M7_CodeT5_word_level		M7_CodeT5_word_level
M8_VRepair_subword		M8_VRepair_subword
M9_CodeBERT_word_level		M9_CodeBERT_word_level
data		data
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
REQUIREMENTS.pdf		REQUIREMENTS.pdf
STATUS.pdf		STATUS.pdf
VulRepair-slide.pdf		VulRepair-slide.pdf
artifact_description.pdf		artifact_description.pdf
requirements.txt		requirements.txt
vulrepair.gif		vulrepair.gif
vulrepair_paper.pdf		vulrepair_paper.pdf

License

awsm-research/VulRepair

Folders and files

Latest commit

History

Repository files navigation

VulRepair Replication Package

Auto-Repair Real-World Software Vulnerabilities

VulRepair Performance on Top-25 Most Dangerous CWEs in 2021

Top-10 Most Accurately Repaired CWE Types of VulRepair

VulRepair Performance on Top-10 Majority CWE Types in Testing Data

The raw predictions of VulRepair can be accessed here

[FSE 2022 Technical track] [Paper #152] [7 mins talk] VulRepair: A T5-based Automated Software Vulnerability Repair To appear in ESEC/FSE 2022 (14-18 November, 2022).

Table of contents

How to replicate

About the Environment Setup

About the Datasets

Descriptive statistics of our experimental dataset

About the Models

Model Naming Convention

How to access the models

About VulRepair Replication

About the Experiment Replication

How to replicate RQ1

How to replicate RQ2

How to replicate RQ3

How to replicate RQ4

Appendix

Results of RQ1

Results of RQ2

Results of RQ3

Results of RQ4

Acknowledgements

License

Citation

About

Resources

License

Stars

Watchers

Forks

Languages

[FSE 2022 Technical track] [Paper #152] [7 mins talk]
VulRepair: A T5-based Automated Software Vulnerability Repair
To appear in ESEC/FSE 2022 (14-18 November, 2022).