MSR2021-ProgramRepair

Paper

You can find the paper here: https://arxiv.org/abs/2103.11626

Data

Note: If you are facing issues regarding the LFS bandwidth, you can download the dataset from Zenodo: https://zenodo.org/record/6802730.

data folder contains multiple folders and files:

repetition folder contains MSR datasets WITH <buggy code, fixed code> duplicate pairs
unique folder contains MSR datasets WITHOUT <buggy code, fixed code> duplicate pairs
sstubs(Large|Small).json files contain dataset in JSON format
sstubs(Large|Small)-(train|test|val).json files contain dataset split in JSON format
split/(large|small) folders contain dataset in text format (what the CodeBERT works with)

Running CodeBERT Experiments

Clone the repository
- git lfs install
- git clone https://github.com/EhsanMashhadi/MSR2021-ProgramRepair.git
Download the CodeBERT model
- cd MSR2021-ProgramRepair
- git clone https://huggingface.co/microsoft/codebert-base
- use the downloaded model's directory path as pretrained_model variable in script files
Install dependencies
- pip install torch==1.4.0
- pip install transformers==2.5.0
Train the model with MSR data
- bash ./scripts/codebert/train.sh
Evaluate the model
- bash ./scripts/codebert/test.sh

Running Simple LSTM Experiments

Install OpenNMT-py
- pip install OpenNMT-py==2.2.0
- If you face conflicts between pytorch and CUDA version, you can follow this link
Preprocess the MSR data
- bash ./scripts/simple-lstm/build_vocab.sh
Train the model
- bash ./scripts/simple-lstm/train.sh
Evaluate the model
- bash ./scripts/simple-lstm/test.sh

Running Simple LSTM Experiments using the legacy version of OpenNMT-py

(This is the original version used to run the simple LSTM experiments in the paper.)

Install OpenNMT-py legacy
- pip install OpenNMT-py==1.2.0
Preprocess the MSR data
- bash ./scripts/simple-lstm/legacy/preprocess.sh
Train the model
- bash ./scripts/simple-lstm/legacy/train.sh
Evaluate the model
- bash ./scripts/simple-lstm/legacy/test.sh

How to run all experiments?

You can change the size and type variables value in script files to run different experiments (large | small, unique | repetition).

Have trouble running on GPU?

Check the CUDA and PyTorch compatibility
Assign the correct values for CUDA_VISIBLE_DEVICES, gpu_rank, and world_size based on your GPU numbers in all scripts.
Run on GPU by removing the gpu_rank, and world_size options in all scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
codebert		codebert
data		data
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSR2021-ProgramRepair

Paper

Data

Running CodeBERT Experiments

Running Simple LSTM Experiments

Running Simple LSTM Experiments using the legacy version of OpenNMT-py

How to run all experiments?

Have trouble running on GPU?

About

Releases 1

Packages

Contributors 2

Languages

EhsanMashhadi/MSR2021-ProgramRepair

Folders and files

Latest commit

History

Repository files navigation

MSR2021-ProgramRepair

Paper

Data

Running CodeBERT Experiments

Running Simple LSTM Experiments

Running Simple LSTM Experiments using the legacy version of OpenNMT-py

How to run all experiments?

Have trouble running on GPU?

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages