Skip to content

ML4SE2022/group2

Repository files navigation

Group 2: PLBART with AST

This project covers the implementation of AST for the PLBART model. Our model is fine-tuned using the CodeXGlue dataset. The original PLBART code can be found here. The original paper is also linked, Unified Pre-training for Program Understanding and Generation. We used WSL2 for the project.


Docker Setup

There is the option to use a docker image to run the project. For this image all the dependencies are already installed and you can skip the local setup. The rest of the steps for performing pre-processing, fine-tuning, and evaluation are the same as detailed below. A NVIDIA GPU is necessary to use CUDA with GPU.

Install docker and run the following command to access the image.

docker run -it --gpus all jmoreirakanaley/plbart-ast:final

Activate the conda environment for running the experiments.

conda activate plbart

Access the project.

cd ~/plbart-ast

Local Setup

We can setup a conda environment in order to run experiments, the first step is to download the dependencies. We assume anaconda is installed. The additional requirements (noted in requirements.txt) can be installed by running the following script:

bash install_env.sh

Pre-processing Step

If you wish to only preprocess the dataset follow these steps.

Step 1. Build parser

cd scripts/code_to_code/translation/parser
bash build.sh
cd ..

Step 2. Prepare the data

bash prepare.sh src_lang tgt_lang
cd ../../..

Fine-tuning

We fine-tune and evaluate PLBART with AST on the code-to-code downstream task from CodeXGLUE.

Type Task Language(s) Data Scripts Checkpoints Results
Code to Code Code translation Java, C# [LINK] [LINK] [LINK] [LINK]

Step 1. Download original PLBART base model

cd pretrain
bash download.sh
cd ..

Step 2. Build parser for CodeBLEU evaluation (skip if already done)

cd scripts/code_to_code/translation/parser
bash build.sh
cd ../../../..

Step 3. Prepare the data, train and evaluate PLBART

cd scripts/code_to_code/translation
bash prepare.sh src_lang tgt_lang
bash run.sh GPU_IDS src_lang tgt_lang model_size
cd ../../..

Here is an example for fine-tuning from java to c#:

bash run.sh 0 java cs base

Note. We fine-tuned our model on 1 NVIDIA Quadro P1000 (4gb) GPU (~ 8 hours).


Evaluation

If you wish to only evaluate the model against the CodeXGLUE benchmark.

Step 1. Download PLBART AST fine-tuned checkpoints

To download the fine-tuned models for java to c# and c# to java

cd scripts/code_to_code/translation
bash download.sh

Step 2. Evaluate against CodeXGLUE

Run the evaluate.sh file

bash evaluate.sh GPU_IDS src_lang tgt_lang

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published