C3PO: A Lightweight Copying Mechanism for Translating Pseudocode to Code

Source Code for the C3PO paper published at AACL-IJCNLP 2022 (Student Research Workshop) [Paper]

Stages of the Pipline

Copy Phase
Generate Phase
Combine Phase

Getting Started

Install conda or miniconda
Create a conda environment

conda create -f c3po.yml 
conda activate c3po

Download the SPOC dataset as a zip and unzip into into the data/ folder

Data Preprocessing

Data preprocessing of the SPOC dataset to mask [CPY] tags using a decision tree is included under cpy_preprocess/.
The preprocessed datasets are released in the data/ folder.
There are 2 versions of both the train and eval datasets.

Normally masking tokens with [CPY] tokens. In data/CPY_dataset.pkl (for train set) and data/CPY_dataset_eval.pkl (for eval set)
Numbering the [CPY] tokens as [CPY_1] ... [CPY_n]. In data/CPY_dataset_numbered.pkl (for train set) and data/CPY_dataset_numbered_eval.pkl (for eval set)

Models

Experiments were carried out with the 2 following architectures, which is included in the models/ folder.

Vanilla Seq2Seq (models/vanilla_seq2seq.py)
Attention Seq2Seq (models/attention_seq2seq.py)

Training Seq2Seq models

Scripts in train_scripts/seq2seq_training.py. Can train both Usage:

python seq2seq_training.py
--attention # Boolean flag if attention seq2seq model to be trained (Default Vanilla Seq2Seq)
--non-copy # Boolean flag if non-copy version to be trained (no masking of tokens) (Default copy version)

The hyperparameters used have been set by default for each model

Inference and Results

Inference scripts are included in inference/scripts/. The predictions are released as .pkl files under inference/predictions/ for the following versions

Attention Seq2Seq with C3PO (CPY masking)
Attention Seq2Seq w/o C3PO (No CPY masking)
Vanilla Seq2Seq with C3PO (CPY masking)
Vanilla Seq2Seq w/o C3PO (No CPY masking)

The results (BLEU scores) are computed using these prediction files in inference/results.ipynb

Citation

If you used this work in your research, please cite:

@inproceedings{veerendranath2022c3po,
    title={C3PO: A Lightweight Copying Mechanism for Translating Pseudocode to Code},
    author={Veerendranath, Vishruth and Masti, Vibha and Anagani, Prajwal and Hr, Mamatha},
    booktitle={Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop},
    pages={47--53},
    year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
cpy_preprocess		cpy_preprocess
data		data
dataloaders		dataloaders
figures		figures
inference		inference
models		models
tokenizer		tokenizer
train_scripts		train_scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
c3po.yml		c3po.yml
requirements.txt		requirements.txt

Pseudocode-to-Code/C3PO

Folders and files

Latest commit

History

Repository files navigation

C3PO: A Lightweight Copying Mechanism for Translating Pseudocode to Code

Stages of the Pipline

Getting Started

Data Preprocessing

Models

Training Seq2Seq models

Inference and Results

Citation

About

Resources

Stars

Watchers

Forks

Languages