Skip to content

QiushiSun/TransCoder

Repository files navigation

TransCoder

arXiv License

Code for LREC-COLING 2024 Submission

Environment & Preparing

conda create --name transcoder python=3.7
conda activate transcoder
pip install -r requirements.txt
cd TransCoder/evaluator/CodeBLEU/parser
bash build.sh
cd ../../../
cp evaluator/CodeBLEU/parser/my-languages.so build/
#make sure git-lfs installed like 'apt-get install git-lfs'
bash get_models.sh

Preparing data

The dataset comes from CodeXGLUE.

mkdir data
cd data
pip install gdown
gdown https://drive.google.com/uc?export=download&id=1BBeHFlKoyanbxaqFJ6RRWlqpiokhDhY7
unzip data.zip
rm data.zip

Preparing local path

Direct WORKDIR, HUGGINGFACE_LOCALS in run.sh, run_few_shot.sh to your path.

Finetune

export MODEL_NAME=
export TASK=
export SUB_TASK=
# to run one task
bash run.sh $MODEL_NAME $TASK $SUB_TASK
# to run few shot
bash run_few_shot.sh $MODEL_NAME $TASK $SUB_TASK

MODEL_NAME can be any one of ["roberta", "codebert", "graphcodebert", "unixcoder","t5","codet5","codet5+", "bart","plbart"].

TASK can be any one of ['summarize', 'translate', 'refine', 'generate', 'defect', 'clone'].

Category Dataset Task Sub_task(LANG) Type Category Description
C2C BCB clone [] (java) bi-directional encoder code summarization task onCodeSearchNet data with six PLs
C2C Devign defect [] (c) bi-directional encoder text-to-code generation onConcode data
C2C CodeTrans translate ['java-cs', 'cs-java’] end2end en2de code-to-code translation betweenJava and C#
C2C Bugs2Fix refine(repair) ['small','medium'] (java) end2end en2de code refinement oncode repair data with small/medium functions
C2T CodeSN summarize ['java', 'python', 'javascript','php','ruby','go'] end2end en2de code defect detection inC/C++ data
T2C CONCODE generate(concode) [] (java) end2end en2de code clone detection inJava dataTransCoder

Run TransCoder

export MODEL_NAME=
export TASK=
bash run_transcoder.sh $MODEL_NAME $TASK 

TASK can be any one of ['cls2translate','translate2cls','cls2summarize','summarize2cls','translate2summarize','summarize2translate','cross2java','cross2php','cross2ruby','cross2python','cross2go','cross2javascript']

Acknowledgement

Codes are adapted from nchen909/TransCoder

Citation

Please consider citing us if you find this repository useful.👇

@misc{sun2023transcoder,
      title         = {TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills}, 
      author        = {Qiushi Sun and Nuo Chen and Jianing Wang and Xiang Li and Ming Gao},
      year          = {2023},
      eprint        = {2306.07285},
      archivePrefix = {arXiv},
      primaryClass  = {cs.SE}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published