Code implementation of our accepted paper: C3P (Code-Comment Pre-Training).
This repository contains official implementation of the paper C3P (Code-Comment Pre-Training).
How to use?
Our model has been uploaded to the Huggingface community. You can directly load with the following lines:
from models.model import C3PModel, Config
config = Config('eot')
c3p = C3PModel(config=config)
And you can also load single encoder with the following lines:
from transformers import AutoModel
code_encoder = AutoModel.from_pretrained("TerryPei/CL")
comment_encoder = AutoModel.from_pretrained("TerryPei/NL")
How to reproduce?
-
- Deploy our files on the Linux. Download the datasets and move to the
-
- Replace the path to your own path of files.
-
- Upzip the dataset file to the ./content.
-
- Run each cells of code, and modify the hyperparameters in the Arguments Class. Modify the hyperparameters if you run the bash file of script/run.sh
--train_data_file=dataset/<data_name>
--output_dir=results/logs/<logs_name>.log
--do_train
--do_test
--cpkt_path=results/cpkt/<cpkt_name>
--train_batch_size=1024
--eval_batch_size=1024
--learning_rate=2e-5
--max_grad_norm=1.
--num_train_epochs=500
--seed=42
--max_length=320
--log_interval=10
The setup of each hyperparameters will get different outputs in our experiments via runining file:
cd script
bash run.sh GPU_IDS