SparseCoder

This repo will provide the code for reproducing the experiments in SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization.

SparseCoder employs a sliding window mechanism for self-attention to model short-term dependencies and leverages the structure of code to capture long-term dependencies among source code identifiers.

Dependency

-pip install torch

-pip install transformers

Data Download

Our file-level code summary dataset is released at hugging face. You can download the FILE-CS dataset at this

Fine-Tuning

Here we provide fine-tuning settings of SparseCoder for file-level code summarization, whose results are reported in the paper.

lang=FCS
lr=2e-5
batch_size=16
beam_size=5
source_length=2048
target_length=128
output_dir=../saved_models/SparaseCoder
train_file=../../dataset/${lang}/train.pkl
dev_file=../../dataset/${lang}/dev.pkl
epochs=10 
pretrained_model=longformer-unixcoder #Roberta: roberta-base
mkdir -p $output_dir

#training and evaluating
python run.py \
--do_train \
--do_eval \
--model_name_or_path $pretrained_model \
--train_filename $train_file \
--dev_filename $dev_file \
--output_dir $output_dir \
--max_source_length $source_length \
--max_target_length $target_length \
--beam_size $beam_size \
--train_batch_size $batch_size \
--eval_batch_size $batch_size \
--learning_rate $lr \
--num_train_epochs $epochs 2>&1| tee $output_dir/train.log

#testing
reload_model=$output_dir/checkpoint-best-score/model.bin
test_file=../dataset/${lang}/test.pkl
python run.py \
--do_test \
--load_model_path $reload_model \
--model_name_or_path $pretrained_model \
--test_filename $test_file \
--output_dir $output_dir \
--max_source_length $source_length \
--max_target_length $target_length \
--beam_size $beam_size \
--train_batch_size $batch_size \
--eval_batch_size $batch_size \
--learning_rate $lr \
--num_train_epochs $epochs 2>&1| tee $output_dir/test.log

If you want to reproduce the other models mentioned in the paper, simply run the run.sh file under the appropriate folder.

Reference

If you use this code or SparseCoder, and if you use our FILE-CS dataset, please consider citing us.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
File-Level-Summary		File-Level-Summary
Method-Level-Summary		Method-Level-Summary
dataset		dataset
README.md		README.md
appendix.pdf		appendix.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparseCoder

Dependency

Data Download

Fine-Tuning

Reference

About

Uh oh!

Releases

Packages

Languages

DeepSoftwareAnalytics/SparseCoder

Folders and files

Latest commit

History

Repository files navigation

SparseCoder

Dependency

Data Download

Fine-Tuning

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages