GitHub - HongliangLiang/pcan: Path context augmented statement and network for learning programs

Introduction

Code for paper:

Path Context Augmented Statement and Network for Learning Programs

Da Xiao, Dengji Hang, Lu Ai, Shengping Li, Hongliang Liang

Empirical Software Engineering, 2022

Environment

dgl==0.5.3  
python==3.8  
torch==1.7.1+cu110

BCB Data

Training Data：data/bcb/pairs.pkl （When start to training, it will be automatically pro-rated randomly into training sets and development sets）

select about 100 million randomly from Raw BCB Data Set
Testing Data：data/bcb/pairs.pkl
Pcan tree structure data (include training set, development set and test set) after data pre-processing：data/bcb/processed.pkl
Testing Data construction method：

Because more data in the positive example can be found in the Raw BCB data, the type is constructed based on the positive example data and the condition that the positive example data accounts for 14% of the total data.

For example, if the positive sample of type = 1 is 15 and the negative sample of type = 1 is 2, then the total number of negative sample that need to be constructed is 15/0.14 = 107, then the number of negative sample that need to be constructed is 107- 15 - 2 = 95, we mark the type of 95 random samples labeled as negative sample that do not have a type found in the negative sample as 1.
From 1 to 4 classes are all constructed as method 1.
Positive examples of Type = 4(16,889 can be found) < 124750 * 0.14, as described in this article, remain unchanged. Then, based on the data of 16889, we complete the negative examples,
Because have no enough positive data, so the total number is 122283 instead of 124750.

Fanal pre-processing BCB Data distribute follow as：

Type	T1	T2	ST3	MT3	T4	total
radio	0.0026	0.00054	0.0020	0.0091	0.9858
all +	442	93	342	1563	169450	171890
all -	0	2	22	391	262050	262465
test +	15	4	32	180	16889	17120
all -	92	24	196	1105	103746	105163

Running

Training

python bcb_run.py --dataset_name bcb --epochs 1500 --nhead 4 --batch_size 64 --dropout_rate 0.2 --mode train

model file：outputs/models/.*pt
evaluate

python bcb_run.py --dataset_name bcb --type 1 --batch_size 1 --mode eval --model_path ./outputs/models/model.pt

evaluate result: outpus/eval_result.txt

Among,

dataset_name：name of data set

type：positive example type (from 1 to 5, 1 is the easiest and 5 is the most difficult) that you want to evaluate.

model_path: model file path

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
.vector_cache		.vector_cache
c_extractor		c_extractor
data/bcb		data/bcb
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
bcb_run.py		bcb_run.py
data.py		data.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vector_cache

.vector_cache

c_extractor

c_extractor

data/bcb

data/bcb

models

models

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

README_zh.md

README_zh.md

bcb_run.py

bcb_run.py

data.py

data.py

pipeline.py

pipeline.py

requirements.txt

requirements.txt

utils.py

utils.py

Repository files navigation

Introduction

Environment

BCB Data

Running

About

Releases

Packages

Languages

License

HongliangLiang/pcan

Folders and files

Latest commit

History

Repository files navigation

Introduction

Environment

BCB Data

Running

About

Resources

License

Stars

Watchers

Forks

Languages