Skip to content
Generate Source Code Change Patterns from Review History
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CodeTokenizer
collector
data
per_lang
spmf_python
.gitignore
LICENSE
README.md
calc_condition_spread.py
collect_changes.py
collect_changes_clone.py
collect_pulls.py
config.example.conf
evaluate_rules.py
generate_rules.py
lang_extentions.py
merge_changes.py
tokenizer.rb
tokenizer_test.py

README.md

Generate Source Code Change Pattern from Review History

How to Use

0 Cloning this repository

git clone https://github.com/Ikuyadeu/review_pattern_gen.git
cd review_pattern_gen
pip3 install antlr4-python3-runtime prefixspan PyGithub unidiff numpy
git clone https://github.com/Ikuyadeu/CodeTokenizer.git

1 Preparing config file

Making empty config file

touch config

and edit config file like berrow

(If your target Python repository name is tensorflow/models)

[GitHub]
id = YourGitHubId
password = YourGitHubPassword
token = (**option if you will collect from private repo)YourGitHubToken
[Target]
owner = Your Target GitHub Repository Owner (e.g. tensorflow)
repo = Your Target GitHub Repository (e.g. models)
lang = Your Target Language (e.g. Python)
[Rule]
threshold = Rule threshold (e.g. 2, it means all repeated changes are rule)
frequent_or_topk = Method to use threshold (frequent or topk)

**GitHub token can be generated from https://github.com/settings/tokens)

2 Collecting training data set

python3 collect_pulls.py
python3 collect_changes_clone.py

Output:

  • Pull List (data/pulls/{owner}_{repo}.csv)
  • Change List (data/changes/{owner}_{repo}_python.json)

3 Generating frequently appered patterns

This process needs long time

python3 merge_changes.py
python3 generate_rules.py

Output:

  • Pattern (data/rules/{owner}_{repo}_python.json)

4 Evaluating detected patterns

python3 evaluate_rules.py

Output:

  • Pattern (data/rules/{owner}_{repo}_python_evaluated.json)

Sample

This repository put a part of tensorflow/models' review data on each directory. Also, these data is shorter than correct data set.

Thanks

I would like to thank the Support Center for Advanced Telecommunications (SCAT) Technology Research, Foundation. This system was supported by JSPS KAKENHI Grant Numbers JP18H03222, JP17H00731, JP15H02683, and JP18KT0013.

Also, this repository use other repository https://github.com/Ikuyadeu/CodeTokenizer

You can’t perform that action at this time.