Skip to content
Relink git commits in jira for the migration from SVN to git and gitlab.
Python Scheme
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.org
all-bugs.log
correlate_files_per_issue.py
find_all_bugs.py
guix.scm
issues-and-files.log
link_commits_to_issues.py
missed-strings.txt
plot.py
retrieve_commits_and_issues.py
retrieve_repository_info.py
test_repoinfo.json
testtask.todo
tomatch.txt

README.org

relink git issues to jira

See also the related blog post for more information!

requirements

  • dulwich (Docs)
  • swig
  • gpgme
  • matplotlib
  • python-gpg: pip3 install --user gpg

To get them quickly and test that everything works:

guix environment -l guix.scm
for i in *.py; do python3 $i --test; done

usage

Retrieve commits and issue-IDs from Git repo

./retrieve_commits_and_issues.py [--with-files] [--output TODO_FILE.todo] [--previous OLD_TODO_FILE.todo] PATH_TO_GIT_REPO ...

commit-issue pairs included in the OLD_TODO_FILE are not added to the TODO_FILE.

Store repository info

./retrieve_repository_info.py [--output INFO_FILE.repoinfo] PATH_TO_GIT_REPO

Link commits to Jira

./link_commits_to_issues.py [--create-the-links] [--jira-api-server URL] [--netrc-gpg-path jira-netrc.gpg | --jira-user USER --jira-password PASSWORD] --repo-info-file FILE.repoinfo FILE.todo

credentials via netrc

prepare netrc:

echo machine jira.HOST.TLD login USER password PASSWORD | gpg2 -er MY_EMAIL@HOST.TLD > jira-netrc.gpg

todo file format

<commit> <issue> <isodate> <message with linebreaks replaced by "---" >

There can be multiple entries per commit: one per issue referenced.

The entries are ordered in commit_time order: newest commits first (they are the most important ones to have right).

History analysis

Files affected per issue

./retrieve_commits_and_issues.py --with-files --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --count-files-per-issue | sort > files-affected-by-time-with-issue.dat
./plot.py files-affected-by-time-with-issue.dat

Only bugs

# ...
# get all jira bugs:
# ./find_all_bugs.py --jira-api-server https://jira.HOST.TLD > all-bugs.log
# stats
./retrieve_commits_and_issues.py --with-files --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --count-files-per-issue  -i all-bugs.log | sort > files-affected-by-time-with-issue-only-bugs.dat
./plot.py files-affected-by-time-with-issue-only-bugs.dat

Aggregated file size of changed files per issue

./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --sum-filesizes-per-issue | sort > sum-filesize-by-time-with-issue.dat
./plot.py sum-filesize-by-time-with-issue.dat

Create nodelists and edgelists

./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py --file-connections issues-and-files.log --debug --output-edgelist all-issues-edgelist-max300.csv --output-nodelist  all-issues-nodelist-max300.csv

Analyze the CSVs with graph software like Gephi.

Subselect a graph for a specific module

With the example of MODULE_FOO, runtime of a few hours in a 1 million line codebase.

This needs ripgrep in addition to the other dependencies.

./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py --file-connections issues-and-files.log --debug --output-edgelist all-issues-edgelist-max300.csv --output-nodelist  all-issues-nodelist-max300.csv
grep MODULE_FOO all-issues-nodelist-max300.csv > all-issues-nodelist-max300-foo.csv
cat all-issues-nodelist-max300-foo.csv | cut -d " " -f 1 > foo-nodeids-raw.txt
time grep -wf foo-nodeids-raw.txt all-issues-edgelist-max300.csv | tee all-issues-edgelist-max300-with-foo.csv
sed s/^/^/ foo-nodeids-raw.txt > foo-nodeids-first.txt
sed "s/^/ /" foo-nodeids-raw.txt | sed "s/$/ /" > foo-nodeids-second.txt
time rg -f foo-nodeids-second.txt all-issues-edgelist-max300-with-foo.csv | tee all-issues-edgelist-max300-to-foo.csv
time rg -f foo-nodeids-first.txt all-issues-edgelist-max300-to-foo.csv | tee all-issues-edgelist-max300-from-foo.csv

Now import all-issues-nodelist-max300-foo.csv and all-issues-edgelist-max300-from-foo.csv into Gephi.

Select a specific timespan

Just change the logfile from retrieve_commits_and_issues.py and select the lines you want. It is ordered by time, newest issue first.

You can’t perform that action at this time.