Project by Matt Eland (@IntegerMan)
This project is built for generating data visualization datasets from git source repositories currently cloned to your machine.
- Clone your repository locally using
git clone
or a Git tool such as GitKraken or GitHub Desktop. - Install all requirements needed:
- Pandas
- PyDriller
- Open
Gather.ipynb
- Set
repository_path
equal to the local file path of your git repository. You do not need to specify.git
, just the local folder. For example:repository_path = 'C:\\dev\\VisualizingCode'
- Optionally set the
repository_branch
if you only want to analyze the main branch (this is recommended for performance and clarity of results) - Run all cells in
Gather.ipynb
this will generate:Commits.csv
containing all git commitsFileCommits.csv
which breaks down commits at a one row per file per commit levelFileSizes.csv
containing file statistics for all source files in the current version of your projectMergedFileData.csv
which joins togetherFileCommits.csv
andFileSizes.csv
The data should now be ready to import into Tableau, Power BI, or another tool. You can also analyze the data in Python or another programming language