# GitStractor Jupyter Notebook Data Visualization

This notebook serves as a set of examples for visualizing codebases using [GitStractor](https://github.com/integerman/gitstractor) and Jupyter Notebooks.

Contact [Matt Eland](https://MattEland.dev) with questions.

## Requirements

This application currently requires:

- CSV files generated by [GitStractor](https://github.com/integerman/gitstractor)
- Jupyter Notebooks running some version of Python (tested using Python 3.8.8)
- The following Python libraries:
  - pandas
  - plotly.express

In [14]:
# Load Dependencies

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

## Data Loading

In [9]:
# This should point to the location containing the GitStractor CSV files
data_dir = 'C:\\tools\\gitstractor'

# These are the default GitStractor file names and shouldn't need to be customized
author_file = data_dir + '\\Authors.csv'
commits_file = data_dir + '\\Commits.csv'
file_commits_file = data_dir + '\\FileCommits.csv'
files_file = data_dir + '\\Files.csv'
final_structure_file = data_dir + '\\FinalStructure.csv'

### Load Authors

In [10]:
df_authors = pd.read_csv(author_file)

df_authors.head(5)

Unnamed: 0,Name,Email,NumCommits,TotalBytes
0,Matt Eland,Matt.Eland@GMail.com,229,5033233
1,repo-visualizer,repo-visualizer@users.noreply.github.com,42,2332366


### Load Commits

In [11]:
df_commits = pd.read_csv(commits_file)

df_commits.head(5)

Unnamed: 0,CommitHash,AuthorEmail,AuthorDateUTC,CommitterEmail,CommitterDate,Message,NumFiles,AddedFiles,DeletedFiles,TotalFiles,TotalBytes,FileNames
0,0df430136e5ca0af02175d3acc40935e572528a5,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Initial commit,3,3,0,3,7175,"dfcfd5 .gitignore @ 0df430 (Added), 144780 LIC..."
1,aec4a2c1f9283bc6da50e264a3ca8743b1f05410,Matt.Eland@GMail.com,7/4/2022 3:07:48 AM,Matt.Eland@GMail.com,7/4/2022 3:07:48 AM,Project structure setup,11,9,0,12,579000,"f0bb9e .gitignore @ aec4a2 (Modified), e34634 ..."
2,85f8e28806fa77c22764300502889d9af4a4740d,Matt.Eland@GMail.com,7/4/2022 3:52:20 AM,Matt.Eland@GMail.com,7/4/2022 3:52:20 AM,Initial setup,12,9,0,20,7922,ec60a2 MattEland.WhereDoggo/MattEland.WhereDog...
3,5b407858d926bd50620376f0d7ebf871b392e87d,Matt.Eland@GMail.com,7/4/2022 4:03:53 AM,Matt.Eland@GMail.com,7/4/2022 4:03:53 AM,Now tracking game knowledge as events,6,2,0,22,3716,bca261 MattEland.WhereDoggo/MattEland.WhereDog...
4,c8886dca6f7bbb999670b9dd89dd03dd530cecac,Matt.Eland@GMail.com,7/4/2022 4:32:35 AM,Matt.Eland@GMail.com,7/4/2022 4:32:35 AM,Doggo night phase,11,3,0,25,8770,06c73b MattEland.WhereDoggo/MattEland.WhereDog...


In [12]:
df_file_commits = pd.read_csv(file_commits_file)

df_file_commits.head()

Unnamed: 0,FilePath,FileHash,CommitHash,AuthorEmail,AuthorDateUTC,CommitterEmail,CommitterDate,Message,Bytes,Lines
0,.gitignore,dfcfd56f444f9ae40e1082c07fe254cc547136cf,0df430136e5ca0af02175d3acc40935e572528a5,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Initial commit,6002,350
1,LICENSE,1447802c73221ff4f9a42fe06f922d70a1c1c108,0df430136e5ca0af02175d3acc40935e572528a5,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Initial commit,1067,21
2,README.md,fa40df806092ac52d73705a6e53e941c1686b328,0df430136e5ca0af02175d3acc40935e572528a5,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Matt.Eland@GMail.com,7/4/2022 2:56:36 AM,Initial commit,106,2
3,.gitignore,f0bb9e4ba2142c160fc712ab167ffed69820d129,aec4a2c1f9283bc6da50e264a3ca8743b1f05410,Matt.Eland@GMail.com,7/4/2022 3:07:48 AM,Matt.Eland@GMail.com,7/4/2022 3:07:48 AM,Project structure setup,6036,352
4,MattEland.WhereDoggo/MattEland.WhereDoggo.Core...,e34634acd363adb59e62845b3168ff76f79a6a92,aec4a2c1f9283bc6da50e264a3ca8743b1f05410,Matt.Eland@GMail.com,7/4/2022 3:07:48 AM,Matt.Eland@GMail.com,7/4/2022 3:07:48 AM,Project structure setup,628,21


### Load File Structure

In [13]:
df_structure = pd.read_csv(final_structure_file)

df_structure.head(5)

Unnamed: 0,CommitHash,FileHash,Filename,Extension,FilePath,State,Lines,Bytes,CreatedDateUTC,Path1,Path2,Path3,Path4,Path5
0,5646a44aa369f67be1ff7f261c4d85511aece73a,3a02aa0a8c9f1d0b74b6ab88a28388261064ba49,build.yml,.yml,.github/workflows/build.yml,Final,49,1965,12/1/2022 7:28:40 PM,.github,workflows,,,
1,5646a44aa369f67be1ff7f261c4d85511aece73a,76c64c7d8236de7fb1edff9967f5d4afe2618e45,codeql-analysis.yml,.yml,.github/workflows/codeql-analysis.yml,Final,72,2727,12/1/2022 7:28:40 PM,.github,workflows,,,
2,5646a44aa369f67be1ff7f261c4d85511aece73a,c9112b4f26f5ef36b38a2364bdd21edd6098b9f3,create-diagram.yml,.yml,.github/workflows/create-diagram.yml,Final,16,343,12/1/2022 7:28:40 PM,.github,workflows,,,
3,5646a44aa369f67be1ff7f261c4d85511aece73a,1d9b645bbe2bf4c1ae117911534431038bb23a49,.gitignore,.gitignore,.gitignore,Final,356,6125,12/1/2022 7:28:40 PM,,,,,
4,5646a44aa369f67be1ff7f261c4d85511aece73a,568164e0c18ed80f5fffc7296d258f67e242951a,AIDesign.md,.md,AIDesign.md,Final,39,953,12/1/2022 7:28:40 PM,,,,,


## Data Visualization

In [None]:
import plotly.express as px
import plotly.graph_objects as go