# Mining Software Repositories: OpenStack Nova Project.

### Goal

The goal of this tool and analysis is to help in capturing insights from the commits on a project repo, in this case: the openstack nova project repo. This will help in understanding the project as well as provide guidiance to contributors and maintainers.

### Objectives

The following questions will be answered:
* Find the most actively modified module?
* How many commits occured during the studied period?
* How much churn occurred during the studied period? Churn is defined as the sum of added and removed lines by all commits.

**NB**: This workflow is responsible for the pre-processing, analysis, and generation of insight from the collected data. It is assumed that the automated collection of the data via the script accessible in thesame folder with this notebook has been completed. The collected data will be loaded here before the other process in the workflow executes.

### Required imports:

In [1]:
# Built-in libraries
import json

# The normal data science ecosystem libraries
# pandas for data wrangling
import pandas as pd

# Plotting modules and libraries required
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

### Required settings:

In [2]:
# Settings:
# 1. Command needed to make plots appear in the Jupyter Notebook
%matplotlib inline

# 2. Command needed to make plots bigger in the Jupyter Notebook
plt.rcParams['figure.figsize']= (12, 10)

# 3. Command needed to make 'ggplot' styled plots- professional and yet good looking theme.
plt.style.use('ggplot')

# 4. This will make the plot zoomable
# mpld3.enable_notebook()

### Other utility functions for data manipulation

In [3]:
# Utility data manipulation functions

### 1. Loading the data

In [4]:
# Open and load json file
with open('data.json') as file:
    data = json.load(file)
    print("data loaded successfully")

data loaded successfully


In [5]:
df = pd.json_normalize(data, "files", ["commit_node_id", "commit_sha", "commit_html_url", "commit_date" ])

In [9]:
df.head(10)

Unnamed: 0,sha,filename,status,additions,deletions,changes,blob_url,raw_url,contents_url,patch,previous_filename,commit_node_id,commit_sha,commit_html_url,commit_date
0,b3f461cca42b3bc413767649e7284db3c7332f42,nova/api/openstack/compute/deferred_delete.py,modified,1,1,2,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -40,7 +40,7 @@ def _restore(self, req, id, ...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
1,59b9c384df60d670f526f91ffb10fa09d12ab7ba,nova/api/openstack/compute/migrate_server.py,modified,1,1,2,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -57,7 +57,7 @@ def _migrate(self, req, id, ...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
2,e92becb582ad1b9ac5f044f7e4b312cf76806cc0,nova/api/openstack/compute/server_metadata.py,modified,1,1,2,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -114,7 +114,7 @@ def _update_instance_metad...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
3,68d4ef0eeb0d71e78ff64174b477f9558bd4810f,nova/api/openstack/compute/servers.py,modified,3,4,7,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -797,8 +797,7 @@ def create(self, req, body...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
4,a35889b9131c7c3ffe0dc06eef88ede96827c83c,nova/compute/api.py,modified,3,3,6,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -400,7 +400,7 @@ def _record_action_start(s...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
5,27405aea671abc234dc159a0033b62359007615b,nova/exception.py,modified,9,16,25,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -998,10 +998,6 @@ class QuotaClassExists(No...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
6,d1bb6babb779b2053d7ea801cbd2beea7a0705dc,nova/tests/unit/api/openstack/compute/test_api.py,modified,1,1,2,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -143,7 +143,7 @@ def fail(req):\n ...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
7,d8f443843f3778942292c2b3a81c73bb163e3171,nova/tests/unit/compute/test_compute.py,modified,1,1,2,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -8859,7 +8859,7 @@ def test_create_instance...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
8,48910cf75cb862f122c6a53e26bcf9875f605395,nova/tests/unit/test_quota.py,modified,11,11,22,https://github.com/openstack/nova/blob/232f827...,https://github.com/openstack/nova/raw/232f8275...,https://api.github.com/repos/openstack/nova/co...,"@@ -109,15 +109,15 @@ def test_too_many_instan...",,C_kwDOAAwOD9oAKDIzMmY4Mjc1ZWMwMDc2N2QxZjEwMGNh...,232f8275ec00767d1f100cacae4823e6f77e04ef,https://github.com/openstack/nova/commit/232f8...,2022-02-10T19:43:54Z
9,61f23f9804cc81ec27db8616863af6eb1c283487,doc/source/admin/networking.rst,modified,2,2,4,https://github.com/openstack/nova/blob/db15cb9...,https://github.com/openstack/nova/raw/db15cb95...,https://api.github.com/repos/openstack/nova/co...,"@@ -206,10 +206,10 @@ virtio-net Multiqueue\n ...",,C_kwDOAAwOD9oAKGRiMTVjYjk1MTNkYmYxMmUxNzNhNjZh...,db15cb9513dbf12e173a66ab87e9638dcc08f4f0,https://github.com/openstack/nova/commit/db15c...,2022-02-09T21:32:35Z
