# MGL869 - Lab

*MGL869 ETS Montreal - Production engineering*

## Abstract

## Authors
- **Léo FORNOFF**
- **William PHAN**
- **Yannis OUAKRIM**

---

## Part 1 : Data collection

In [1]:
from Jira import jira_download
from pandas import Index
from numpy import ndarray


### 1.1 - Download Jira data
We download data if they are not already present in the data folder.

Return the dataframe of the data.

Query filter can be defined in config.ini

In [4]:
jira_dataframe = jira_download()

Data already downloaded
Filter = 'project=HIVE AND issuetype=Bug AND status in (Resolved, Closed) AND affectedVersion>= 2.0.0'


### 1.2 - Clean Jira data using pandas
Previously, we downloaded all the data from Jira. Now, we will clean the data using pandas.
We will keep only some colums and combine some columns.

In [3]:
keep: [str] = ['Issue key', 'Status', 'Resolution', 'Created', 'Fix Versions Combined', 'Affects Versions Combined']

In [4]:
affects_version_columns: [str] = [col for col in jira_dataframe.columns if col.startswith('Affects Version/s')]
jira_dataframe['Affects Versions Combined'] = jira_dataframe[affects_version_columns].apply(
    lambda x: ', '.join(x.dropna().astype(str)), axis=1
)

In [5]:
# Combine the versions into a single column
fix_version_columns: [str] = [col for col in jira_dataframe.columns if col.startswith('Fix Version/s')]

jira_dataframe['Fix Versions Combined'] = jira_dataframe[fix_version_columns].apply(
    lambda x: ', '.join(x.dropna().astype(str)), axis=1
)

In [6]:
# Identify columns whose names contain the string 'Issue key'
issue_key_columns: Index = jira_dataframe.columns[jira_dataframe.columns.str.contains('Issue key')]
# Extract the values from these columns as a NumPy array
issue_key_values: ndarray = jira_dataframe[issue_key_columns].values
# Flatten the array to create a one-dimensional list of all 'Issue key' values
flattened_issue_keys: ndarray = issue_key_values.flatten()
# Convert the list into a set to remove duplicates
ids: set = set(flattened_issue_keys)

---


## Part 2 : Repository analysis


In [1]:
from Hive import git_download, commit_analysis
from git import Repo, Tag
from pandas import DataFrame
from configparser import ConfigParser
from re import compile

### 2.1 - Clone repository

In [6]:
repo: Repo = git_download()

data\hive_data\hiveRepo False
Pulling the repository: https://github.com/apache/hive.git


In [9]:
all_couples = commit_analysis(ids)

20524 couples found.


### 2.2 - Filter data

In [10]:
commit_dataframe: DataFrame = DataFrame(all_couples, columns=["Issue key", "File", "Commit"])

In [11]:
# Languages without whitespaces
config: ConfigParser = ConfigParser()
config.read("config.ini")
languages: [str] = config["GENERAL"]["Languages"].split(",")
languages: [str] = [lang.strip() for lang in languages]
commit_dataframe: DataFrame = commit_dataframe[commit_dataframe['File'].str.endswith(tuple(languages))]

### 2.3 - Extract filter versions from git

In [12]:
releases_regex: [str] = config["GIT"]["ReleasesRegex"].split(",")
tags: Tag = repo.tags
versions: dict = {tag.name: tag.commit for tag in tags}
releases_regex: [str] = [regex.strip() for regex in releases_regex]
releases_regex = [compile(regex) for regex in releases_regex]

In [13]:
filtered_versions: dict = {}
for version in versions:
    if any(regex.match(version) for regex in releases_regex):
        version_numbers = version.split("-")[1]
        filtered_versions[version_numbers] = versions[version]

filtered_versions = dict(sorted(filtered_versions.items(),
                                key=lambda item: item[1].committed_datetime,
                                reverse=True))
filtered_versions, len(filtered_versions)

({'4.0.1': <git.Commit "3af4517eb8cfd9407ad34ed78a0b48b57dfaa264">,
  '2.3.10': <git.Commit "5160d3af392248255f68e41e1e0557eae4d95273">,
  '4.0.0': <git.Commit "183f8cb41d3dbed961ffd27999876468ff06690c">,
  '3.1.3': <git.Commit "4df4d75bf1e16fe0af75aad0b4179c34c07fc975">,
  '2.3.9': <git.Commit "92dd0159f440ca7863be3232f3a683a510a62b9d">,
  '2.3.8': <git.Commit "f1e87137034e4ecbe39a859d4ef44319800016d7">,
  '2.3.7': <git.Commit "cb213d88304034393d68cc31a95be24f5aac62b6">,
  '3.1.2': <git.Commit "8190d2be7b7165effa62bd21b7d60ef81fb0e4af">,
  '2.3.6': <git.Commit "2c2fdd524e8783f6e1f3ef15281cc2d5ed08728f">,
  '2.3.5': <git.Commit "76595628ae13b95162e77bba365fe4d2c60b3f29">,
  '2.3.4': <git.Commit "56acdd2120b9ce6790185c679223b8b5e884aaf2">,
  '3.1.1': <git.Commit "f4e0529634b6231a0072295da48af466cf2f10b7">,
  '3.1.0': <git.Commit "bcc7df95824831a8d2f1524e4048dfc23ab98c19">,
  '3.0.0': <git.Commit "ce61711a5fa54ab34fc74d86d521ecaeea6b072a">,
  '2.3.3': <git.Commit "3f7dde31aed44b5440563d3

## Part 3. - Understand analysis

In [2]:
from Understand.commands import und_create_command, und_analyze_command, und_metrics_command, und_purge_command
from Understand.metrics import metrics
from os import path

### 3.1 - Create the Understand project


In [15]:
hive_git_directory: str = config["GIT"]["HiveGitDirectory"]
data_directory: str = config["GENERAL"]["DataDirectory"]
understand_project_name : str = config["UNDERSTAND"]["UnderstandProjectName"]

understand_project_path : str = path.join(data_directory, hive_git_directory, understand_project_name)

if not path.exists(understand_project_path):
    und_create_command()

In [16]:
und_purge_command()

Running command : 
     und purge -db data\hive_data\hive.und
Database purged.



In [None]:
metrics(filtered_versions)

Creating repo data\temp_repositories\4.0.1 from C:\Users\moshi\Documents\projects\Informatique\ETS\MGL869\MGL869-Lab-Hive\data\hive_data\hiveRepo
Creating the directory: data\temp_repositories\4.0.1
Creating repo data\temp_repositories\2.3.10 from C:\Users\moshi\Documents\projects\Informatique\ETS\MGL869\MGL869-Lab-Hive\data\hive_data\hiveRepo
Creating the directory: data\temp_repositories\2.3.10
Running command : 
     und create -db data\temp_repositories\2.3.10\2.3.10.und -languages Java c++

Analyzing commit 5160d3af392248255f68e41e1e0557eae4d95273
Running command : 
     und add C:\Users\moshi\Documents\projects\Informatique\ETS\MGL869\MGL869-Lab-Hive\data\temp_repositories\2.3.10 -db data\temp_repositories\2.3.10\2.3.10.und
Files added: 5118

Running command : 
     und analyze -db data\temp_repositories\2.3.10\2.3.10.und -quiet
Running command : 
     und create -db data\temp_repositories\4.0.1\4.0.1.und -languages Java c++

Analyzing commit 3af4517eb8cfd9407ad34ed78a0b48b57dfaa