# Moth - Můra
MUNI Omniscient Tutor Helper - Masaryk University Repository Analyzer

This tool was created as a part of the thesis *"Measuring Software Development Contributions using Git"* thesis at Masaryk University.
The goal of this tool is to analyze git repositories of students and provide useful information to teacher about their work.

The implementation is originally written in Python 3.9 and uses the following libraries without which the tool would not be possible:
- [levenshtein](https://pypi.org/project/python-Levenshtein/) - for fuzzy matching in syntactic analysis
- [GitPython](https://gitpython.readthedocs.io/en/stable/) - for git operations
- [python-gitlab](https://python-gitlab.readthedocs.io/en/stable/) - for interfacing with GitLab
- [PyGithub](https://pygithub.readthedocs.io/en/latest/) - for interfacing with GitHub
- [matplotlib](https://matplotlib.org/) - for plotting various graphs
- [notebook](https://jupyter.org/) - for the front-end you are currently using

Below are the necessary imports for the tool to work.

In [None]:
import fs_access as file_system
import lib
import mura
import configuration
import semantic_analysis

from uni_chars import *  # shortcut for unicode characters used throughout the tool
from history_analyzer import CommitRange

from IPython.display import display, HTML

display(HTML("<style>.container { width:100% !important; }</style>"))  # wide screen support

# macros for automagically reloading the modules when they are changed
%load_ext autoreload
%autoreload 2

# Configuration

The following code block is a shortcut for opening the configuration folder. The configuration folder contains all the configuration files and rules.
Uncomment the line to open the folder.

In [None]:
# configuration.open_configuration_folder()


Configuration is split into multiple files each grouping settings for a specific part of the tool.

## Configuration file
- `configuration_data/configuration.txt` - contains general configuration of the tool
- `lang-syntax/*` - contains weight definitions for general syntax of a language
- `lang-semantics/*` - contains weight definitions for semantic constructs
- `remote-repo-weigths/weights.txt` - contains weight definitions for remote repository objects

## Rules file

- `configuration_data/rules.txt` - contains rules for the tool

Once the configuration is set, run the code block below to load the configurations into the tool.

In [None]:
config = configuration.validate()

# All properties of the config can be edited here. I recommend an editor with code completion support.

Semantic analyzers for each extension are executed only if the extension is present in the repository. Therefore, it is not required to have all the prerequisites installed if the project does not include those extensions.

# Contributors

Often times contributors do not have a synchronized git configuration across all development devices. This can lead to the tool not being able to properly identify the contributors. The tool attempts to match contributors by their name and email. If that is not enough an explicit name-to-name mapping can be provided in the `contributor_map` variable.

In [None]:
contributor_map = \
    [
        # ('Jiří Šťastný', 'Jiri Stastny'),
        ('Matěj Gorgol', 'Matej Gorgol')
    ]

config.contributor_map = contributor_map

# Uncomment the following line to enable anonymous mode, which will replace the names of the contributors with "Contributor #n"
config.anonymous_mode = True

# Repository

Put the path to the repository you want to analyze into the `repo_path` variable and run the code block below.

In [None]:
repository_path = r"C:\MUNI\last\Java\M1\airport-manager"

repository = file_system.validate_repository(repository_path, config)

# Commit range

The commit range is defined by the `start` and `end` variables. The variables can be either a commit hash or a tag/branch name.
Additionally, the `end` variable can be set to `ROOT` and `start` to `HEAD` to analyze the repository from the beginning to the current state.

In [None]:
start = "HEAD"
end = "ROOT"

commit_range = CommitRange(start, end, verbose=True)

# Analysis

The analysis is a time-consuming process. Taking longer the larger the repository is. For a single project from the PA165 course for Milestone 1 (120 commits), the analysis took about 35 seconds on an Intel i7-12700H CPU.

In [None]:
tracked_files = lib.get_tracked_files()

history_analysis_result = commit_range.analyze()

semantic_analysis_grouped = []
for group in tracked_files:
    grouped_semantic_weight = semantic_analysis.compute_semantic_weight_grouped(group)
    semantic_analysis_grouped.append(grouped_semantic_weight)

# Results

The analysis part is finished. The tool provides multiple outputs to help the teacher analyze the students' work. Each output is a separate function code block.

## Contributors

In [None]:
contributors = mura.display_contributor_info(commit_range, config)

## Commits

In [None]:
commit_distribution, insertions_deletions = mura.commit_info(commit_range, repository, contributors)

In [None]:
mura.insertions_deletions_info(insertions_deletions)

## Commit graph

Displays a graph of the commits in the repository. The range of the x-axis is computed from the starting commit date and the ending commit date.
To display only a section of the graph, the list can be sliced. This is generally useful when there are commits at the boundaries. Taking a section in the middle does not make much sense.

In [None]:
commits = [commit for commit in commit_range]

commits = commits[1:]  # remove first commit
# commits = commits[:10] # remove last 10 commits

mura.plot_commits(commits, contributors, repository, force_x_axis_labels=False)

## File statistics

First part of the output is a combined statistics of all file changes in the repository.

- A: Added
- D: Deleted
- M: Modified
- R: Renamed

In [None]:
flagged_files = mura.file_statistics_info(commit_range, contributors)

## Percentages and ownership

In [33]:
percentage, ownership = mura.percentage_info(history_analysis_result, contributors, config)

📊 Percentage of tracked files:

	ANON: Anonymous #6 <contributor6@email.cz>: 30.91%
	ANON: Anonymous #5 <contributor5@email.cz>: 2.88%
	ANON: Anonymous #2 <contributor2@email.cz>: 18.89%
	ANON: Anonymous #4 <contributor4@email.cz>: 23.42%
	ANON: Anonymous #3 <contributor3@email.cz>: 14.83%
	ANON: Anonymous #1 <contributor1@email.cz>: 9.07%
Files owned by 👨‍💻 Anonymous #6
	.gitignore (0.29411764705882354)
	.mvn\wrapper\maven-wrapper.properties (1)
	mvnw (0.30914826498422715)
	mvnw.cmd (0.31216931216931215)
	steward-module\.mvn\wrapper\maven-wrapper.properties (1)
	steward-module\mvnw (1)
	steward-module\mvnw.cmd (1)
	steward-module\pom.xml (1)
	steward-module\src\main\java\cz\muni\fi\pa165\airportmanager\stewardmodule\StewardModuleApplication.java (1)
	steward-module\src\main\java\cz\muni\fi\pa165\airportmanager\stewardmodule\data\model\Steward.java (1)
	domain-module\.gitignore (1)
	domain-module\.mvn\wrapper\maven-wrapper.properties (1)
	domain-module\mvnw (1)
	domain-module\mvnw.cmd 

## Ownership as a directory tree

In [34]:
mura.display_dir_tree(config, percentage, repository)

📁 Dir Tree with ownership:

├── .gitignore 👨‍💻 [ANON: Anonymous #6 <contributor6@email.cz>: 29%, ANON: Anonymous #5 <contributor5@email.cz>: 71%]
├── .mvn
│   └── wrapper
│       └── maven-wrapper.properties 👨‍💻 [ANON: Anonymous #6 <contributor6@email.cz>: 100%]
├── README.md 👨‍💻 [ANON: Anonymous #2 <contributor2@email.cz>: 100%]
├── mvnw 👨‍💻 [ANON: Anonymous #5 <contributor5@email.cz>: 69%, ANON: Anonymous #6 <contributor6@email.cz>: 31%]
├── mvnw.cmd 👨‍💻 [ANON: Anonymous #5 <contributor5@email.cz>: 69%, ANON: Anonymous #6 <contributor6@email.cz>: 31%]
├── pom.xml 👨‍💻 [ANON: Anonymous #2 <contributor2@email.cz>: 28%, ANON: Anonymous #6 <contributor6@email.cz>: 17%, ANON: Anonymous #4 <contributor4@email.cz>: 26%, ANON: Anonymous #3 <contributor3@email.cz>: 13%, ANON: Anonymous #1 <contributor1@email.cz>: 4%, ANON: Anonymous #5 <contributor5@email.cz>: 12%]
├── steward-module 👨‍💻 [ANON: Anonymous #6 <contributor6@email.cz>: 98%, ANON: Anonymous #2 <contributor2@email.cz>: 1%, ANON: Ano

## Rules

In [35]:
rule_violation_weight_multipliers = mura.rule_info(config, ownership)

📜 Rules: 

All contributors must have at least 1 file/s matching: `.*Controller.*\.java` in a directory matching: `*`

🚫 Violated Rules: 

❌ Contributor ANON: Anonymous #5 <contributor5@email.cz> did not fulfill the following requirements:
	All contributors must have at least 1 file/s matching: `.*Controller.*\.java` in a directory matching: `*`


## Syntax

In [36]:
syntactic_weights = mura.syntax_info()

📖 Syntax:

ℹ️ TODO


## Semantics

In [37]:
semantic_weights = mura.semantic_info(tracked_files, ownership, semantic_analysis_grouped)

📚 Semantics:

📦 Group: C:\MUNI\last\Java\M1\airport-manager
Total files: 6
🏆 Total weight: 0.0
📦 Group: C:\MUNI\last\Java\M1\airport-manager\.mvn\wrapper
Total files: 1
🏆 Total weight: 0.0
📦 Group: C:\MUNI\last\Java\M1\airport-manager\airlines-module
Total files: 4
🏆 Total weight: 0.0
📦 Group: C:\MUNI\last\Java\M1\airport-manager\airlines-module\.mvn\wrapper
Total files: 1
🏆 Total weight: 0.0
📦 Group: C:\MUNI\last\Java\M1\airport-manager\airlines-module\src\main\java\cz\muni\fi\pa165\airportmanager\airlinesmodule
Total files: 1
File: AirlinesModuleApplication.java: Owner: Anonymous #3
Contents: Classes: 1 Functions: 1 Properties: 0 Fields: 0 Comments: 0 
🏆 Semantic file weight: 58.0
🏆 Total weight: 58.0
📦 Group: C:\MUNI\last\Java\M1\airport-manager\airlines-module\src\main\java\cz\muni\fi\pa165\airportmanager\airlinesmodule\data\model
Total files: 1
File: Airline.java: Owner: Anonymous #3
Contents: Classes: 1 Functions: 8 Properties: 0 Fields: 2 Comments: 1 
🏆 Semantic file weight: 73.

## Remote repository

In [39]:
repo_management_weights = mura.remote_info(commit_range, repository, config, contributors)

🌐 Remote repository management:

Project: Airport Manager
📋 Total issues: 10
🔄 Total pull requests: 16
👨‍💻 Total contributors: 6
📋 Issue: Artifacts creation only for develop - by Anonymous #6

Description: - [ ] Check if the artefact issue really is due to the number of artefacts or just due to the size of our artefact.
- [ ] Find a good solution for artefacts only in master/ develop.
State: opened
ℹ️ Issue was not closed during the period or did not exist at all.
🏆 Weight 0.0 - Beneficiaries: Anonymous #6
📋 Issue: make repo update & delete throw exception - by Anonymous #3

Description: updateEntity & deleteEntity in DomainRepository should throw exception when trying to update/delete nonexisting/null object

update the tests as well!
State: opened
ℹ️ Issue was not closed during the period or did not exist at all.
🏆 Weight 0.0 - Beneficiaries: Anonymous #3
📋 Issue: Use mocks in tests - by Anonymous #3

Description: Write better tests using mocks
State: opened
Assignee: Anonymous #3
ℹ️

## Summary

In [40]:
mura.summary_info(syntactic_weights, semantic_weights, repo_management_weights, rule_violation_weight_multipliers)



🏆 Total weight per contributor for 📖 Syntax:


🏆 Total weight per contributor for 📚 Semantics:
 -> Anonymous #3: 956.5
 -> Anonymous #6: 2845.5
 -> Anonymous #1: 921.25
 -> Anonymous #4: 1481.0
 -> Anonymous #2: 1455.0


🏆 Total weight per contributor for 🌐 Remote repository management:
 -> Anonymous #6: 418.0
 -> Anonymous #3: 208.0
 -> Anonymous #4: 40.0
 -> Anonymous #2: 120.0
 -> Anonymous #1: 60.0


🏆⚠️ Weight multiplier for unfulfilled 📜 Rules:
 -> Anonymous #5: 0.9


🏆 Total weight per contributor:
1️⃣ -> Anonymous #6: 3263.5
2️⃣ -> Anonymous #2: 1575.0
3️⃣ -> Anonymous #4: 1521.0
4️⃣ -> Anonymous #3: 1164.5
5️⃣ -> Anonymous #1: 981.25
