# Fundamentals of Software Systems - SE Part I Assignment

By Andy Wiemeyer and Lucius Bachmann

### Setup tools
* checkout repo
* initialize Git utility
* You can recreate the Repository object with other parameters to analyze different time periods.
  The last year was used that the setup is fast.

In [None]:
from datetime import datetime
from pydriller import Repository, Git
from os import path, mkdir

repo_remote_path = 'https://github.com/mastodon/mastodon.git'
repo_path = 'mastodon'
repo_checkout_path = f'{repo_path}/{repo_path}'
filepath = 'app'

since = datetime.fromisoformat('2021-11-08')
to = datetime.fromisoformat('2022-11-08')

if not path.exists(repo_path):
    mkdir(repo_path)

repo = Repository(repo_remote_path, clone_repo_to=repo_path, since=since, to=to, filepath=filepath)
# clone repo if necessary
for commit in repo.traverse_commits():
    break
git = Git(repo_checkout_path)

### Checkout repo at tag v3.5.3

In [None]:
tag = git.get_commit_from_tag('v3.5.3')
git.checkout(tag.hash)

## 1 Complexity Hotspots

1. You must consider only the app folder from the Mastodon repository
(i.e., https://github.com/mastodon/mastodon).

-> nothing to do

2. Decide on the granularity of your analysis of software entities (e.g., source code
files); describe why you selected this specific granularity.

For this analysis the granularity of source code files is used.
It is an easy unit to perform measurements on.
The mastodon repository contains a ruby on rails application with some javascript for the frontend.
In both languages it's possible to define multiple classes in one file. Without performing a programming language
specific analysis, it's not possible to measure smaller units.

3. Create a list of all these entities, as they appear in the latest stable release of
Mastodon (i.e., tag v3.5.3).

In [None]:
import glob

entities = [child for child in glob.glob(f'{repo_checkout_path}/{filepath}/**/*', recursive=True) if path.isfile(child)]

print(entities[0:10])

In [None]:
extensions = set([path.splitext(entity)[1] for entity in entities])
print(extensions)

4. Decide on the type of complexity you want to measure for your software entities
and explain why you selected this type.

To decide which metric would be a good indicator for complexity, a file was chosen to show the metrics the lizard library provides.

In [None]:
import lizard

filename = 'mastodon/mastodon/app/workers/scheduler/indexing_scheduler.rb'
file = open(filename, mode='r')
analysis = lizard.analyze_file.analyze_source_code(filename, file.read())
print(f'of file {filename}')
print(f'nr of functions: {len(analysis.function_list)}')
print(f'cyclomatic complexity: {analysis.CCN}')
print(f'lines of code: {analysis.nloc}')
print(f'token_count: {analysis.token_count}')
try:
    print(f'deepest nesting level: {analysis.ND}')
except AttributeError:
    print(f'deepest nesting level threw an error')

To keep the analysis simple, the number of lines of code is used in this analysis.

5. Decide on a timeframe on which you want to base your analysis and explain the
rationale of your choice.

6. For each entity in the system, measure its complexity and the number of changes
(in the given timeframe). Merge these two pieces of information together to cre-
ate a candidate list of problematic hotspots in the app part of Mastodon.

7. Visualize the hotspots with a visualization of your choice.

8. Analyze six candidate hotspots (not necessarily the top ones) through:

Candidate 1:
...complexity trend...
...manual analysis....

Candidate 2:

Candidate 3:

Candidate 4:

Candidate 5:

Candidate 6:

## 2 Temporal/Logical Coupling

1. Determine what could be cases of temporal/logical coupling and generate a list
of candidates with a set of coupled entities.

2. Visualize these candidate sets of couple entities with a visualization of your
choice.

3. For three set candidates in the list:
• analyze and explain why these entities are coupled;
• describe how important it would be to fix them, and any ideas for their
improvement.

Canditate set 1:

Canditate set 2:

Candidate set 3:

## 3 Defective Hotspots

1. Decide on how you want to detect entities that had defects in the past (e.g.,
commit message analysis vs. issue tracking system analysis) and motivate your
choice.

2. Determine defective hotspots among the entities in the timeframe that you pre-
viously selected (i.e., consider only defects in the selected timeframe). What
conclusions can you draw from this?

3. Determine complexity hotspots at the beginning of your timeframe, then corre-
late them with the defects they have presented throughout the entire timeframe.
Is there a correlation? Why do you think this is the case?

4. What conclusions can you draw from the relationship between defective hotspots
and complexity hotspots in Mastodon? And on these two metrics in general?
