# Complexity Hotspots

1. You must consider only the app folder from the Mastodon repository
(i.e., https://github.com/mastodon/mastodon).

In [2]:
import pydriller
# load the repository into PyDriller from URL 
repo = pydriller.Repository('https://github.com/mastodon/mastodon')

import pickle
# save repo as pickle file
with open('repo.pkl', 'wb') as f:
    pickle.dump(repo, f)

In [58]:
# open the pickle file
with open('repo.pkl', 'rb') as f:
    repo_mastodon = pickle.load(f)

# get the commits
all_commits = list(repo_mastodon.traverse_commits())

2. Decide on the granularity of your analysis of software entities (e.g., source code
files); describe why you selected this specific granularity.

In [36]:
# run inside \mastodon\app
git ls-files | awk -F . '{print $NF}' | sort | uniq -c | sort -n -r | awk '{print $2,$1}' | head -10
rb 895
js 437
haml 214
json 178
png 37
erb 33
scss 32
svg 24
woff2 5
woff 5

3. Create a list of all these entities, as they appear in the latest stable release of Mastodon (i.e., tag v3.5.3). 

In [None]:
# run inside \mastodon\app
# list just names of all files with .rb extension using git
git ls-files '*.rb' | awk -F / '{print $NF}' > rb_files.txt

In [56]:
# read txt file to a list
with open('rb_files.txt', 'r') as f:
    rb_files = f.readlines()

In [3]:
latest_release = pydriller.Repository('C:\\Users\\szymo\\Desktop\\SOSy_repos\\mastodon',
                                        only_modifications_with_file_types=[".rb"],
                                        to_tag="v3.5.3")
latest_release_commits = list(latest_release.traverse_commits())
print("Number of commits: ", len(latest_release_commits))

Number of commits:  4167


In [21]:
files = {}
for commit in latest_release_commits:
    for m in commit.modified_files:
        # if new_path is not None then save to dict
        if m.new_path is not None:
            files[m.filename] = m.new_path
        else:
            # file was deleted -> change the value for that filename
            files[m.filename] = 'deleted'	

# keep only files that are not deleted
files_filtered = {k: v for k, v in files.items() if v != 'deleted'}

# keep only the files that are in app/ 
files_entities= {k: v for k, v in files_filtered.items() if v.startswith('app\\')}

# get just the names
files_entities_names = list(files_entities.values())
print("Number of files: ", len(files_entities_names))

Number of files:  1469
Number of unique files:  1469


In [None]:
# Analyze change frequencies(generate a bar chart: num-changes/files)
import matplotlib.pyplot as plt
import numpy as np
# find number of changes per file
changes = {}
for commit in all_commits:
    for mod in commit.modified_files:
        if mod.filename in changes:
            changes[mod.filename] += 1
        else:
            changes[mod.filename] = 1