# Quick code analysis

To get going with this notebook I suggest using VS Code with Python extension.

Create a virtual environment in this folder:

```
python -m venv .venv
```

Select the Python Interpreter from the venv in the quick command menu (Ctrl/Cmd + Shift + P) -> `> Python: Select Interpreter` (refresh the interpreter list or reload the window if it does not show up).

Open a terminal, load the venv if that's not done automatically, and install pandas:

```
pip install pandas
```

Ensure `git` can be be found.

Change the variables below to point to your repo of interest, and give this notebook a spin (VS Code may ask if it needs to install a Jupyter kernel).

In [37]:
file_to_analyse = "gaphor.csv"
after = "two months ago"
top = 10


In [38]:
import pandas

# TODO: do what's done from the extractor script here

df = pandas.read_csv(file_to_analyse, names=["date", "commit", "added", "removed", "filename"])

## Churn

Churn is simplyhow many times a file has changed in the history of a project. The more often it changed, the higher the "churn".

In [39]:
churn = df.groupby(['filename']).size().reset_index(name='counts').sort_values("counts", ascending=False)
churn[:top]

Unnamed: 0,filename,counts
1458,poetry.lock,210
1459,pyproject.toml,195
22,.github/workflows/build.yml,151
1000,gaphor/ui/diagrampage.py,101
1186,gaphor/ui/mainwindow.py,92
1190,gaphor/ui/namespace.py,92
677,gaphor/core/modeling/diagram.py,87
294,docs/requirements.txt,79
820,gaphor/diagram/presentation.py,77
49,README.md,76


# Change Coupling

Change coupling tells us which files have a tendency to change together.

In [40]:
from IPython.core.display import HTML

combinations = {}
commits = {}

for _, group in df.groupby(['commit']):
    import itertools
    for filename in group["filename"]:
        try:
            commits[filename] += 1
        except KeyError:
            commits[filename] = 1

    for pair in itertools.combinations(group['filename'], 2):
        try:
            combinations[pair] += 1
        except KeyError:
            combinations[pair] = 1

change_coupling = sorted(((n / commits[a] + n / commits[b], n, a, b) for (a, b), n in combinations.items()),reverse=1)[:top]

rows = (f"<tr><td>{n}</td><td>{file_a}</td><td>{commits[file_a]}</td><td>{int(n / commits[file_a] * 100)}</td></tr><tr><td></td><td>{file_b}</td><td>{commits[file_b]}</td><td>{int(n / commits[file_b] * 100)}</td></tr>" for _, n, file_a, file_b in change_coupling)

HTML(f'<table><th><td>Coupled Entities</td><td>Commits</td><td>% coupling</td></tr>{"".join(rows)}</table>')

0,1,2,3
13.0,po/ru.po,13,100
,po/sv.po,13,100
13.0,po/ca.po,13,100
,po/sv.po,13,100
13.0,po/ca.po,13,100
,po/ru.po,13,100
9.0,gaphor/ui/icons/hicolor/scalable/actions/gaphor-extend-symbolic.svg,9,100
,gaphor/ui/icons/hicolor/scalable/actions/gaphor-include-symbolic.svg,9,100
8.0,gaphor/ui/icons/hicolor/scalable/actions/gaphor-trace-symbolic.svg,8,100
,gaphor/ui/icons/hicolor/scalable/actions/gaphor-verify-symbolic.svg,8,100
