# Code Complexity Summary

This notbook generates code compexity summaries and writes them to a google sheet.

For each repository, one tab is added.

Before running the code, make sure to be on an up-to-date master for all repositories without any untracked python files.

For more on code complexities and the measures used in this notebook, check these [slides]().

In [2]:
%load_ext autoreload
%autoreload 2

from pathlib import Path

root = Path("/Users/corrie/Documents/private_projects")
repos = [
    root / "code-complexity",
    #...
]

In [3]:



import pandas as pd

import gspread

from code.gsheet_utils import apply_formatting, return_data_to_write
from code.complexity_metrics import get_repo_complexities

import warnings
warnings.simplefilter(action="ignore", category=Warning)

from plotnine import *

In [4]:
creds_dict = {}  # your creds dict
scopes = ["https://www.googleapis.com/auth/drive"]
gc = gspread.service_account_from_dict(creds_dict, scopes)


## Run for a single repo

If you only want to get the results for a single repo (or folder), you can run the following command:

In [5]:
repo = 'code-complexity'

df = get_repo_complexities(repos[0])

df.head(3)

Unnamed: 0,repo,file,function_name,func_lineno,func_length,cognitive_complexity,sum_expression_complexity,max_expression_complexity,num_arguments,num_returns,num_module_expressions,module_complexity,extract_date
3,code-complexity,/code/parse_code.py,iterate_over_expressions,33,31,12,18.0,6.0,1,0,4,0.0,2023-04-18
1,code-complexity,/code/parse_code.py,get_all_python_files,13,10,8,18.0,3.5,2,1,4,0.0,2023-04-18
6,code-complexity,/code/gsheet_utils.py,return_data_to_write,45,8,2,9.5,2.5,3,1,14,7.5,2023-04-18


## Save to Google Sheet

In [6]:
url = "https://docs.google.com/spreadsheets/d/{sheet_id}}/"
sheet = gc.open_by_url(url)

In [None]:
tabs = [wksh.title for wksh in sheet.worksheets()]
all_repos = []
for repo in repos:
    new_df = get_repo_complexities(repo)
    
    if not repo in tabs:
        wksht = sheet.add_worksheet(title=repo, rows=1000, cols=26, index=0)
        df = new_df
    else:
        df = return_data_to_write(sheet, repo.name, new_df)
        wksht = sheet.worksheet(repo.name)
    apply_formatting(wksht, df)

    
    all_repos.append(df)

## Summary Statistics

In [8]:
df_all = pd.concat(all_repos, ignore_index=True).query('repo != "nan" & repo.notna()')

df_all['extract_date'] = df_all['extract_date'].replace('nan', None).fillna(method='ffill')

In [9]:
plot_df = (df_all
           .groupby(['repo', 'extract_date'])
           .cognitive_complexity.agg(['mean', 'max', 'median'])
           .reset_index()
          )

plot_df['extract_date'] = pd.to_datetime(plot_df['extract_date'])


In [15]:
(ggplot(plot_df, aes(x='extract_date', y='mean', color='repo'))
 + geom_line(show_legend=False, size=1.5)
 + geom_point(show_legend=False, size=2)
 + scale_x_date(date_labels='%b %Y', breaks=plot_df.extract_date.unique())
 + scale_color_brewer(type='qual', palette='Set2')
 + labs(x='', y='Complexity', title='Code Complexity of our Repos over Time')
 + theme_minimal()
 + theme(figure_size=(8,15),
        legend_position='bottom')
).draw()

In [14]:
(ggplot(plot_df, aes(x='extract_date', y='mean', color='repo'))
 + geom_line(show_legend=False)
 + geom_point(show_legend=False)
 + scale_x_date(date_labels='%d %b %Y', breaks=plot_df['extract_date'].unique())
 + scale_color_brewer(type='qual', palette='Paired')
 + labs(x='', y='Max Cognitive Complexity', title='Max Cognitive Complexity of our Repos over Time')
 + theme_minimal()
 + theme(figure_size=(10,6),
        legend_position='bottom')
).draw()