## Prerequisits
- CodeMaat (installed on PATH or have the JAR present)
- Java (required by CodeMaat)
- cloc 
- Python 3.9 or higher

In [1]:
import os
import subprocess
import shlex
import sys
import json
import pandas as pd
from io import StringIO
from IPython.display import clear_output, display, Javascript, HTML

from ipynb.fs.full.cs_filepaths import FilePaths
from ipynb.fs.full.cs_entities import ProjectForAnalysis, SystemBoundaries, AuthorColor
from ipynb.fs.full.cs_d3graphing import EnclosureDiagram, MainDevEnclosureDiagram
from ipynb.fs.full.cs_commands import (
    GitLogCommand, MaatCommand, ClocCommand, MergeComplexityAndFrequency, 
    FileComplexityCommand, FileComplexityTrendCommand,
    CreateEnclosureDiagramJson, CreateMainDevEnclosureDiagramJson
)

pd.set_option('display.max_colwidth', None)

## Analysis Types

The primary classes that represent each type of analysis.

<!-- - [HotspotAnalysis](#HotspotAnalysis)
- [FileComplexityStaticAnalysis](#FileComplexityStaticAnalysis)
- [FileComplexityTrendAnalysis](#FileComplexityTrendAnalysis) -->

In [2]:
class Analysis:
    def __init__(self, project_for_analysis):
        self.project_for_analysis = project_for_analysis
        self.file_paths = FilePaths()
        
    def _generate_log_file(self):
        # Generate log file
        log_command = GitLogCommand(
            self.project_for_analysis.git_file,
            before=self.project_for_analysis.before,
            after=self.project_for_analysis.after
        )
        log_command.execute().write_out_to_file()
        
    def df(self):
        raise Exception("Method not implemented")

### HotspotAnalysis

Examines a project's files by extracting their revision frequency (as a proxy for effort expended on the module) and line count (as a proxy for complexity) in order to detect hotspots in your codebase. Either view the data in a dataframe or visualize the project with an enclosure diagram.

- `module`: The file in question
- `revisions`: The number of revisions that module has undergone in the analysis timespan
- `code`: The number of lines of code in the module (as a proxy for complexity)

Example
```python
code_maat = ProjectForAnalysis("/home/brombaut/work/code-maat")
hs_analysis = HotspotAnalysis(code_maat)
hs_analysis.analyze()
df = hs_analysis.df()
# Show enclosure diagram
hs_analysis.enclosure_diagram()
```

In [3]:
import pandas as pd

class HotspotAnalysis(Analysis):
    def __init__(self, project_for_analysis):
        super().__init__(project_for_analysis)
        
    def analyze(self):
        self._generate_log_file()
        # Analyze Change Frequencies
        self.maat_command = (
            MaatCommand(f"-l {self.file_paths.log_file} -c git -a revisions")
                .execute()
                .write_out_to_file()
        )
        # Count Lines of Code
        self.cloc_command = (
            ClocCommand(f"{self.project_for_analysis.dir_to_analyze} --by-file --csv --quiet")
                .execute()
                .write_out_to_file()
        )
        # Merge Complexity and Effort for Data View
        self.merge_command = (
            MergeComplexityAndFrequency(
                self.file_paths.maat_output_csv, self.file_paths.cloc_lines_csv
            ).execute()
        )
        # Create Enclosure JSON
        self.create_enclosure_json_command = CreateEnclosureDiagramJson(
            self.file_paths.maat_output_csv, self.file_paths.cloc_lines_csv)
        self.create_enclosure_json_command.execute()
        return self
        
    def enclosure_diagram(self):
        enclosure_diagram_json = json.loads(self.create_enclosure_json_command.out_as_str())
        enc_diagram = EnclosureDiagram(enclosure_diagram_json)
        enc_diagram.show()
        return self
    
    def df(self):
        result = pd.read_csv(StringIO(self.merge_command.out_as_str()))
        return result

### Complexity Analysis (Whitespace Analysis)

The idea of indentation as a proxy for complexity is backed by research (see [Reading Beside the Lines: Indentation as a Proxy for Complexity Metric](https://www.semanticscholar.org/paper/Reading-Beside-the-Lines%3A-Indentation-as-a-Proxy-Hindle-Godfrey/ce39dfa1f8b0b234da54c6f4b696e28057fc2b20)). It's a simple metric, yet it correlates with more elaborate metrics, such as McCabe cyclomatic complexity and Halstead complexity measures.

We can either perform this analysis on a single snapshot in time of a file (`StaticFileComplexityAnalysis`), or (perhaps more useful) we can perform this analysis on a file across time to visualize the complexity trend of the file (`TrendFileComplexityAnalysis`).

We are basically the calculating the logical indentation for a static file. Four spaces or one tab counts as one logical indentation. Empty and blank lines are ignored.

- The `total` column is the accumulated complexity. It’s useful to compare different revisions or modules against each other.
- The `mean` column tells us the mean complexity of the module.
- The standard deviation `sd` column tells us the variance of the complexity within the module. A low number indicates that most lines have a complexity close to the mean.
- The `max` columns show the maximum complexity value in the module. A large maximum indentation value means there is a lot of indenting, which essentially means nested conditions. We can expect islands of complexity.

> [REVIEW THIS] Note that in order to specify the timespan for analysis, you must provide a `before` and/or `after` value when creating the `ProjectForAnalysis`. These values are used to determine the commits to use when calculating the complexity trend

Example

```python
benrombautca = ProjectForAnalysis("/home/brombaut/work/benrombautca")

# Static analysis
static_file_complexity = StaticFileComplexityAnalysis(benrombautca, "src/bookshelf/BookCard.vue")
static_file_complexity.analyze()
df = static_file_complexity.df()

# Trend analysis
file_complexity_trend = TrendFileComplexityAnalysis(benrombautca, "src/bookshelf/BookCard.vue")
file_complexity_trend.analyze()
df = file_complexity_trend.df()
# Show trend lines
file_complexity_trend.total_trend_line_plot()
file_complexity_trend.mean_trend_line_plot()
file_complexity_trend.sd_trend_line_plot()

```

In [4]:
class FileComplexityAnalysis(Analysis):
    def __init__(self, project_for_analysis, file_name_for_analysis):
        super().__init__(project_for_analysis)
        project_for_analysis.throw_if_file_does_not_exist(file_name_for_analysis)
        self.file_name_for_analysis = file_name_for_analysis

        
class StaticFileComplexityAnalysis(FileComplexityAnalysis):        
    def analyze(self):
        comp_analysis_command = FileComplexityCommand(
            f"{self.project_for_analysis.dir_to_analyze}/{self.file_name_for_analysis}"
        ).execute()
        self.csv_str = comp_analysis_command.out_as_str()
        return self
        
    def df(self):
        result = pd.read_csv(StringIO(self.csv_str))
        return result

    
class TrendFileComplexityAnalysis(FileComplexityAnalysis):
    def analyze(self):
        self.comp_analysis_command = FileComplexityTrendCommand(
            self.project_for_analysis.dir_to_analyze,
            self.file_name_for_analysis,
            self.project_for_analysis.first_commit_in_timespan(),
            self.project_for_analysis.last_commit_in_timespan(),
        ).execute()
        return self
    
    def df(self):
        result = pd.read_csv(StringIO(self.comp_analysis_command.out_as_str()))
        return result
    
    def line_plot(self, y):
        self.df().plot.line(
            y=y,
            title=self.file_name_for_analysis,
            ylabel=f"Complexity ({y})",
            xlabel="Revision",
        )
    
    def total_trend_line_plot(self):
        self.line_plot("total")
        return self
    
    def mean_trend_line_plot(self):
        self.line_plot("mean")
        return self
    
    def sd_trend_line_plot(self):
        self.line_plot("sd")
        return self

## Coupling Analysis

Generates the following fields:

1. entity: This is the name of one of the involved modules. Code Maat always calculates pairs.
2. coupled: This is the coupled counterpart to the entity.
3. degree: The degree specifies the percent of shared commits. The higher the number, the stronger the coupling.
4. average-revs: Finally, we get a weighted number of total revisions for the involved modules. The idea here is that we can filter out modules with too few revisions to avoid bias

### SystemCouplingAnalysis

Example
```python
boundaries_dict = {
    "Code": ["src/code_maat"],
    "Analysis Test": ["test/code_maat/analysis"],
    "Dataset Test": ["test/code_maat/dataset"],
    "End to end Test": ["test/code_maat/end_to_end"],
    "Parsers Test": ["test/code_maat/parsers"],
}
boundaries = SystemBoundaries(boundaries_dict)
craft = ProjectForAnalysis("/home/brombaut/work/code-maat", system_boundaries=boundaries)
coupling_data = SystemCouplingAnalysis(craft).analyze().df()
```

In [5]:
# TODO: Should this handle architectural boundaries?
class SystemCouplingAnalysis(Analysis):
    # NOTE: temporal_period can only be None or 1 (limitation of codemaat)
    def __init__(self, project_for_analysis, temporal_period=None):
        super().__init__(project_for_analysis)
        self.temporal_period = temporal_period
        
    def analyze(self):
        self._generate_log_file()
        # Analyze Coupling
        cmd_str = f"-l {self.file_paths.log_file} -c git -a coupling"
        if self.project_for_analysis.has_system_boundaries():
            cmd_str += f" -g {self.project_for_analysis.system_boundaries_file()}"
        if self.temporal_period:
            cmd_str += f" --temporal-period {self.temporal_period}"
        self.maat_command = MaatCommand(cmd_str).execute()
        return self
    
    def df(self):
        result = pd.read_csv(StringIO(self.maat_command.out_as_str()))
        return result

### FileCouplingAnalysis

Example

```python
benrombautca = ProjectForAnalysis("/home/brombaut/work/benrombautca")
file_coupling_analysis = FileCouplingAnalysis(benrombautca, "src/bookshelf/BookshelfSection.vue")
file_coupling_analysis.analyze()
file_coupling_analysis.enclosure_diagram()
```

In [6]:
class FileCouplingAnalysis(Analysis):
    def __init__(self, project_for_analysis, file_name_for_analysis):
        super().__init__(project_for_analysis)
        project_for_analysis.throw_if_file_does_not_exist(file_name_for_analysis)
        self.file_name_for_analysis = file_name_for_analysis
        
    def analyze(self):
        self._generate_log_file()
        # Analyze Coupling
        self.maat_command = (
            MaatCommand(f"-l {self.file_paths.log_file} -c git -a coupling")
                .execute()
                .write_out_to_file()
        )
        # Count Lines of Code
        self.cloc_command = (
            ClocCommand(f"{self.project_for_analysis.dir_to_analyze} --by-file --csv --quiet")
                .execute()
                .write_out_to_file()
        )
        # Filter only lines with specific file
        df = pd.read_csv(self.file_paths.maat_output_csv)
        self.filtered_df = df.loc[
            (df['entity'].str.contains(self.file_name_for_analysis)) |
            (df['coupled'].str.contains(self.file_name_for_analysis))
        ]
        if len(self.filtered_df) == 0:
            raise Exception(f"No coupling data detected for file={self.file_name_for_analysis}")
        self.filtered_df.to_csv(self.file_paths.maat_output_csv, index=False)
        # Create Enclosure JSON
        self.create_enclosure_json_command = CreateEnclosureDiagramJson(
            self.file_paths.maat_output_csv,
            self.file_paths.cloc_lines_csv,
            weight_column=2
        ).execute()
        return self
        
    def enclosure_diagram(self):            
        enclosure_diagram_json = json.loads(self.create_enclosure_json_command.out_as_str())
        enc_diagram = EnclosureDiagram(enclosure_diagram_json)
        enc_diagram.show()
        return self
    
    def df(self):
        return self.filtered_df

## Authors Analysis

### ParallelWorkAnalysis

Example:

```python
craft = ProjectForAnalysis("/home/brombaut/work/code-maat")
an = ParallelWorkAnalysis(craft)
an.analyze()
an.df()
an.enclosure_diagram()
```

In [7]:
class ParallelWorkAnalysis(Analysis):
    def __init__(self, project_for_analysis):
        super().__init__(project_for_analysis)
        
    def analyze(self):
        self._generate_log_file()
        # Analyze Change Frequencies
        self.maat_command = (
            MaatCommand(f"-l {self.file_paths.log_file} -c git -a authors")
                .execute()
                .write_out_to_file()
        )
        # Count Lines of Code
        self.cloc_command = (
            ClocCommand(f"{self.project_for_analysis.dir_to_analyze} --by-file --csv --quiet")
                .execute()
                .write_out_to_file()
        )
        # Create Enclosure JSON
        self.create_enclosure_json_command = CreateEnclosureDiagramJson(
            self.file_paths.maat_output_csv,
            self.file_paths.cloc_lines_csv,
            weight_column=1
        ).execute()
        return self
        
    def enclosure_diagram(self):
        enclosure_diagram_json = json.loads(self.create_enclosure_json_command.out_as_str())
        enc_diagram = EnclosureDiagram(enclosure_diagram_json)
        enc_diagram.show()
        return self
    
    def df(self):
        result = pd.read_csv(StringIO(self.maat_command.out_as_str()))
        return result

### MainDeveloperAnalysis

Example:

```python
code_maat = ProjectForAnalysis("/home/brombaut/work/code-maat")
an = MainDeveloperAnalysis(code_maat)
an.analyze()
an.df()
```

In [8]:
class MainDeveloperAnalysis(Analysis):
    def __init__(self, project_for_analysis):
        super().__init__(project_for_analysis)
        
    def analyze(self):
        self._generate_log_file()
        # Analyze Change Frequencies
        self.maat_command = (
            MaatCommand(f"-l {self.file_paths.log_file} -c git -a main-dev")
                .execute()
        )
        # Count Lines of Code
        self.cloc_command = (
            ClocCommand(f"{self.project_for_analysis.dir_to_analyze} --by-file --csv --quiet")
                .execute()
                .write_out_to_file()
        )

    def df(self):
        result = pd.read_csv(StringIO(self.maat_command.out_as_str()))
        return result
    
    def enclosure_diagram(self, author_colors):
        if author_colors is None:
            print("No author colours provided...automatically creating")
            # TODO
        # Rewrite out to file
        self.maat_command.write_out_to_file()
        self.cloc_command.write_out_to_file()
        author_colors.write_to_file()
        # Create main dev enclosure diagram json
        self.create_main_dev_enclosure_json_command = (
            CreateMainDevEnclosureDiagramJson(
                self.cloc_command.out_file(),
                self.maat_command.out_file(),
                author_colors.out_file()
            ).execute()
        )
        enclosure_diagram_json = json.loads(self.create_main_dev_enclosure_json_command.out_as_str())
        self.data = enclosure_diagram_json
        enc_diagram = MainDevEnclosureDiagram(enclosure_diagram_json)
        enc_diagram.show()

In [9]:
# author_color_dict = {
#     "Adam Petersen": "green",
# }
# author_colours = AuthorColor(author_color_dict)
code_maat = ProjectForAnalysis("/home/brombaut/work/code-maat")
an = MainDeveloperAnalysis(code_maat)
an.analyze()

In [13]:
author_colors = AuthorColor.from_authors_list(an.df()['main-dev'].unique())

In [14]:
author_colors.author_color_dict

{'Adam Petersen': '#B567E5'}

In [15]:
an.enclosure_diagram(author_colors)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### EntityOwnershipAnalysis

Example:

```python
code_maat = ProjectForAnalysis("/home/brombaut/work/code-maat")
an = EntityOwnershipAnalysis(craft)
an.analyze()
an.df()
```

In [None]:
class EntityOwnershipAnalysis(Analysis):
    def __init__(self, project_for_analysis):
        super().__init__(project_for_analysis)
        
    def analyze(self):
        self._generate_log_file()
        # Analyze Change Frequencies
        self.maat_command = (
            MaatCommand(f"-l {self.file_paths.log_file} -c git -a entity-ownership")
                .execute()
        )
        
    def df(self):
        result = pd.read_csv(StringIO(self.maat_command.out_as_str()))
        return result

In [None]:
# TODO: Fractal figures?