# Test Workflow Data Aggregation

-----------------------------------------------------------------
This example illustrates the aggregation of workflow (actions) data using the `github2pandas` repository. Here, all workflows are read out, filtered and displayed in terms of success or failure. 

In [None]:
from github2pandas.workflows.aggregation import AggWorkflow as AggWF
from github2pandas.utility import Utility
from pathlib import Path

## Basic Usage

The most important input parameter is an Repository object from PyGitHub-Package.

In [None]:
git_repo_name = "Extract_Git_Activities"
git_repo_owner = "TUBAF-IFI-DiPiT"
    
default_data_folder = Path("data", git_repo_name)

import os
github_token = os.environ['TOKEN']
# If you do not include your Github Token in .env, its neccessary to integrate it here. 
# github_token = "yourToken"

repo = Utility.get_repo(git_repo_name, github_token)

The code snipet generates a raw data set based on repo information. The pandas Dataframe includes author´s information, timestamp and the general result of the workflow run. 

In [None]:
AggWF.generate_workflow_pandas_tables(repo=repo, data_root_dir=default_data_folder)
pd_workflow = AggWF.get_raw_workflow(data_root_dir=default_data_folder)

In [None]:
pd_workflow.head(5)

In [None]:
pd_workflow.commit_author.unique()

Obviously, however, we again have problems with the duplicates of the naming. Let's use the processing tools to eliminate them.

## Application of processing methods

In [None]:
dublicate_names = [('SebastianZug', 'Sebastian Zug')]

def replace_dublicates(pd_table, column_name, dublicates):
    for name in dublicates:
        pd_table[column_name].replace(name[0], name[1],
                                        inplace=True)
    return pd_table


pd_workflow_filtered = (
    AggWF.get_raw_workflow(data_root_dir=default_data_folder)
    .pipe(replace_dublicates, "commit_author", dublicate_names)
)

Who is the person with the most successful commits?

In [None]:
pd_workflow_filtered.groupby(['commit_author', 'conclusion'])['workflow_run_id'].count().unstack()

## Get workflow run logs
What happens during the workflow run. Let's take a closer view on log files of a specific Action run.

In [None]:
AggWF.download_workflow_log_files(repo=repo,
                                  github_token=github_token,
                                  workflow = repo.get_workflow_run(642018321),
                                  data_root_dir=default_data_folder)