# Test Workflow Data Aggregation

-----------------------------------------------------------------
This example illustrates the aggregation of workflow (actions) data using the `github2pandas` repository. Here, all workflows are read out, filtered and displayed in terms of success or failure. 

In [None]:
from pathlib import Path
from github2pandas.workflows.aggregation import generate_workflow_history,\
                                                get_workflow_pandas_table,\
                                                request_log_files

from github2pandas.utility import replace_dublicates,\
                                  apply_python_date_format

## Basic Usage

The code snipet generates a raw data set. The pandas Dataframe includes author´s information, timestamp and the general result of the workflow run. 

In [None]:
git_repo_name = "Extract_Git_Activities"
git_repo_owner = "TUBAF-IFI-DiPiT"
    
default_data_folder = Path("data", git_repo_name)

import os
github_token = os.environ['TOKEN']
# If you do not include your Github Token in .env, its neccessary to integrate it here. 
# github_token = "yourToken"

generate_workflow_history(repo_name=git_repo_name,
                          github_token=github_token,
                          data_dir=default_data_folder)

pd_workflow = get_workflow_pandas_table(data_dir=default_data_folder)

In [None]:
pd_workflow.head(5)

In [None]:
pd_workflow.commit_author.unique()

Obviously, however, we again have problems with the duplicates of the naming. Let's use the processing tools to eliminate them.

## Application of processing methods

In [None]:
dublicate_names = [('SebastianZug', 'Sebastian Zug')]


pd_workflow_filtered = (
    get_workflow_pandas_table(data_dir=default_data_folder)
    .pipe(replace_dublicates, "commit_author", dublicate_names)
)

Who is the person with the most successful commits?

In [None]:
pd_workflow_filtered.groupby(['commit_author', 'conclusion'])['workflow_run_id'].count().unstack()

## Get workflow run logs
What happens during the workflow run. Let's take a closer view on log files.

In [None]:
request_log_files(owner = git_repo_owner,
                  repo_name=git_repo_name,
                  github_token=github_token,
                  workflow_id="617685085",
                  folder=".")