# Test Workflow Data Aggregation

-----------------------------------------------------------------
This example illustrates the aggregation of workflow (actions) data using the `github2pandas` repository. Here, all workflows are read out, filtered and displayed in terms of success or failure. 

In [1]:
from github2pandas.aggregation.workflows import AggWorkflow as AggWF
from github2pandas.aggregation.utility import Utility
from pathlib import Path

## Basic Usage

The most important input parameter is an Repository object from PyGitHub-Package.

In [2]:
git_repo_name = "Extract_Git_Activities"
git_repo_owner = "TUBAF-IFI-DiPiT"
    
default_data_folder = Path("data", git_repo_name)

import os
github_token = os.environ['TOKEN']
# If you do not include your Github Token in .env, its neccessary to integrate it here. 
# github_token = "yourToken"

repo = Utility.get_repo(git_repo_owner, git_repo_name, github_token)

The code snipet generates a raw data set based on repo information. The pandas Dataframe includes author´s information, timestamp and the general result of the workflow run. 

In [3]:
AggWF.generate_workflow_pandas_tables(repo=repo, data_root_dir=default_data_folder)
pd_workflow = AggWF.get_raw_workflow(data_root_dir=default_data_folder)

In [4]:
pd_workflow.head(5)

Unnamed: 0,workflow_id,workflow_name,workflow_run_id,commit_message,commit_author,commit_sha,commit_branch,state,conclusion,author
0,6245620,RunTests,663932655,Merge pull request #22 from TUBAF-IFI-DiPiT/en...,Maximilian Karl,894acd353e91413045311549e31cf53a795cca54,main,completed,failure,d17678d7-409f-4704-b0a4-39925b1b169e
1,6245620,RunTests,656975885,Update ER_diagram.drawio,Maximilian Karl,e3d2bd94fa1e01ffd346bcd940f50e389b9446f8,main,completed,failure,6d1b7901-9635-4705-a7f5-050593c619dc
2,6245620,RunTests,656758944,Merge pull request #27 from TUBAF-IFI-DiPiT/fe...,Maximilian Karl,5cd09a720e9b4cbac15e7fa2904286019f5752e7,main,completed,failure,d17678d7-409f-4704-b0a4-39925b1b169e
3,6245620,RunTests,654047739,Delete empty.txt,Sebastian Zug,cc50429fdeb840ec8ad6bcba43fd78d915eafcc2,main,completed,success,d17678d7-409f-4704-b0a4-39925b1b169e
4,6245620,RunTests,654047367,Delete diagram.xml,Sebastian Zug,357be6f674cda7989175641fb7cbbb3b1e39dee5,main,completed,success,d17678d7-409f-4704-b0a4-39925b1b169e


In [5]:
pd_workflow.commit_author.unique()

array(['Maximilian Karl', 'Sebastian Zug', 'SebastianZug'], dtype=object)

Obviously, however, we again have problems with the duplicates of the naming. Let's use the processing tools to eliminate them.

## Application of processing methods

In [6]:
dublicate_names = [('SebastianZug', 'Sebastian Zug')]

def replace_dublicates(pd_table, column_name, dublicates):
    for name in dublicates:
        pd_table[column_name].replace(name[0], name[1],
                                        inplace=True)
    return pd_table


pd_workflow_filtered = (
    AggWF.get_raw_workflow(data_root_dir=default_data_folder)
    .pipe(replace_dublicates, "commit_author", dublicate_names)
)

Who is the person with the most successful commits?

In [7]:
pd_workflow_filtered.groupby(['commit_author', 'conclusion'])['workflow_run_id'].count().unstack()

conclusion,failure,success
commit_author,Unnamed: 1_level_1,Unnamed: 2_level_1
Maximilian Karl,12,3
Sebastian Zug,11,21


## Get workflow run logs
What happens during the workflow run. Let's take a closer view on log files of a specific Action run.

In [8]:
AggWF.download_workflow_log_files(repo=repo,
                                  github_token=github_token,
                                  workflow = repo.get_workflow_run(642018321),
                                  data_root_dir=default_data_folder)

https://api.github.com/repos/TUBAF-IFI-DiPiT/Extract_Git_Activities/actions/runs/642018321/logs


11