# Test Workflow Data Aggregation

-----------------------------------------------------------------
This example illustrates the aggregation of workflow (actions) data using the `github2pandas` repository. Here, all workflows are read out, filtered and displayed in terms of success or failure. 

In [None]:
from github2pandas.workflows import Workflows
from github2pandas.utility import Utility
from pathlib import Path
import pandas as pd
import os

## Basic Usage

The most important input parameter is an Repository object from PyGitHub-Package.

In [None]:
git_repo_name = "github2pandas"
git_repo_owner = "TUBAF-IFI-DiPiT"
    
default_data_folder = Path("data", git_repo_name)

github_token = os.environ['GITHUB_API_TOKEN']
# If you do not include your Github Token in .env, its neccessary to integrate it here. 
# github_token = "yourToken"

repo = Utility.get_repo(git_repo_owner, git_repo_name, github_token, default_data_folder)

The code snipet generates a raw data set based on repo information. The pandas Dataframe includes author´s information, timestamp and the general result of the workflow run. 

In [None]:
Workflows.generate_workflow_pandas_tables(repo=repo, data_root_dir=default_data_folder)

In [None]:
pd_workflow = Workflows.get_workflows(data_root_dir=default_data_folder)
pd_workflow.head(5)

In [None]:
pd_run = Workflows.get_workflows(data_root_dir=default_data_folder, filename = Workflows.WORKFLOWS_RUNS)
pd_run.head(5)

## Get workflow run logs
What happens during the workflow run. Let's take a closer view on log files of a specific Action run.

In [None]:
Workflows.download_workflow_log_files(repo=repo,
                                  github_token=github_token,
                                  workflow_run_id=1322994624,
                                  data_root_dir=default_data_folder)

The workflow logs are stored in the data folder of the project now.

## Check who prepared the workflows

For this request we have to merge Version data with Workflow information. 

1. Prepare commit, edits and workflow dataframes
2. Extract commits adressing workflow-folder `.github/workflow/` from edits
3. Identify authors integrating workflows

In [None]:
from github2pandas.version import Version
Version.clone_repository(repo=repo, data_root_dir=default_data_folder, github_token=github_token)
Version.no_of_proceses = 8
Version.generate_version_pandas_tables(repo=repo, data_root_dir=default_data_folder)

pd_edits = Version.get_version(data_root_dir=default_data_folder, filename=Version.VERSION_EDITS)
pd_commits = Version.get_version(data_root_dir=default_data_folder)

In [None]:
relevant_commits = pd_edits[pd_edits["new_path"].str.contains(".github/workflows/", na=False)][['commit_sha', 'filename']]
relevant_commits.drop_duplicates(inplace = True)
relevant_commits

In [None]:
pd.merge(relevant_commits, pd_commits[['author', 'commit_sha', 'commited_at']],
         how="left", left_on = "commit_sha", right_on = "commit_sha")