# OSPO Project Health Data

You can compare the charts you generate in Tableau with the [charts on Sharepoint](https://onevmw.sharepoint.com/teams/OSCS/Shared%20Documents/Forms/AllItems.aspx?viewid=25ac1b00%2Dc58d%2D4f8b%2Da9a5%2Df5ddadc1d350&id=%2Fteams%2FOSCS%2FShared%20Documents%2F2021%2D02) to check for accuracy.

# Tableau Functions

**These functions will contain the code to generate data for Tableau by calling the common functions that I use to gather the data and interpret it**

In [1]:
# Sustains and Keeps Up with Contributions Function

def sustain_prs_by_repo_tableau(repo_id, repo_name, org_name, start_date, end_date, engine):

    import pandas as pd
    from common_functions import sustain_prs_by_repo_data

    error_num, error_text, pr_sustainDF, title, title_color, interpretation, risk, risk_num = sustain_prs_by_repo_data(repo_id, repo_name, org_name, start_date, end_date, engine)

    # error_num == 0 - ok to generate graph. If error_num == -1, there aren't enough PRs to generate the graph.
    # pr_sustainDF: is the dataframe containing the data you need to graph
    # title: this is the title for the top of the chart
    # title_color: The title text should be displayed in this color
    # interpretation: This text should appear at the bottom of the graph
    # risk and risk num: You can ignore these variables

    if error_num == 0:
        # Add code here to generate what you need for Tableau. 

        # print the dataframe so that you know what you have
        print('Sustain Dataframe\n', pr_sustainDF, '\n\n')

In [2]:
# Contributor Risk Function

def contributor_risk_tableau(repo_id, repo_name, org_name, start_date, end_date, engine):

    from common_functions import contributor_risk_data

    error_num, error_text, names, percents, commits, bar_colors, title, title_color, interpretation, risk, num_people = contributor_risk_data(repo_id, repo_name, org_name, start_date, end_date, engine)

    # names, percents, commits, bar_colors are all lists containing what you would expect.
    # The sequences of the lists are important. percents[0], commits[0], bar_colors[0] contain the data for names[0] 

    if error_num == 0:
        # Add code here to generate what you need for Tableau. 

        # print the data so that you know what you have
        print('Names', names)
        print('Percents', percents)
        print('Commits', commits)
        print('Bar_colors', bar_colors)

In [3]:
# Timely Responses Function

def response_time_tableau(repo_id, repo_name, org_name, start_date, end_date, engine):

    import pandas as pd
    from common_functions import response_time_data
    
    error_num, error_text, pr_responseDF, title, title_color, interpretation, risk, risk_num = response_time_data(repo_id, repo_name, org_name, start_date, end_date, engine)

    # pr_responseDF['total_prs'] is the black line in my chart representing 'Total'.
    # pr_responseDF['in_guidelines'] is the green line representing 'Response < 2 business days'

    if error_num == 0:
        # Add code here to generate what you need for Tableau. 

        # print the dataframe so that you know what you have
        print('Response Time Dataframe\n', pr_responseDF, '\n\n')

In [4]:
# Actively Maintained - Regular Releases Function

def activity_release_tableau(repo_name, org_name, start_date, end_date, repo_api):

    ### IMPORTANT: You need to have a GitHub API key stored in a file named gh_key in the same directory as this NB

    import pandas as pd
    from common_functions import activity_release_data

    error_num, error_text, releasesDF, start_dt, end_dt, title, title_color, interpretation, risk, risk_num = activity_release_data(repo_name, org_name, start_date, end_date, repo_api)

    # releasesDF['date'] is what should be plotted with x's or similar at the appropriate date on the x axis

    if error_num == 0:
        # Add code here to generate what you need for Tableau. 

        # print the dataframe so that you know what you have
        print('Releases Dataframe\n', releasesDF, '\n\n')

# Main Script

**The script below will generate all of the data using the functions above in addition to some common functions**

In [5]:
from common_functions import augur_db_connect, get_dates, get_commits_by_repo
from common_functions import repo_api_call, fork_archive

six_months = 180  # Default to one year of data
year = 365   # Default to one year of data

engine = augur_db_connect()

start_date, end_date = get_dates(year)
six_start_date, six_end_date = get_dates(six_months)

commit_threshold = 60 # 90 but use 1500 for testing

repo_list_commits = get_commits_by_repo(six_start_date, six_end_date, engine)

top = repo_list_commits.loc[repo_list_commits['count'] > commit_threshold]

# Testing - Delete this line later
i = 0

for index, repo in top.iterrows():

    repo_id = repo['repo_id']
    repo_name = repo['repo_name']
    repo_path = repo['repo_path']
    org_name = repo_path[11:(len(repo_path)-1)]

    print('Processing:', org_name, repo_name, repo_path, repo_id, repo['count'])

    try:
        repo_api = repo_api_call(repo_name, org_name)
    except:
        print('Cannot process API calls for:', org_name, repo_name, repo_path, repo_id)

    is_fork, is_archived = fork_archive(repo_name, org_name, engine)

    # Only gather data from repos that aren't forks or archived
    if is_fork == False and is_archived == False:
        sustain_prs_by_repo_tableau(repo_id, repo_name, org_name, start_date, end_date, engine)
        contributor_risk_tableau(repo_id, repo_name, org_name, start_date, end_date, engine)
        response_time_tableau(repo_id, repo_name, org_name, start_date, end_date, engine)
        activity_release_tableau(repo_name, org_name, start_date, end_date, repo_api)

    # Testing - Delete these lines later
    if i > 2:
        break
    else:
       i+=1

Processing: vmware captive-web-view github.com/vmware/ 28028 61
Names ['Jim Hawkins', 'Neil Broadbent']
Percents [0.9444444444444444, 0.05555555555555555]
Commits [85, 5]
Bar_colors ['red', 'lightblue']
Processing: vmware-tanzu velero-plugin-for-vsphere github.com/vmware-tanzu/ 28046 62
Sustain Dataframe
    yearmonth                  repo_name  repo_id  closed_total  all_total  \
0    2020-02  velero-plugin-for-vsphere    28046             8          8   
1    2020-03  velero-plugin-for-vsphere    28046            36         36   
2    2020-04  velero-plugin-for-vsphere    28046            16         16   
3    2020-05  velero-plugin-for-vsphere    28046             8          8   
4    2020-06  velero-plugin-for-vsphere    28046            27         27   
5    2020-07  velero-plugin-for-vsphere    28046            36         36   
6    2020-08  velero-plugin-for-vsphere    28046            23         23   
7    2020-09  velero-plugin-for-vsphere    28046            18         18   


Sustain Dataframe
    yearmonth repo_name  repo_id  closed_total  all_total  diff  diff_per
0    2020-02      kapp    30468           1.0        1.0   0.0       0.0
1    2020-03      kapp    30468           3.0        3.0   0.0       0.0
2    2020-04      kapp    30468           5.0        5.0   0.0       0.0
3    2020-05      kapp    30468           1.0        1.0   0.0       0.0
4    2020-06      kapp    30468           3.0        3.0   0.0       0.0
5    2020-07      kapp    30468           2.0        2.0   0.0       0.0
6    2020-08      kapp    30468           4.0        4.0   0.0       0.0
7    2020-09      kapp    30468           2.0        2.0   0.0       0.0
8    2020-10      kapp    30468           0.0        0.0   0.0       NaN
9    2020-11      kapp    30468           2.0        2.0   0.0       0.0
10   2020-12      kapp    30468           3.0        5.0   2.0       0.4
11   2021-01      kapp    30468           5.0        5.0   0.0       0.0
12   2021-02      kapp    30468 