# Project Git Metrics for Landscape Analysis

Project git metrics for software landscape analysis related to Cytomining ecosystem.

## Setup

Set an environment variable named `LANDSCAPE_ANALYSIS_GH_TOKEN` to a [GitHub access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens). E.g.: `export LANDSCAPE_ANALYSIS_GH_TOKEN=token_here`

In [1]:
import os
from datetime import datetime

import pandas as pd
import pytz
from box import Box
from github import Auth, Github

# set github authorization and client
github_client = Github(
    auth=Auth.Token(os.environ.get("LANDSCAPE_ANALYSIS_GH_TOKEN")), per_page=100
)
# get the current datetime
tz = pytz.timezone("UTC")
current_datetime = datetime.now(tz)

In [2]:
# gather projects data
projects = Box.from_yaml(filename="data/projects.yaml").projects

# check the number of projects
print("number of projects: ", len(projects))
print("project names: ", [project["name"] for project in projects])

number of projects:  3
project names:  ['pycytominer', 'cyosnake', 'cytotable']


In [3]:
# show the keys available for the projects
projects[0].keys()

dict_keys(['name', 'tags', 'homepage_url', 'repo_url'])

In [4]:
df_projects = pd.DataFrame(
    # create a list of repo data records for a dataframe
    [
        {
            "Project Name": repo.name,
            "GitHub Stars": repo.stargazers_count,
            "GitHub Forks": repo.forks_count,
            "GitHub Watchers": repo.subscribers_count,
            "GitHub Open Issues": repo.get_issues(state="open").totalCount,
            "GitHub Contributors": repo.get_contributors().totalCount,
            "GitHub License Type": repo.get_license().license.spdx_id,
            "Date Created": repo.created_at.replace(tzinfo=pytz.UTC),
            "Date Most Recent Commit": repo.get_commits()[0].commit.author.date.replace(
                tzinfo=pytz.UTC
            ),
            "Duration Created to Most Recent Commit": "",
            "Duration Most Recent Commit to Now": "",
            "Repository Size (KB)": repo.size,
        }
        # make a request for github repo data with pygithub
        for repo in [
            github_client.get_repo(project.repo_url.replace("https://github.com/", ""))
            for project in projects
        ]
    ]
)

# calculate time deltas
df_projects["Duration Created to Most Recent Commit"] = (
    df_projects["Date Most Recent Commit"] - df_projects["Date Created"]
)
df_projects["Duration Most Recent Commit to Now"] = (
    current_datetime - df_projects["Date Most Recent Commit"]
)

# show the result
df_projects

Unnamed: 0,Project Name,GitHub Stars,GitHub Forks,GitHub Watchers,GitHub Open Issues,GitHub Contributors,GitHub License Type,Date Created,Date Most Recent Commit,Duration Created to Most Recent Commit,Duration Most Recent Commit to Now,Repository Size (KB)
0,pycytominer,52,32,6,85,22,BSD-3-Clause,2019-07-03 18:22:51+00:00,2023-10-03 17:40:10+00:00,1552 days 23:17:19,7 days 05:05:55.063346,720941
1,CytoSnake,3,3,0,35,3,CC-BY-4.0,2022-02-15 18:02:45+00:00,2023-09-01 23:09:07+00:00,563 days 05:06:22,38 days 23:36:58.063346,780
2,CytoTable,3,4,4,42,4,BSD-3-Clause,2022-09-08 15:46:25+00:00,2023-10-06 14:01:20+00:00,392 days 22:14:55,4 days 08:44:45.063346,6817
