Skip to content

claestom/github-traffic-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Traffic Analytics

Track GitHub repository traffic (views and clones) over time. GitHub only exposes the last 14 days; this tool captures data each day, appends it to a central CSV in your storage account, and preserves the history so you can analyze older traffic for free.

What It Does

  • Fetches views and clones data for all your public repositories
  • Stores historical data (GitHub only keeps 14 days)
  • First run: Backfills 14 days of historical data (T-15 through T-2)
  • Subsequent runs: Captures data from 2 days ago (T-2) daily, ensuring that the metrics are updated in GitHub (often there is a delay)
  • Outputs: CSV with date columns showing views(clones) per repository. Sample output here.
  • Visualize metrics in a PowerBI report, with filters to drill into a single repo or any selection

PBI report

Usage Options

Get the code

# Clone and enter the repo
git clone https://github.com/claestom/github-traffic-analytics.git
cd github-traffic-analytics

Option 1: Azure Functions (automated)

Deploy to Azure for fully automated daily collection at 7 AM CET. On first run - when no CSV gets detected, the function backfills 14 days of history. Subsequent runs collect data from 2 days prior (T-2) to ensure data is captured before GitHub purges it.

Prerequisites:

  • Azure subscription
  • Azure CLI installed
  • PowerShell 7.4+

Deploy:

# 1. Login to Azure
az login

Tip: copy-paste the script into a text editor, fill in the values, then run it as a whole from inside the github-traffic-analytics folder.

# 2. Set variables
$rg = "<rg-name>"
$location = "<region of deployment>"

# 3. Create resource group
az group create --name $rg --location $location

# 4. Deploy infrastructure (storage, function app, identity)
az deployment group create `
  --resource-group $rg `
  --template-file infra/main.bicep `
  --parameters infra/main.bicepparam

# 5. Get function app name from outputs
$funcApp = az functionapp list -g $rg --query "[0].name" -o tsv

# 6. Set GitHub credentials in the Azure Functions config
az functionapp config appsettings set -g $rg -n $funcApp `
  --settings GITHUB_TOKEN="ghp_new_token" GITHUB_USERNAME="your_username"

# 7. Publish function code
cd azure-function
$env:FUNCTIONS_WORKER_RUNTIME = "powershell"
$env:FUNCTIONS_WORKER_RUNTIME_VERSION = "7.4"
func azure functionapp publish $funcApp --nozip --powershell 

What Gets Created:

  • Storage Account (stores CSV in metrics container)
  • Function App (PowerShell 7.4, Consumption plan)
  • User-Assigned Managed Identity (secure storage access)
  • Application Insights (monitoring)

(optional) Once deployed:

Test the Azure Function:

testfunction

Explained here as well.

Architecture:

architecture

Option 2: Local Script

Run manually or via scheduled task on your local machine.

Setup:

# 1. Create .env file in root directory
GITHUB_TOKEN=ghp_your_token_here
GITHUB_USERNAME=your_username

# 2. Run the script
cd src
.\github-traffic-metrics.ps1

Output: outputs/github-traffic-metrics.csv

Linux users:

sudo apt-get install -y powershell
pwsh ./src/github-traffic-metrics.ps1

Schedule (optional):

  • Windows: Task Scheduler
  • Linux/Mac: cron job

Power BI Usage (SAS + .pbit)

Use the template in powerbi/ to visualize the CSV with your own storage.

Generate a blob SAS (how to do it)

How to generate a blob SAS

  • Scope: blob-level, permissions: r (read only), set expiry date.

  • Desktop setup:

    • Open the .pbit → enter the SAS URL when prompted.
    • Data source credentials: choose Anonymous (SAS is in the URL).
    • Refresh to apply all existing transform steps.
  • Publish to Service:

    • Dataset → Settings → Data source credentials → set to Anonymous for the blob domain.
    • Enable Scheduled refresh (ensure SAS expiry is sufficient).
  • SAS scope: prefer blob-level SAS with r permission only; rotate with brief overlap.

Common Use Cases

  • Overall insights across repositories: use the line chart with Date on the X-axis and Views/Clones as values; leave the Repository slicer empty to see totals.
  • Single repository focus: select one repo in the Repository slicer; visuals filter to that repo’s traffic only.
  • Specific time period: use the Date slicer (Between) to limit visuals to a range (e.g., last 7 days).
  • Example: Understand activities after a release window: pick the repo in the slicer and set the Date slicer to the release week to analyze impact.

Examples

Overall insights across repositories

PBI report

Traffic for two repos for a specific time period

selection

Analyze traffic after new release (use of time range and repo filter)

featurerelease

Cost Estimate

  • Azure Functions (Consumption): ~30s daily run (up to 2 minutes on first run with backfill) stays in the free grant → ~$0.
  • Storage (Blob, LRS, Hot): Few MB CSV + tiny egress → <$0.10/month.
  • Power BI: Desktop free; sharing in Service needs Pro/PPU (only if you publish).

At low volume the stack is effectively free; costs grow with higher frequency, larger blobs, or publishing to Power BI Service.

More information about pricing: Azure services and Power BI.

Roadmap

Planned enhancements to deepen traffic insights:

  • Unique Views & Clones: Track distinct visitors and unique IPs alongside total metrics to measure repeat engagement.
  • Referring Websites: Capture traffic sources (Twitter, Reddit, blogs, search engines) to measure campaign impact.

The CSV-based approach ensures backward compatibility while unlocking richer insights.

Contributions

Contributions are always appreciated! Open an issue to report bugs or suggest features, or submit a PR to improve the solution.

About

Capture and visualize repository traffic beyond GitHub’s 14‑day window.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors