# GitHub Repository Analyzer - Interactive Demo

This notebook demonstrates how to use the GitHub Repository Analyzer interactively in Jupyter.

## Setup

Before running this notebook:
1. Install dependencies: `pip install -r requirements.txt`
2. Set up your GitHub token in a `.env` file
3. Install Jupyter: `pip install jupyter ipywidgets`

In [1]:
# Import required libraries
import os
import sys
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from dotenv import load_dotenv

# Add parent directory to the path
sys.path.append('..')

# Import the analyzer
from github_repository_analyzer import (
    GitHubRepositoryAnalyzer, analyze_github_trends, VisualizationEngine
)
# Set up plotting
%matplotlib inline
plt.style.use('seaborn-v0_8')
sns.set_palette('viridis')

Install them with: pip install scikit-learn
Install them with: pip install scikit-learn


In [2]:
# Load environment variables and initialize analyzer
load_dotenv()
github_token = os.getenv('GITHUB_TOKEN')

if not github_token:
    print("Please set your GITHUB_TOKEN in a .env file")
else:
    print("GitHub token loaded successfully")
    analyzer = GitHubRepositoryAnalyzer(token=github_token)
    print("Analyzer initialized")

GitHub token loaded successfully
Analyzer initialized


## Example 1: Basic Repository Search and Analysis

In [9]:
import pandas as pd
# Build query
query = ['language:python']
# Execute search
print("Searching for Python ML repositories...")
repositories = analyzer.search_repositories(
    query=query,
    max_repos=150
)
# Process results
print(f"Found {len(repositories)} repositories")
if repositories:
    top_repos = sorted(repositories,
                      key=lambda x: x['stars'],
                      reverse=True)[:10]

    df_top = pd.DataFrame([{
        'Repository': repo['full_name'],
        'Stars': repo['stars'],
        'Forks': repo['forks'],
        'Language': repo.get('language', 'Unknown'),
        'Last Updated': repo['updated_at']
    } for repo in top_repos])

    print("\nTop 10 Repositories:")
    display(df_top)


Searching for Python ML repositories...
Searching for repositories with query: ['language:python']
Total repositories collected: 450
Found 150 repositories

Top 10 Repositories:


Unnamed: 0,Repository,Stars,Forks,Language,Last Updated
0,EbookFoundation/free-programming-books,358732,63489,Python,2025-06-04T10:25:03Z
1,public-apis/public-apis,348964,36688,Python,2025-06-04T10:43:16Z
2,donnemartin/system-design-primer,303795,50274,Python,2025-06-04T10:40:47Z
3,vinta/awesome-python,245422,25748,Python,2025-06-04T10:12:54Z
4,TheAlgorithms/Python,201064,46836,Python,2025-06-04T10:06:21Z
5,Significant-Gravitas/AutoGPT,175858,45759,Python,2025-06-04T10:32:40Z
6,AUTOMATIC1111/stable-diffusion-webui,153157,28503,Python,2025-06-04T10:39:23Z
7,huggingface/transformers,145162,29214,Python,2025-06-04T10:25:15Z
8,ytdl-org/youtube-dl,135944,10354,Python,2025-06-04T09:21:17Z
9,521xueweihan/HelloGitHub,115637,10322,Python,2025-06-04T10:25:57Z


In [7]:
# Perform comprehensive analysis
print("Performing analysis...")

analysis_results = analyzer.analyze_all(
    perform_clustering=True,
    predict_trends=True
)

print("Analysis completed!")

# Display summary statistics
summary = analysis_results.get('summary', {})
print(f"\nSummary Statistics:")
for key, value in summary.items():
    print(f"  • {key.replace('_', ' ').title()}: {value:,}" if isinstance(value, (int, float)) else f"  • {key.replace('_', ' ').title()}: {value}")

Performing analysis...
Starting comprehensive analysis...
Analyzing programming languages...
Analyzing repository topics...
Performing topic clustering...
Generating trend predictions...
Analysis complete!
Analysis completed!

Summary Statistics:
  • Total Repositories: 300
  • Total Stars: 18,958,476
  • Total Forks: 3,204,314
  • Total Watchers: 18,958,476
  • Avg Stars Per Repo: 63,194.9
  • Avg Forks Per Repo: 10,681.0
  • Unique Languages: 1
  • Unique Topics: 573
  • Oldest Repository: 2010-02-16
  • Newest Repository: 2025-03-06
  • Language Diversity Ratio: 0.003


In [15]:
print(f"Repositories collected: {len(analyzer.repositories_data)}")
analyzer = analyze_github_trends(
    queries=query,
    token=github_token,
    time_window='last 6 months',
    max_repos_per_query=150,
)


Repositories collected: 450

Processing query: language:python
Searching for repositories with query: language:python
Time window: last 6 months
Total repositories collected: 0


ValueError: No repository data available. Run search_repositories() first.

## Example 2: Language Analysis and Visualization

In [13]:
# Remove rows with missing or invalid language values
languages = analysis_results.get('languages', [])
df_languages = pd.DataFrame(languages)

# Display top languages
print("Programming Languages Analysis:")
display(df_languages.head(10)[['language', 'count_stars', 'sum_stars', 'mean_stars', 'popularity_score']])

# Use the visualizer's function for plotting
from github_repository_analyzer.visualizer import VisualizationEngine

visualizer = VisualizationEngine(repositories_data=analyzer.repositories_data)
fig = visualizer.create_language_popularity_chart(df_languages, top_n=10)
fig.show()

Programming Languages Analysis:


Unnamed: 0,language,count_stars,sum_stars,mean_stars,popularity_score
0,NaT,300,18958476,63194.92,100.0


## Example 3: Topic Analysis

In [14]:
# Analyze repository topics
from github_repository_analyzer.visualizer import VisualizationEngine

topics = analysis_results.get('topics', [])

if topics:
    df_topics = pd.DataFrame(topics)
    print("Repository Topics Analysis:")
    display(df_topics.head(15))

    visualizer = VisualizationEngine(repositories_data=analyzer.repositories_data)
    fig = visualizer.create_topic_distribution_chart(df_topics, top_n=20)
    fig.show()

Repository Topics Analysis:


Unnamed: 0,topic,count,percentage
0,python,128,42.67
1,deep-learning,56,18.67
2,machine-learning,46,15.33
3,ai,36,12.0
4,pytorch,34,11.33
5,llm,30,10.0
6,hacktoberfest,26,8.67
7,chatgpt,22,7.33
8,data-science,20,6.67
9,framework,18,6.0


## Example 4: Trend Predictions

In [None]:
# Analyze trend predictions
predictions = analysis_results.get('predictions', {})

if predictions:
    print("Trend Predictions:")

    # Create DataFrame for predictions (optional, for tabular display)
    pred_data = []
    for language, pred in predictions.items():
        pred_data.append({
            'Language': language,
            'Current Score': pred['current_score'],
            'Growth Rate (%)': pred['growth_rate'],
            'Trend Direction': pred['trend_direction'],
            'Confidence (%)': pred['confidence']
        })

    df_predictions = pd.DataFrame(pred_data)
    df_predictions = df_predictions.sort_values('Growth Rate (%)', ascending=False)
    display(df_predictions)

    # Use the inbuilt visualization function
    from github_repository_analyzer.visualizer import VisualizationEngine
    visualizer = VisualizationEngine(repositories_data=analyzer.repositories_data)
    fig = visualizer.create_trend_prediction_chart(predictions)
    fig.show()

## Example 5: Interactive Time Window Analysis

In [None]:
# Create an interactive widget for time window selection
import ipywidgets as widgets
from IPython.display import display, clear_output

# Time window options
time_windows = [
    "last 12 months",
    "last 6 months", 
    "last month",
    "this month",
    "last two weeks",
    "last week",
    "today"
]

# Create widgets
query_widget = widgets.Text(
    value="language:javascript topic:react",
    description="Query:",
    style={'description_width': 'initial'}
)

time_widget = widgets.Dropdown(
    options=time_windows,
    value="last 6 months",
    description="Time Window:",
    style={'description_width': 'initial'}
)

max_repos_widget = widgets.IntSlider(
    value=100,
    min=50,
    max=500,
    step=50,
    description="Max Repos:",
    style={'description_width': 'initial'}
)

button = widgets.Button(
    description="Analyze",
    button_style='primary'
)

output = widgets.Output()

def analyze_button_click(b):
    with output:
        clear_output(wait=True)
        print(f"Analyzing: {query_widget.value}")
        print(f"Time window: {time_widget.value}")
        print(f"Max repositories: {max_repos_widget.value}")
        
        try:
            # Clear previous data
            analyzer.repositories_data = []
            
            # Search repositories
            repos = analyzer.search_repositories(
                query=query_widget.value,
                max_repos=max_repos_widget.value,
                time_window=time_widget.value
            )
            
            if repos:
                print(f"Found {len(repos)} repositories")
                
                # Quick analysis
                analysis = analyzer.analyze_all(
                    perform_clustering=False,
                    predict_trends=False
                )
                
                summary = analysis.get('summary', {})
                print(f"\nQuick Stats:")
                print(f"  • Total Stars: {summary.get('total_stars', 0):,}")
                print(f"  • Average Stars: {summary.get('total_stars', 0) / len(repos):.1f}")
                print(f"  • Unique Languages: {summary.get('unique_languages', 0)}")
                
                # Show top repositories
                top_5 = sorted(repos, key=lambda x: x['stars'], reverse=True)[:5]
                print(f"\nTop 5 Repositories:")
                for i, repo in enumerate(top_5, 1):
                    print(f"  {i}. {repo['full_name']} -  {repo['stars']:,}")
            else:
                print("No repositories found")
                
        except Exception as e:
            print(f"Error: {e}")

button.on_click(analyze_button_click)

# Display widgets
print("🎛Interactive Repository Analysis")
display(query_widget, time_widget, max_repos_widget, button, output)

## Example 6: Export Results

In [None]:
# Generate comprehensive insights report
insights = analyzer.generate_insights_report()

print("Insights Report Generated:")
print("\nTop Findings:")
for i, finding in enumerate(insights.get('top_findings', []), 1):
    print(f"  {i}. {finding}")

print("\nRecommendations:")
for i, rec in enumerate(insights.get('recommendations', []), 1):
    print(f"  {i}. {rec}")

# Save all data
print("\nSaving analysis data...")
saved_files = analyzer.save_all_data(filename_prefix="notebook_analysis")

print("Files saved:")
for file_type, filename in saved_files.items():
    print(f"  • {file_type}: {filename}")

## Next Steps

- Try different search queries and time windows
- Explore the saved CSV and JSON files
- Modify the analysis parameters for your specific research needs

Happy analyzing!