# GitHub Repository Analyzer - Interactive Demo

This notebook demonstrates how to use the GitHub Repository Analyzer interactively in Jupyter.

## Setup

Before running this notebook:
1. Install dependencies: `pip install -r requirements.txt`
2. Set up your GitHub token in a `.env` file
3. Install Jupyter: `pip install jupyter ipywidgets`

In [2]:
# Import required libraries
import os
import sys
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from dotenv import load_dotenv

import main

# Add parent directory to the path
sys.path.append('..')

# Import the analyzer
from github_repository_analyzer import (
    GitHubRepositoryAnalyzer, analyze_github_trends, VisualizationEngine
)
# Set up plotting
%matplotlib inline
plt.style.use('seaborn-v0_8')
sns.set_palette('viridis')

Install them with: pip install scikit-learn
Install them with: pip install scikit-learn


In [3]:
# Load environment variables and initialize analyzer
load_dotenv()
github_token = os.getenv('GITHUB_TOKEN')

if not github_token:
    print("Please set your GITHUB_TOKEN in a .env file")
else:
    print("GitHub token loaded successfully")
    analyzer = GitHubRepositoryAnalyzer(token=github_token)
    print("Analyzer initialized")

GitHub token loaded successfully
Analyzer initialized


## Example 1: Basic Repository Search and Analysis

In [6]:
import pandas as pd
# Build query
query = ['language:python language:r language:javascript']
# Execute search
print("Searching for Python ML repositories...")
repositories = analyzer.search_repositories(
    query=query,
    max_repos=150
)
# Process results
print(f"Found {len(repositories)} repositories")
if repositories:
    top_repos = sorted(repositories,
                      key=lambda x: x['stars'],
                      reverse=True)[:10]

    df_top = pd.DataFrame([{
        'Repository': repo['full_name'],
        'Stars': repo['stars'],
        'Forks': repo['forks'],
        'Language': repo.get('language', 'Unknown'),
        'Last Updated': repo['updated_at']
    } for repo in top_repos])

    print("\nTop 10 Repositories:")
    display(df_top)


Searching for Python ML repositories...
Searching for repositories with query: ['language:python language:r language:javascript']
Total repositories collected: 450
Found 150 repositories

Top 10 Repositories:


Unnamed: 0,Repository,Stars,Forks,Language,Last Updated
0,EbookFoundation/free-programming-books,358764,63490,Python,2025-06-04T19:42:11Z
1,public-apis/public-apis,349023,36695,Python,2025-06-04T19:40:42Z
2,donnemartin/system-design-primer,303930,50296,Python,2025-06-04T19:40:38Z
3,vinta/awesome-python,245456,25751,Python,2025-06-04T19:00:11Z
4,facebook/react,235984,48667,JavaScript,2025-06-04T19:44:33Z
5,TheAlgorithms/Python,201080,46841,Python,2025-06-04T19:25:15Z
6,trekhleb/javascript-algorithms,191411,30690,JavaScript,2025-06-04T19:42:06Z
7,Significant-Gravitas/AutoGPT,175877,45761,Python,2025-06-04T19:18:29Z
8,AUTOMATIC1111/stable-diffusion-webui,153164,28505,Python,2025-06-04T19:39:09Z
9,airbnb/javascript,146818,26760,JavaScript,2025-06-04T14:59:48Z


In [8]:
# Print the repository count
print(f"Repositories collected: {len(analyzer.repositories_data)}")

# Only perform analysis if repositories were found
if analyzer.repositories_data:
    # Perform the complete analysis
    analysis_results = analyzer.analyze_all()
    analyzer.create_visualizations()

    # Print summary
    analyzer.print_summary()
else:
    print("No repositories found. Check your query and token.")

Repositories collected: 450
Starting comprehensive analysis...
Analyzing programming languages...
Analyzing repository topics...
Performing topic clustering...
Generating trend predictions...
Analysis complete!
Creating visualizations...
Visualizations complete!

Executive Summary
Total Repositories Analyzed: 450
Total Stars: 33.1M
Total Forks: 5.8M
Average Stars Per Repo: 73.6K
Unique Languages: 2
Unique Topics: 742

Top Findings:
1. NaT is the most popular language with 334 repositories
2. 'python' is the most common topic appearing in 34.2% of repositories
3. Repository clustering revealed 10 distinct technology clusters

Recommendations:
1. Consider learning NaT - it's the most active language in the ecosystem
2. Focus on trending topics: python, javascript, deep-learning, machine-learning, hacktoberfest
3. Major technology cluster centers around: data, python, algorithms


In [9]:
# Perform comprehensive analysis
print("Performing analysis...")

analysis_results = analyzer.analyze_all(
    perform_clustering=True,
    predict_trends=True
)

print("Analysis completed!")

# Display summary statistics
summary = analysis_results.get('summary', {})
print(f"\nSummary Statistics:")
for key, value in summary.items():
    print(f"  • {key.replace('_', ' ').title()}: {value:,}" if isinstance(value, (int, float)) else f"  • {key.replace('_', ' ').title()}: {value}")

Performing analysis...
Starting comprehensive analysis...
Analyzing programming languages...
Analyzing repository topics...
Performing topic clustering...
Generating trend predictions...
Analysis complete!
Analysis completed!

Summary Statistics:
  • Total Repositories: 450
  • Total Stars: 33,110,710
  • Total Forks: 5,838,647
  • Total Watchers: 33,110,710
  • Avg Stars Per Repo: 73,579.4
  • Avg Forks Per Repo: 12,974.8
  • Unique Languages: 2
  • Unique Topics: 742
  • Oldest Repository: 2009-04-03
  • Newest Repository: 2025-03-06
  • Language Diversity Ratio: 0.004


## Example 2: Language Analysis and Visualization

In [10]:
# Remove rows with missing or invalid language values
languages = analysis_results.get('languages', [])
df_languages = pd.DataFrame(languages)

# Display top languages
print("Programming Languages Analysis:")
display(df_languages.head(10)[['language', 'count_stars', 'sum_stars', 'mean_stars', 'popularity_score']])

# Use the visualizer's function for plotting
from github_repository_analyzer.visualizer import VisualizationEngine

visualizer = VisualizationEngine(repositories_data=analyzer.repositories_data)
fig = visualizer.create_language_popularity_chart(df_languages, top_n=10)
fig.show()

Programming Languages Analysis:


Unnamed: 0,language,count_stars,sum_stars,mean_stars,popularity_score
0,NaT,334,24232996,72553.88,98.96
1,NaT,116,8877714,76532.02,48.7


## Example 3: Topic Analysis

In [11]:
# Analyze repository topics
from github_repository_analyzer.visualizer import VisualizationEngine

topics = analysis_results.get('topics', [])

if topics:
    df_topics = pd.DataFrame(topics)
    print("Repository Topics Analysis:")
    display(df_topics.head(15))

    visualizer = VisualizationEngine(repositories_data=analyzer.repositories_data)
    fig = visualizer.create_topic_distribution_chart(df_topics, top_n=20)
    fig.show()

Repository Topics Analysis:


Unnamed: 0,topic,count,percentage
0,python,154,34.22
1,javascript,65,14.44
2,deep-learning,60,13.33
3,machine-learning,53,11.78
4,hacktoberfest,41,9.11
5,ai,40,8.89
6,pytorch,39,8.67
7,llm,37,8.22
8,chatgpt,33,7.33
9,openai,28,6.22


## Example 4: Trend Predictions

In [12]:
# Analyze trend predictions
predictions = analysis_results.get('predictions', {})

if predictions:
    print("Trend Predictions:")

    # Create DataFrame for predictions (optional, for tabular display)
    pred_data = []
    for language, pred in predictions.items():
        pred_data.append({
            'Language': language,
            'Current Score': pred['current_score'],
            'Growth Rate (%)': pred['growth_rate'],
            'Trend Direction': pred['trend_direction'],
            'Confidence (%)': pred['confidence']
        })

    df_predictions = pd.DataFrame(pred_data)
    df_predictions = df_predictions.sort_values('Growth Rate (%)', ascending=False)
    display(df_predictions)

    # Use the inbuilt visualization function
    from github_repository_analyzer.visualizer import VisualizationEngine
    visualizer = VisualizationEngine(repositories_data=analyzer.repositories_data)
    fig = visualizer.create_trend_prediction_chart(predictions)
    fig.show()

Trend Predictions:


Unnamed: 0,Language,Current Score,Growth Rate (%),Trend Direction,Confidence (%)
0,NaT,48.7,-0.24,down,99.51796


## Example 5: Interactive Time Window Analysis

In [13]:
# Create an interactive widget for time window selection
import ipywidgets as widgets
from IPython.display import display, clear_output

# Time window options
time_windows = [
    "last 12 months",
    "last 6 months", 
    "last month",
    "this month",
    "last two weeks",
    "last week",
    "today"
]

# Create widgets
query_widget = widgets.Text(
    value="language:javascript topic:react",
    description="Query:",
    style={'description_width': 'initial'}
)

time_widget = widgets.Dropdown(
    options=time_windows,
    value="last 6 months",
    description="Time Window:",
    style={'description_width': 'initial'}
)

max_repos_widget = widgets.IntSlider(
    value=100,
    min=50,
    max=500,
    step=50,
    description="Max Repos:",
    style={'description_width': 'initial'}
)

button = widgets.Button(
    description="Analyze",
    button_style='primary'
)

output = widgets.Output()

def analyze_button_click(b):
    with output:
        clear_output(wait=True)
        print(f"Analyzing: {query_widget.value}")
        print(f"Time window: {time_widget.value}")
        print(f"Max repositories: {max_repos_widget.value}")
        
        try:
            # Clear previous data
            analyzer.repositories_data = []
            
            # Search repositories
            repos = analyzer.search_repositories(
                query=query_widget.value,
                max_repos=max_repos_widget.value,
                time_window=time_widget.value
            )
            
            if repos:
                print(f"Found {len(repos)} repositories")
                
                # Quick analysis
                analysis = analyzer.analyze_all(
                    perform_clustering=False,
                    predict_trends=False
                )
                
                summary = analysis.get('summary', {})
                print(f"\nQuick Stats:")
                print(f"  • Total Stars: {summary.get('total_stars', 0):,}")
                print(f"  • Average Stars: {summary.get('total_stars', 0) / len(repos):.1f}")
                print(f"  • Unique Languages: {summary.get('unique_languages', 0)}")
                
                # Show top repositories
                top_5 = sorted(repos, key=lambda x: x['stars'], reverse=True)[:5]
                print(f"\nTop 5 Repositories:")
                for i, repo in enumerate(top_5, 1):
                    print(f"  {i}. {repo['full_name']} -  {repo['stars']:,}")
            else:
                print("No repositories found")
                
        except Exception as e:
            print(f"Error: {e}")

button.on_click(analyze_button_click)

# Display widgets
print("🎛Interactive Repository Analysis")
display(query_widget, time_widget, max_repos_widget, button, output)

🎛Interactive Repository Analysis


Text(value='language:javascript topic:react', description='Query:', style=TextStyle(description_width='initial…

Dropdown(description='Time Window:', index=1, options=('last 12 months', 'last 6 months', 'last month', 'this …

IntSlider(value=100, description='Max Repos:', max=500, min=50, step=50, style=SliderStyle(description_width='…

Button(button_style='primary', description='Analyze', style=ButtonStyle())

Output()

## Example 6: Export Results

In [14]:
# Generate comprehensive insights report
insights = analyzer.generate_insights_report()

print("Insights Report Generated:")
print("\nTop Findings:")
for i, finding in enumerate(insights.get('top_findings', []), 1):
    print(f"  {i}. {finding}")

print("\nRecommendations:")
for i, rec in enumerate(insights.get('recommendations', []), 1):
    print(f"  {i}. {rec}")

# Save all data
print("\nSaving analysis data...")
saved_files = analyzer.save_all_data(filename_prefix="notebook_analysis")

print("Files saved:")
for file_type, filename in saved_files.items():
    print(f"  • {file_type}: {filename}")

Insights Report Generated:

Top Findings:

Recommendations:

Saving analysis data...


AttributeError: type object 'FileManager' has no attribute 'save_to_json'

## Next Steps

- Try different search queries and time windows
- Explore the saved CSV and JSON files
- Modify the analysis parameters for your specific research needs

Happy analyzing!