# GitHub Repository Analyzer for Azure Solutions

This notebook analyzes GitHub repositories containing Azure solutions and provides insights about their capabilities, deployment methods, and differentiation from other solutions.

## Setup

First, let's set up the environment and load our dependencies:

In [1]:
# Import necessary libraries
import os
import json
import pandas as pd
from dotenv import load_dotenv
from IPython.display import display, Markdown, HTML

# Load environment variables from .env file
load_dotenv()

# Import our helper modules
from utils.github_analyzer import GitHubRepoAnalyzer
from utils.openai_helper import AzureOpenAIAnalyzer

## Check Azure OpenAI Configuration

Let's make sure our Azure OpenAI configuration is properly set up:

In [2]:
# Check if Azure OpenAI credentials are set up
required_vars = {
    "AZURE_OPENAI_ENDPOINT": os.getenv("AZURE_OPENAI_ENDPOINT"),
    "AZURE_OPENAI_KEY": os.getenv("AZURE_OPENAI_KEY"),
    "AZURE_OPENAI_DEPLOYMENT": os.getenv("AZURE_OPENAI_DEPLOYMENT")
}

missing_vars = [var for var, value in required_vars.items() if not value]

if missing_vars:
    print(f"⚠️ Missing environment variables: {', '.join(missing_vars)}")
    print("Please create a .env file with the required Azure OpenAI credentials.")
else:
    print("✅ Azure OpenAI credentials configured successfully!")
    
# Check GitHub token (optional)
if os.getenv("GITHUB_TOKEN"):
    print("✅ GitHub token found - higher rate limits will be available")
else:
    print("ℹ️ No GitHub token found - using anonymous access (rate limited)")

✅ Azure OpenAI credentials configured successfully!
✅ GitHub token found - higher rate limits will be available


## GitHub Repository Analysis

Now let's create a function to analyze a GitHub repository and extract the required information:

In [3]:
def analyze_github_repository(repo_url, verbose = False):
    """Analyze a GitHub repository and extract information"""
    print(f"Analyzing repository: {repo_url}")
    
    # Step 1: Initialize analyzers
    github_analyzer = GitHubRepoAnalyzer()
    openai_analyzer = AzureOpenAIAnalyzer()
    
    # Step 2: Extract repository information
    print("Extracting repository information...")
    try:
        repo_data = github_analyzer.extract_repo_info(repo_url)
        
        if verbose:
            print("Repository information: \n", json.dumps(repo_data, indent=2))

    except Exception as e:
        print(f"Error extracting repository data: {str(e)}")
        return None
    
    # Step 3: Analyze with Azure OpenAI
    print("Analyzing repository with Azure OpenAI...")
    try:
        analysis_results = openai_analyzer.analyze_repo(repo_data)
    except Exception as e:
        print(f"Error during OpenAI analysis: {str(e)}")
        return None
    
    # Step 4: Format results for display
    print("Analysis complete!")
    return analysis_results

def format_analysis_results(repo_url, results):
    """Format analysis results into a structured dictionary"""
    # Extract values and explanations from results
    formatted = {
        "Repository URL": repo_url,
        "RBAC Enabled?": f"{results.get('rbac_enabled', {}).get('answer', 'Unknown')} ({results.get('rbac_enabled', {}).get('confidence_level', 'low')})",
        "Azure Gov Ready?": f"{results.get('azure_gov_ready', {}).get('answer', 'Unknown')} ({results.get('azure_gov_ready', {}).get('confidence_level', 'low')})",
        "Azure Secret Ready": f"{results.get('azure_secret_ready', {}).get('answer', 'Unknown')} ({results.get('azure_secret_ready', {}).get('confidence_level', 'low')})",
        "Chat History?": f"{results.get('chat_history', {}).get('answer', 'Unknown')} ({results.get('chat_history', {}).get('confidence_level', 'low')})",
        "# of Azure Services?": f"{results.get('azure_services_count', {}).get('answer', 'Unknown')} ({results.get('azure_services_count', {}).get('confidence_level', 'low')})",
        "Arch Diagram": f"{results.get('architecture_diagram_present', {}).get('answer', 'Unknown')} ({results.get('architecture_diagram_present', {}).get('confidence_level', 'low')})",
        "Differentiators from AskSage": results.get('differentiators_from_asksage', {}).get('answer', 'Unknown'),
        "Differentiators from NIPRGPT": results.get('differentiators_from_niprgpt', {}).get('answer', 'Unknown'),
        "Differentiators from CamoGPT": results.get('differentiators_from_camogpt', {}).get('answer', 'Unknown'),
        "Deployment Method": f"{results.get('deployment_method', {}).get('answer', 'Unknown')} ({results.get('deployment_method', {}).get('confidence_level', 'low')})",
        "Costs Estimate": f"{results.get('costs_estimate', {}).get('answer', 'Unknown')} ({results.get('costs_estimate', {}).get('confidence_level', 'low')})",
        "Notes": results.get('notes', {}).get('answer', '')
    }
    
    return formatted

## Analyze a Single Repository

Let's analyze a single GitHub repository and display the results:

In [4]:
# Enter the GitHub repository URL to analyze
# repo_url = input("Enter the GitHub repository URL to analyze: ")
repo_url = "https://github.com/microsoft/azurechat" # Example URL for testing

In [5]:
# # Analyze the repository
# analysis_result = analyze_github_repository(repo_url)

# analysis_result_formatted = format_analysis_results(repo_url, analysis_result)

# # Display results as a DataFrame for better visibility
# if analysis_result_formatted:
#     # Convert to DataFrame
#     df = pd.DataFrame.from_dict(analysis_result)
#     display(HTML(df.to_html()))
    
#     # Also save as CSV
#     repo_name = repo_url.rstrip('/').split('/')[-1]
#     filename = f"data/{repo_name}_analysis.csv"
#     df.to_csv(filename)
#     print(f"Analysis saved to {filename}")

## Batch Analysis

Now let's create functionality to analyze multiple repositories at once and compare them:

In [6]:
def analyze_multiple_repositories(repo_urls):
    """Analyze multiple GitHub repositories and return a combined DataFrame"""
    results = []
    
    for url in repo_urls:
        print(f"\n{'='*50}\nAnalyzing: {url}\n{'='*50}")
        result = analyze_github_repository(url)
        if result:
            # Extract repository name
            repo_name = url.rstrip('/').split('/')[-1]
            result_with_name = {"Repository Name": repo_name, **result}
            results.append(result_with_name)
    
    # Convert results to DataFrame
    if results:
        # Create DataFrame and set Repository Name as index
        df = pd.DataFrame(results)
        df.set_index("Repository Name", inplace=True)
        
        # Save combined results to CSV
        df.to_csv("data/repository_comparison.csv")
        print(f"\nAnalysis of {len(results)} repositories saved to data/repository_comparison.csv")
        
        return df
    else:
        print("No valid analysis results")
        return None

In [10]:
# Example usage for batch analysis
# Uncomment and modify the list of URLs as needed

repo_urls = [
"https://github.com/microsoft/azurechat",
# "https://github.com/microsoft/PubSec-Info-Assistant",
# "https://github.com/Azure-Samples/azure-search-openai-demo",
# "https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator",
# "https://github.com/microsoft/simplechat",
# "https://github.com/Azure-Samples/azure-ai-vercel-rag-starter",
# "https://github.com/Azure-Samples/azure-search-openai-javascript",
# "https://github.com/Azure-Samples/serverless-chat-langchainjs",
# "https://github.com/usri/azuregov-search-knowledge-mining"
]
comparison_df = analyze_multiple_repositories(repo_urls)
display(HTML(comparison_df.to_html()))


Analyzing: https://github.com/microsoft/azurechat
Analyzing repository: https://github.com/microsoft/azurechat
Extracting repository information...
Analyzing repository with Azure OpenAI...
Analysis complete!

Analysis of 1 repositories saved to data/repository_comparison.csv


Unnamed: 0_level_0,rbac_enabled,azure_gov_ready,azure_secret_ready,chat_history,azure_services_count,architecture_diagram_present,differentiators_from_asksage,differentiators_from_niprgpt,differentiators_from_camogpt,differentiators_from_aiflow,deployment_method,costs_estimate,notes
Repository Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
azurechat,"{'answer': 'No', 'explanation': 'The solution does not explicitly mention RBAC within the application for uploading data or other operations.', 'confidence': 'high'}","{'answer': 'No', 'explanation': 'The solution does not mention specific readiness for Azure Government Cloud.', 'confidence': 'high'}","{'answer': 'No', 'explanation': 'The solution does not mention specific readiness for Azure Government SECRET or TOP SECRET Cloud.', 'confidence': 'high'}","{'answer': 'Yes', 'explanation': 'The solution uses Azure Cosmos DB to store chat history.', 'confidence': 'high'}","{'answer': 11, 'explanation': 'The solution uses Azure OpenAI, Azure Cosmos DB, Azure App Service, Azure AI Document Intelligence, Azure AI Search, Azure AI Speech, Azure Key Vault, Azure Blob Storage, Azure Monitor, Azure Developer CLI, and Azure Private Endpoints.', 'confidence': 'high'}","{'answer': 'Yes', 'explanation': 'The solution includes a high-level architecture diagram in the documentation.', 'confidence': 'high'}","{'answer': ['It is not deployed in Azure Government Cloud', 'It enables the use of organization data', 'It does not offer tool usage'], 'explanation': 'AzureChat is not specifically mentioned as being deployed in Azure Government Cloud, it allows the use of organizational data, and does not offer tool usage.', 'confidence': 'high'}","{'answer': ['It is not Government Owned', 'It enables agents', 'It offers external APIs', 'It offers tool usage'], 'explanation': 'AzureChat is not government-owned, enables agents, offers external APIs, and offers tool usage.', 'confidence': 'high'}","{'answer': ['It is not Government Owned', 'It enables agents', 'It enables the use of organization data', 'It offers tool usage'], 'explanation': 'AzureChat is not government-owned, enables agents, enables the use of organizational data, and offers tool usage.', 'confidence': 'high'}","{'answer': ['It is not Government Owned', 'It is not deployed in Azure Government Cloud'], 'explanation': 'AzureChat is not government-owned and is not specifically mentioned as being deployed in Azure Government Cloud.', 'confidence': 'high'}","{'answer': 'Azure Developer CLI and GitHub Actions', 'explanation': 'The solution can be deployed using Azure Developer CLI or GitHub Actions.', 'confidence': 'high'}","{'answer': 'Variable', 'explanation': 'The cost depends on the Azure services used and their usage. The documentation provides a link to the Azure pricing calculator for an estimate.', 'confidence': 'high'}","{'answer': 'The solution provides a comprehensive set of features for deploying a private chat tenant in Azure, including managed identity-based security, support for private endpoints, and ESLZ compliant deployment.', 'explanation': 'The solution is well-documented and includes various deployment options and security features.', 'confidence': 'high'}"


In [11]:
# Interactively input multiple repositories
def input_multiple_repos():
    print("Enter GitHub repository URLs (one per line). Enter a blank line when finished:")
    urls = []
    while True:
        url = input()
        if not url:
            break
        urls.append(url)
    return urls

# Uncomment to use
# repo_urls = input_multiple_repos()
# if repo_urls:
#     comparison_df = analyze_multiple_repositories(repo_urls)
#     if comparison_df is not None:
#         display(HTML(comparison_df.to_html()))

In [15]:
# Extract the "answer" key from each cell in the dataframe
answers_df = comparison_df.applymap(lambda x: x.get('answer') if isinstance(x, dict) else (x.get('value') if isinstance(x, dict) else x))
# Convert "Yes" to True and "No" to False
answers_df = answers_df.replace({"Yes": True, "No": False})
# Display the resulting dataframe
answers_df

# Save the answers DataFrame to a CSV file
answers_df.to_csv("data/formatted_repo_comparison.csv", index=False)

  answers_df = comparison_df.applymap(lambda x: x.get('answer') if isinstance(x, dict) else (x.get('value') if isinstance(x, dict) else x))
  answers_df = answers_df.replace({"Yes": True, "No": False})


## Conclusion

This notebook demonstrates how to analyze GitHub repositories containing Azure solutions and extract specific information about their features, capabilities, and deployment methods. The analysis is powered by Azure OpenAI and provides insights into important aspects like RBAC support, Azure Government compatibility, architecture diagrams, deployment methods, and cost estimates.

To use this notebook:
1. Ensure your Azure OpenAI credentials are set up in a `.env` file
2. Enter a GitHub repository URL to analyze
3. Review the analysis results
4. Optionally use the batch analysis feature to compare multiple repositories

The analysis results can help you make informed decisions about which Azure solution best meets your requirements.