# 📊 Benchmarking Analysis: Performance & Cost Evaluation

This notebook provides comprehensive analysis of KubeSentiment's benchmarking framework, including performance metrics, cost analysis, and infrastructure comparisons.

## 🎯 Learning Objectives

By the end of this notebook, you will:
1. Understand the benchmarking framework architecture
2. Analyze performance metrics across different instance types
3. Perform cost-benefit analysis for infrastructure choices
4. Compare CPU vs GPU performance characteristics
5. Generate performance reports and visualizations
6. Understand scaling patterns and optimization opportunities

## 📦 Setup and Dependencies

First, let's install the required dependencies and set up our environment.

In [None]:
# Install required packages for this notebook
# Note: This cell might take a few minutes to run
!pip install -r ../requirements.txt

### ✅ Version Check
Let's check the versions of the installed libraries to ensure our environment is reproducible.

In [None]:
# List installed packages to ensure reproducibility
!pip list

## 🏗️ Benchmarking Framework Overview

### Architecture

The benchmarking framework consists of:

```
Benchmarking Framework
├── Load Testing (locust/load-test.py)
├── Resource Monitoring (resource-monitor.py)
├── Cost Calculator (cost-calculator.py)
├── Report Generator (report-generator.py)
└── Configurations (configs/)
```

### Key Components

1. **Load Testing**: Simulates user traffic with configurable concurrency
2. **Resource Monitoring**: Tracks CPU, memory, and network usage
3. **Cost Analysis**: Calculates costs per prediction and total TCO
4. **Report Generation**: Creates HTML/PDF reports with visualizations
5. **Infrastructure Configs**: Pre-defined instance types and pricing

In [None]:
# Setup and imports
import os
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('default')
sns.set_palette("husl")

# Define paths
BENCHMARKING_DIR = Path("../../benchmarking")
CONFIGS_DIR = BENCHMARKING_DIR / "configs"
SCRIPTS_DIR = BENCHMARKING_DIR / "scripts"

print("✅ Libraries imported successfully!")
print(f"📁 Benchmarking directory: {BENCHMARKING_DIR.absolute()}")
print(f"⚙️ Configs directory: {CONFIGS_DIR.absolute()}")
print(f"🔧 Scripts directory: {SCRIPTS_DIR.absolute()}")

## 📋 Loading Benchmark Configurations

Let's examine the benchmarking configurations and instance types.

In [None]:
# Load benchmark configurations
def load_benchmark_configs():
    """Load all benchmark configuration files."""
    configs = {}
    
    # Load main benchmark config
    main_config_path = CONFIGS_DIR / "benchmark-config.yaml"
    if main_config_path.exists():
        try:
            import yaml
            with open(main_config_path, 'r') as f:
                configs['main'] = yaml.safe_load(f)
            print("✅ Main benchmark config loaded")
        except ImportError:
            print("⚠️ PyYAML not available, skipping YAML configs")
            configs['main'] = {}
    else:
        print("❌ Main benchmark config not found")
        configs['main'] = {}
    
    # Load CPU instances config
    cpu_config_path = CONFIGS_DIR / "cpu-instances.yaml"
    if cpu_config_path.exists():
        try:
            with open(cpu_config_path, 'r') as f:
                configs['cpu_instances'] = yaml.safe_load(f)
            print("✅ CPU instances config loaded")
        except:
            configs['cpu_instances'] = {}
    
    # Load GPU instances config  
    gpu_config_path = CONFIGS_DIR / "gpu-instances.yaml"
    if gpu_config_path.exists():
        try:
            with open(gpu_config_path, 'r') as f:
                configs['gpu_instances'] = yaml.safe_load(f)
            print("✅ GPU instances config loaded")
        except:
            configs['gpu_instances'] = {}
    
    return configs

# Load configurations
configs = load_benchmark_configs()

print("\n📊 Benchmark Configuration Summary:")
print("=" * 50)

# Main config
if configs.get('main'):
    main_config = configs['main']
    print("🎯 Benchmark Settings:")
    print(f"   Duration: {main_config.get('benchmark', {}).get('duration', 'N/A')}")
    print(f"   Concurrent Users: {main_config.get('benchmark', {}).get('concurrent_users', 'N/A')}")
    print(f"   Ramp-up Time: {main_config.get('benchmark', {}).get('ramp_up_time', 'N/A')}")
    
    # Instance types
    cpu_instances = main_config.get('instances', {}).get('cpu', [])
    gpu_instances = main_config.get('instances', {}).get('gpu', [])
    
    print(f"\n🖥️ CPU Instances ({len(cpu_instances)}):")
    for instance in cpu_instances:
        print(f"   • {instance}")
    
    print(f"\n🎮 GPU Instances ({len(gpu_instances)}):")
    for instance in gpu_instances:
        print(f"   • {instance}")
    
    # Costs
    costs = main_config.get('costs', {})
    print(f"\n💰 Cost Configuration ({len(costs)} instances):")
    for instance, cost in costs.items():
        print(f"   • {instance}: ${cost}/hour")
else:
    print("❌ No configuration data available")

# Sample instance data for demonstration
sample_instances = {
    "cpu": [
        {"type": "t3.medium", "vcpu": 2, "memory": "4GB", "cost_per_hour": 0.0416},
        {"type": "c5.large", "vcpu": 2, "memory": "4GB", "cost_per_hour": 0.096},
        {"type": "c5.xlarge", "vcpu": 4, "memory": "8GB", "cost_per_hour": 0.192}
    ],
    "gpu": [
        {"type": "p3.2xlarge", "vcpu": 8, "memory": "61GB", "gpu": "V100", "cost_per_hour": 3.06},
        {"type": "g4dn.xlarge", "vcpu": 4, "memory": "16GB", "gpu": "T4", "cost_per_hour": 0.526}
    ]
}

print("\n📋 Sample Instance Specifications:")
for category, instances in sample_instances.items():
    print(f"\n{category.upper()} Instances:")
    for instance in instances:
        specs = f"{instance['type']}: {instance['vcpu']} vCPU, {instance['memory']} RAM"
        if 'gpu' in instance:
            specs += f", {instance['gpu']} GPU"
        specs += f" - ${instance['cost_per_hour']}/hr"
        print(f"   • {specs}")

## 📈 Sample Benchmark Data Analysis

Let's create and analyze sample benchmark data to demonstrate the analysis capabilities.

In [None]:
# Generate sample benchmark data
def generate_sample_benchmark_data():
    """Generate sample benchmark data for analysis."""
    
    # Instance types
    instances = [
        "t3.medium", "c5.large", "c5.xlarge", "c5.2xlarge",
        "p3.2xlarge", "g4dn.xlarge", "g4dn.2xlarge"
    ]
    
    # User concurrency levels
    concurrency_levels = [1, 5, 10, 20, 50, 100]
    
    data = []
    np.random.seed(42)  # For reproducible results
    
    for instance in instances:
        # Base performance characteristics by instance type
        if instance.startswith("t3"):
            base_rps = 50
            base_latency = 80
            cost_per_hour = 0.0416
        elif instance.startswith("c5.large"):
            base_rps = 120
            base_latency = 45
            cost_per_hour = 0.096
        elif instance.startswith("c5.xlarge"):
            base_rps = 250
            base_latency = 30
            cost_per_hour = 0.192
        elif instance.startswith("c5.2xlarge"):
            base_rps = 500
            base_latency = 25
            cost_per_hour = 0.384
        elif instance.startswith("p3.2xlarge"):
            base_rps = 800
            base_latency = 15
            cost_per_hour = 3.06
        elif instance.startswith("g4dn.xlarge"):
            base_rps = 400
            base_latency = 20
            cost_per_hour = 0.526
        elif instance.startswith("g4dn.2xlarge"):
            base_rps = 750
            base_latency = 18
            cost_per_hour = 0.752
        
        for concurrency in concurrency_levels:
            # Performance degrades with higher concurrency
            degradation_factor = 1 / (1 + (concurrency - 1) * 0.1)
            
            # Add some randomness
            rps_noise = np.random.normal(0, base_rps * 0.1)
            latency_noise = np.random.normal(0, base_latency * 0.2)
            
            actual_rps = max(1, (base_rps * degradation_factor) + rps_noise)
            actual_latency = max(10, (base_latency / degradation_factor) + latency_noise)
            
            # Success rate (degrades at high concurrency)
            success_rate = min(1.0, 0.98 - (concurrency - 10) * 0.005) if concurrency > 10 else 0.98
            
            # CPU and memory usage
            cpu_usage = min(95, 20 + (concurrency * 2) + np.random.normal(0, 5))
            memory_usage = min(90, 30 + (concurrency * 1.5) + np.random.normal(0, 3))
            
            data.append({
                "instance_type": instance,
                "concurrency": concurrency,
                "requests_per_second": round(actual_rps, 2),
                "avg_latency_ms": round(actual_latency, 2),
                "p95_latency_ms": round(actual_latency * 1.5, 2),
                "p99_latency_ms": round(actual_latency * 2.0, 2),
                "success_rate": round(success_rate, 3),
                "cpu_usage_percent": round(cpu_usage, 2),
                "memory_usage_percent": round(memory_usage, 2),
                "cost_per_hour": cost_per_hour,
                "cost_per_1000_predictions": round((cost_per_hour / actual_rps) * 1000, 4)
            })
    
    return pd.DataFrame(data)

# Generate sample data
benchmark_df = generate_sample_benchmark_data()

print("📊 Sample Benchmark Data Generated:")
print("=" * 50)
print(f"📋 Total data points: {len(benchmark_df)}")
print(f"🖥️ Instance types: {benchmark_df['instance_type'].nunique()}")
print(f"👥 Concurrency levels: {benchmark_df['concurrency'].nunique()}")

print("\n🔍 Data Preview:")
display(benchmark_df.head(10))

print("\n📈 Summary Statistics:")
display(benchmark_df.describe())

## 📊 Performance Analysis

Let's analyze the performance characteristics across different instance types and concurrency levels.

In [None]:
# Performance analysis visualizations
fig, axes = plt.subplots(3, 3, figsize=(18, 15))
fig.suptitle('KubeSentiment Benchmark Analysis', fontsize=16, fontweight='bold')

# 1. Throughput by instance type
throughput_data = benchmark_df.groupby('instance_type')['requests_per_second'].mean().sort_values(ascending=True)
axes[0, 0].barh(range(len(throughput_data)), throughput_data.values)
axes[0, 0].set_yticks(range(len(throughput_data)))
axes[0, 0].set_yticklabels(throughput_data.index)
axes[0, 0].set_xlabel('Requests per Second')
axes[0, 0].set_title('Average Throughput by Instance Type')
axes[0, 0].grid(True, alpha=0.3)

# 2. Latency by instance type
latency_data = benchmark_df.groupby('instance_type')['avg_latency_ms'].mean().sort_values()
axes[0, 1].bar(range(len(latency_data)), latency_data.values)
axes[0, 1].set_xticks(range(len(latency_data)))
axes[0, 1].set_xticklabels(latency_data.index, rotation=45, ha='right')
axes[0, 1].set_ylabel('Average Latency (ms)')
axes[0, 1].set_title('Average Latency by Instance Type')
axes[0, 1].grid(True, alpha=0.3)

# 3. Cost per 1000 predictions
cost_data = benchmark_df.groupby('instance_type')['cost_per_1000_predictions'].mean().sort_values()
colors = ['green' if x < 0.01 else 'orange' if x < 0.05 else 'red' for x in cost_data.values]
bars = axes[0, 2].bar(range(len(cost_data)), cost_data.values, color=colors)
axes[0, 2].set_xticks(range(len(cost_data)))
axes[0, 2].set_xticklabels(cost_data.index, rotation=45, ha='right')
axes[0, 2].set_ylabel('Cost ($/1000 predictions)')
axes[0, 2].set_title('Cost Efficiency by Instance Type')
axes[0, 2].set_yscale('log')
axes[0, 2].grid(True, alpha=0.3)

# 4. Throughput vs Concurrency (selected instances)
selected_instances = ['t3.medium', 'c5.xlarge', 'p3.2xlarge', 'g4dn.xlarge']
for instance in selected_instances:
    instance_data = benchmark_df[benchmark_df['instance_type'] == instance]
    axes[1, 0].plot(instance_data['concurrency'], instance_data['requests_per_second'], 
                   marker='o', label=instance, linewidth=2)
axes[1, 0].set_xlabel('Concurrent Users')
axes[1, 0].set_ylabel('Requests per Second')
axes[1, 0].set_title('Throughput Scaling by Instance Type')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# 5. Latency vs Concurrency
for instance in selected_instances:
    instance_data = benchmark_df[benchmark_df['instance_type'] == instance]
    axes[1, 1].plot(instance_data['concurrency'], instance_data['avg_latency_ms'], 
                   marker='s', label=instance, linewidth=2)
axes[1, 1].set_xlabel('Concurrent Users')
axes[1, 1].set_ylabel('Average Latency (ms)')
axes[1, 1].set_title('Latency Scaling by Instance Type')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

# 6. Resource utilization
cpu_data = benchmark_df.groupby('instance_type')['cpu_usage_percent'].mean()
memory_data = benchmark_df.groupby('instance_type')['memory_usage_percent'].mean()

x = np.arange(len(cpu_data))
width = 0.35

axes[1, 2].bar(x - width/2, cpu_data.values, width, label='CPU', alpha=0.8)
axes[1, 2].bar(x + width/2, memory_data.values, width, label='Memory', alpha=0.8)
axes[1, 2].set_xticks(x)
axes[1, 2].set_xticklabels(cpu_data.index, rotation=45, ha='right')
axes[1, 2].set_ylabel('Usage (%)')
axes[1, 2].set_title('Average Resource Utilization')
axes[1, 2].legend()
axes[1, 2].grid(True, alpha=0.3)

# 7. Success rate distribution
success_data = benchmark_df.groupby('instance_type')['success_rate'].mean().sort_values(ascending=True)
axes[2, 0].barh(range(len(success_data)), success_data.values)
axes[2, 0].set_yticks(range(len(success_data)))
axes[2, 0].set_yticklabels(success_data.index)
axes[2, 0].set_xlabel('Success Rate')
axes[2, 0].set_title('Success Rate by Instance Type')
axes[2, 0].axvline(0.95, color='red', linestyle='--', alpha=0.7, label='95% Target')
axes[2, 0].legend()
axes[2, 0].grid(True, alpha=0.3)

# 8. Performance per dollar
perf_per_dollar = benchmark_df.groupby('instance_type').agg({
    'requests_per_second': 'mean',
    'cost_per_hour': 'first'
})
perf_per_dollar['performance_per_dollar'] = perf_per_dollar['requests_per_second'] / perf_per_dollar['cost_per_hour']
perf_per_dollar = perf_per_dollar.sort_values('performance_per_dollar', ascending=True)

axes[2, 1].barh(range(len(perf_per_dollar)), perf_per_dollar['performance_per_dollar'].values)
axes[2, 1].set_yticks(range(len(perf_per_dollar)))
axes[2, 1].set_yticklabels(perf_per_dollar.index)
axes[2, 1].set_xlabel('Requests per Second per Dollar')
axes[2, 1].set_title('Performance per Dollar (Efficiency)')
axes[2, 1].grid(True, alpha=0.3)

# 9. Cost breakdown
cost_breakdown = benchmark_df.groupby('instance_type')['cost_per_hour'].first().sort_values()
axes[2, 2].pie(cost_breakdown.values, labels=cost_breakdown.index, autopct='%1.1f%%')
axes[2, 2].set_title('Cost Distribution by Instance Type')

plt.tight_layout()
plt.show()

# Performance summary
print("📊 Performance Analysis Summary:")
print("=" * 50)

# Best performers
best_throughput = benchmark_df.loc[benchmark_df['requests_per_second'].idxmax()]
best_latency = benchmark_df.loc[benchmark_df['avg_latency_ms'].idxmin()]
best_efficiency = perf_per_dollar.loc[perf_per_dollar['performance_per_dollar'].idxmax()]

print(f"🚀 Best Throughput: {best_throughput['instance_type']} - {best_throughput['requests_per_second']} RPS")
print(f"⚡ Best Latency: {best_latency['instance_type']} - {best_latency['avg_latency_ms']}ms")
print(f"💰 Best Efficiency: {best_efficiency.name} - {best_efficiency['performance_per_dollar']:.1f} RPS/$")
print(f"📈 Average Success Rate: {benchmark_df['success_rate'].mean():.1%}")
print(f"💸 Average Cost/1000 Predictions: ${benchmark_df['cost_per_1000_predictions'].mean():.4f}")

# Instance type categorization
cpu_instances = benchmark_df[~benchmark_df['instance_type'].str.contains('p3|g4dn')]
gpu_instances = benchmark_df[benchmark_df['instance_type'].str.contains('p3|g4dn')]

print(f"\n🖥️ CPU Instances Average: {cpu_instances['requests_per_second'].mean():.1f} RPS, ${cpu_instances['cost_per_1000_predictions'].mean():.4f}/1000pred")
print(f"🎮 GPU Instances Average: {gpu_instances['requests_per_second'].mean():.1f} RPS, ${gpu_instances['cost_per_1000_predictions'].mean():.4f}/1000pred")

# Scaling analysis
scaling_efficiency = benchmark_df.groupby('concurrency')['requests_per_second'].mean()
print(f"\n📊 Scaling Efficiency: {scaling_efficiency.pct_change().mean():.1%} avg improvement per concurrency level")

## 💰 Cost-Benefit Analysis

Let's perform a detailed cost-benefit analysis to help choose the optimal infrastructure.

In [None]:
# Cost-benefit analysis
def perform_cost_benefit_analysis(benchmark_df, daily_predictions=100000, days_per_month=30):
    """Perform comprehensive cost-benefit analysis."""
    
    # Monthly predictions
    monthly_predictions = daily_predictions * days_per_month
    
    # Calculate costs for each instance
    analysis_df = benchmark_df.groupby('instance_type').agg({
        'requests_per_second': 'mean',
        'avg_latency_ms': 'mean',
        'cost_per_hour': 'first',
        'cost_per_1000_predictions': 'mean',
        'success_rate': 'mean',
        'cpu_usage_percent': 'mean',
        'memory_usage_percent': 'mean'
    }).reset_index()
    
    # Calculate monthly costs and capacity
    hours_per_month = 24 * days_per_month
    analysis_df['monthly_cost'] = analysis_df['cost_per_hour'] * hours_per_month
    analysis_df['monthly_capacity'] = analysis_df['requests_per_second'] * 3600 * hours_per_month
    analysis_df['capacity_utilization'] = (monthly_predictions / analysis_df['monthly_capacity']) * 100
    
    # Calculate number of instances needed
    analysis_df['instances_needed'] = np.ceil(monthly_predictions / analysis_df['monthly_capacity'])
    analysis_df['total_monthly_cost'] = analysis_df['monthly_cost'] * analysis_df['instances_needed']
    analysis_df['cost_per_prediction'] = analysis_df['total_monthly_cost'] / monthly_predictions
    
    # Efficiency metrics
    analysis_df['performance_efficiency'] = analysis_df['requests_per_second'] / analysis_df['cost_per_hour']
    analysis_df['resource_efficiency'] = (analysis_df['cpu_usage_percent'] + analysis_df['memory_usage_percent']) / 2
    
    # Overall score (weighted combination)
    analysis_df['overall_score'] = (
        (1 / analysis_df['cost_per_prediction']) * 0.4 +  # 40% weight on cost efficiency
        analysis_df['performance_efficiency'] * 0.3 +     # 30% weight on performance
        analysis_df['success_rate'] * 0.3                  # 30% weight on reliability
    )
    
    return analysis_df.sort_values('overall_score', ascending=False)

# Perform analysis for different scenarios
scenarios = [
    {"name": "Small Scale", "daily_predictions": 10000, "description": "10K predictions/day (startup)"},
    {"name": "Medium Scale", "daily_predictions": 100000, "description": "100K predictions/day (growing company)"},
    {"name": "Large Scale", "daily_predictions": 1000000, "description": "1M predictions/day (enterprise)"}
]

cost_analysis_results = {}

for scenario in scenarios:
    print(f"💰 Analyzing {scenario['name']} Scenario: {scenario['description']}")
    print("-" * 60)
    
    analysis = perform_cost_benefit_analysis(benchmark_df, 
                                           daily_predictions=scenario['daily_predictions'])
    cost_analysis_results[scenario['name']] = analysis
    
    # Show top 3 recommendations
    top_3 = analysis.head(3)[['instance_type', 'instances_needed', 'total_monthly_cost', 
                              'cost_per_prediction', 'capacity_utilization', 'overall_score']]
    
    print(f"🏆 Top 3 Recommendations for {scenario['daily_predictions']:,} daily predictions:")
    for idx, row in top_3.iterrows():
        print(f"  {idx+1}. {row['instance_type']}")
        print(f"     • Instances needed: {int(row['instances_needed'])}")
        print(f"     • Monthly cost: ${row['total_monthly_cost']:,.2f}")
        print(f"     • Cost per prediction: ${row['cost_per_prediction']:.6f}")
        print(f"     • Capacity utilization: {row['capacity_utilization']:.1f}%")
        print(f"     • Overall score: {row['overall_score']:.2f}")
        print()

# Comparative analysis visualization
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Cost-Benefit Analysis: Different Scale Scenarios', fontsize=16, fontweight='bold')

scenario_colors = ['blue', 'green', 'red']

for idx, (scenario_name, analysis) in enumerate(cost_analysis_results.items()):
    color = scenario_colors[idx]
    
    # Cost per prediction
    axes[0, 0].bar(np.arange(len(analysis)) + idx*0.25, analysis['cost_per_prediction'], 
                   width=0.25, label=scenario_name, color=color, alpha=0.7)
    
    # Total monthly cost
    axes[0, 1].bar(np.arange(len(analysis)) + idx*0.25, analysis['total_monthly_cost'], 
                   width=0.25, label=scenario_name, color=color, alpha=0.7)
    
    # Instances needed
    axes[0, 2].bar(np.arange(len(analysis)) + idx*0.25, analysis['instances_needed'], 
                   width=0.25, label=scenario_name, color=color, alpha=0.7)
    
    # Capacity utilization
    axes[1, 0].scatter(analysis['capacity_utilization'], analysis['cost_per_prediction'], 
                       label=scenario_name, color=color, s=100, alpha=0.7)
    
    # Performance efficiency
    axes[1, 1].bar(np.arange(len(analysis)) + idx*0.25, analysis['performance_efficiency'], 
                   width=0.25, label=scenario_name, color=color, alpha=0.7)
    
    # Overall score
    axes[1, 2].bar(np.arange(len(analysis)) + idx*0.25, analysis['overall_score'], 
                   width=0.25, label=scenario_name, color=color, alpha=0.7)

# Set labels and formatting
axes[0, 0].set_title('Cost per Prediction by Instance')
axes[0, 0].set_ylabel('Cost ($)')
axes[0, 0].set_yscale('log')
axes[0, 0].set_xticks(np.arange(len(analysis)))
axes[0, 0].set_xticklabels(analysis['instance_type'], rotation=45, ha='right')

axes[0, 1].set_title('Total Monthly Cost')
axes[0, 1].set_ylabel('Cost ($)')
axes[0, 1].set_xticks(np.arange(len(analysis)))
axes[0, 1].set_xticklabels(analysis['instance_type'], rotation=45, ha='right')

axes[0, 2].set_title('Instances Needed')
axes[0, 2].set_ylabel('Number of Instances')
axes[0, 2].set_xticks(np.arange(len(analysis)))
axes[0, 2].set_xticklabels(analysis['instance_type'], rotation=45, ha='right')

axes[1, 0].set_title('Capacity Utilization vs Cost')
axes[1, 0].set_xlabel('Capacity Utilization (%)')
axes[1, 0].set_ylabel('Cost per Prediction ($)')
axes[1, 0].set_yscale('log')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

axes[1, 1].set_title('Performance Efficiency')
axes[1, 1].set_ylabel('RPS per Dollar')
axes[1, 1].set_xticks(np.arange(len(analysis)))
axes[1, 1].set_xticklabels(analysis['instance_type'], rotation=45, ha='right')

axes[1, 2].set_title('Overall Recommendation Score')
axes[1, 2].set_ylabel('Score')
axes[1, 2].set_xticks(np.arange(len(analysis)))
axes[1, 2].set_xticklabels(analysis['instance_type'], rotation=45, ha='right')

# Add legends
for ax in [axes[0, 0], axes[0, 1], axes[0, 2], axes[1, 1], axes[1, 2]]:
    ax.legend()

plt.tight_layout()
plt.show()

# Print key insights
print("💡 Cost-Benefit Analysis Insights:")
print("=" * 50)

# Best overall performers
for scenario_name, analysis in cost_analysis_results.items():
    best = analysis.iloc[0]
    print(f"🏆 {scenario_name}: Best choice is {best['instance_type']}")
    print(f"   • Cost per prediction: ${best['cost_per_prediction']:.6f}")
    print(f"   • Instances needed: {int(best['instances_needed'])}")
    print(f"   • Monthly cost: ${best['total_monthly_cost']:,.2f}")
    print(f"   • Capacity utilization: {best['capacity_utilization']:.1f}%")
    print()

# GPU vs CPU comparison
medium_scale = cost_analysis_results['Medium Scale']
gpu_instances = medium_scale[medium_scale['instance_type'].str.contains('p3|g4dn')]
cpu_instances = medium_scale[~medium_scale['instance_type'].str.contains('p3|g4dn')]

if len(gpu_instances) > 0 and len(cpu_instances) > 0:
    print("🎮 GPU vs CPU Comparison (Medium Scale):")
    print(f"   • Best GPU: {gpu_instances.iloc[0]['instance_type']} (${gpu_instances.iloc[0]['cost_per_prediction']:.6f}/pred)")
    print(f"   • Best CPU: {cpu_instances.iloc[0]['instance_type']} (${cpu_instances.iloc[0]['cost_per_prediction']:.6f}/pred)")
    
    gpu_cpu_ratio = gpu_instances.iloc[0]['cost_per_prediction'] / cpu_instances.iloc[0]['cost_per_prediction']
    print(f"   • GPU cost ratio vs CPU: {gpu_cpu_ratio:.2f}x")
    
    if gpu_cpu_ratio < 1:
        print("   ✅ GPUs are more cost-effective for this workload")
    else:
        print("   ✅ CPUs are more cost-effective for this workload")

## 🧪 Automated Testing

We can integrate automated tests directly into our notebooks using `pytest`.

In [None]:
# Create a simple test file
test_code = """
import pandas as pd

def test_data_generation():
    # Test that the sample data can be generated successfully
    df = generate_sample_benchmark_data()
    assert isinstance(df, pd.DataFrame), \"Should be a pandas DataFrame\"
    assert not df.empty, \"DataFrame should not be empty\"

def test_cost_analysis():
    # Test that the cost analysis runs without errors
    df = generate_sample_benchmark_data()
    analysis = perform_cost_benefit_analysis(df)
    assert isinstance(analysis, pd.DataFrame), \"Should be a pandas DataFrame\"

## 🎯 Recommendations Engine

Let's create an intelligent recommendation system that suggests optimal infrastructure based on requirements.

In [None]:
# Intelligent recommendation engine
def recommend_instances(requirements):
    """
    Recommend optimal instances based on requirements.
    
    Args:
        requirements: Dict with keys:
            - daily_predictions: int
            - max_latency_ms: float
            - max_cost_per_prediction: float
            - reliability_target: float (0-1)
            - prefer_gpu: bool
    """
    
    # Filter instances based on requirements
    filtered_df = benchmark_df.copy()
    
    # Filter by latency requirement
    if requirements.get('max_latency_ms'):
        filtered_df = filtered_df[filtered_df['avg_latency_ms'] <= requirements['max_latency_ms']]
    
    # Filter by cost requirement
    if requirements.get('max_cost_per_prediction'):
        filtered_df = filtered_df[filtered_df['cost_per_1000_predictions'] <= requirements['max_cost_per_prediction'] * 1000]
    
    # Filter by reliability requirement
    if requirements.get('reliability_target'):
        filtered_df = filtered_df[filtered_df['success_rate'] >= requirements['reliability_target']]
    
    # Filter by GPU preference
    if requirements.get('prefer_gpu') is True:
        filtered_df = filtered_df[filtered_df['instance_type'].str.contains('p3|g4dn')]
    elif requirements.get('prefer_gpu') is False:
        filtered_df = filtered_df[~filtered_df['instance_type'].str.contains('p3|g4dn')]
    
    if len(filtered_df) == 0:
        return {"error": "No instances meet the specified requirements", "requirements": requirements}
    
    # Score remaining instances
    scored_df = filtered_df.groupby('instance_type').agg({
        'requests_per_second': 'mean',
        'avg_latency_ms': 'mean',
        'cost_per_1000_predictions': 'mean',
        'success_rate': 'mean',
        'cost_per_hour': 'first'
    }).reset_index()
    
    # Calculate capacity and cost for requirements
    daily_predictions = requirements.get('daily_predictions', 100000)
    monthly_predictions = daily_predictions * 30
    hours_per_month = 24 * 30
    
    scored_df['monthly_capacity'] = scored_df['requests_per_second'] * 3600 * hours_per_month
    scored_df['instances_needed'] = np.ceil(monthly_predictions / scored_df['monthly_capacity'])
    scored_df['total_monthly_cost'] = scored_df['cost_per_hour'] * hours_per_month * scored_df['instances_needed']
    scored_df['capacity_utilization'] = (monthly_predictions / scored_df['monthly_capacity']) * 100
    
    # Calculate recommendation score
    scored_df['recommendation_score'] = (
        (1 / scored_df['cost_per_1000_predictions']) * 0.4 +  # Cost efficiency
        (scored_df['requests_per_second'] / scored_df['cost_per_hour']) * 0.3 +  # Performance per dollar
        scored_df['success_rate'] * 0.3  # Reliability
    )
    
    # Sort by recommendation score
    recommendations = scored_df.sort_values('recommendation_score', ascending=False)
    
    # Return top recommendations
    top_recommendations = []
    for _, row in recommendations.head(3).iterrows():
        top_recommendations.append({
            "instance_type": row['instance_type'],
            "instances_needed": int(row['instances_needed']),
            "total_monthly_cost": round(row['total_monthly_cost'], 2),
            "cost_per_prediction": round(row['total_monthly_cost'] / monthly_predictions, 6),
            "capacity_utilization": round(row['capacity_utilization'], 1),
            "avg_latency_ms": round(row['avg_latency_ms'], 1),
            "success_rate": round(row['success_rate'], 3),
            "recommendation_score": round(row['recommendation_score'], 2)
        })
    
    return {
        "requirements": requirements,
        "recommendations": top_recommendations,
        "total_candidates": len(recommendations)
    }

# Test the recommendation engine with different scenarios
recommendation_scenarios = [
    {
        "name": "Budget-Conscious Startup",
        "requirements": {
            "daily_predictions": 50000,
            "max_latency_ms": 100,
            "max_cost_per_prediction": 0.0001,
            "reliability_target": 0.95,
            "prefer_gpu": False
        }
    },
    {
        "name": "Performance-Focused Enterprise",
        "requirements": {
            "daily_predictions": 500000,
            "max_latency_ms": 50,
            "max_cost_per_prediction": 0.001,
            "reliability_target": 0.99,
            "prefer_gpu": True
        }
    },
    {
        "name": "Balanced Medium Business",
        "requirements": {
            "daily_predictions": 200000,
            "max_latency_ms": 75,
            "max_cost_per_prediction": 0.0005,
            "reliability_target": 0.97,
            "prefer_gpu": None  # No preference
        }
    }
]

print("🎯 Intelligent Infrastructure Recommendations:")
print("=" * 60)

for scenario in recommendation_scenarios:
    print(f"\n🏢 Scenario: {scenario['name']}")
    print(f"📊 Requirements: {scenario['requirements']['daily_predictions']:,} daily predictions")
    print(f"   • Max latency: {scenario['requirements']['max_latency_ms']}ms")
    print(f"   • Max cost/prediction: ${scenario['requirements']['max_cost_per_prediction']:.6f}")
    print(f"   • Reliability target: {scenario['requirements']['reliability_target']:.1%}")
    
    if scenario['requirements'].get('prefer_gpu') is True:
        print("   • GPU preferred: Yes")
    elif scenario['requirements'].get('prefer_gpu') is False:
        print("   • GPU preferred: No")
    else:
        print("   • GPU preferred: No preference")
    
    recommendation = recommend_instances(scenario['requirements'])
    
    if "error" in recommendation:
        print(f"❌ {recommendation['error']}")
        continue
    
    print(f"\n🏆 Top Recommendations ({recommendation['total_candidates']} candidates considered):")
    
    for i, rec in enumerate(recommendation['recommendations'][:2], 1):
        print(f"\n   {i}. {rec['instance_type']}")
        print(f"      • Instances needed: {rec['instances_needed']}")
        print(f"      • Monthly cost: ${rec['total_monthly_cost']:,.2f}")
        print(f"      • Cost per prediction: ${rec['cost_per_prediction']:.6f}")
        print(f"      • Capacity utilization: {rec['capacity_utilization']:.1f}%")
        print(f"      • Average latency: {rec['avg_latency_ms']:.1f}ms")
        print(f"      • Success rate: {rec['success_rate']:.1%}")
        print(f"      • Recommendation score: {rec['recommendation_score']:.2f}")
    
    print("-" * 60)

# Interactive recommendation tool
def interactive_recommendation():
    """Interactive recommendation tool."""
    print("\n🎮 Interactive Recommendation Tool")
    print("-" * 40)
    
    try:
        daily_preds = int(input("Daily predictions (e.g., 100000): ") or "100000")
        max_latency = float(input("Max latency in ms (e.g., 100): ") or "100")
        max_cost = float(input("Max cost per prediction (e.g., 0.0001): ") or "0.0001")
        reliability = float(input("Reliability target (0-1, e.g., 0.95): ") or "0.95")
        
        gpu_pref_input = input("Prefer GPU? (yes/no/auto): ").lower().strip()
        if gpu_pref_input == "yes":
            prefer_gpu = True
        elif gpu_pref_input == "no":
            prefer_gpu = False
        else:
            prefer_gpu = None
        
        requirements = {
            "daily_predictions": daily_preds,
            "max_latency_ms": max_latency,
            "max_cost_per_prediction": max_cost,
            "reliability_target": reliability,
            "prefer_gpu": prefer_gpu
        }
        
        result = recommend_instances(requirements)
        
        if "error" in result:
            print(f"❌ {result['error']}")
            return
        
        print("\n🏆 Recommendation for your requirements:")
        best = result['recommendations'][0]
        print(f"   Instance: {best['instance_type']}")
        print(f"   Instances needed: {best['instances_needed']}")
        print(f"   Monthly cost: ${best['total_monthly_cost']:,.2f}")
        print(f"   Cost per prediction: ${best['cost_per_prediction']:.6f}")
        
    except KeyboardInterrupt:
        print("\n👋 Exiting interactive mode")
    except Exception as e:
        print(f"\n❌ Error: {e}")
        print("💡 Try using the predefined scenarios above")

# Uncomment to run interactive mode
# interactive_recommendation()

print("\n💡 To use the interactive recommendation tool, uncomment the last line above!")
print("\n🎉 Benchmarking Analysis Complete!")
print("\n📋 Key Takeaways:")
print("   • Use the recommendation engine to find optimal infrastructure")
print("   • Consider both performance and cost when choosing instances")
print("   • GPU instances excel at high-throughput scenarios")
print("   • CPU instances are often more cost-effective for moderate loads")
print("   • Monitor capacity utilization to optimize resource usage")
print("\n🚀 Next: Explore monitoring metrics and alerting in the next notebook!")