# Optimizer and Pruning Profiling Analysis

This notebook analyzes and visualizes the results from profiling runs of both the optimization-integrated model and the pure pruning benchmark. It helps compare different optimization levels and pruning approaches to understand their impact on performance and efficiency.

## Features

- Load and compare profiling results from multiple runs
- Visualize performance across different pruning levels
- Compare original vs. optimized model implementations
- Create interactive charts for key metrics
- Analyze component-level performance breakdown

In [None]:
# Import necessary libraries
import os
import json
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from IPython.display import display, HTML, Markdown

# Set plotting style
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (12, 8)

## Load Profiling Results

First, let's load the profiling results from both the pure pruning benchmark and the optimization profiling runs.

In [None]:
# Function to load profiling results
def load_profiling_results(directory="profiling_results"):
    """Load profiling results from the given directory."""
    results = {}
    
    # Check if the directory exists
    if not os.path.exists(directory):
        print(f"Directory {directory} not found")
        return results
    
    # Load full model profiling results
    full_model_path = os.path.join(directory, "full_model", "full_model_profiling.json")
    if os.path.exists(full_model_path):
        with open(full_model_path, "r") as f:
            results["full_model"] = json.load(f)
            print(f"Loaded full model profiling results")
    
    # Load other profiling results
    result_files = glob.glob(os.path.join(directory, "*.json"))
    for file_path in result_files:
        try:
            with open(file_path, "r") as f:
                file_name = os.path.basename(file_path)
                results[file_name] = json.load(f)
                print(f"Loaded {file_name}")
        except Exception as e:
            print(f"Error loading {file_path}: {e}")
    
    return results

# Function to load pure pruning benchmark results
def load_pruning_results(directory="pure_pruning_results"):
    """Load pure pruning benchmark results from the given directory."""
    results = {}
    
    # Check if the directory exists
    if not os.path.exists(directory):
        print(f"Directory {directory} not found")
        return results
    
    # Find subdirectories that contain benchmark results
    subdirs = [d for d in os.listdir(directory) if os.path.isdir(os.path.join(directory, d))]
    
    for subdir in subdirs:
        subdir_path = os.path.join(directory, subdir)
        
        # Load benchmark config
        config_path = os.path.join(subdir_path, "config.json")
        if os.path.exists(config_path):
            with open(config_path, "r") as f:
                config = json.load(f)
                
            # Load metrics
            metrics_path = os.path.join(subdir_path, "metrics")
            if os.path.exists(metrics_path):
                metrics_files = glob.glob(os.path.join(metrics_path, "*.json"))
                if metrics_files:
                    with open(metrics_files[0], "r") as f:
                        metrics = json.load(f)
                        
                    results[subdir] = {
                        "config": config,
                        "metrics": metrics
                    }
                    print(f"Loaded benchmark results from {subdir}")
    
    return results

# Load both types of results
profiling_results = load_profiling_results()
pruning_results = load_pruning_results()

## Analyze Full Model Profiling Results

Let's analyze the full model profiling results to understand the performance characteristics of different optimization levels and pruning strategies.

In [None]:
def analyze_full_model_profiling(results):
    """Analyze full model profiling results."""
    if "full_model" not in results:
        print("Full model profiling results not found")
        return
    
    full_model = results["full_model"]
    
    # Display basic information
    if "args" in full_model:
        args = full_model["args"]
        display(HTML(f"<h3>Model Information</h3>"))
        display(HTML(f"<p><b>Model:</b> {args.get('model_name', 'N/A')}</p>"))
        display(HTML(f"<p><b>Device:</b> {args.get('device', 'N/A')}</p>"))
        display(HTML(f"<p><b>Optimization Level:</b> {args.get('optimization_level', 'N/A')}</p>"))
        display(HTML(f"<p><b>Pruning Levels:</b> {args.get('pruning_levels', 'N/A')}</p>"))
    
    # Model loading analysis
    if "model_loading" in full_model:
        loading = full_model["model_loading"]
        display(HTML(f"<h3>Model Loading Comparison</h3>"))
        
        # Create comparison table
        data = {
            "Model Type": ["Baseline", "Original", "Optimized"],
            "Load Time (s)": [
                loading["baseline_model"]["load_time"],
                loading["original_model"]["load_time"],
                loading["optimized_model"]["load_time"]
            ],
            "Parameters": [
                loading["baseline_model"]["parameter_count"],
                loading["original_model"]["parameter_count"],
                loading["optimized_model"]["parameter_count"]
            ],
            "Memory (MB)": [
                loading["baseline_model"]["memory_usage"] / (1024**2),
                loading["original_model"]["memory_usage"] / (1024**2),
                loading["optimized_model"]["memory_usage"] / (1024**2)
            ]
        }
        
        df = pd.DataFrame(data)
        display(df)
        
        # Plot loading time comparison
        plt.figure(figsize=(10, 6))
        plt.bar(data["Model Type"], data["Load Time (s)"], color=["lightgray", "dodgerblue", "green"])
        plt.title("Model Loading Time Comparison")
        plt.ylabel("Time (seconds)")
        plt.grid(axis="y", alpha=0.3)
        
        # Add value labels
        for i, v in enumerate(data["Load Time (s)"]):
            plt.text(i, v + 0.1, f"{v:.2f}s", ha="center")
        
        plt.show()
    
    # Pruning comparison analysis
    if "pruning_comparison" in full_model:
        display(HTML(f"<h3>Pruning Performance Comparison</h3>"))
        
        pruning_data = full_model["pruning_comparison"]
        pruning_levels = sorted([int(level) for level in pruning_data["original"].keys()])
        
        # Create DataFrame
        data = {
            "Pruning Level": [],
            "Original TPS": [],
            "Optimized TPS": [],
            "Speedup": []
        }
        
        for level in pruning_levels:
            level_str = str(level)
            original_tps = pruning_data["original"][level_str]["tokens_per_second"]
            optimized_tps = pruning_data["optimized"][level_str]["tokens_per_second"]
            speedup = optimized_tps / original_tps
            
            data["Pruning Level"].append(f"{level}%")
            data["Original TPS"].append(original_tps)
            data["Optimized TPS"].append(optimized_tps)
            data["Speedup"].append(speedup)
        
        df = pd.DataFrame(data)
        display(df)
        
        # Plot tokens per second comparison
        plt.figure(figsize=(12, 6))
        plt.plot(pruning_levels, [pruning_data["original"][str(level)]["tokens_per_second"] for level in pruning_levels], 
                 'o-', label="Original", color="dodgerblue", linewidth=2)
        plt.plot(pruning_levels, [pruning_data["optimized"][str(level)]["tokens_per_second"] for level in pruning_levels], 
                 'o-', label="Optimized", color="green", linewidth=2)
        
        plt.title("Generation Speed vs. Pruning Level")
        plt.xlabel("Pruning Level (%)")
        plt.ylabel("Tokens per Second")
        plt.grid(True, alpha=0.3)
        plt.legend()
        plt.xticks(pruning_levels)
        plt.show()
        
        # Plot speedup factors
        plt.figure(figsize=(12, 6))
        bars = plt.bar(pruning_levels, [data["Speedup"][i] for i in range(len(pruning_levels))], color="coral")
        plt.axhline(y=1.0, color='k', linestyle='--', alpha=0.3)
        
        plt.title("Speedup Factor by Pruning Level")
        plt.xlabel("Pruning Level (%)")
        plt.ylabel("Speedup Factor (Optimized / Original)")
        plt.grid(axis="y", alpha=0.3)
        plt.xticks(pruning_levels)
        
        # Add value labels
        for i, bar in enumerate(bars):
            height = bar.get_height()
            plt.text(bar.get_x() + bar.get_width()/2., height + 0.05, f"{height:.2f}x", ha="center")
        
        plt.show()
    
    # Component breakdown analysis
    if "component_breakdown" in full_model:
        display(HTML(f"<h3>Component-Level Performance Analysis</h3>"))
        
        breakdown = full_model["component_breakdown"]
        
        # Check if data is available
        if "original" in breakdown and "optimized" in breakdown:
            # Get components with percentage data
            if "percentages" in breakdown["original"] and "percentages" in breakdown["optimized"]:
                orig_pct = breakdown["original"]["percentages"]
                opt_pct = breakdown["optimized"]["percentages"]
                
                # Get common components
                components = list(set(orig_pct.keys()) & set(opt_pct.keys()))
                
                # Create DataFrame
                data = {
                    "Component": [c.replace("_", " ").title() for c in components],
                    "Original %": [orig_pct.get(c, 0) for c in components],
                    "Optimized %": [opt_pct.get(c, 0) for c in components],
                    "Change": [opt_pct.get(c, 0) - orig_pct.get(c, 0) for c in components]
                }
                
                df = pd.DataFrame(data)
                display(df)
                
                # Plot component time distribution
                plt.figure(figsize=(12, 6))
                x = np.arange(len(components))
                width = 0.35
                
                plt.bar(x - width/2, [orig_pct.get(c, 0) for c in components], width, 
                        label="Original", color="dodgerblue")
                plt.bar(x + width/2, [opt_pct.get(c, 0) for c in components], width,
                        label="Optimized", color="green")
                
                plt.title("Component Time Distribution")
                plt.xlabel("Component")
                plt.ylabel("Time (%)")
                plt.xticks(x, [c.replace("_", " ").title() for c in components], rotation=45, ha="right")
                plt.legend()
                plt.grid(axis="y", alpha=0.3)
                plt.tight_layout()
                plt.show()
    
    # Integration optimization tests
    if "integration_tests" in full_model:
        display(HTML(f"<h3>Integration Optimization Results</h3>"))
        
        integration_data = full_model["integration_tests"]
        configs = list(integration_data.keys())
        
        # Get a representative pruning level
        if configs and integration_data[configs[0]]:
            level = next(iter(integration_data[configs[0]].keys()))
            
            # Create DataFrame
            data = {
                "Configuration": [c.replace("_", " ").title() for c in configs],
                "Tokens/sec": [integration_data[c][level]["tokens_per_second"] for c in configs]
            }
            
            # Add speedup relative to original
            if "original" in configs:
                original_tps = integration_data["original"][level]["tokens_per_second"]
                data["Speedup vs. Original"] = [integration_data[c][level]["tokens_per_second"] / original_tps for c in configs]
            
            df = pd.DataFrame(data)
            display(df)
            
            # Plot comparison bar chart
            plt.figure(figsize=(12, 6))
            bars = plt.bar(data["Configuration"], data["Tokens/sec"], 
                          color=["dodgerblue" if c == "Original" else "green" for c in data["Configuration"]])
            
            plt.title(f"Integration Optimization Comparison (at {level}% pruning)")
            plt.xlabel("Configuration")
            plt.ylabel("Tokens per Second")
            plt.grid(axis="y", alpha=0.3)
            plt.xticks(rotation=45, ha="right")
            
            # Add value labels
            for i, bar in enumerate(bars):
                height = bar.get_height()
                plt.text(bar.get_x() + bar.get_width()/2., height + 0.5, f"{height:.1f}", ha="center")
            
            plt.tight_layout()
            plt.show()

# Run the full model analysis if results are available
if profiling_results and "full_model" in profiling_results:
    analyze_full_model_profiling(profiling_results)
else:
    print("Full model profiling results not available")

## Analyze Pure Pruning Benchmark Results

Let's analyze the results from the pure pruning benchmark to understand the efficiency benefits of pruning in isolation from other optimizations.

In [None]:
def analyze_pure_pruning_results(results):
    """Analyze pure pruning benchmark results."""
    if not results:
        print("No pure pruning benchmark results found")
        return
    
    # Select the first benchmark result for detailed analysis
    benchmark_id = next(iter(results))
    benchmark = results[benchmark_id]
    
    # Display basic configuration
    config = benchmark["config"]
    display(HTML(f"<h3>Benchmark Configuration</h3>"))
    display(HTML(f"<p><b>Model:</b> {config.get('model_name', 'N/A')}</p>"))
    display(HTML(f"<p><b>Pruning Strategy:</b> {config.get('pruning_strategy', 'N/A')}</p>"))
    display(HTML(f"<p><b>Pruning Method:</b> {config.get('pruning_method', 'N/A')}</p>"))
    display(HTML(f"<p><b>Target Sparsity:</b> {config.get('target_sparsity', 'N/A')}</p>"))
    display(HTML(f"<p><b>Epochs:</b> {config.get('epochs', 'N/A')}</p>"))
    
    # Analyze metrics over time
    metrics = benchmark["metrics"]
    display(HTML(f"<h3>Metrics Over Time</h3>"))
    
    if "epochs" in metrics:
        epochs = sorted([int(e) for e in metrics["epochs"]])
        
        # Create plots for key metrics
        key_metrics = [
            ("perplexity", "Perplexity", "Lower is better"),
            ("active_heads_percentage", "Active Heads (%)", "Lower means more pruning"),
            ("inference_latency", "Inference Latency (ms/token)", "Lower is better"),
            ("lexical_diversity", "Lexical Diversity", "Higher is better"),
            ("repetition_score", "Repetition Score", "Lower is better")
        ]
        
        plt.figure(figsize=(15, 12))
        
        for i, (metric_key, metric_name, description) in enumerate(key_metrics):
            if metric_key in metrics:
                metric_values = [metrics[metric_key].get(str(e), None) for e in epochs]
                metric_values = [v for v in metric_values if v is not None]
                
                if metric_values:
                    plt.subplot(3, 2, i+1)
                    plt.plot(epochs[:len(metric_values)], metric_values, 'o-', color=f"C{i}", linewidth=2)
                    plt.title(f"{metric_name} Over Time ({description})")
                    plt.xlabel("Epoch")
                    plt.ylabel(metric_name)
                    plt.grid(True, alpha=0.3)
                    
                    # Add phase markers
                    if config.get('pruning_start_epoch'):
                        plt.axvline(x=int(config['pruning_start_epoch']), color='r', linestyle='--', alpha=0.5,
                                   label="Start Pruning")
                    if config.get('pruning_end_epoch'):
                        plt.axvline(x=int(config['pruning_end_epoch']), color='g', linestyle='--', alpha=0.5,
                                  label="Start Fine-tuning")
                    
                    if i == 0:  # Only add legend to the first plot
                        plt.legend()
        
        plt.tight_layout()
        plt.show()
        
        # Create table of final metrics
        if epochs:
            last_epoch = str(max(epochs))
            final_metrics = {}
            
            for metric_key, metric_name, _ in key_metrics:
                if metric_key in metrics and last_epoch in metrics[metric_key]:
                    final_metrics[metric_name] = metrics[metric_key][last_epoch]
            
            display(HTML(f"<h4>Final Metrics (Epoch {last_epoch})</h4>"))
            df = pd.DataFrame(final_metrics.items(), columns=["Metric", "Value"])
            display(df)
    
    # If multiple benchmarks are available, compare them
    if len(results) > 1:
        display(HTML(f"<h3>Comparison Across Benchmarks</h3>"))
        
        # Create comparison data
        comparison_data = {
            "Benchmark": [],
            "Strategy": [],
            "Method": [],
            "Final Perplexity": [],
            "Final Latency": [],
            "Active Heads %": []
        }
        
        for bench_id, bench_data in results.items():
            config = bench_data["config"]
            metrics = bench_data["metrics"]
            
            # Get final epoch
            if "epochs" in metrics:
                epochs = sorted([int(e) for e in metrics["epochs"]])
                if epochs:
                    last_epoch = str(max(epochs))
                    
                    # Extract final metrics
                    final_perplexity = metrics.get("perplexity", {}).get(last_epoch, None)
                    final_latency = metrics.get("inference_latency", {}).get(last_epoch, None)
                    active_heads = metrics.get("active_heads_percentage", {}).get(last_epoch, None)
                    
                    # Add to comparison data
                    comparison_data["Benchmark"].append(bench_id)
                    comparison_data["Strategy"].append(config.get("pruning_strategy", "N/A"))
                    comparison_data["Method"].append(config.get("pruning_method", "N/A"))
                    comparison_data["Final Perplexity"].append(final_perplexity)
                    comparison_data["Final Latency"].append(final_latency)
                    comparison_data["Active Heads %"].append(active_heads)
        
        # Create DataFrame and display
        if comparison_data["Benchmark"]:
            df = pd.DataFrame(comparison_data)
            display(df)
            
            # Plot comparison
            if len(df) >= 2:
                plt.figure(figsize=(15, 6))
                
                # Plot perplexity comparison
                plt.subplot(1, 2, 1)
                bars = plt.bar(df["Strategy"] + " (" + df["Method"] + ")", df["Final Perplexity"])
                plt.title("Final Perplexity by Approach")
                plt.xlabel("Pruning Approach")
                plt.ylabel("Perplexity (lower is better)")
                plt.grid(axis="y", alpha=0.3)
                plt.xticks(rotation=45, ha="right")
                
                # Add value labels
                for i, bar in enumerate(bars):
                    height = bar.get_height()
                    plt.text(bar.get_x() + bar.get_width()/2., height + 0.1, f"{height:.2f}", ha="center")
                
                # Plot latency comparison
                plt.subplot(1, 2, 2)
                bars = plt.bar(df["Strategy"] + " (" + df["Method"] + ")", df["Final Latency"])
                plt.title("Final Inference Latency by Approach")
                plt.xlabel("Pruning Approach")
                plt.ylabel("Latency (ms/token, lower is better)")
                plt.grid(axis="y", alpha=0.3)
                plt.xticks(rotation=45, ha="right")
                
                # Add value labels
                for i, bar in enumerate(bars):
                    height = bar.get_height()
                    plt.text(bar.get_x() + bar.get_width()/2., height + 0.1, f"{height:.2f}", ha="center")
                
                plt.tight_layout()
                plt.show()

# Run the pure pruning analysis if results are available
if pruning_results:
    analyze_pure_pruning_results(pruning_results)
else:
    print("Pure pruning benchmark results not available")

## Integration Comparison

Let's compare the results from both pure pruning and optimization-integrated profiling to understand how they work together.

In [None]:
def compare_pruning_and_optimization(profiling_results, pruning_results):
    """Compare pruning and optimization results."""
    if not profiling_results or not pruning_results:
        print("Both profiling and pruning results are required for comparison")
        return
    
    display(HTML(f"<h3>Comparison Between Pure Pruning and Optimized Pruning</h3>"))
    
    # Check if we have the necessary data
    if "full_model" not in profiling_results or "pruning_comparison" not in profiling_results["full_model"]:
        print("Missing optimization profiling data for comparison")
        return
    
    # Get data from profiling results
    opt_pruning = profiling_results["full_model"]["pruning_comparison"]
    pruning_levels = sorted([int(level) for level in opt_pruning["original"].keys()])
    
    # Get data from pure pruning results
    pure_benchmark_id = next(iter(pruning_results))
    pure_metrics = pruning_results[pure_benchmark_id]["metrics"]
    
    # Create comparison data
    comparison_data = {
        "Pruning Level": [],
        "Original TPS": [],
        "Optimized TPS": [],
        "Pure Pruning TPS": [],
        "Opt vs Orig": [],
        "Pure vs Orig": []
    }
    
    # Get last epoch for pure pruning
    if "epochs" in pure_metrics:
        epochs = sorted([int(e) for e in pure_metrics["epochs"]])
        if epochs:
            last_epoch = str(max(epochs))
            
            # Find the closest pruning level in pure pruning results
            if "active_heads_percentage" in pure_metrics and last_epoch in pure_metrics["active_heads_percentage"]:
                pure_active = pure_metrics["active_heads_percentage"][last_epoch]
                pure_pruning_level = 100 - pure_active
                pure_latency = None
                if "inference_latency" in pure_metrics and last_epoch in pure_metrics["inference_latency"]:
                    pure_latency = pure_metrics["inference_latency"][last_epoch]
                    pure_tps = 1000 / pure_latency if pure_latency > 0 else 0  # Convert ms/token to tokens/sec
                
                # Add to comparison data
                for level in pruning_levels:
                    level_str = str(level)
                    original_tps = opt_pruning["original"][level_str]["tokens_per_second"]
                    optimized_tps = opt_pruning["optimized"][level_str]["tokens_per_second"]
                    
                    comparison_data["Pruning Level"].append(f"{level}%")
                    comparison_data["Original TPS"].append(original_tps)
                    comparison_data["Optimized TPS"].append(optimized_tps)
                    
                    # Use closest pure pruning level
                    if pure_tps is not None and abs(level - pure_pruning_level) < 20:  # If within 20%
                        comparison_data["Pure Pruning TPS"].append(pure_tps)
                    else:
                        comparison_data["Pure Pruning TPS"].append(None)
                    
                    # Calculate ratios
                    comparison_data["Opt vs Orig"].append(optimized_tps / original_tps if original_tps > 0 else None)
                    if pure_tps is not None and abs(level - pure_pruning_level) < 20 and original_tps > 0:
                        comparison_data["Pure vs Orig"].append(pure_tps / original_tps)
                    else:
                        comparison_data["Pure vs Orig"].append(None)
    
    # Create DataFrame and display
    if comparison_data["Pruning Level"]:
        df = pd.DataFrame(comparison_data)
        display(df)
        
        # Plot comparison
        plt.figure(figsize=(12, 8))
        
        # Filter to only rows with pure pruning data
        valid_rows = [i for i, val in enumerate(comparison_data["Pure Pruning TPS"]) if val is not None]
        
        if valid_rows:
            # Extract valid data
            valid_levels = [pruning_levels[i] for i in valid_rows]
            valid_orig = [comparison_data["Original TPS"][i] for i in valid_rows]
            valid_opt = [comparison_data["Optimized TPS"][i] for i in valid_rows]
            valid_pure = [comparison_data["Pure Pruning TPS"][i] for i in valid_rows]
            
            # Plot tokens per second comparison
            plt.subplot(2, 1, 1)
            plt.plot(valid_levels, valid_orig, 'o-', label="Original", color="dodgerblue", linewidth=2)
            plt.plot(valid_levels, valid_opt, 'o-', label="Optimized", color="green", linewidth=2)
            plt.plot(valid_levels, valid_pure, 'o-', label="Pure Pruning", color="orange", linewidth=2)
            
            plt.title("Generation Speed Comparison")
            plt.xlabel("Pruning Level (%)")
            plt.ylabel("Tokens per Second")
            plt.grid(True, alpha=0.3)
            plt.legend()
            plt.xticks(valid_levels)
            
            # Plot speedup ratios
            plt.subplot(2, 1, 2)
            valid_opt_ratio = [comparison_data["Opt vs Orig"][i] for i in valid_rows]
            valid_pure_ratio = [comparison_data["Pure vs Orig"][i] for i in valid_rows]
            
            plt.plot(valid_levels, valid_opt_ratio, 'o-', label="Optimized / Original", color="green", linewidth=2)
            plt.plot(valid_levels, valid_pure_ratio, 'o-', label="Pure / Original", color="orange", linewidth=2)
            plt.axhline(y=1.0, color='k', linestyle='--', alpha=0.3)
            
            plt.title("Speedup Ratio Comparison")
            plt.xlabel("Pruning Level (%)")
            plt.ylabel("Speedup Ratio (>1 is better)")
            plt.grid(True, alpha=0.3)
            plt.legend()
            plt.xticks(valid_levels)
            
            plt.tight_layout()
            plt.show()
            
            # Calculate and display the best overall approach
            best_opt = max(valid_opt_ratio) if valid_opt_ratio else 0
            best_pure = max(valid_pure_ratio) if valid_pure_ratio else 0
            
            display(HTML(f"<h4>Performance Summary</h4>"))
            display(HTML(f"<p>Best speedup with optimized pruning: <b>{best_opt:.2f}x</b></p>"))
            display(HTML(f"<p>Best speedup with pure pruning: <b>{best_pure:.2f}x</b></p>"))
            
            if best_opt > best_pure:
                improvement = (best_opt / best_pure - 1) * 100
                display(HTML(f"<p>Combined optimization provides <b>{improvement:.1f}%</b> better performance than pruning alone</p>"))
            else:
                improvement = (best_pure / best_opt - 1) * 100
                display(HTML(f"<p>Pure pruning provides <b>{improvement:.1f}%</b> better performance than combined optimization in this case</p>"))

# Run the comparison if both sets of results are available
if profiling_results and pruning_results:
    compare_pruning_and_optimization(profiling_results, pruning_results)
else:
    print("Both profiling and pruning results are required for comparison")

## Conclusion and Recommendations

Based on the analysis above, we can provide recommendations for the most efficient configuration of pruning and optimization.

In [None]:
def generate_recommendations(profiling_results, pruning_results):
    """Generate recommendations based on the analysis."""
    display(HTML(f"<h3>Conclusions and Recommendations</h3>"))
    
    recommendations = []
    
    # Analyze pruning level recommendations
    if profiling_results and "full_model" in profiling_results and "pruning_comparison" in profiling_results["full_model"]:
        # Get the pruning data
        pruning_data = profiling_results["full_model"]["pruning_comparison"]
        pruning_levels = sorted([int(level) for level in pruning_data["original"].keys()])
        
        # Find the best pruning level for speed
        best_level = 0
        best_speedup = 0
        for level in pruning_levels:
            level_str = str(level)
            if level_str in pruning_data["optimized"] and level_str in pruning_data["original"]:
                speedup = pruning_data["optimized"][level_str]["tokens_per_second"] / pruning_data["original"][level_str]["tokens_per_second"]
                if speedup > best_speedup:
                    best_speedup = speedup
                    best_level = level
        
        recommendations.append(f"**Optimal Pruning Level**: {best_level}% pruning provides the best speed improvement ({best_speedup:.2f}x).")
    
    # Analyze pruning strategy recommendations
    if pruning_results and len(pruning_results) > 1:
        # Compare strategies
        strategy_performance = {}
        
        for bench_id, bench_data in pruning_results.items():
            config = bench_data["config"]
            metrics = bench_data["metrics"]
            
            # Get final epoch
            if "epochs" in metrics:
                epochs = sorted([int(e) for e in metrics["epochs"]])
                if epochs:
                    last_epoch = str(max(epochs))
                    
                    # Extract strategy and method
                    strategy = config.get("pruning_strategy")
                    method = config.get("pruning_method")
                    
                    if strategy and method:
                        key = f"{strategy}_{method}"
                        
                        # Extract metrics
                        if "inference_latency" in metrics and last_epoch in metrics["inference_latency"]:
                            latency = metrics["inference_latency"][last_epoch]
                            strategy_performance[key] = {
                                "strategy": strategy,
                                "method": method,
                                "latency": latency,
                                "tps": 1000 / latency if latency > 0 else 0
                            }
                            
                            if "perplexity" in metrics and last_epoch in metrics["perplexity"]:
                                strategy_performance[key]["perplexity"] = metrics["perplexity"][last_epoch]
        
        # Find best strategy for speed
        if strategy_performance:
            best_strategy = max(strategy_performance.items(), key=lambda x: x[1]["tps"])
            recommendations.append(f"**Best Pruning Strategy**: {best_strategy[1]['strategy']} pruning with {best_strategy[1]['method']} method offers the highest performance ({best_strategy[1]['tps']:.2f} tokens/sec).")
    
    # Analyze optimization recommendations
    if profiling_results and "full_model" in profiling_results and "integration_tests" in profiling_results["full_model"]:
        integration_data = profiling_results["full_model"]["integration_tests"]
        configs = list(integration_data.keys())
        
        if configs and integration_data[configs[0]]:
            level = next(iter(integration_data[configs[0]].keys()))
            
            # Find the best configuration
            best_config = ""
            best_tps = 0
            for config in configs:
                if config != "original":
                    tps = integration_data[config][level]["tokens_per_second"]
                    if tps > best_tps:
                        best_tps = tps
                        best_config = config
            
            if best_config:
                recommendations.append(f"**Optimal Integration Configuration**: The '{best_config.replace('_', ' ').title()}' configuration offers the best performance ({best_tps:.2f} tokens/sec).")
    
    # Generate final recommendations
    if recommendations:
        for rec in recommendations:
            display(Markdown(rec))
        
        # Overall recommendation
        display(Markdown("\n**Overall Recommendation**:"))
        display(Markdown("Based on the profiling and benchmark results, we recommend combining optimization with strategic pruning to achieve the best performance. The data shows that pruning efficiency depends greatly on the pruning level, strategy, and method used."))
        
        # Display specific recommendation based on what we found
        if "full_model" in profiling_results and "pruning_comparison" in profiling_results["full_model"]:
            # Get specific recommendations from the data
            best_level_str = next((r for r in recommendations if "**Optimal Pruning Level**" in r), "")
            best_level_match = best_level_str.split("%")[0].split(" ")[-1] if best_level_str else "30-50"
            
            best_strategy_str = next((r for r in recommendations if "**Best Pruning Strategy**" in r), "")
            if "gradual" in best_strategy_str.lower():
                strategy_rec = "gradual pruning during training"
            elif "one_shot" in best_strategy_str.lower():
                strategy_rec = "one-shot pruning followed by fine-tuning"
            else:
                strategy_rec = "iterative pruning and fine-tuning cycles"
            
            best_method_match = "entropy-based" if "entropy" in best_strategy_str.lower() else \
                                "magnitude-based" if "magnitude" in best_strategy_str.lower() else "random"
            
            display(Markdown(f"For the best balance of speed and quality, use **{best_level_match}% pruning** with **{strategy_rec}** using the **{best_method_match}** selection method."))
    else:
        display(Markdown("Not enough data to generate specific recommendations. Run more benchmarks with different configurations to get detailed recommendations."))

# Generate recommendations if results are available
if profiling_results or pruning_results:
    generate_recommendations(profiling_results, pruning_results)
else:
    print("No profiling or pruning results available for generating recommendations")

## Next Steps

Based on the analysis, here are some suggested next steps for further improving the model efficiency:

1. **Experiment with Hybrid Approaches**: Try combining the best aspects of both optimization-integrated and pure pruning approaches.

2. **Test on Larger Models**: Run benchmarks on larger models to see if the efficiency gains scale with model size.

3. **Explore Dynamic Pruning**: Implement dynamic pruning that adapts to input complexity during inference.

4. **Optimize Component Bottlenecks**: Focus optimization efforts on the components that take the most time in the profiling results.

5. **Add Quantization**: Consider adding post-training quantization to further reduce memory usage and increase speed.

6. **Real-world Task Evaluation**: Test the pruned and optimized models on real-world tasks to ensure practical benefits are maintained.

7. **Hardware-specific Optimization**: Adapt the optimization strategies to target specific hardware platforms (CPU, GPU, TPU, etc.).