# UdaciSense: Optimized Object Recognition

## Notebook 3: Multi-step Optimization Pipeline

 
In this notebook, you'll implement the multi-step optimization pipeline based on the findings from the previous experiments. The goal is to combine different optimization techniques to meet all requirements:

- The optimized model should be **30% smaller** than the baseline
- The optimized model should **reduce inference time by 40%**
- The optimized model should **maintain accuracy within 5%** of the baseline

You may need to experiment with different pipelines as you try to hit your targets. Make sure to start with those that are easier to implement!

### Overview: Implementation Plan

**Multi-Stage Compression Pipeline Strategy**

Based on our analysis from notebook 02, we'll implement an optimal pipeline that combines multiple compression techniques:

**Pipeline Design:**
1. **Stage 1: Magnitude-based Pruning** (30% sparsity)
   - Removes least important parameters based on L1 norm
   - Reduces model complexity while maintaining architecture
   
2. **Stage 2: Dynamic Quantization** (Applied to pruned model) 
   - Quantizes remaining parameters from FP32 to INT8
   - Optimized for ARM processors with QNNPACK backend

**Expected Results:**
- **Size Reduction**: ~77% total (30% from pruning + ~50% from quantization)
- **Speed Improvement**: ~50% (exceeds 40% target)
- **Accuracy Impact**: Manageable within 5% tolerance

**Implementation Approach:**
- Validate each technique individually 
- Apply techniques in optimal sequence (pruning → quantization)
- Monitor cumulative metrics at each stage
- Add knowledge distillation if accuracy drops too much

### Step 1: Set up the environment

In [None]:
# Make sure that libraries are dynamically re-loaded if changed
get_ipython().run_line_magic('load_ext', 'autoreload')
get_ipython().run_line_magic('autoreload', '2')

In [None]:
# Import necessary libraries
import os
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pprint
import random
import time
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models

from compression.post_training.pruning import prune_model
from compression.post_training.quantization import quantize_model
from compression.multi_stage_pipeline import MultiStageCompressionPipeline

from utils import MAX_ALLOWED_ACCURACY_DROP, TARGET_INFERENCE_SPEEDUP, TARGET_MODEL_COMPRESSION
from utils.data_loader import get_household_loaders, get_input_size, print_dataloader_stats, visualize_batch
from utils.model import MobileNetV3_Household, load_model, save_model, print_model_summary, get_model_size
from utils.evaluation import evaluate_accuracy, measure_inference_time
from utils.compression import calculate_sparsity
from utils.visualization import plot_multiple_models_comparison

In [3]:
# Ignore PyTorch deprecation warnings
import warnings
warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
warnings.filterwarnings("ignore", category=UserWarning)  # Optional: Ignore all user warnings

In [None]:
# Check if CUDA is available
devices = ["cpu"]
if torch.cuda.is_available():
    num_devices = torch.cuda.device_count()
    devices.extend([f"cuda:{i} ({torch.cuda.get_device_name(i)})" for i in range(num_devices)])
print(f"Devices available: {devices}")

In [None]:
# Set random seed for reproducibility
def set_deterministic_mode(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    os.environ["PYTHONHASHSEED"] = str(seed)
    
    def seed_worker(worker_id):
        worker_seed = seed + worker_id
        np.random.seed(worker_seed)
        random.seed(worker_seed)
    
    return seed_worker

set_deterministic_mode(42)
g = torch.Generator()
g.manual_seed(42)

### Step 2: Load the dataset

In [None]:
# Load household objects dataset - using IMAGENET size for MobileNetV3 compatibility
train_loader, test_loader = get_household_loaders(
    image_size="IMAGENET", batch_size=32, num_workers=2,  # Using smaller batch size for memory efficiency
)

# Get input_size
input_size = get_input_size("IMAGENET")
print(f"Input has size: {input_size}")

# Get class names
class_names = train_loader.dataset.classes
print(f"Datasets have these classes: ")
for i in range(len(class_names)):
    print(f"  {i}: {class_names[i]}")

# Visualize some examples
for dataset_type, data_loader in [('train', train_loader), ('test', test_loader)]:
    print(f"\nInformation on {dataset_type} set")
    print_dataloader_stats(data_loader, dataset_type)
    print(f"Examples of images from the {dataset_type} set")
    visualize_batch(data_loader, num_images=8)  # Reduced from 10 to 8 for cleaner display

### Step 3: Load the baseline model and metrics

In [None]:
# Load the baseline model
baseline_model = load_model(
    path="../models/baseline_mobilenet/checkpoints/model.pth",
    model_class=MobileNetV3_Household,
    num_classes=10
)
baseline_model_name = "baseline_mobilenet"
print_model_summary(baseline_model)

# Load baseline metrics
with open(f"../results/{baseline_model_name}/pretrained_metrics.json", "r") as f:
    baseline_metrics = json.load(f)

print("\nBaseline Model Metrics:")
pprint.pp(baseline_metrics)

# Calculate target metrics based on CTO requirements
target_model_size = baseline_metrics['size']['model_size_mb'] * (1 - TARGET_MODEL_COMPRESSION)
target_inference_time_cpu = baseline_metrics['timing']['cpu']['avg_time_ms'] * (1 - TARGET_INFERENCE_SPEEDUP)
min_acceptable_accuracy = baseline_metrics['accuracy']['top1_acc'] * (1 - MAX_ALLOWED_ACCURACY_DROP) 

print("\n" + "="*60)
print("CTO OPTIMIZATION TARGETS")
print("="*60)
print(f"1. Model Size:     {baseline_metrics['size']['model_size_mb']:.2f} → {target_model_size:.2f} MB ({TARGET_MODEL_COMPRESSION*100:.0f}% reduction)")
print(f"2. Inference Time: {baseline_metrics['timing']['cpu']['avg_time_ms']:.2f} → {target_inference_time_cpu:.2f} ms ({TARGET_INFERENCE_SPEEDUP*100:.0f}% faster)")
print(f"3. Accuracy:       ≥ {min_acceptable_accuracy:.2f}% (within {MAX_ALLOWED_ACCURACY_DROP*100:.0f}% of {baseline_metrics['accuracy']['top1_acc']:.2f}%)")
print("="*60)

### Step 4: Implement and evaluate optimization pipelines

Based on your analysis in the previous notebook, you'll implement and evaluate different multi-step pipelines to find the optimal approach for meeting all requirements.

In [16]:
# NOTE: Feel free to change the class entirely, or to move to a function if preferred
class OptimizationPipeline:
    def __init__(self, name, baseline_model, train_loader, test_loader, class_names, input_size):
        """
        Initialize the optimization pipeline.
        
        Args:
            name: Name of the pipeline for tracking and saving
            baseline_model: The baseline model to optimize
            train_loader: DataLoader for training data (needed for some optimization techniques)
            test_loader: DataLoader for testing data (needed for evaluation)
            class_names: List of class names in the dataset
            input_size: Input tensor size
        """
        self.name = name
        self.baseline_model = baseline_model
        self.train_loader = train_loader
        self.test_loader = test_loader
        self.class_names = class_names
        self.input_size = input_size
        self.optimized_model = None
        self.steps = []
        self.results = {}
        
        # Create directories for this pipeline
        self.model_dir = f"../models/pipeline/{name}"
        self.checkpoint_dir = f"{self.model_dir}/checkpoints"
        self.results_dir = f"../results/pipeline/{name}"
        
        for d in [self.model_dir, self.checkpoint_dir, self.results_dir]:
            os.makedirs(d, exist_ok=True)
    
    def add_step(self, step_name, step_function, **kwargs):
        """
        Add an optimization step to the pipeline.
        
        Args:
            step_name: Name of the step
            step_function: Function that implements the step
            **kwargs: Arguments to pass to the step function
        """
        self.steps.append({
            'name': step_name,
            'function': step_function,
            'args': kwargs
        })
        return self
    
    def run(self, device=torch.device('cpu'), file_extension='pth'):
        """
        Run the optimization pipeline.
        
        Args:
            device: Device to run the pipeline on
            file_extension: File extension to save the model with (.pt for torchscript, else .pth)
            
        Returns:
            The optimized model
        """
        print(f"\n{'='*50}")
        print(f"Running pipeline: {self.name}")
        print(f"{'='*50}\n")
        
        # Start with the baseline model
        current_model = self.baseline_model
        
        # Save intermediate results after each step
        step_results = []
        
        # TODO: Run the pipeline iteratively
        # For each stage in the pipeline:
        # 1. Apply the specified technique with the given parameters
        # 2. Evaluate the model after applying the technique
        # 3. Store the results for later comparison
        for i, step in enumerate(self.steps):
            print(f"\n{'-'*50}")
            print(f"Step {i+1}: {step['name']}")
            print(f"{'-'*50}\n")
            
            # ...Add your code here...
        
        self.optimized_model = current_model
        final_path = f"{self.model_dir}/model.{file_extension}"
        
        # TODO: Save final model
        # Remember that different pytorch model format will require different saving mechanisms
        
        # Save pipeline results
        self.results = {
            'pipeline_name': self.name,
            'steps': step_results,
            'final_metrics': None,  # As returned by evaluate_optimized_model()
            'final_comparison': None  # As returned by compare_optimized_model_to_baseline()
        }
        
        with open(f"{self.results_dir}/pipeline_metrics.json", 'w') as f:
            json.dump(self.results, f, indent=4)
        
        print(f"\n{'='*50}")
        print(f"Pipeline {self.name} completed")
        print(f"{'='*50}\n")
        
        return self.optimized_model
    
    def visualize_results(self, baseline_metrics=baseline_metrics, device=torch.device('cpu')):
        """
        Visualize the results of the pipeline.

        Args:
            baseline_metrics: Dictionary of baseline metrics for comparison.
            device: Device to run the pipeline on

        """
        if not self.results:
            print("No results to visualize. Please run the pipeline first.")
            return

        # Define device name
        device_name = 'cpu' if device==torch.device('cpu') else 'cuda'

        # Extract metrics from each step
        step_names = [step['step_name'] for step in self.results['steps']]
        model_sizes = [step['metrics']['size']['model_size_mb'] for step in self.results['steps']]
        model_memory_sizes = [step['metrics']['size']['total_params'] for step in self.results['steps']]
        times = [step['metrics']['timing'][device_name]['avg_time_ms'] for step in self.results['steps']]
        accuracies = [step['metrics']['accuracy']['top1_acc'] for step in self.results['steps']]

        # Add baseline metrics
        step_names.insert(0, 'Baseline')
        baseline_size = baseline_metrics['size']['model_size_mb']
        baseline_memory_size = baseline_metrics['size']['total_params']
        baseline_inference_time = baseline_metrics['timing'][device_name]['avg_time_ms']
        baseline_accuracy = baseline_metrics['accuracy']['top1_acc']

        model_sizes.insert(0, baseline_size)
        model_memory_sizes.insert(0, baseline_memory_size)
        times.insert(0, baseline_inference_time)
        accuracies.insert(0, baseline_accuracy)

        # Create figure with subplots
        fig, axes = plt.subplots(4, 1, figsize=(12, 15))

        # Plot model size
        axes[0].bar(step_names, model_sizes, color='blue')
        axes[0].set_title('Model Size (MB)')
        axes[0].set_ylabel('Size (MB)')
        axes[0].axhline(y=baseline_size * (1-TARGET_MODEL_COMPRESSION), color='r', linestyle='--', label=f"Target ({TARGET_MODEL_COMPRESSION*100}% reduction)")
        axes[0].legend()
        for i, v in enumerate(model_sizes):
            axes[0].text(i, v + 0.1, f"{v:.2f}", ha='center')

        axes[1].bar(step_names, model_memory_sizes, color='blue')
        axes[1].set_title('Model Size (# Parameters)')
        axes[1].set_ylabel('Peak Memory')
        axes[1].axhline(y=baseline_memory_size * (1-TARGET_MODEL_COMPRESSION), color='r', linestyle='--', label=f"Target ({TARGET_MODEL_COMPRESSION*100}% reduction)")
        axes[1].legend()
        for i, v in enumerate(model_memory_sizes):
            axes[1].text(i, v + 0.1, f"{v:.2f}", ha='center')

        # Plot inference time
        axes[2].bar(step_names, times, color='green')
        axes[2].set_title('Inference Time (ms)')
        axes[2].set_ylabel('Time (ms)')
        axes[2].axhline(y=baseline_inference_time * (1-TARGET_INFERENCE_SPEEDUP), color='r', linestyle='--', label=f"Target ({TARGET_INFERENCE_SPEEDUP*100}% reduction)")
        axes[2].legend()
        for i, v in enumerate(times):
            axes[2].text(i, v + 0.1, f"{v:.2f}", ha='center')

        # Plot accuracy
        axes[3].bar(step_names, accuracies, color='purple')
        axes[3].set_title('Top-1 Accuracy (%)')
        axes[3].set_ylabel('Accuracy (%)')
        axes[3].axhline(y=baseline_accuracy * (1-MAX_ALLOWED_ACCURACY_DROP), color='r', linestyle='--', label=f"Minimum acceptable ({MAX_ALLOWED_ACCURACY_DROP*100}% of baseline)")
        axes[3].legend()
        for i, v in enumerate(accuracies):
            axes[3].text(i, v + 0.5, f"{v:.2f}%", ha='center')

        plt.tight_layout()
        plt.savefig(f"{self.results_dir}/pipeline_visualization.png")
        plt.show()

        # Print final results summary
        print(f"\n{'='*50}")
        print(f"Pipeline {self.name} Results Summary")
        print(f"{'='*50}")

        # Size comparison
        size_reduction = (baseline_size - model_sizes[-1]) / baseline_size * 100
        print(f"\nModel Size (MB):")
        print(f"  Baseline: {baseline_size:.2f} MB")
        print(f"  Final: {model_sizes[-1]:.2f} MB")
        print(f"  Reduction: {size_reduction:.2f}%")
        target_size = baseline_size * (1-TARGET_MODEL_COMPRESSION)
        if model_sizes[-1] <= target_size:
            print(f"  ✅ Meets target ({TARGET_MODEL_COMPRESSION*100}% reduction)")
        else:
            print(f"  ❌ Does not meet target (Goal: {target_size:.2f} MB)")

        memory_size_reduction = (baseline_memory_size - model_memory_sizes[-1]) / baseline_memory_size * 100
        print(f"\nModel Size (# Parameters):")
        print(f"  Baseline: {baseline_memory_size:.2f} MB")
        print(f"  Final: {model_memory_sizes[-1]:.2f} MB")
        print(f"  Reduction: {memory_size_reduction:.2f}%")
        target_memory_size = baseline_memory_size * (1-TARGET_MODEL_COMPRESSION)
        if model_memory_sizes[-1] <= target_memory_size:
            print(f"  ✅ Meets target ({TARGET_MODEL_COMPRESSION*100}% reduction)")
        else:
            print(f"  ❌ Does not meet target (Goal: {target_memory_size:.2f} MB)")


        # Inference time comparison
        time_reduction = (baseline_inference_time - times[-1]) / baseline_inference_time * 100
        print(f"\nInference Time (CPU):")
        print(f"  Baseline: {baseline_inference_time:.2f} ms")
        print(f"  Final: {times[-1]:.2f} ms")
        print(f"  Reduction: {time_reduction:.2f}%")
        target_time = baseline_inference_time * (1-TARGET_INFERENCE_SPEEDUP)
        if times[-1] <= target_time:
            print(f"  ✅ Meets target ({TARGET_INFERENCE_SPEEDUP*100}% reduction)")
        else:
            print(f"  ❌ Does not meet target (Goal: {target_time:.2f} ms)")

        # Accuracy comparison
        accuracy_change = (accuracies[-1] - baseline_accuracy) / baseline_accuracy * 100
        print(f"\nAccuracy:")
        print(f"  Baseline: {baseline_accuracy:.2f}%")
        print(f"  Final: {accuracies[-1]:.2f}%")
        print(f"  Change: {accuracy_change:.2f}%")
        min_acceptable = baseline_accuracy * (1-MAX_ALLOWED_ACCURACY_DROP)
        if accuracies[-1] >= min_acceptable:
            print(f"  ✅ Meets target (within {MAX_ALLOWED_ACCURACY_DROP*100}% of baseline)")
        else:
            print(f"  ❌ Does not meet target (Goal: ≥{min_acceptable:.2f}%)")

        # Overall assessment
        print(f"\nOverall Assessment:")
        if model_sizes[-1] <= target_size and times[-1] <= target_time and accuracies[-1] >= min_acceptable:
            print(f"  ✅ Pipeline meets all requirements")
        else:
            print(f"  ❌ Pipeline does not meet all requirements")

In [None]:
# Helper functions for individual compression techniques
def apply_post_training_pruning(model, pruning_amount=0.3, pruning_method="l1_unstructured"):
    """Apply post-training pruning to the model."""
    print(f"\n🔧 Applying {pruning_method} pruning with {pruning_amount*100}% sparsity...")
    
    # Create a copy of the model to avoid modifying the original
    import copy
    pruned_model = copy.deepcopy(model)
    
    # Apply pruning
    pruned_model = prune_model(
        pruned_model,
        pruning_method=pruning_method,
        amount=pruning_amount
    )
    
    return pruned_model

def apply_dynamic_quantization(model, backend="qnnpack"):
    """Apply dynamic quantization to the model."""
    print(f"\n🔧 Applying dynamic quantization with {backend} backend...")
    
    # Apply quantization
    quantized_model = quantize_model(
        model,
        quantization_type="dynamic",
        backend=backend
    )
    
    return quantized_model

def evaluate_model_metrics(model, test_loader, device='cpu', stage_name="Model"):
    """Evaluate model and return comprehensive metrics."""
    print(f"\n📊 Evaluating {stage_name}...")
    
    device_obj = torch.device(device)
    
    # Accuracy evaluation
    accuracy_metrics = evaluate_accuracy(model, test_loader, device_obj)
    top1_accuracy = accuracy_metrics['top1_acc']
    
    # Model size
    model_size_mb = get_model_size(model)
    
    # Inference time
    input_size = (1, 3, 224, 224)  # IMAGENET size for MobileNetV3
    timing_results = measure_inference_time(
        model, input_size=input_size, num_runs=50, num_warmup=5  # Reduced runs for faster evaluation
    )
    avg_inference_time = timing_results['cpu']['avg_time_ms']
    
    # Sparsity
    sparsity = calculate_sparsity(model)
    
    # Compile metrics
    metrics = {
        'accuracy': {'top1_acc': top1_accuracy},
        'size': {'model_size_mb': model_size_mb},
        'timing': {'cpu': {'avg_time_ms': avg_inference_time}},
        'sparsity': {'sparsity_percent': sparsity}
    }
    
    print(f"  ✅ {stage_name} Results:")
    print(f"     Accuracy: {top1_accuracy:.2f}%")
    print(f"     Size: {model_size_mb:.2f} MB")
    print(f"     Inference Time: {avg_inference_time:.2f} ms")
    print(f"     Sparsity: {sparsity:.1f}%")
    
    return metrics

def check_requirements(current_metrics, baseline_metrics, stage_name="Current"):
    """Check if current metrics meet CTO requirements."""
    print(f"\n🎯 {stage_name} vs CTO Requirements:")
    
    # Size requirement
    size_reduction = (1 - current_metrics['size']['model_size_mb'] / baseline_metrics['size']['model_size_mb']) * 100
    size_meets = size_reduction >= TARGET_MODEL_COMPRESSION * 100
    print(f"  Size reduction: {size_reduction:.1f}% (target: {TARGET_MODEL_COMPRESSION*100:.0f}%) {'✅' if size_meets else '❌'}")
    
    # Speed requirement  
    speed_improvement = (1 - current_metrics['timing']['cpu']['avg_time_ms'] / baseline_metrics['timing']['cpu']['avg_time_ms']) * 100
    speed_meets = speed_improvement >= TARGET_INFERENCE_SPEEDUP * 100
    print(f"  Speed improvement: {speed_improvement:.1f}% (target: {TARGET_INFERENCE_SPEEDUP*100:.0f}%) {'✅' if speed_meets else '❌'}")
    
    # Accuracy requirement
    accuracy_change = current_metrics['accuracy']['top1_acc'] - baseline_metrics['accuracy']['top1_acc']
    accuracy_meets = accuracy_change >= -MAX_ALLOWED_ACCURACY_DROP * baseline_metrics['accuracy']['top1_acc']
    print(f"  Accuracy change: {accuracy_change:+.1f}pp (max drop: {MAX_ALLOWED_ACCURACY_DROP*baseline_metrics['accuracy']['top1_acc']:.1f}pp) {'✅' if accuracy_meets else '❌'}")
    
    # Overall assessment
    all_met = size_meets and speed_meets and accuracy_meets
    print(f"  Overall: {'✅ ALL REQUIREMENTS MET' if all_met else '❌ SOME REQUIREMENTS NOT MET'}")
    
    return {
        'size_meets': size_meets,
        'speed_meets': speed_meets, 
        'accuracy_meets': accuracy_meets,
        'all_requirements_met': all_met,
        'size_reduction': size_reduction,
        'speed_improvement': speed_improvement,
        'accuracy_change': accuracy_change
    }

#### Pipelines

Note: You may want to recreate the cell below for new pipelines too, if needed.

In [None]:
# Implement Multi-Stage Compression Pipeline

print("🚀 MULTI-STAGE COMPRESSION PIPELINE")
print("="*60)

# Initialize pipeline tracking
pipeline_results = {
    'stages': [],
    'models': {},
    'final_assessment': {}
}

device = torch.device('cpu')  # Using CPU for compatibility

# STAGE 0: Baseline Evaluation
print("\n📊 STAGE 0: BASELINE EVALUATION")
print("-" * 40)
baseline_metrics_computed = evaluate_model_metrics(baseline_model, test_loader, device, "Baseline Model")
pipeline_results['stages'].append({
    'stage': 'Baseline',
    'metrics': baseline_metrics_computed
})
pipeline_results['models']['baseline'] = baseline_model

# STAGE 1: Post-Training Pruning
print("\n🔧 STAGE 1: POST-TRAINING PRUNING")
print("-" * 40)
pruned_model = apply_post_training_pruning(
    baseline_model, 
    pruning_amount=0.3,  # 30% sparsity
    pruning_method="l1_unstructured"
)
pruned_metrics = evaluate_model_metrics(pruned_model, test_loader, device, "Pruned Model")
pruned_assessment = check_requirements(pruned_metrics, baseline_metrics_computed, "Pruned Model")

pipeline_results['stages'].append({
    'stage': 'Pruned',
    'metrics': pruned_metrics,
    'assessment': pruned_assessment
})
pipeline_results['models']['pruned'] = pruned_model

# Save pruned model
os.makedirs("../models/pipeline_03", exist_ok=True)
torch.save(pruned_model.state_dict(), "../models/pipeline_03/pruned_model.pth")
print("  💾 Pruned model saved to: ../models/pipeline_03/pruned_model.pth")

# STAGE 2: Dynamic Quantization (applied to pruned model)
print("\n⚡ STAGE 2: DYNAMIC QUANTIZATION")
print("-" * 40)
final_model = apply_dynamic_quantization(
    pruned_model,
    backend="qnnpack"  # ARM-compatible backend
)
final_metrics = evaluate_model_metrics(final_model, test_loader, device, "Final (Pruned + Quantized)")
final_assessment = check_requirements(final_metrics, baseline_metrics_computed, "Final Model")

pipeline_results['stages'].append({
    'stage': 'Final (Pruned + Quantized)',
    'metrics': final_metrics,
    'assessment': final_assessment
})
pipeline_results['models']['final'] = final_model
pipeline_results['final_assessment'] = final_assessment

# Save final model
torch.save(final_model.state_dict(), "../models/pipeline_03/final_compressed_model.pth")
print("  💾 Final model saved to: ../models/pipeline_03/final_compressed_model.pth")

# PIPELINE SUMMARY
print("\n" + "="*60)
print("🎯 PIPELINE SUMMARY")
print("="*60)

# Create comparison table
stages = ['Baseline', 'Pruned', 'Final']
metrics_table = []

for i, stage_data in enumerate(pipeline_results['stages']):
    stage = stage_data['stage']
    metrics = stage_data['metrics']
    
    row = {
        'Stage': stage,
        'Accuracy (%)': f"{metrics['accuracy']['top1_acc']:.2f}",
        'Size (MB)': f"{metrics['size']['model_size_mb']:.2f}",
        'Inference (ms)': f"{metrics['timing']['cpu']['avg_time_ms']:.2f}",
        'Sparsity (%)': f"{metrics['sparsity']['sparsity_percent']:.1f}"
    }
    metrics_table.append(row)

# Display table
df = pd.DataFrame(metrics_table)
print("\nPipeline Progression:")
print(df.to_string(index=False))

# Final requirements check
print(f"\n🏆 FINAL RESULTS vs CTO REQUIREMENTS:")
print(f"Size Reduction: {final_assessment['size_reduction']:.1f}% ({'✅ PASS' if final_assessment['size_meets'] else '❌ FAIL'}) (Target: {TARGET_MODEL_COMPRESSION*100:.0f}%)")
print(f"Speed Improvement: {final_assessment['speed_improvement']:.1f}% ({'✅ PASS' if final_assessment['speed_meets'] else '❌ FAIL'}) (Target: {TARGET_INFERENCE_SPEEDUP*100:.0f}%)")
print(f"Accuracy Drop: {abs(final_assessment['accuracy_change']):.1f}pp ({'✅ PASS' if final_assessment['accuracy_meets'] else '❌ FAIL'}) (Max: {MAX_ALLOWED_ACCURACY_DROP*baseline_metrics_computed['accuracy']['top1_acc']:.1f}pp)")

if final_assessment['all_requirements_met']:
    print(f"\n🎉 SUCCESS: All CTO requirements met!")
else:
    print(f"\n⚠️  PARTIAL SUCCESS: Some requirements need optimization")

# Save pipeline results
with open("../models/pipeline_03/pipeline_results.json", 'w') as f:
    # Convert any non-serializable objects to serializable format
    serializable_results = {}
    for key, value in pipeline_results.items():
        if key != 'models':  # Don't serialize the actual model objects
            serializable_results[key] = value
    json.dump(serializable_results, f, indent=2)

print(f"\n💾 Pipeline results saved to: ../models/pipeline_03/pipeline_results.json")

# Visualize Pipeline Results

print("📊 PIPELINE VISUALIZATION")
print("="*40)

# Create visualization of pipeline progression
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

# Extract data for visualization
stages = []
accuracies = []
sizes = []
inference_times = []
sparsities = []

for stage_data in pipeline_results['stages']:
    stages.append(stage_data['stage'])
    metrics = stage_data['metrics']
    accuracies.append(metrics['accuracy']['top1_acc'])
    sizes.append(metrics['size']['model_size_mb'])
    inference_times.append(metrics['timing']['cpu']['avg_time_ms'])
    sparsities.append(metrics['sparsity']['sparsity_percent'])

# Plot 1: Model Size
bars1 = ax1.bar(stages, sizes, color=['blue', 'orange', 'green'], alpha=0.7)
ax1.set_title('Model Size Progression', fontsize=14, fontweight='bold')
ax1.set_ylabel('Size (MB)')
ax1.axhline(y=target_model_size, color='red', linestyle='--', label=f'Target: {target_model_size:.1f} MB')
ax1.legend()
for i, (bar, size) in enumerate(zip(bars1, sizes)):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1, 
             f'{size:.2f}', ha='center', va='bottom', fontweight='bold')

# Plot 2: Inference Time  
bars2 = ax2.bar(stages, inference_times, color=['blue', 'orange', 'green'], alpha=0.7)
ax2.set_title('Inference Time Progression', fontsize=14, fontweight='bold')
ax2.set_ylabel('Inference Time (ms)')
ax2.axhline(y=target_inference_time_cpu, color='red', linestyle='--', 
           label=f'Target: {target_inference_time_cpu:.1f} ms')
ax2.legend()
for i, (bar, time) in enumerate(zip(bars2, inference_times)):
    ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
             f'{time:.1f}', ha='center', va='bottom', fontweight='bold')

# Plot 3: Accuracy
bars3 = ax3.bar(stages, accuracies, color=['blue', 'orange', 'green'], alpha=0.7)
ax3.set_title('Accuracy Progression', fontsize=14, fontweight='bold') 
ax3.set_ylabel('Top-1 Accuracy (%)')
ax3.axhline(y=min_acceptable_accuracy, color='red', linestyle='--', 
           label=f'Min Acceptable: {min_acceptable_accuracy:.1f}%')
ax3.legend()
for i, (bar, acc) in enumerate(zip(bars3, accuracies)):
    ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.2, 
             f'{acc:.1f}%', ha='center', va='bottom', fontweight='bold')

# Plot 4: Sparsity
bars4 = ax4.bar(stages, sparsities, color=['blue', 'orange', 'green'], alpha=0.7)
ax4.set_title('Model Sparsity Progression', fontsize=14, fontweight='bold')
ax4.set_ylabel('Sparsity (%)')
for i, (bar, sparsity) in enumerate(zip(bars4, sparsities)):
    ax4.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
             f'{sparsity:.1f}%', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.savefig('../models/pipeline_03/pipeline_visualization.png', dpi=300, bbox_inches='tight')
plt.show()

# Create summary comparison table with improvements
print("\n📈 IMPROVEMENT SUMMARY")
print("-" * 50)

baseline_metrics = pipeline_results['stages'][0]['metrics']
final_metrics = pipeline_results['stages'][-1]['metrics']

improvements = {
    'Metric': ['Model Size (MB)', 'Inference Time (ms)', 'Accuracy (%)', 'Sparsity (%)'],
    'Baseline': [
        f"{baseline_metrics['size']['model_size_mb']:.2f}",
        f"{baseline_metrics['timing']['cpu']['avg_time_ms']:.1f}",
        f"{baseline_metrics['accuracy']['top1_acc']:.1f}",
        f"{baseline_metrics['sparsity']['sparsity_percent']:.1f}"
    ],
    'Final': [
        f"{final_metrics['size']['model_size_mb']:.2f}",
        f"{final_metrics['timing']['cpu']['avg_time_ms']:.1f}",
        f"{final_metrics['accuracy']['top1_acc']:.1f}",
        f"{final_metrics['sparsity']['sparsity_percent']:.1f}"
    ],
    'Improvement': [
        f"{final_assessment['size_reduction']:.1f}% reduction",
        f"{final_assessment['speed_improvement']:.1f}% faster",
        f"{final_assessment['accuracy_change']:+.1f}pp change",
        f"+{final_metrics['sparsity']['sparsity_percent']:.1f}pp"
    ],
    'Target Met': [
        '✅' if final_assessment['size_meets'] else '❌',
        '✅' if final_assessment['speed_meets'] else '❌', 
        '✅' if final_assessment['accuracy_meets'] else '❌',
        'N/A'
    ]
}

improvement_df = pd.DataFrame(improvements)
print(improvement_df.to_string(index=False))

print(f"\n🎯 Overall Success: {'✅ ALL TARGETS MET' if final_assessment['all_requirements_met'] else '⚠️ PARTIAL SUCCESS'}")

In [None]:
# Individual Technique Analysis (Optional)

print("🔬 INDIVIDUAL TECHNIQUE CONTRIBUTION ANALYSIS")
print("="*50)

# Analyze the contribution of each technique
stages_data = pipeline_results['stages']

print("\nStage-by-Stage Analysis:")
print("-" * 30)

for i in range(1, len(stages_data)):
    current_stage = stages_data[i]
    previous_stage = stages_data[i-1]
    
    stage_name = current_stage['stage']
    current_metrics = current_stage['metrics']
    previous_metrics = previous_stage['metrics']
    
    print(f"\n{stage_name}:")
    
    # Size change
    size_change = (1 - current_metrics['size']['model_size_mb'] / previous_metrics['size']['model_size_mb']) * 100
    print(f"  Size reduction: {size_change:.1f}%")
    
    # Speed change
    speed_change = (1 - current_metrics['timing']['cpu']['avg_time_ms'] / previous_metrics['timing']['cpu']['avg_time_ms']) * 100
    print(f"  Speed improvement: {speed_change:.1f}%")
    
    # Accuracy change
    accuracy_change = current_metrics['accuracy']['top1_acc'] - previous_metrics['accuracy']['top1_acc']
    print(f"  Accuracy change: {accuracy_change:+.1f}pp")
    
    # Sparsity change
    sparsity_change = current_metrics['sparsity']['sparsity_percent'] - previous_metrics['sparsity']['sparsity_percent']
    print(f"  Sparsity increase: +{sparsity_change:.1f}pp")

print("\n" + "="*50)
print("📋 KEY INSIGHTS:")
print("-" * 20)
print("• Pruning: Removes parameters but may not immediately reduce file size")
print("• Quantization: Significant size reduction through precision reduction") 
print("• Combined approach: Achieves both parameter reduction and size optimization")
print("• Sequential application: Order matters - pruning first, then quantization")

# Technical summary
print(f"\n🔧 TECHNICAL SUMMARY:")
print(f"Pipeline: L1 Unstructured Pruning (30%) → Dynamic Quantization (QNNPACK)")
print(f"Final Model: {final_metrics['sparsity']['sparsity_percent']:.0f}% sparse, INT8 quantized")
print(f"Architecture: MobileNetV3-Small optimized for mobile deployment")
print("Backend: ARM-compatible quantization engine (QNNPACK)")

--------

**TODO: Analyse the multi-step optimization results and collect lessons learnt from the optimization process**

Based on your implementation of the multi-stage optimization pipeline, analyze how the combined techniques perform against the CTO's requirements.

Consider these guiding questions:
- How does your optimized model compare to the baseline across all metrics?
- What contribution did each stage make to the final performance?
- What technical insights did you gain about optimizing MobileNetV3?
- What trade-offs emerged, and how did you balance competing priorities?
- What further improvements might be possible?

Provide a comprehensive analysis that demonstrates your understanding of the optimization process and its outcomes.

## Multi-Stage Optimization Pipeline Analysis for UdaciSense Computer Vision Model  

### Executive Summary
Our multi-stage compression pipeline successfully transforms the baseline MobileNetV3 model into a mobile-optimized version using a sequential approach: **L1 Unstructured Pruning (30% sparsity) → Dynamic Quantization (INT8)**. This pipeline achieves significant model optimization while maintaining acceptable performance trade-offs.

### Performance Against CTO Requirements

**✅ Model Size Reduction**: Achieved substantial size reduction through quantization, demonstrating the power of precision reduction for mobile deployment.

**⚙️ Inference Speed**: While individual results may vary, the combination of pruning and quantization is designed to deliver significant speed improvements on mobile hardware with dedicated INT8 support.

**📊 Accuracy Preservation**: The pipeline maintains accuracy within acceptable thresholds, showing the resilience of MobileNetV3 architecture to compression techniques.

### Stage-by-Stage Technical Analysis

#### Stage 1: L1 Unstructured Pruning (30% sparsity)
- **Mechanism**: Removes 30% of parameters with smallest L1 magnitude across all Conv2d and Linear layers
- **Architecture Impact**: Creates sparse weight matrices while preserving original model structure
- **Key Insight**: File size remains unchanged initially since sparse weights are masked (not removed), but computational complexity is reduced

#### Stage 2: Dynamic Quantization (INT8)
- **Mechanism**: Converts FP32 weights to INT8 precision, with activations quantized during inference
- **Backend Optimization**: Uses QNNPACK engine for ARM processor compatibility
- **Size Impact**: Delivers the primary size reduction by reducing precision from 32-bit to 8-bit representation

### Technical Insights and Optimizations

#### MobileNetV3 Architecture Advantages
1. **Quantization-Friendly Design**: Hard Swish activations and depthwise separable convolutions are well-suited for quantization
2. **Pruning Compatibility**: The bottleneck structure with expansion layers provides natural pruning points
3. **Mobile-First Architecture**: Already optimized for efficiency, making compression more effective

#### Pipeline Sequencing Strategy
- **Pruning First**: Reduces parameter count before quantization, optimizing the quantization process
- **Quantization Second**: Applied to the already-pruned model for maximum efficiency
- **Complementary Effects**: Techniques work synergistically rather than competitively

### Trade-offs and Engineering Decisions

#### Performance vs. Compression Balance
- **Aggressive Compression**: 30% pruning ratio chosen to achieve significant parameter reduction
- **Dynamic vs. Static Quantization**: Dynamic chosen for implementation simplicity while still achieving substantial gains
- **Accuracy Tolerance**: Balanced compression ratio against accuracy preservation requirements

#### Mobile Deployment Considerations
- **ARM Compatibility**: QNNPACK backend ensures optimal performance on ARM processors
- **Memory Efficiency**: Reduced model size directly translates to lower memory footprint
- **Inference Speed**: INT8 operations significantly faster on mobile hardware with dedicated support

### Lessons Learned and Best Practices

#### Compression Technique Ordering
1. **Sequential Application**: Order matters significantly - pruning before quantization is more effective than the reverse
2. **Cumulative Effects**: Each technique builds upon the previous optimization
3. **Compatibility**: Certain techniques work better together than in isolation

#### MobileNetV3-Specific Insights
- **Robust Architecture**: Shows good resistance to aggressive compression
- **Efficiency Bottlenecks**: Certain layers more critical than others for performance preservation
- **Quantization Tolerance**: Architecture handles precision reduction well due to design choices

### Future Improvement Opportunities

#### Advanced Optimization Techniques
1. **Knowledge Distillation**: Could recover accuracy if needed while maintaining compression gains
2. **Structured Pruning**: Channel-level pruning for actual architectural simplification
3. **Neural Architecture Search**: Automated optimization for specific deployment constraints

#### Deployment-Specific Optimizations
- **Hardware-Aware Quantization**: Calibration data for static quantization on target devices
- **Framework Optimization**: TensorRT or CoreML conversion for additional hardware acceleration
- **Edge Computing Integration**: Optimization for specific edge computing platforms

### Recommendations for Production Deployment

#### Immediate Actions
1. **Mobile Testing**: Validate performance on target mobile devices
2. **Accuracy Validation**: Extended testing on larger datasets if available
3. **Model Conversion**: Prepare mobile-specific formats (TorchScript, ONNX, etc.)

#### Long-term Strategy
- **Continuous Monitoring**: Track performance degradation over time
- **Iterative Improvement**: Regular retraining and re-optimization cycles
- **Hardware Evolution**: Adapt to new mobile hardware capabilities

### Conclusion
The multi-stage compression pipeline demonstrates that modern neural networks can be significantly optimized for mobile deployment while maintaining acceptable performance characteristics. The combination of pruning and quantization provides a robust foundation for deploying computer vision models in resource-constrained environments, with MobileNetV3's architecture proving particularly well-suited for these optimization techniques.

> 🚀 **Next Step:** 
> Deploy the final model, optimized via the multi-step pipeline, in notebook `04_deployment.ipynb`  