# 🚀 ML Model Optimization Suite

This notebook demonstrates comprehensive optimization of different ML model types:
- **LLMs** (Large Language Models)
- **Stable Diffusion** (Image Generation)
- **Text Encoders** (Embeddings & NLP)
- **Audio Models** (Sound-to-Vector)

Each optimization includes hardware-specific tuning and detailed performance analysis.

## 🔍 Step 1: System Hardware Analysis

First, let's analyze your hardware specifications to determine optimal optimization strategies.

In [None]:
import sys
sys.path.append('/workspace/scripts')
sys.path.append('/workspace/configs')

import json
import torch
from pathlib import Path

# Run system analysis
print("🔍 Analyzing system hardware...")
!python /workspace/scripts/system_info.py

# Load and display key specs
with open('/workspace/system_specs.json', 'r') as f:
    specs = json.load(f)

print("\n📊 Key Hardware Specifications:")
print(f"CPU Cores: {specs['cpu']['cores_logical']}")
print(f"RAM: {specs['memory']['total_gb']} GB")
print(f"GPU Count: {len(specs['gpu'])}")
if specs['gpu']:
    for i, gpu in enumerate(specs['gpu']):
        print(f"GPU {i}: {gpu['name']} ({gpu['total_memory_gb']} GB VRAM)")

## 🤖 Step 2: LLM Optimization

Optimize Large Language Models with quantization and LoRA techniques.

In [None]:
print("🤖 Starting LLM Optimization...")
print("Using INT8 quantization and LoRA for efficient fine-tuning")

# Run LLM optimization
!python /workspace/scripts/optimize_llm.py --model microsoft/DialoGPT-small --quantization int8 --use-lora

# Load and display results
results_file = Path('/workspace/results/llm_optimization_results_int8.json')
if results_file.exists():
    with open(results_file, 'r') as f:
        llm_results = json.load(f)
    
    print("\n📊 LLM Optimization Results:")
    if 'benchmark' in llm_results:
        benchmark = llm_results['benchmark']
        print(f"Tokens per second: {benchmark.get('tokens_per_second', 'N/A')}")
        print(f"Average time per run: {benchmark.get('avg_time_per_run', 'N/A'):.3f}s")
        print(f"Peak memory usage: {benchmark.get('peak_memory_gb', 'N/A'):.2f} GB")
else:
    print("⚠️ LLM optimization results not found")

## 🎨 Step 3: Stable Diffusion Optimization

Optimize Stable Diffusion for faster image generation with memory efficiency.

In [None]:
print("🎨 Starting Stable Diffusion Optimization...")
print("Using FP16 precision with memory-efficient attention")

# Run Stable Diffusion optimization
!python /workspace/scripts/optimize_stable_diffusion.py --model runwayml/stable-diffusion-v1-5 --precision fp16

# Load and display results
results_file = Path('/workspace/results/sd_optimization_results_fp16.json')
if results_file.exists():
    with open(results_file, 'r') as f:
        sd_results = json.load(f)
    
    print("\n📊 Stable Diffusion Optimization Results:")
    if 'single_image_benchmark' in sd_results:
        benchmark = sd_results['single_image_benchmark']
        print(f"Average time per image: {benchmark.get('avg_time_per_image', 'N/A'):.2f}s")
        print(f"Peak memory usage: {benchmark.get('peak_memory_gb', 'N/A'):.2f} GB")
        print(f"Images generated: {benchmark.get('images_generated', 'N/A')}")
    
    if 'batch_benchmark' in sd_results:
        batch = sd_results['batch_benchmark']
        print(f"Batch throughput: {batch.get('throughput_images_per_second', 'N/A'):.2f} images/sec")
else:
    print("⚠️ Stable Diffusion optimization results not found")

## 📝 Step 4: Text Encoder Optimization

Optimize text encoders for efficient embedding generation and similarity search.

In [None]:
print("📝 Starting Text Encoder Optimization...")
print("Optimizing sentence transformers for embedding generation")

# Run text encoder optimization
!python /workspace/scripts/optimize_text_encoder.py --model sentence-transformers/all-MiniLM-L6-v2

# Load and display results
results_file = Path('/workspace/results/text_encoder_optimization_results_sentence_transformer.json')
if results_file.exists():
    with open(results_file, 'r') as f:
        text_results = json.load(f)
    
    print("\n📊 Text Encoder Optimization Results:")
    if 'encoding_benchmark' in text_results:
        # Show best performing batch size
        benchmark = text_results['encoding_benchmark']
        best_batch = max(benchmark.items(), key=lambda x: x[1]['texts_per_second'])
        batch_name, batch_results = best_batch
        print(f"Best batch size: {batch_name}")
        print(f"Texts per second: {batch_results['texts_per_second']:.2f}")
        print(f"Peak memory: {batch_results['peak_memory_gb']:.2f} GB")
    
    if 'similarity_benchmark' in text_results:
        sim = text_results['similarity_benchmark']
        print(f"Similarity search: {sim.get('queries_per_second', 'N/A'):.2f} queries/sec")
else:
    print("⚠️ Text encoder optimization results not found")

## 🔊 Step 5: Audio Model Optimization

Optimize audio models for sound-to-vector conversion with preprocessing enhancements.

In [None]:
print("🔊 Starting Audio Model Optimization...")
print("Optimizing Wav2Vec2 for audio embedding generation")

# Run audio model optimization
!python /workspace/scripts/optimize_sound_to_vec.py --model facebook/wav2vec2-base-960h --enable-preprocessing

# Load and display results
results_file = Path('/workspace/results/audio_optimization_results_fp32.json')
if results_file.exists():
    with open(results_file, 'r') as f:
        audio_results = json.load(f)
    
    print("\n📊 Audio Model Optimization Results:")
    if 'encoding_benchmark' in audio_results:
        # Show best performing batch size
        benchmark = audio_results['encoding_benchmark']
        if benchmark:
            best_batch = max(benchmark.items(), key=lambda x: x[1]['audio_clips_per_second'])
            batch_name, batch_results = best_batch
            print(f"Best batch size: {batch_name}")
            print(f"Audio clips per second: {batch_results['audio_clips_per_second']:.2f}")
            print(f"Peak memory: {batch_results['peak_memory_gb']:.2f} GB")
    
    if 'preprocessing_optimization' in audio_results:
        preproc = audio_results['preprocessing_optimization']
        best_preproc = max(preproc.items(), key=lambda x: x[1]['samples_per_second'])
        preproc_name, preproc_results = best_preproc
        print(f"Best preprocessing: {preproc_name}")
        print(f"Samples per second: {preproc_results['samples_per_second']:.2f}")
else:
    print("⚠️ Audio optimization results not found")

## 📊 Step 6: Comprehensive Performance Analysis

Generate and display a comprehensive performance report with optimization recommendations.

In [None]:
print("📊 Generating Comprehensive Performance Report...")

# Run master optimizer to generate comprehensive report
!python /workspace/scripts/master_optimizer.py

# Load comprehensive report
report_file = Path('/workspace/results/comprehensive_optimization_report.json')
if report_file.exists():
    with open(report_file, 'r') as f:
        report = json.load(f)
    
    print("\n🎯 COMPREHENSIVE PERFORMANCE REPORT")
    print("="*50)
    
    # System Summary
    if 'system_summary' in report:
        sys_summary = report['system_summary']
        print(f"\n🖥️ System Configuration:")
        print(f"  CPU Cores: {sys_summary.get('cpu_cores', 'Unknown')}")
        print(f"  Memory: {sys_summary.get('memory_gb', 'Unknown')} GB")
        print(f"  GPU Count: {sys_summary.get('gpu_count', 'Unknown')}")
        print(f"  Total GPU Memory: {sys_summary.get('gpu_memory_gb', 'Unknown')} GB")
    
    # Performance Metrics
    if 'performance_metrics' in report:
        print(f"\n⚡ Performance Metrics:")
        for model_type, metrics in report['performance_metrics'].items():
            print(f"  {model_type.replace('_', ' ').title()}:")
            print(f"    Processing Speed: {metrics.get('processing_speed', 'N/A')}")
            print(f"    Memory Usage: {metrics.get('memory_usage_gb', 'N/A')} GB")
            print(f"    Avg Processing Time: {metrics.get('avg_processing_time', 'N/A')}s")
    
    # Recommendations
    if 'recommendations' in report and report['recommendations']:
        print(f"\n💡 Optimization Recommendations:")
        for i, rec in enumerate(report['recommendations'], 1):
            print(f"  {i}. {rec}")
    else:
        print(f"\n✅ Your system is well-optimized for the tested models!")
    
    print(f"\n📁 Detailed results saved to: /workspace/results/")
    print(f"📊 Full report: comprehensive_optimization_report.json")
else:
    print("⚠️ Comprehensive report not found")

## 🔍 Step 7: View Generated Images (Stable Diffusion)

Display images generated during the Stable Diffusion optimization.

In [None]:
import matplotlib.pyplot as plt
from PIL import Image
import os

# Look for generated images
results_dir = Path('/workspace/results')
image_files = list(results_dir.glob('generated_image_*.png'))

if image_files:
    print(f"🎨 Displaying {len(image_files)} generated images:")
    
    # Create subplot for images
    fig, axes = plt.subplots(1, min(len(image_files), 4), figsize=(16, 4))
    if len(image_files) == 1:
        axes = [axes]
    
    for i, img_file in enumerate(image_files[:4]):
        img = Image.open(img_file)
        if i < len(axes):
            axes[i].imshow(img)
            axes[i].axis('off')
            axes[i].set_title(f'Image {i+1}')
    
    # Hide unused subplots
    for i in range(len(image_files), len(axes)):
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()
else:
    print("⚠️ No generated images found. Run Stable Diffusion optimization first.")

## 📈 Step 8: Performance Visualization

Create visualizations of the optimization results.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Collect performance data from all optimizations
performance_data = {}
memory_data = {}

# Try to load all result files
result_files = {
    'LLM (INT8)': '/workspace/results/llm_optimization_results_int8.json',
    'Stable Diffusion (FP16)': '/workspace/results/sd_optimization_results_fp16.json',
    'Text Encoder': '/workspace/results/text_encoder_optimization_results_sentence_transformer.json',
    'Audio Model': '/workspace/results/audio_optimization_results_fp32.json'
}

for model_name, file_path in result_files.items():
    if Path(file_path).exists():
        with open(file_path, 'r') as f:
            data = json.load(f)
        
        # Extract performance metrics
        if model_name.startswith('LLM') and 'benchmark' in data:
            performance_data[model_name] = data['benchmark'].get('tokens_per_second', 0)
            memory_data[model_name] = data['benchmark'].get('peak_memory_gb', 0)
        elif model_name.startswith('Stable Diffusion') and 'single_image_benchmark' in data:
            # Convert to images per second
            avg_time = data['single_image_benchmark'].get('avg_time_per_image', 1)
            performance_data[model_name] = 1 / avg_time if avg_time > 0 else 0
            memory_data[model_name] = data['single_image_benchmark'].get('peak_memory_gb', 0)
        elif model_name.startswith('Text Encoder') and 'encoding_benchmark' in data:
            # Get best batch performance
            benchmark = data['encoding_benchmark']
            if benchmark:
                best_perf = max(batch['texts_per_second'] for batch in benchmark.values())
                performance_data[model_name] = best_perf
                best_batch = max(benchmark.items(), key=lambda x: x[1]['texts_per_second'])[1]
                memory_data[model_name] = best_batch.get('peak_memory_gb', 0)
        elif model_name.startswith('Audio') and 'encoding_benchmark' in data:
            # Get best batch performance
            benchmark = data['encoding_benchmark']
            if benchmark:
                best_perf = max(batch['audio_clips_per_second'] for batch in benchmark.values())
                performance_data[model_name] = best_perf
                best_batch = max(benchmark.items(), key=lambda x: x[1]['audio_clips_per_second'])[1]
                memory_data[model_name] = best_batch.get('peak_memory_gb', 0)

# Create visualizations
if performance_data:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Performance chart
    models = list(performance_data.keys())
    performance = list(performance_data.values())
    
    bars1 = ax1.bar(models, performance, color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4'])
    ax1.set_title('Model Performance (Items/Second)', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Items per Second')
    ax1.tick_params(axis='x', rotation=45)
    
    # Add value labels on bars
    for bar, value in zip(bars1, performance):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(performance)*0.01,
                f'{value:.1f}', ha='center', va='bottom', fontweight='bold')
    
    # Memory usage chart
    memory = [memory_data.get(model, 0) for model in models]
    bars2 = ax2.bar(models, memory, color=['#FFB6C1', '#B6E5D8', '#B6D4F1', '#D4F1C9'])
    ax2.set_title('Peak Memory Usage (GB)', fontsize=14, fontweight='bold')
    ax2.set_ylabel('Memory (GB)')
    ax2.tick_params(axis='x', rotation=45)
    
    # Add value labels on bars
    for bar, value in zip(bars2, memory):
        if value > 0:
            ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(memory)*0.01,
                    f'{value:.1f}', ha='center', va='bottom', fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    print("📊 Performance Summary:")
    for model, perf in performance_data.items():
        mem = memory_data.get(model, 0)
        print(f"  {model}: {perf:.1f} items/sec, {mem:.1f} GB memory")
else:
    print("⚠️ No performance data available. Run optimizations first.")

## 🎯 Summary and Next Steps

This notebook demonstrated comprehensive ML model optimization with hardware-specific tuning. Here's what we accomplished:

### ✅ Completed Optimizations
1. **System Analysis**: Hardware specifications and capabilities
2. **LLM Optimization**: Quantization and LoRA for efficient inference
3. **Stable Diffusion**: Memory-efficient image generation
4. **Text Encoders**: Batch optimization and embedding compression
5. **Audio Models**: Sound processing and feature extraction

### 📊 Key Metrics Tracked
- **Throughput**: Items processed per second
- **Memory Usage**: Peak GPU memory consumption
- **Latency**: Average processing time per item
- **Hardware Utilization**: GPU and memory efficiency

### 🚀 Next Steps
1. **Production Deployment**: Export optimized models for serving
2. **Custom Models**: Adapt scripts for your specific models
3. **Advanced Optimization**: TensorRT, ONNX export, multi-GPU
4. **Monitoring**: Set up continuous performance monitoring

### 📁 Generated Files
- **Results**: `/workspace/results/` - All optimization results
- **Models**: `/workspace/models/` - Optimized model exports
- **Images**: Generated Stable Diffusion samples
- **Reports**: Comprehensive performance analysis

The Docker environment provides exact hardware specification matching and comprehensive benchmarking for optimal ML model performance!