# Performance Analysis Deep Dive

Welcome to the **Amorsize Performance Analysis** notebook! 🚀

In this tutorial, you'll learn how to:
- 🔍 **Identify performance bottlenecks** using diagnostic profiling
- 📊 **Visualize overhead breakdown** and understand where time is spent
- 📈 **Monitor execution** in real-time using hooks
- ⚡ **Optimize based on insights** from bottleneck analysis
- 🎯 **Compare different strategies** and measure their impact

This notebook goes beyond basic optimization to help you **understand why** certain configurations work better and **how to diagnose** performance issues in your own workloads.

---

## Prerequisites

Make sure you have completed `01_getting_started.ipynb` first. This notebook assumes familiarity with:
- Basic `optimize()` usage
- Understanding speedup and overhead concepts
- Reading optimization results

---

## Setup

First, let's import everything we need and set up visualization:

In [None]:
# Enable inline plotting
%matplotlib inline

import time
import numpy as np
import matplotlib.pyplot as plt
from typing import List, Dict, Any

# Import Amorsize
from amorsize import (
    optimize,
    execute,
    analyze_bottlenecks,
    format_bottleneck_report,
    HookManager,
    HookEvent,
    create_progress_hook,
    create_timing_hook,
    create_throughput_hook,
)

# Configure matplotlib for better looking plots
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

print("✓ All imports successful!")
print("Ready to analyze performance! 🚀")

---

## Part 1: Understanding Diagnostic Profiling

Diagnostic profiling gives you **transparency** into the optimizer's decision-making process. It answers questions like:
- Why did it choose `n_jobs=4` instead of `8`?
- What's limiting my speedup?
- Is spawn overhead, IPC, or something else the bottleneck?

### 1.1 Basic Diagnostic Profile

In [None]:
def cpu_bound_task(n):
    """A typical CPU-intensive computation."""
    result = 0
    for i in range(5000):
        result += n ** 2 + i ** 0.5
    return result

# Optimize with diagnostic profiling enabled
data = range(200)
result = optimize(
    cpu_bound_task,
    data,
    sample_size=5,
    profile=True,  # Enable diagnostic profiling
    verbose=False
)

print(f"Recommendation: n_jobs={result.n_jobs}, chunksize={result.chunksize}")
print(f"Expected speedup: {result.estimated_speedup:.2f}x")
print(f"\nReason: {result.reason}")

# Show the detailed diagnostic report
print("\n" + "="*80)
print("DETAILED DIAGNOSTIC PROFILE")
print("="*80)
print(result.explain())

### 1.2 Accessing Profile Data Programmatically

The diagnostic profile contains **all the metrics** used in optimization:

In [None]:
if result.profile:
    p = result.profile
    
    print("📊 Sampling Results:")
    print(f"  • Execution time per item: {p.avg_execution_time*1000:.3f} ms")
    print(f"  • IPC overhead per item:   {p.avg_pickle_time*1000:.3f} ms")
    print(f"  • Sample count:            {p.sample_count}")
    print(f"  • Workload type:           {p.workload_type}")
    
    print("\n⚙️  System Information:")
    print(f"  • Physical cores:          {p.physical_cores}")
    print(f"  • Logical cores:           {p.logical_cores}")
    print(f"  • Spawn cost per worker:   {p.spawn_cost*1000:.1f} ms")
    print(f"  • Available memory:        {p.available_memory / (1024**3):.1f} GB")
    
    print("\n🎯 Optimization Decisions:")
    print(f"  • Max workers (CPU):       {p.max_workers_cpu}")
    print(f"  • Max workers (memory):    {p.max_workers_memory}")
    print(f"  • Chosen workers:          {result.n_jobs}")
    print(f"  • Optimal chunksize:       {p.optimal_chunksize}")
    
    print("\n📈 Performance Metrics:")
    print(f"  • Theoretical max speedup: {p.theoretical_max_speedup:.2f}x")
    print(f"  • Estimated speedup:       {p.estimated_speedup:.2f}x")
    print(f"  • Parallel efficiency:     {p.speedup_efficiency*100:.1f}%")

---

## Part 2: Bottleneck Analysis

**Bottleneck analysis** identifies what's preventing better performance. Let's explore different types of bottlenecks:

### 2.1 Spawn Overhead Bottleneck

In [None]:
def fast_task(x):
    """Very fast task - spawn overhead will dominate."""
    return x * 2

data = range(1000)
result = optimize(fast_task, data, sample_size=10, profile=True, verbose=False)

print(f"Result: n_jobs={result.n_jobs}, speedup={result.estimated_speedup:.2f}x")
print(f"\nBottleneck Analysis:")

# Run bottleneck analysis
ba = run_bottleneck_analysis(result)
if ba:
    print(f"  • Primary bottleneck: {ba.primary_bottleneck.value}")
    print(f"  • Severity: {ba.bottleneck_severity*100:.1f}%")
    print(f"  • Efficiency score: {ba.efficiency_score*100:.1f}%")
    
    if ba.recommendations:
        print("\n💡 Recommendations:")
        for rec in ba.recommendations:
            print(f"  {rec}")

### 2.2 IPC/Serialization Overhead

In [None]:
def data_heavy_task(data_item):
    """Task with heavy data serialization."""
    # Simulate processing large data
    large_list = list(range(1000))
    return sum(large_list) + data_item

data = range(100)
result = optimize(data_heavy_task, data, sample_size=5, profile=True, verbose=False)

print(f"Result: n_jobs={result.n_jobs}, speedup={result.estimated_speedup:.2f}x")

if result.profile:
    p = result.profile
    print(f"\n📦 Data Serialization:")
    print(f"  • Data pickle time:   {p.avg_data_pickle_time*1000:.3f} ms")
    print(f"  • Result pickle time: {p.avg_pickle_time*1000:.3f} ms")
    print(f"  • Data size:          {p.data_size_bytes} bytes")
    print(f"  • Return size:        {p.return_size_bytes} bytes")

### 2.3 Visualizing Overhead Breakdown

In [None]:
def create_overhead_breakdown_chart(profile):
    """Create a pie chart showing overhead breakdown."""
    if not profile:
        print("No profile data available")
        return
    
    breakdown = profile.get_overhead_breakdown()
    
    # Prepare data
    labels = ['Spawn', 'IPC/Serialization', 'Chunking']
    sizes = [breakdown['spawn'], breakdown['ipc'], breakdown['chunking']]
    colors = ['#ff9999', '#66b3ff', '#99ff99']
    explode = (0.1, 0, 0)  # Explode spawn overhead slice
    
    # Create pie chart
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.pie(sizes, explode=explode, labels=labels, colors=colors,
           autopct='%1.1f%%', shadow=True, startangle=90)
    ax.axis('equal')
    ax.set_title('Parallelization Overhead Breakdown', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    print("\n📊 Overhead Breakdown:")
    for label, size in zip(labels, sizes):
        print(f"  • {label:20s}: {size:5.1f}%")

# Visualize for our CPU-bound example
data = range(200)
result = optimize(cpu_bound_task, data, sample_size=5, profile=True, verbose=False)
create_overhead_breakdown_chart(result.profile)

---

## Part 3: Real-Time Monitoring with Hooks

Hooks let you **monitor execution** in real-time, collect metrics, and integrate with external monitoring systems.

### 3.1 Basic Progress Monitoring

In [None]:
def process_item(x):
    """Simple processing function."""
    time.sleep(0.01)  # Simulate work
    return x ** 2

# Create hook manager
hooks = HookManager()

# Track progress
progress_data = []

def track_progress(percent, completed, total):
    progress_data.append((time.time(), percent, completed))
    if completed % 20 == 0 or completed == total:
        print(f"Progress: {percent:5.1f}% ({completed:3d}/{total:3d} items)")

# Register progress hook
progress_hook = create_progress_hook(track_progress, min_interval=0.0)
hooks.register(HookEvent.POST_EXECUTE, progress_hook)

# Execute with monitoring
data = range(100)
print("Starting execution with real-time monitoring...\n")
results = execute(process_item, data, hooks=hooks, verbose=False)
print(f"\n✓ Completed! Processed {len(results)} items")

### 3.2 Collecting Performance Metrics

In [None]:
# Create metrics collector
metrics = {
    'start_time': None,
    'end_time': None,
    'n_jobs': 0,
    'chunksize': 0,
    'total_items': 0,
    'elapsed_time': 0,
    'throughput': 0
}

hooks = HookManager()

def collect_start_metrics(ctx):
    metrics['start_time'] = ctx.timestamp
    metrics['n_jobs'] = ctx.n_jobs
    metrics['chunksize'] = ctx.chunksize
    metrics['total_items'] = ctx.total_items
    print(f"Started: n_jobs={ctx.n_jobs}, chunksize={ctx.chunksize}, items={ctx.total_items}")

def collect_end_metrics(ctx):
    metrics['end_time'] = ctx.timestamp
    metrics['elapsed_time'] = ctx.elapsed_time
    metrics['throughput'] = ctx.throughput_items_per_sec
    print(f"Completed in {ctx.elapsed_time:.2f}s")
    print(f"Throughput: {ctx.throughput_items_per_sec:.1f} items/sec")

hooks.register(HookEvent.PRE_EXECUTE, collect_start_metrics)
hooks.register(HookEvent.POST_EXECUTE, collect_end_metrics)

# Execute with metrics collection
data = range(150)
results = execute(cpu_bound_task, data, hooks=hooks, verbose=False)

print("\n📊 Collected Metrics:")
for key, value in metrics.items():
    if 'time' in key.lower() and isinstance(value, float) and value > 100:
        continue  # Skip timestamp values
    print(f"  • {key:15s}: {value}")

### 3.3 Throughput Visualization

In [None]:
# Compare throughput for different worker counts
throughput_results = []

for n_jobs in [1, 2, 4, 8]:
    hooks = HookManager()
    throughput_value = [0]
    
    def capture_throughput(rate):
        throughput_value[0] = rate
    
    hooks.register(HookEvent.POST_EXECUTE, create_throughput_hook(capture_throughput))
    
    data = range(200)
    results = execute(cpu_bound_task, data, n_jobs=n_jobs, hooks=hooks, verbose=False)
    
    throughput_results.append((n_jobs, throughput_value[0]))
    print(f"n_jobs={n_jobs}: {throughput_value[0]:.1f} items/sec")

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
n_jobs_list = [r[0] for r in throughput_results]
throughput_list = [r[1] for r in throughput_results]

ax.bar(n_jobs_list, throughput_list, color='skyblue', edgecolor='navy', alpha=0.7)
ax.set_xlabel('Number of Workers (n_jobs)', fontsize=12)
ax.set_ylabel('Throughput (items/sec)', fontsize=12)
ax.set_title('Throughput vs Worker Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3)

for i, (n, t) in enumerate(throughput_results):
    ax.text(n, t + max(throughput_list)*0.02, f'{t:.1f}', 
            ha='center', va='bottom', fontsize=10, fontweight='bold')

plt.tight_layout()
plt.show()

---

## Part 4: Comparative Performance Analysis

Let's compare different scenarios to understand what makes parallelization effective:

### 4.1 Impact of Task Duration

In [None]:
def variable_duration_task(n, duration_ms):
    """Task with configurable duration."""
    iterations = int(duration_ms * 100)
    result = 0
    for i in range(iterations):
        result += n ** 2
    return result

# Test different task durations
durations = [0.1, 0.5, 1.0, 5.0, 10.0]  # milliseconds
speedup_results = []

print("Testing different task durations...\n")
for duration in durations:
    data = range(100)
    result = optimize(
        lambda x: variable_duration_task(x, duration),
        data,
        sample_size=5,
        profile=True,
        verbose=False
    )
    speedup_results.append((duration, result.estimated_speedup, result.n_jobs))
    print(f"Duration: {duration:5.1f}ms → Speedup: {result.estimated_speedup:5.2f}x (n_jobs={result.n_jobs})")

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Speedup vs Duration
ax1.plot([r[0] for r in speedup_results], [r[1] for r in speedup_results], 
         'o-', linewidth=2, markersize=8, color='green')
ax1.set_xlabel('Task Duration (ms)', fontsize=12)
ax1.set_ylabel('Speedup', fontsize=12)
ax1.set_title('Speedup vs Task Duration', fontsize=14, fontweight='bold')
ax1.grid(alpha=0.3)

# Optimal n_jobs vs Duration
ax2.plot([r[0] for r in speedup_results], [r[2] for r in speedup_results], 
         's-', linewidth=2, markersize=8, color='blue')
ax2.set_xlabel('Task Duration (ms)', fontsize=12)
ax2.set_ylabel('Optimal n_jobs', fontsize=12)
ax2.set_title('Optimal Workers vs Task Duration', fontsize=14, fontweight='bold')
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 Insight: Longer tasks benefit more from parallelization!")
print("   Overhead becomes less significant relative to computation time.")

### 4.2 Workload Size Impact

In [None]:
# Test different workload sizes
workload_sizes = [50, 100, 200, 500, 1000]
size_results = []

print("Testing different workload sizes...\n")
for size in workload_sizes:
    data = range(size)
    result = optimize(
        cpu_bound_task,
        data,
        sample_size=5,
        profile=True,
        verbose=False
    )
    size_results.append((size, result.estimated_speedup, result.chunksize))
    print(f"Size: {size:4d} → Speedup: {result.estimated_speedup:5.2f}x (chunksize={result.chunksize})")

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot([r[0] for r in size_results], [r[1] for r in size_results], 
        'o-', linewidth=2, markersize=8, color='purple', label='Speedup')
ax.set_xlabel('Workload Size (items)', fontsize=12)
ax.set_ylabel('Speedup', fontsize=12)
ax.set_title('Speedup vs Workload Size', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)
ax.legend()

plt.tight_layout()
plt.show()

print("\n💡 Insight: Larger workloads allow for better parallelization!")
print("   More items = better amortization of spawn and setup costs.")

---

## Part 5: Advanced: Custom Monitoring Dashboard

Let's build a **complete monitoring dashboard** that tracks multiple metrics in real-time:

In [None]:
class PerformanceDashboard:
    """Real-time performance monitoring dashboard."""
    
    def __init__(self):
        self.metrics = {
            'execution_id': int(time.time()),
            'status': 'initialized',
            'n_jobs': 0,
            'chunksize': 0,
            'total_items': 0,
            'start_time': 0,
            'end_time': 0,
            'duration': 0,
            'throughput': 0,
            'items_completed': 0
        }
        self.hooks = HookManager()
        self._setup_hooks()
    
    def _setup_hooks(self):
        """Configure monitoring hooks."""
        self.hooks.register(HookEvent.PRE_EXECUTE, self._on_start)
        self.hooks.register(HookEvent.POST_EXECUTE, self._on_complete)
    
    def _on_start(self, ctx):
        """Called when execution starts."""
        self.metrics['status'] = 'running'
        self.metrics['n_jobs'] = ctx.n_jobs
        self.metrics['chunksize'] = ctx.chunksize
        self.metrics['total_items'] = ctx.total_items
        self.metrics['start_time'] = ctx.timestamp
    
    def _on_complete(self, ctx):
        """Called when execution completes."""
        self.metrics['status'] = 'completed'
        self.metrics['end_time'] = ctx.timestamp
        self.metrics['duration'] = ctx.elapsed_time
        self.metrics['throughput'] = ctx.throughput_items_per_sec
        self.metrics['items_completed'] = ctx.items_completed
    
    def get_hooks(self):
        """Get the configured hooks."""
        return self.hooks
    
    def display(self):
        """Display the dashboard."""
        print("=" * 80)
        print(f"Performance Dashboard - Execution #{self.metrics['execution_id']}")
        print("=" * 80)
        print(f"Status:           {self.metrics['status'].upper()}")
        print(f"Configuration:    n_jobs={self.metrics['n_jobs']}, chunksize={self.metrics['chunksize']}")
        print(f"Workload:         {self.metrics['total_items']} items")
        if self.metrics['status'] == 'completed':
            print(f"Duration:         {self.metrics['duration']:.2f}s")
            print(f"Throughput:       {self.metrics['throughput']:.1f} items/sec")
            print(f"Items Completed:  {self.metrics['items_completed']}")
        print("=" * 80)

# Create and use dashboard
dashboard = PerformanceDashboard()

print("Starting monitored execution...\n")
data = range(150)
results = execute(cpu_bound_task, data, hooks=dashboard.get_hooks(), verbose=False)

print("\n")
dashboard.display()

---

## Part 6: Practical Optimization Workflow

Let's put everything together in a **real-world optimization workflow**:

### 6.1 Complete Analysis Pipeline

In [None]:
def analyze_and_optimize(func, data, sample_size=5):
    """Complete analysis and optimization workflow."""
    
    print("="*80)
    print("STEP 1: Optimization with Diagnostic Profiling")
    print("="*80)
    
    # Optimize with profiling
    result = optimize(func, data, sample_size=sample_size, profile=True, verbose=False)
    
    print(f"\nRecommendation: n_jobs={result.n_jobs}, chunksize={result.chunksize}")
    print(f"Expected speedup: {result.estimated_speedup:.2f}x")
    print(f"Reason: {result.reason}")
    
    if result.warnings:
        print("\n⚠️  Warnings:")
        for warning in result.warnings:
            print(f"  • {warning}")
    
    # Bottleneck analysis
    if result.profile and result.profile.bottleneck_analysis:
        print("\n" + "="*80)
        print("STEP 2: Bottleneck Analysis")
        print("="*80)
        
        ba = result.profile.bottleneck_analysis
        print(f"\nPrimary bottleneck: {ba.primary_bottleneck.value}")
        print(f"Severity: {ba.bottleneck_severity*100:.1f}%")
        print(f"Efficiency: {ba.efficiency_score*100:.1f}%")
        
        if ba.recommendations:
            print("\n💡 Recommendations:")
            for rec in ba.recommendations:
                for line in rec.split('\n'):
                    print(f"  {line}")
    
    # Overhead breakdown visualization
    if result.profile:
        print("\n" + "="*80)
        print("STEP 3: Overhead Breakdown")
        print("="*80 + "\n")
        create_overhead_breakdown_chart(result.profile)
    
    # Monitored execution
    print("\n" + "="*80)
    print("STEP 4: Monitored Execution")
    print("="*80 + "\n")
    
    dashboard = PerformanceDashboard()
    results = execute(func, data, hooks=dashboard.get_hooks(), verbose=False)
    
    print("\n")
    dashboard.display()
    
    return result, results

# Run complete analysis
data = range(200)
result, results = analyze_and_optimize(cpu_bound_task, data, sample_size=5)

---

## Key Takeaways 🎯

1. **Diagnostic Profiling** gives you complete transparency into optimization decisions
2. **Bottleneck Analysis** identifies what's limiting your performance
3. **Overhead Breakdown** shows where parallelization time is spent
4. **Hooks** enable real-time monitoring and integration with external systems
5. **Task Duration** is critical - longer tasks benefit more from parallelization
6. **Workload Size** matters - larger workloads allow better optimization

### When to Use These Tools:

- **Diagnostic Profiling**: Always use in development to understand optimization
- **Bottleneck Analysis**: When speedup is lower than expected
- **Overhead Visualization**: To communicate performance characteristics
- **Hooks**: For production monitoring and integration

---

## Next Steps 📚

Now that you understand performance analysis, explore:

1. **Use Case Guides** for domain-specific patterns:
   - [Web Services](../../docs/USE_CASE_WEB_SERVICES.md)
   - [Data Processing](../../docs/USE_CASE_DATA_PROCESSING.md)
   - [ML Pipelines](../../docs/USE_CASE_ML_PIPELINES.md)

2. **Advanced Topics**:
   - [Performance Optimization](../../docs/PERFORMANCE_OPTIMIZATION.md)
   - [Best Practices](../../docs/BEST_PRACTICES.md)
   - [Troubleshooting](../../docs/TROUBLESHOOTING.md)

3. **More Examples**:
   - Check out `examples/` directory for specific scenarios
   - Try adapting these patterns to your own workloads

---

## Happy Optimizing! 🚀

Remember: **Measure, don't guess!** Use these tools to understand your workload and make informed optimization decisions.