# Getting Started with Amorsize

**Interactive Tutorial: Learn multiprocessing optimization in 10 minutes!**

This notebook demonstrates how Amorsize automatically finds optimal `n_jobs` and `chunksize` parameters for Python multiprocessing, preventing "negative scaling" where parallelism makes code slower.

## What You'll Learn
1. Why blindly using `n_jobs=-1` can hurt performance
2. How Amorsize analyzes and optimizes your workloads
3. Hands-on examples with real performance comparisons
4. Interactive parameter tuning playground

## Prerequisites
```bash
pip install git+https://github.com/CampbellTrevor/Amorsize.git
pip install matplotlib  # For visualizations
```

---
## Part 1: The Problem with Blind Parallelization

Let's see what happens when we blindly parallelize without optimization.

In [None]:
import time
from multiprocessing import Pool
import os

def cpu_intensive_function(x):
    """A CPU-intensive function that takes ~1ms per item"""
    result = 0
    for i in range(10000):
        result += x ** 2
    return result

# Test data
data = list(range(100))

# Serial execution (baseline)
start = time.time()
serial_results = [cpu_intensive_function(x) for x in data]
serial_time = time.time() - start

print(f"Serial execution time: {serial_time:.3f}s")

In [None]:
# Blind parallelization (might be slower!)
start = time.time()
with Pool(processes=os.cpu_count()) as pool:
    parallel_results = pool.map(cpu_intensive_function, data, chunksize=1)
blind_parallel_time = time.time() - start

print(f"Blind parallel time: {blind_parallel_time:.3f}s")
print(f"Speedup: {serial_time / blind_parallel_time:.2f}x")

if blind_parallel_time > serial_time:
    print("⚠️ NEGATIVE SCALING! Parallelism made it SLOWER!")
else:
    print("✅ Got some speedup, but is it optimal?")

**Why does this happen?**

- Process spawning overhead (especially on Windows/macOS)
- Data serialization (pickle) overhead
- Inter-process communication overhead
- Small chunksize = more overhead
- Too many workers = cache thrashing

---
## Part 2: The Amorsize Solution

Now let's use Amorsize to automatically find optimal parameters.

In [None]:
from amorsize import optimize

# Analyze and get optimal parameters
result = optimize(
    func=cpu_intensive_function,
    data=data,
    verbose=True,
    sample_size=10  # Quick analysis with 10 samples
)

print(f"\n📊 Optimization Results:")
print(f"   Recommended n_jobs: {result.n_jobs}")
print(f"   Recommended chunksize: {result.chunksize}")
print(f"   Estimated speedup: {result.estimated_speedup:.2f}x")
print(f"   Parallel beneficial: {result.n_jobs > 1}")

### What did Amorsize analyze?

Amorsize performed a comprehensive analysis:
- ✅ Measured function execution time with dry runs
- ✅ Detected physical CPU cores (not hyperthreaded)
- ✅ Measured OS overhead (fork vs spawn)
- ✅ Calculated optimal chunksize for ~200ms target duration
- ✅ Applied Amdahl's Law for speedup estimation

In [None]:
# Execute with optimized parameters
from amorsize import execute

start = time.time()
optimized_results = execute(cpu_intensive_function, data, verbose=False)
optimized_time = time.time() - start

print(f"\n⚡ Performance Comparison:")
print(f"   Serial time:         {serial_time:.3f}s")
print(f"   Blind parallel:      {blind_parallel_time:.3f}s ({serial_time/blind_parallel_time:.2f}x)")
print(f"   Amorsize optimized:  {optimized_time:.3f}s ({serial_time/optimized_time:.2f}x)")
print(f"\n🎯 Improvement: {blind_parallel_time/optimized_time:.2f}x better than blind parallelization!")

---
## Part 3: Visualizing the Optimization

Let's visualize how different configurations perform.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Performance comparison bar chart
configs = ['Serial', 'Blind\nParallel', 'Amorsize\nOptimized']
times = [serial_time, blind_parallel_time, optimized_time]
speedups = [1.0, serial_time/blind_parallel_time, serial_time/optimized_time]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Execution time comparison
bars1 = ax1.bar(configs, times, color=['gray', 'orange', 'green'])
ax1.set_ylabel('Execution Time (seconds)', fontsize=12)
ax1.set_title('Execution Time Comparison', fontsize=14, fontweight='bold')
ax1.grid(axis='y', alpha=0.3)

# Add value labels on bars
for bar, time_val in zip(bars1, times):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
            f'{time_val:.3f}s', ha='center', va='bottom', fontweight='bold')

# Speedup comparison
bars2 = ax2.bar(configs, speedups, color=['gray', 'orange', 'green'])
ax2.set_ylabel('Speedup vs Serial', fontsize=12)
ax2.set_title('Speedup Comparison', fontsize=14, fontweight='bold')
ax2.axhline(y=1.0, color='r', linestyle='--', label='Serial baseline')
ax2.grid(axis='y', alpha=0.3)
ax2.legend()

# Add value labels on bars
for bar, speedup in zip(bars2, speedups):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height,
            f'{speedup:.2f}x', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\n🎯 Key Insight: Amorsize optimized configuration is {blind_parallel_time/optimized_time:.2f}x faster!")

---
## Part 4: Diagnostic Insights

Amorsize provides detailed diagnostic information about why it made its recommendations.

In [None]:
# Get detailed diagnostic profile
result = optimize(
    func=cpu_intensive_function,
    data=data,
    verbose=False,
    profile=True  # Enable detailed profiling
)

profile = result.profile

print("🔍 Diagnostic Profile:")
print(f"\n📦 Workload Characteristics:")
print(f"   Total items:              {profile.total_items}")
print(f"   Avg execution time:       {profile.avg_execution_time*1000:.2f}ms per item")
print(f"   Workload type:            {profile.workload_type}")
print(f"   Coefficient of variation: {profile.coefficient_of_variation:.3f}")

print(f"\n🖥️  System Information:")
print(f"   Physical cores:           {profile.physical_cores}")
print(f"   Logical cores:            {profile.logical_cores}")
print(f"   Start method:             {profile.multiprocessing_start_method}")
print(f"   Spawn cost:               {profile.spawn_cost*1000:.1f}ms per worker")

print(f"\n🎯 Optimization Decisions:")
print(f"   Max workers (CPU):        {profile.max_workers_cpu}")
print(f"   Max workers (Memory):     {profile.max_workers_memory}")
print(f"   Optimal chunksize:        {profile.optimal_chunksize}")
print(f"   Target chunk duration:    {profile.target_chunk_duration*1000:.0f}ms")

---
## Part 5: Interactive Parameter Exploration

Let's explore how different worker counts affect performance.

In [None]:
# Test different worker counts
worker_counts = [1, 2, 4, 8, 16]
execution_times = []
speedups = []

print("Testing different worker counts...\n")

for n_workers in worker_counts:
    if n_workers > os.cpu_count():
        continue  # Skip if more than available cores
    
    start = time.time()
    with Pool(processes=n_workers) as pool:
        # Use Amorsize's recommended chunksize
        results = pool.map(cpu_intensive_function, data, 
                          chunksize=result.chunksize)
    elapsed = time.time() - start
    
    execution_times.append(elapsed)
    speedup = serial_time / elapsed
    speedups.append(speedup)
    
    marker = "⭐" if n_workers == result.n_jobs else "  "
    print(f"{marker} n_jobs={n_workers:2d}: {elapsed:.3f}s (speedup: {speedup:.2f}x)")

print(f"\n⭐ = Amorsize recommendation")

In [None]:
# Visualize scaling curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Execution time vs workers
ax1.plot(worker_counts[:len(execution_times)], execution_times, 'o-', linewidth=2, markersize=8)
ax1.axvline(x=result.n_jobs, color='g', linestyle='--', linewidth=2, label=f'Amorsize: {result.n_jobs} workers')
ax1.set_xlabel('Number of Workers', fontsize=12)
ax1.set_ylabel('Execution Time (seconds)', fontsize=12)
ax1.set_title('Execution Time vs Worker Count', fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.legend()

# Speedup vs workers
ax2.plot(worker_counts[:len(speedups)], speedups, 'o-', linewidth=2, markersize=8, color='green')
ax2.plot(worker_counts[:len(speedups)], worker_counts[:len(speedups)], '--', alpha=0.5, label='Linear speedup (ideal)')
ax2.axvline(x=result.n_jobs, color='g', linestyle='--', linewidth=2, label=f'Amorsize: {result.n_jobs} workers')
ax2.set_xlabel('Number of Workers', fontsize=12)
ax2.set_ylabel('Speedup vs Serial', fontsize=12)
ax2.set_title('Speedup vs Worker Count', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.legend()

plt.tight_layout()
plt.show()

print(f"\n📊 Observation: Adding more workers doesn't always help!")
print(f"   Amorsize found the sweet spot at {result.n_jobs} workers.")

---
## Part 6: Real-World Example - Data Processing

Let's apply Amorsize to a practical data processing scenario.

In [None]:
import random

def process_transaction(transaction):
    """Process a financial transaction with validation and calculations"""
    user_id, amount, category = transaction
    
    # Simulate validation
    if amount < 0:
        return {'error': 'negative amount'}
    
    # Calculate derived values
    tax = amount * 0.15
    fee = amount * 0.02
    total = amount + tax + fee
    
    # Simulate some computation
    category_code = hash(category) % 1000
    risk_score = (amount * category_code) % 100
    
    return {
        'user_id': user_id,
        'amount': amount,
        'tax': tax,
        'fee': fee,
        'total': total,
        'category_code': category_code,
        'risk_score': risk_score
    }

# Generate sample transactions
categories = ['food', 'transport', 'entertainment', 'utilities', 'healthcare']
transactions = [
    (i, random.uniform(10, 1000), random.choice(categories))
    for i in range(5000)
]

print(f"Generated {len(transactions)} transactions")
print(f"Sample: {transactions[0]}")

In [None]:
# Process with Amorsize
from amorsize import execute

start = time.time()
processed = execute(process_transaction, transactions, verbose=True)
amorsize_time = time.time() - start

print(f"\n✅ Processed {len(processed)} transactions in {amorsize_time:.3f}s")
print(f"\nSample result:")
print(processed[0])

In [None]:
# Compare with serial execution
start = time.time()
serial_processed = [process_transaction(t) for t in transactions]
serial_proc_time = time.time() - start

print(f"\n📊 Performance Comparison:")
print(f"   Serial:    {serial_proc_time:.3f}s")
print(f"   Amorsize:  {amorsize_time:.3f}s")
print(f"   Speedup:   {serial_proc_time/amorsize_time:.2f}x")

---
## Part 7: Key Takeaways

### What We Learned

1. **Blind Parallelization is Risky**
   - Using `n_jobs=-1` and `chunksize=1` can make code slower
   - Overhead from process spawning, serialization, and communication

2. **Amorsize Provides Intelligence**
   - Analyzes function execution time and characteristics
   - Considers system resources (CPU, memory)
   - Calculates optimal parameters using Amdahl's Law

3. **Simple API**
   - One-line execution: `execute(func, data)`
   - Two-step workflow: `optimize()` then `Pool.map()`
   - Detailed diagnostics available

4. **Real Performance Gains**
   - Typical speedups: 5-8x for CPU-bound workloads
   - Avoids negative scaling scenarios
   - Works with various workload types

### Next Steps

- 📖 Read [Use Case Guides](https://github.com/CampbellTrevor/Amorsize/tree/main/docs) for your domain:
  - Web Services (Django, Flask, FastAPI)
  - Data Processing (Pandas, CSV, databases)
  - ML Pipelines (PyTorch, TensorFlow, feature engineering)

- 🔧 Explore advanced features:
  - Checkpoint/Resume for long-running jobs
  - Dead Letter Queue for handling failures
  - Circuit Breaker for cascade failure prevention
  - Monitoring hooks for production observability

- 📊 Try other notebooks:
  - `02_performance_analysis.ipynb` - Deep dive into bottleneck analysis
  - `03_parameter_tuning.ipynb` - Advanced parameter optimization
  - `04_monitoring.ipynb` - Real-time monitoring and metrics

---
## Appendix: Troubleshooting Common Issues

### Issue 1: Function Not Picklable

**Problem:** Lambda functions or nested functions can't be serialized

```python
# ❌ Won't work
result = execute(lambda x: x**2, data)

# ✅ Works
def square(x):
    return x**2
result = execute(square, data)
```

### Issue 2: No Speedup Benefit

**Problem:** Function is too fast, parallelization overhead dominates

```python
# Amorsize will recommend n_jobs=1 (serial)
result = optimize(lambda x: x**2, data, verbose=True)
# Output: "Parallelization not beneficial. Use serial execution."
```

### Issue 3: Memory Errors

**Problem:** Large return objects cause OOM

```python
# Use batch processing
from amorsize import process_in_batches

for batch_results in process_in_batches(func, data, max_memory_mb=1000):
    # Process batch results immediately
    save_to_disk(batch_results)
```

### Issue 4: Windows/macOS Slower Than Expected

**Problem:** `spawn` start method has higher overhead than `fork`

```python
# Amorsize automatically accounts for this
# It measures spawn cost and adjusts recommendations
result = optimize(func, data, verbose=True)
# Check: result.profile.spawn_cost
```

---
## 🎉 Congratulations!

You've completed the Getting Started tutorial!

You now know how to:
- ✅ Use Amorsize to automatically optimize multiprocessing parameters
- ✅ Avoid negative scaling where parallelism hurts performance
- ✅ Visualize and understand optimization decisions
- ✅ Apply Amorsize to real-world data processing scenarios

Happy optimizing! 🚀