# Lab 5: Hyperparameter Optimization with SageMaker

## Overview
Learn how to automatically find the best hyperparameters for medical image segmentation using SageMaker Automatic Model Tuning. This lab uses Bayesian optimization to efficiently search the hyperparameter space.

## Learning Objectives
- Understand hyperparameter tuning strategies
- Configure SageMaker HyperparameterTuner
- Define search spaces and objective metrics
- Analyze tuning job results
- Select and deploy the best model

## Prerequisites
- Completed Lab 1 (Single GPU Training)
- Understanding of hyperparameters
- Budget for multiple training jobs

**Estimated Time:** 90-120 minutes  
**Estimated Cost:** $15-25 (20 training jobs)

## Hyperparameter Tuning Strategies

### Grid Search
```
lr = [1e-3, 1e-4, 1e-5]
batch_size = [2, 4, 8]
Total jobs: 3 √ó 3 = 9
```
‚úì Exhaustive  
‚úó Expensive  
‚úó Doesn't scale  

### Random Search
```
lr ~ Uniform(1e-5, 1e-3)
batch_size ~ Choice([2, 4, 8])
Total jobs: User defined
```
‚úì Better coverage  
‚úì Scalable  
‚úó No learning  

### Bayesian Optimization (SageMaker)
```
Uses previous results to guide search
Balances exploration vs exploitation
```
‚úì Most efficient  
‚úì Learns from results  
‚úì Fewer jobs needed  

**This Lab Uses:** Bayesian Optimization

## Step 1: Setup Environment

In [None]:
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker.tuner import (
    HyperparameterTuner,
    ContinuousParameter,
    CategoricalParameter,
    IntegerParameter
)
from sagemaker import get_execution_role
import boto3

sagemaker_session = sagemaker.Session(boto3.Session(region_name='us-east-1'))
role = get_execution_role()
region = sagemaker_session.boto_region_name
bucket = sagemaker_session.default_bucket()

print(f"Region: {region}")
print(f"Bucket: {bucket}")

## Step 2: Configure Data Paths

In [None]:
bucket = 'your-s3-bucket-name'  # Replace with your S3 bucket name
data_path = f's3://{bucket}/segmentation_data/'
output_path = f's3://{bucket}/segmentation_data/output'

print(f"Training data: {data_path}")
print(f"Output path: {output_path}")

## Step 3: Define Base Estimator

Create the base training job configuration.

In [None]:
# Static hyperparameters (not tuned)
static_hyperparameters = {
    "model_name": "SegResNet",
    "epochs": 10,  # Shorter for faster tuning
    "val_interval": 2
}

estimator = PyTorch(
    entry_point="train_simple.py",
    source_dir="../code/training",
    role=role,
    instance_count=1,
    instance_type="ml.g5.xlarge",
    framework_version="2.1.0",
    py_version="py310",
    hyperparameters=static_hyperparameters,
    output_path=output_path,
    base_job_name="hpo-medical-seg",
    enable_sagemaker_metrics=True,  # Required for HPO
    metric_definitions=[
        {"Name": "val_dice", "Regex": "Validation Dice: ([0-9\\.]+)"},
        {"Name": "train_loss", "Regex": "Training loss: ([0-9\\.]+)"}
    ],
    sagemaker_session=sagemaker_session,
)

print("‚úì Base estimator configured")

## Step 4: Define Hyperparameter Search Space

Specify which hyperparameters to tune and their ranges.

In [None]:
hyperparameter_ranges = {
    # Learning rate: log scale between 1e-5 and 1e-3
    "lr": ContinuousParameter(1e-5, 1e-3, scaling_type="Logarithmic"),
    
    # Batch size: discrete choices
    "batch_size": CategoricalParameter([2, 4, 8]),
}

print("üìä Hyperparameter Search Space:")
print(f"  Learning Rate: 1e-5 to 1e-3 (log scale)")
print(f"  Batch Size: [2, 4, 8]")
print(f"\nTotal combinations: Continuous √ó 3 = Infinite")
print(f"Bayesian optimization will sample efficiently")

## Step 5: Configure Tuning Job

Set up the hyperparameter tuner with objective metric and strategy.

In [None]:
tuner = HyperparameterTuner(
    estimator=estimator,
    objective_metric_name="val_dice",
    objective_type="Maximize",  # Higher Dice score is better
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=estimator.metric_definitions,
    max_jobs=20,  # Total training jobs to run
    max_parallel_jobs=2,  # Run 2 jobs simultaneously
    strategy="Bayesian",  # Bayesian optimization
    early_stopping_type="Auto"  # Stop poor performers early
)

print("‚úì Hyperparameter Tuner configured")
print(f"  Objective: Maximize val_dice")
print(f"  Strategy: Bayesian Optimization")
print(f"  Max Jobs: 20")
print(f"  Parallel Jobs: 2")
print(f"  Early Stopping: Enabled")
print(f"\nEstimated Duration: 90-120 minutes")
print(f"Estimated Cost: ~$20 (20 jobs √ó $1/job)")

## Step 6: Launch Tuning Job

Start the hyperparameter optimization process.

In [None]:
tuning_job_name = f"medhpo-{sagemaker.utils.sagemaker_timestamp()}"

tuner.fit(
    inputs={"training": data_path},
    job_name=tuning_job_name,
    wait=False  # Don't block, we'll monitor separately
)

print(f"\nüöÄ Tuning job launched: {tuning_job_name}")
print(f"\nMonitor progress:")
print(f"  Console: https://console.aws.amazon.com/sagemaker/home?region={region}#/hyper-tuning-jobs/{tuning_job_name}")

## Step 7: Monitor Tuning Progress

Check the status and intermediate results.

In [None]:
import time
from IPython.display import clear_output

sm_client = boto3.client('sagemaker', region_name=region)

def get_tuning_status():
    response = sm_client.describe_hyper_parameter_tuning_job(
        HyperParameterTuningJobName=tuning_job_name
    )
    return response

# Monitor for 5 minutes
for i in range(10):
    clear_output(wait=True)
    
    status = get_tuning_status()
    
    print(f"‚è±Ô∏è  Tuning Job Status: {status['HyperParameterTuningJobStatus']}")
    print(f"\nüìä Job Counts:")
    print(f"  Completed: {status['TrainingJobStatusCounters']['Completed']}")
    print(f"  In Progress: {status['TrainingJobStatusCounters']['InProgress']}")
    print(f"  Stopped: {status['TrainingJobStatusCounters']['Stopped']}")
    print(f"  Failed: {status['TrainingJobStatusCounters'].get('NonRetryableError', 0)}")
    
    if 'BestTrainingJob' in status:
        best = status['BestTrainingJob']
        print(f"\nüèÜ Current Best:")
        print(f"  Job: {best['TrainingJobName']}")
        print(f"  Dice Score: {best['FinalHyperParameterTuningJobObjectiveMetric']['Value']:.4f}")
    
    if status['HyperParameterTuningJobStatus'] in ['Completed', 'Failed', 'Stopped']:
        break
    
    time.sleep(30)

print("\n‚úì Monitoring complete. Run next cell to wait for completion.")

## Step 8: Wait for Completion (Optional)

Block until all jobs finish.

In [None]:
# This will block until tuning completes
tuner.wait()

print("‚úì Hyperparameter tuning completed!")

## Step 9: Analyze Results

Retrieve and analyze all training jobs.

In [None]:
import pandas as pd

# Get tuning job analytics
tuning_analytics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)
df = tuning_analytics.dataframe()

# Sort by objective metric
df = df.sort_values('FinalObjectiveValue', ascending=False)

print("üìä Top 10 Training Jobs:")
print(df[['TrainingJobName', 'FinalObjectiveValue', 'lr', 'batch_size', 'TrainingJobStatus']].head(10).to_string(index=False))

# Best hyperparameters
best_job = df.iloc[0]
print(f"\nüèÜ Best Hyperparameters:")
print(f"  Dice Score: {best_job['FinalObjectiveValue']:.4f}")
print(f"  Learning Rate: {best_job['lr']:.6f}")
print(f"  Batch Size: {int(best_job['batch_size'])}")
print(f"  Job Name: {best_job['TrainingJobName']}")

## Step 10: Visualize Results

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Learning Rate vs Dice Score
completed = df[df['TrainingJobStatus'] == 'Completed']
axes[0].scatter(completed['lr'], completed['FinalObjectiveValue'], alpha=0.6, s=100)
axes[0].set_xscale('log')
axes[0].set_xlabel('Learning Rate', fontsize=12)
axes[0].set_ylabel('Dice Score', fontsize=12)
axes[0].set_title('Learning Rate vs Performance', fontsize=14)
axes[0].grid(True, alpha=0.3)

# Plot 2: Batch Size vs Dice Score
batch_sizes = completed['batch_size'].unique()
batch_means = [completed[completed['batch_size'] == bs]['FinalObjectiveValue'].mean() for bs in batch_sizes]
axes[1].bar(batch_sizes, batch_means, alpha=0.7, color='steelblue')
axes[1].set_xlabel('Batch Size', fontsize=12)
axes[1].set_ylabel('Average Dice Score', fontsize=12)
axes[1].set_title('Batch Size vs Performance', fontsize=14)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('hpo_results.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úì Visualization saved: hpo_results.png")

## Step 11: Convergence Analysis

In [None]:
# Plot best score over time
df_sorted = df.sort_values('TrainingStartTime')
df_sorted['BestSoFar'] = df_sorted['FinalObjectiveValue'].cummax()

plt.figure(figsize=(10, 6))
plt.plot(range(len(df_sorted)), df_sorted['BestSoFar'], marker='o', linewidth=2, markersize=6)
plt.xlabel('Training Job Number', fontsize=12)
plt.ylabel('Best Dice Score', fontsize=12)
plt.title('Hyperparameter Optimization Convergence', fontsize=14)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('hpo_convergence.png', dpi=150, bbox_inches='tight')
plt.show()

# Calculate improvement
initial_best = df_sorted['BestSoFar'].iloc[0]
final_best = df_sorted['BestSoFar'].iloc[-1]
improvement = ((final_best - initial_best) / initial_best) * 100

print(f"\nüìà Optimization Progress:")
print(f"  Initial Best: {initial_best:.4f}")
print(f"  Final Best: {final_best:.4f}")
print(f"  Improvement: {improvement:.2f}%")
print(f"\n‚úì Convergence plot saved: hpo_convergence.png")

## Step 12: Download Best Model

In [None]:
# Get best training job details
best_training_job = sm_client.describe_training_job(
    TrainingJobName=best_job['TrainingJobName']
)

model_artifacts = best_training_job['ModelArtifacts']['S3ModelArtifacts']

print(f"Best Model Artifacts: {model_artifacts}")

# Download
!aws s3 cp {model_artifacts} ./best_hpo_model.tar.gz
!tar -xzf best_hpo_model.tar.gz

print("\n‚úì Best model downloaded")
!ls -lh *.pth

## Step 13: Cost Analysis

In [None]:
# Calculate total cost
total_time = df['TrainingElapsedTimeSeconds'].sum() / 3600  # hours
cost_per_hour = 1.41  # ml.g5.xlarge
total_cost = total_time * cost_per_hour

completed_jobs = len(df[df['TrainingJobStatus'] == 'Completed'])
stopped_jobs = len(df[df['TrainingJobStatus'] == 'Stopped'])

print("üí∞ Cost Analysis:")
print(f"  Total Training Time: {total_time:.2f} hours")
print(f"  Instance Cost: ${cost_per_hour}/hour")
print(f"  Total Cost: ${total_cost:.2f}")
print(f"\nüìä Job Statistics:")
print(f"  Completed: {completed_jobs}")
print(f"  Early Stopped: {stopped_jobs}")
print(f"  Savings from Early Stopping: ~${stopped_jobs * 0.5:.2f}")
print(f"\nüí° Cost per 1% improvement: ${total_cost / improvement:.2f}")

## Understanding Bayesian Optimization

### How It Works

1. **Initial Exploration** (Jobs 1-5)
   - Random sampling across search space
   - Build initial surrogate model

2. **Exploitation** (Jobs 6-15)
   - Sample near best-performing regions
   - Refine hyperparameter estimates

3. **Final Refinement** (Jobs 16-20)
   - Fine-tune around optimal values
   - Confirm best configuration

### Acquisition Functions

SageMaker uses **Expected Improvement (EI)**:
```
EI(x) = E[max(f(x) - f(x_best), 0)]
```

Balances:
- **Exploitation**: Sample where we expect high performance
- **Exploration**: Sample where we're uncertain

### Early Stopping

Automatically stops jobs that are unlikely to beat current best:
- Saves ~30-40% of training time
- Based on learning curves
- No manual intervention needed

## Advanced Tuning Strategies

### 1. Multi-Objective Optimization
```python
# Optimize for both accuracy and speed
objective_metric_name="val_dice"
# Add secondary metric in analysis
```

### 2. Warm Start Tuning
```python
# Continue from previous tuning job
tuner = HyperparameterTuner(
    ...,
    warm_start_config=WarmStartConfig(
        warm_start_type=WarmStartTypes.IDENTICAL_DATA_AND_ALGORITHM,
        parents=[previous_tuning_job_name]
    )
)
```

### 3. Transfer Learning
```python
# Use results from similar task
warm_start_type=WarmStartTypes.TRANSFER_LEARNING
```

### 4. Expanded Search Space
```python
hyperparameter_ranges = {
    "lr": ContinuousParameter(1e-5, 1e-3),
    "batch_size": CategoricalParameter([2, 4, 8]),
    "model_name": CategoricalParameter(["SegResNet", "UNet"]),
    "weight_decay": ContinuousParameter(1e-6, 1e-3),
}
```

## Key Takeaways

‚úì **Bayesian Optimization Benefits:**
- 3-5x more efficient than random search
- Learns from previous results
- Automatic early stopping saves 30-40% cost
- Typically finds near-optimal in 20-30 jobs

‚úì **Best Practices:**
- Start with 2-3 hyperparameters
- Use log scale for learning rates
- Run 20-30 jobs minimum
- Enable early stopping
- Monitor convergence plots

‚úì **When to Use HPO:**
- New dataset or domain
- Production model optimization
- When 1-2% improvement matters
- Budget allows multiple training runs

‚úì **When NOT to Use HPO:**
- Prototyping phase
- Well-known hyperparameters
- Limited budget
- Quick experiments

## Typical Results

| Metric | Manual Tuning | HPO (20 jobs) |
|--------|---------------|---------------|
| Best Dice | 0.82 | 0.85 |
| Time Spent | 2-3 days | 2 hours |
| Cost | $10-15 | $20 |
| Reproducibility | Low | High |

## Next Steps

- Deploy best model to SageMaker Endpoint
- Run warm-start tuning with expanded search space
- Try multi-objective optimization
- Automate HPO in ML pipeline
- Document optimal hyperparameters for team

## Additional Resources

- [SageMaker HPO Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html)
- [Bayesian Optimization Paper](https://arxiv.org/abs/1807.02811)
- [HPO Best Practices](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-considerations.html)