# Module 09: Hyperparameter Tuning for Deep Learning

**Difficulty**: ⭐⭐⭐ (Advanced)

**Estimated Time**: 60-75 minutes

**Prerequisites**: 
- [Module 05: Feed-Forward Neural Networks with Keras](05_feedforward_neural_networks_keras.ipynb)
- [Module 06: Optimizers](06_optimizers_sgd_adam_rmsprop.ipynb)
- [Module 07: Regularization Techniques](07_regularization_techniques.ipynb)
- [Module 08: Loss Functions and Metrics](08_loss_functions_and_metrics.ipynb)

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand the importance of different hyperparameters and their impact on model performance
2. Implement grid search and random search for hyperparameter optimization
3. Use Keras Tuner for automated hyperparameter tuning
4. Apply learning rate finder techniques to identify optimal learning rates
5. Design effective hyperparameter search strategies for neural networks
6. Track and compare experiments systematically

## 1. Setup and Imports

We'll import all necessary libraries for hyperparameter tuning experiments.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Deep learning libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import fashion_mnist

# Scikit-learn utilities
from sklearn.model_selection import ParameterGrid
from sklearn.preprocessing import StandardScaler

# For reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Plotting configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

## 2. Load and Prepare Data

We'll use Fashion-MNIST for our experiments. This dataset is large enough to show performance differences but small enough for quick iterations.

In [None]:
# Load Fashion-MNIST dataset
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

# Normalize pixel values to [0, 1] range
X_train_full = X_train_full.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Flatten images for fully connected network
X_train_full = X_train_full.reshape(-1, 28 * 28)
X_test = X_test.reshape(-1, 28 * 28)

# Create validation split (20% of training data)
validation_split = int(0.8 * len(X_train_full))
X_train = X_train_full[:validation_split]
y_train = y_train_full[:validation_split]
X_val = X_train_full[validation_split:]
y_val = y_train_full[validation_split:]

print(f"Training set shape: {X_train.shape}")
print(f"Validation set shape: {X_val.shape}")
print(f"Test set shape: {X_test.shape}")
print(f"Number of classes: {len(np.unique(y_train))}")

## 3. Hyperparameter Importance Ranking

Not all hyperparameters are equally important. Based on empirical research and practice, here's a typical importance ranking:

### Critical Hyperparameters (High Impact)
1. **Learning Rate**: Most important single hyperparameter
2. **Network Architecture**: Number of layers and units per layer
3. **Batch Size**: Affects training speed and generalization

### Important Hyperparameters (Medium Impact)
4. **Optimizer Type**: Adam vs SGD vs RMSprop
5. **Regularization Strength**: L2 penalty, dropout rate
6. **Activation Functions**: ReLU vs LeakyReLU vs others

### Secondary Hyperparameters (Lower Impact)
7. **Learning Rate Schedule**: Step decay, exponential decay
8. **Batch Normalization**: Position and parameters
9. **Initialization Method**: He vs Xavier initialization

**Best Practice**: Start tuning from top to bottom. Don't waste time on minor hyperparameters if you haven't optimized the critical ones!

## 4. Manual Search Strategy

Before using automated tools, let's understand manual hyperparameter search.

In [None]:
def create_model(n_hidden=1, n_neurons=30, learning_rate=0.001, dropout_rate=0.0):
    """
    Create a configurable neural network model.
    
    Args:
        n_hidden: Number of hidden layers
        n_neurons: Number of neurons per hidden layer
        learning_rate: Learning rate for optimizer
        dropout_rate: Dropout rate for regularization
    
    Returns:
        Compiled Keras model
    """
    model = keras.Sequential()
    model.add(layers.InputLayer(input_shape=(784,)))
    
    # Add hidden layers
    for _ in range(n_hidden):
        model.add(layers.Dense(n_neurons, activation='relu'))
        if dropout_rate > 0:
            model.add(layers.Dropout(dropout_rate))
    
    # Output layer
    model.add(layers.Dense(10, activation='softmax'))
    
    # Compile model
    optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
    model.compile(
        optimizer=optimizer,
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# Test the model creation function
test_model = create_model(n_hidden=2, n_neurons=64, learning_rate=0.001)
print("Model created successfully!")
test_model.summary()

## 5. Grid Search

Grid search tests all possible combinations of hyperparameters. It's exhaustive but can be slow.

**When to use**: Small search spaces, critical hyperparameters only.

In [None]:
# Define hyperparameter grid (keep it small for speed)
param_grid = {
    'n_hidden': [1, 2],
    'n_neurons': [32, 64],
    'learning_rate': [0.01, 0.001]
}

# Generate all combinations
grid = ParameterGrid(param_grid)
print(f"Total combinations to test: {len(grid)}")

# Store results
grid_search_results = []

# Test each combination
for idx, params in enumerate(grid):
    print(f"\nTesting combination {idx + 1}/{len(grid)}: {params}")
    
    # Create and train model
    model = create_model(**params)
    
    history = model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        epochs=5,  # Short training for demo
        batch_size=128,
        verbose=0
    )
    
    # Record best validation accuracy
    best_val_acc = max(history.history['val_accuracy'])
    
    result = params.copy()
    result['val_accuracy'] = best_val_acc
    grid_search_results.append(result)
    
    print(f"  Best validation accuracy: {best_val_acc:.4f}")

# Convert to DataFrame for analysis
grid_results_df = pd.DataFrame(grid_search_results)
grid_results_df = grid_results_df.sort_values('val_accuracy', ascending=False)

print("\n" + "="*60)
print("Grid Search Results (sorted by validation accuracy):")
print("="*60)
print(grid_results_df.to_string(index=False))

## 6. Random Search

Random search samples random combinations from hyperparameter distributions. Research shows it often outperforms grid search!

**Why Random Search Works Better**:
- Explores more diverse hyperparameter values
- More efficient when some hyperparameters don't matter much
- Can set a time budget instead of exhaustive search

In [None]:
# Define hyperparameter distributions
n_iterations = 10  # Number of random combinations to try

random_search_results = []

for idx in range(n_iterations):
    # Sample random hyperparameters
    params = {
        'n_hidden': np.random.choice([1, 2, 3]),
        'n_neurons': np.random.choice([32, 64, 128]),
        'learning_rate': 10 ** np.random.uniform(-4, -2),  # Log-uniform sampling
        'dropout_rate': np.random.uniform(0.0, 0.5)
    }
    
    print(f"\nIteration {idx + 1}/{n_iterations}:")
    print(f"  Parameters: {params}")
    
    # Create and train model
    model = create_model(**params)
    
    history = model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        epochs=5,
        batch_size=128,
        verbose=0
    )
    
    best_val_acc = max(history.history['val_accuracy'])
    
    result = params.copy()
    result['val_accuracy'] = best_val_acc
    random_search_results.append(result)
    
    print(f"  Validation accuracy: {best_val_acc:.4f}")

# Convert to DataFrame
random_results_df = pd.DataFrame(random_search_results)
random_results_df = random_results_df.sort_values('val_accuracy', ascending=False)

print("\n" + "="*60)
print("Random Search Results (top 5):")
print("="*60)
print(random_results_df.head().to_string(index=False))

## 7. Keras Tuner for Automated Hyperparameter Tuning

Keras Tuner provides several advanced search algorithms including Bayesian Optimization.

In [None]:
# Install Keras Tuner if needed (uncomment if not installed)
# !pip install keras-tuner -q

try:
    import keras_tuner as kt
    print(f"Keras Tuner version: {kt.__version__}")
except ImportError:
    print("Keras Tuner not installed. Skipping this section.")
    print("To install: pip install keras-tuner")

In [None]:
# Define model builder for Keras Tuner
def build_tuner_model(hp):
    """
    Model builder with hyperparameter search space.
    
    Args:
        hp: HyperParameters object from Keras Tuner
    
    Returns:
        Compiled Keras model
    """
    model = keras.Sequential()
    model.add(layers.InputLayer(input_shape=(784,)))
    
    # Tune number of hidden layers (1-3)
    for i in range(hp.Int('n_hidden', 1, 3)):
        # Tune number of neurons per layer (32-128, step of 32)
        model.add(layers.Dense(
            units=hp.Int(f'units_{i}', min_value=32, max_value=128, step=32),
            activation='relu'
        ))
        
        # Tune dropout rate (0.0-0.5)
        model.add(layers.Dropout(
            rate=hp.Float('dropout', min_value=0.0, max_value=0.5, step=0.1)
        ))
    
    model.add(layers.Dense(10, activation='softmax'))
    
    # Tune learning rate (log scale)
    learning_rate = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='log')
    
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

print("Model builder function defined.")

In [None]:
# Note: This cell demonstrates Keras Tuner usage but is commented out
# to avoid long training times. Uncomment to run actual hyperparameter search.

# Initialize Bayesian Optimization tuner
# tuner = kt.BayesianOptimization(
#     build_tuner_model,
#     objective='val_accuracy',
#     max_trials=10,  # Number of configurations to try
#     executions_per_trial=1,  # Train each config once
#     directory='tuner_results',
#     project_name='fashion_mnist_tuning'
# )

# Print search space summary
# tuner.search_space_summary()

# Run the search
# tuner.search(
#     X_train, y_train,
#     validation_data=(X_val, y_val),
#     epochs=10,
#     batch_size=128,
#     callbacks=[keras.callbacks.EarlyStopping(patience=3)]
# )

# Get best hyperparameters
# best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
# print("Best hyperparameters:")
# print(f"  Hidden layers: {best_hps.get('n_hidden')}")
# print(f"  Neurons (layer 0): {best_hps.get('units_0')}")
# print(f"  Dropout rate: {best_hps.get('dropout')}")
# print(f"  Learning rate: {best_hps.get('learning_rate')}")

print("Keras Tuner example code defined (commented out for speed).")
print("Uncomment to run Bayesian Optimization search.")

## 8. Learning Rate Finder

The Learning Rate Finder technique helps identify good learning rate ranges by training with exponentially increasing learning rates.

**How it works**:
1. Start with very small learning rate (e.g., 1e-7)
2. Train for a few batches, exponentially increasing LR
3. Plot loss vs learning rate
4. Choose LR where loss decreases fastest (steepest slope)

In [None]:
class LearningRateFinder(keras.callbacks.Callback):
    """
    Callback to find optimal learning rate range.
    Exponentially increases learning rate and records loss.
    """
    
    def __init__(self, min_lr=1e-7, max_lr=10, steps=100):
        super().__init__()
        self.min_lr = min_lr
        self.max_lr = max_lr
        self.steps = steps
        self.learning_rates = []
        self.losses = []
        self.best_loss = float('inf')
        
    def on_train_begin(self, logs=None):
        # Calculate multiplication factor
        self.lr_mult = (self.max_lr / self.min_lr) ** (1 / self.steps)
        self.current_lr = self.min_lr
        keras.backend.set_value(self.model.optimizer.lr, self.current_lr)
        
    def on_batch_end(self, batch, logs=None):
        # Record current learning rate and loss
        loss = logs.get('loss')
        self.learning_rates.append(self.current_lr)
        self.losses.append(loss)
        
        # Stop if loss is exploding
        if loss > 4 * self.best_loss:
            self.model.stop_training = True
            return
        
        # Update best loss
        if loss < self.best_loss:
            self.best_loss = loss
        
        # Increase learning rate
        self.current_lr *= self.lr_mult
        keras.backend.set_value(self.model.optimizer.lr, self.current_lr)

# Create model for LR finding
lr_model = create_model(n_hidden=2, n_neurons=64, learning_rate=1e-7)

# Create LR finder callback
lr_finder = LearningRateFinder(min_lr=1e-7, max_lr=1, steps=100)

# Run LR finder
print("Running Learning Rate Finder...")
lr_model.fit(
    X_train[:10000], y_train[:10000],  # Use subset for speed
    batch_size=128,
    epochs=1,
    callbacks=[lr_finder],
    verbose=0
)

print(f"Tested {len(lr_finder.learning_rates)} learning rates.")

In [None]:
# Plot learning rate vs loss
plt.figure(figsize=(10, 6))
plt.plot(lr_finder.learning_rates, lr_finder.losses)
plt.xscale('log')
plt.xlabel('Learning Rate (log scale)', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Learning Rate Finder', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)

# Find learning rate with steepest descent
# (largest negative gradient)
gradients = np.gradient(lr_finder.losses)
steepest_idx = np.argmin(gradients)
optimal_lr = lr_finder.learning_rates[steepest_idx]

plt.axvline(optimal_lr, color='red', linestyle='--', linewidth=2, 
            label=f'Suggested LR: {optimal_lr:.2e}')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()

print(f"\nSuggested optimal learning rate: {optimal_lr:.2e}")
print(f"Recommended range: {optimal_lr/10:.2e} to {optimal_lr:.2e}")

## 9. Experiment Tracking Best Practices

When tuning hyperparameters, systematic experiment tracking is crucial.

In [None]:
class ExperimentTracker:
    """
    Simple experiment tracker for hyperparameter tuning.
    Records hyperparameters, metrics, and training history.
    """
    
    def __init__(self):
        self.experiments = []
    
    def log_experiment(self, name, hyperparameters, metrics, notes=""):
        """
        Log an experiment with its results.
        
        Args:
            name: Experiment name/ID
            hyperparameters: Dict of hyperparameters
            metrics: Dict of evaluation metrics
            notes: Optional notes about the experiment
        """
        experiment = {
            'name': name,
            'hyperparameters': hyperparameters,
            'metrics': metrics,
            'notes': notes,
            'timestamp': pd.Timestamp.now()
        }
        self.experiments.append(experiment)
    
    def get_best_experiment(self, metric='val_accuracy', mode='max'):
        """
        Get the best experiment based on a metric.
        """
        if not self.experiments:
            return None
        
        if mode == 'max':
            best = max(self.experiments, key=lambda x: x['metrics'].get(metric, -float('inf')))
        else:
            best = min(self.experiments, key=lambda x: x['metrics'].get(metric, float('inf')))
        
        return best
    
    def to_dataframe(self):
        """
        Convert experiments to DataFrame for easy analysis.
        """
        if not self.experiments:
            return pd.DataFrame()
        
        rows = []
        for exp in self.experiments:
            row = {'name': exp['name']}
            row.update(exp['hyperparameters'])
            row.update(exp['metrics'])
            rows.append(row)
        
        return pd.DataFrame(rows)

# Example usage
tracker = ExperimentTracker()

# Log some experiments
tracker.log_experiment(
    name='exp_001',
    hyperparameters={'n_hidden': 2, 'n_neurons': 64, 'learning_rate': 0.001},
    metrics={'val_accuracy': 0.87, 'val_loss': 0.42}
)

tracker.log_experiment(
    name='exp_002',
    hyperparameters={'n_hidden': 3, 'n_neurons': 128, 'learning_rate': 0.0001},
    metrics={'val_accuracy': 0.89, 'val_loss': 0.38}
)

# View all experiments
print("All Experiments:")
print(tracker.to_dataframe())

# Get best experiment
best = tracker.get_best_experiment()
print(f"\nBest Experiment: {best['name']}")
print(f"Hyperparameters: {best['hyperparameters']}")
print(f"Metrics: {best['metrics']}")

## 10. Hyperparameter Tuning Strategy Guide

### Recommended Tuning Process:

**Step 1: Find Good Learning Rate (Most Critical)**
- Use Learning Rate Finder
- Test suggested LR with simple model
- Typical range: 1e-4 to 1e-2

**Step 2: Tune Architecture**
- Start simple: 1-2 hidden layers
- Gradually increase if underfitting
- Use random search for layer count and neuron count

**Step 3: Tune Batch Size**
- Larger batches (128-512): More stable, faster on GPU
- Smaller batches (16-64): Better generalization
- Balance based on dataset size and hardware

**Step 4: Add Regularization**
- Only if overfitting occurs
- Try dropout: 0.2-0.5
- Try L2: 0.001-0.01

**Step 5: Fine-tune Optimizer**
- Adam works well for most cases
- SGD+momentum for very large datasets
- RMSprop for RNNs

### Common Pitfalls to Avoid:

1. **Tuning too many hyperparameters at once**
   - Solution: Tune one at a time or in small groups

2. **Not using validation set properly**
   - Solution: Keep test set completely separate until final evaluation

3. **Training for too few epochs during search**
   - Solution: Use early stopping, ensure convergence

4. **Ignoring training time**
   - Solution: Consider efficiency vs performance trade-off

5. **Overfitting to validation set**
   - Solution: Use cross-validation or hold out final test set

## 11. Exercise 1: Compare Grid vs Random Search

**Task**: Implement and compare grid search and random search with the same computational budget.

**Requirements**:
1. Define a hyperparameter space with at least 3 hyperparameters
2. Run grid search with 8 combinations
3. Run random search with 8 iterations
4. Compare the best results from each method
5. Visualize the results

In [None]:
# YOUR CODE HERE
# Hint: Use the techniques from sections 5 and 6
# Store results in dictionaries and compare

pass  # Replace with your implementation

## 12. Exercise 2: Learning Rate Schedule Comparison

**Task**: Compare different learning rate schedules and their impact on training.

**Requirements**:
1. Implement constant, step decay, and exponential decay schedules
2. Train the same model with each schedule
3. Plot training curves for comparison
4. Analyze which schedule works best and why

In [None]:
# YOUR CODE HERE
# Hint: Use keras.optimizers.schedules or callbacks.LearningRateScheduler
# Train for at least 15 epochs to see differences

pass  # Replace with your implementation

## 13. Exercise 3: Build an Experiment Tracker Dashboard

**Task**: Extend the ExperimentTracker class to create a visualization dashboard.

**Requirements**:
1. Add a method to plot experiment comparison (bar charts or scatter plots)
2. Add a method to show hyperparameter importance (which HP varies most in top experiments?)
3. Add a method to save/load experiments to/from CSV
4. Run 10+ experiments and use your dashboard to find the best hyperparameters

In [None]:
# YOUR CODE HERE
# Hint: Extend the ExperimentTracker class from section 9
# Use matplotlib/seaborn for visualizations
# Consider using pandas for data manipulation

pass  # Replace with your implementation

## 14. Summary

### Key Concepts Covered:

1. **Hyperparameter Importance Ranking**
   - Learning rate is the most critical hyperparameter
   - Focus on high-impact hyperparameters first

2. **Search Strategies**
   - Grid search: Exhaustive but slow
   - Random search: Often more efficient
   - Bayesian optimization: Most sophisticated

3. **Learning Rate Finder**
   - Quickly identifies good LR ranges
   - Exponentially increases LR during short training run

4. **Keras Tuner**
   - Automates hyperparameter search
   - Supports multiple search algorithms

5. **Experiment Tracking**
   - Systematic logging is essential
   - Track hyperparameters, metrics, and notes

### Best Practices:

- Start with learning rate tuning
- Use random search for initial exploration
- Apply Bayesian optimization for refinement
- Always use a separate validation set
- Track all experiments systematically
- Consider training time in your budget

### What's Next?

- [Module 10: Transfer Learning Concepts](10_transfer_learning_concepts.ipynb)
- Advanced topics: Neural Architecture Search (NAS), Multi-objective optimization

### Additional Resources:

1. "Random Search for Hyper-Parameter Optimization" (Bergstra & Bengio, 2012)
2. Keras Tuner documentation: https://keras.io/keras_tuner/
3. "Cyclical Learning Rates for Training Neural Networks" (Smith, 2017)
4. Fast.ai's learning rate finder approach