<div style="text-align:center;font-size:22pt; font-weight:bold;color:white;border:solid black 1.5pt;background-color:#1e7263;">
    TensorBoard Callback Overview
</div>

In [1]:
# ======================================================================= #
# Course: Deep Learning Complete Course (CS-501)
# Author: Dr. Saad Laouadi
# Institution: Quant Coding Versity Academy
#
# ==========================================================
# Lesson: Understanding tensorboard callback
#         Synthetic Data Example
# ==========================================================
# ## Learning Objectives
# This example will enable you to:
# 1. Understand the tensorboard callback
# 2. Setup the environment for using tensorboard
# =======================================================================
#          Copyright © Dr. Saad Laouadi 2025
# =======================================================================

In [None]:
# ============================================================================ #
#                         Environment Path Configuration                       #
# ============================================================================ #
#
# Purpose:
#   Configure the system PATH to use Python executables from the active virtual 
#   environment instead of global installations.
#
# Usage:
#   1. First verify if configuration is needed by running: !which python
#   2. If the output shows the global Python installation rather than your 
#      virtual environment, execute this configuration block
#
# Note:
#   This configuration is typically only needed for JupyterLab Desktop or 
#   similar standalone installations. Web-based JupyterLab or properly 
#   configured environments should not require this adjustment.
# ============================================================================ #

import os
import sys

env_path = os.path.dirname(sys.executable)
os.environ['PATH'] = f"{env_path}:{os.environ['PATH']}"

In [1]:
# ==================================================== #
#        Load Required Libraries
# ==================================================== #

import shutil
from datetime import datetime
import io


# Disable Metal API Validation
os.environ["METAL_DEVICE_WRAPPER_TYPE"] = "0"  

# import tensorflow
import tensorflow as tf

print("="*72)

%reload_ext watermark
%watermark -a "Dr. Saad Laouadi" -u -d -m

print("="*72)
print("Imported Packages and Their Versions:")
print("="*72)

%watermark -iv
print("="*72)

# Global Config
RANDOM_STATE = 101

Author: Dr. Saad Laouadi

Last updated: 2024-12-31

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 24.1.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Imported Packages and Their Versions:
sklearn   : 1.5.1
keras     : 3.6.0
matplotlib: 3.9.2
tensorflow: 2.16.2
seaborn   : 0.13.2
numpy     : 1.26.4
pandas    : 2.2.2



# Hyperparameter Tuning in Deep Learning: A Comprehensive Guide

## Introduction

Hyperparameter tuning is crucial for optimizing deep learning model performance. This guide covers various approaches, from basic grid search to advanced automated methods.

## Key Hyperparameters

### 1. Model Architecture
- Number of layers
- Units per layer
- Activation functions
- Layer types (Dense, CNN, RNN, etc.)

### 2. Training Parameters
- Learning rate
- Batch size
- Number of epochs
- Optimizer choice

### 3. Regularization Parameters
- Dropout rate
- L1/L2 regularization
- Early stopping patience
- Learning rate decay

## Basic Implementation

### 1. Grid Search

```python
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

def create_model(learning_rate=0.001, hidden_units=[64, 32], dropout_rate=0.2):
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(hidden_units[0], activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(hidden_units[1], activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    return model

# Define parameter grid
param_grid = {
    'learning_rate': [0.001, 0.01, 0.1],
    'hidden_units': [[32, 16], [64, 32], [128, 64]],
    'dropout_rate': [0.1, 0.2, 0.3],
    'batch_size': [32, 64, 128],
    'epochs': [50, 100]
}

# Create model wrapper
model = KerasClassifier(build_fn=create_model, verbose=0)

# Perform grid search
grid_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=3,
    n_jobs=-1,
    verbose=2
)

grid_result = grid_search.fit(X_train, y_train)
```

### 2. Random Search

```python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint

# Define parameter distributions
param_dist = {
    'learning_rate': uniform(0.0001, 0.1),
    'hidden_units': [[32, 16], [64, 32], [128, 64], [256, 128]],
    'dropout_rate': uniform(0.1, 0.4),
    'batch_size': randint(16, 256),
    'epochs': randint(50, 200)
}

# Perform random search
random_search = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_dist,
    n_iter=20,
    cv=3,
    n_jobs=-1,
    verbose=2
)

random_result = random_search.fit(X_train, y_train)
```

## Advanced Implementation

### 1. Bayesian Optimization with Optuna

```python
import optuna

def objective(trial):
    # Define hyperparameters to optimize
    learning_rate = trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True)
    n_layers = trial.suggest_int('n_layers', 1, 4)
    hidden_units = []
    for i in range(n_layers):
        hidden_units.append(
            trial.suggest_int(f'hidden_units_l{i}', 16, 256, log=True)
        )
    dropout_rate = trial.suggest_float('dropout_rate', 0.1, 0.5)
    batch_size = trial.suggest_int('batch_size', 16, 256, log=True)
    
    # Build model
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Input(shape=(input_dim,)))
    
    for units in hidden_units:
        model.add(tf.keras.layers.Dense(units, activation='relu'))
        model.add(tf.keras.layers.Dropout(dropout_rate))
    
    model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    
    # Train model
    history = model.fit(
        X_train, y_train,
        batch_size=batch_size,
        epochs=100,
        validation_split=0.2,
        callbacks=[tf.keras.callbacks.EarlyStopping(patience=10)],
        verbose=0
    )
    
    return history.history['val_accuracy'][-1]

# Create study
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
```

### 2. Population Based Training (PBT)

```python
class PBTOptimizer:
    def __init__(self, population_size=10, exploit_threshold=0.2):
        self.population_size = population_size
        self.exploit_threshold = exploit_threshold
        self.population = []
        
    def initialize_population(self):
        for _ in range(self.population_size):
            hyperparams = {
                'learning_rate': np.random.uniform(1e-4, 1e-2),
                'hidden_units': [
                    np.random.randint(32, 256),
                    np.random.randint(16, 128)
                ],
                'dropout_rate': np.random.uniform(0.1, 0.5)
            }
            self.population.append({
                'hyperparams': hyperparams,
                'model': self.create_model(hyperparams),
                'score': 0
            })
    
    def exploit_and_explore(self):
        # Sort population by score
        self.population.sort(key=lambda x: x['score'], reverse=True)
        
        # Replace bottom performers
        cutoff = int(self.population_size * self.exploit_threshold)
        for i in range(cutoff, self.population_size):
            # Copy hyperparameters from top performer
            donor_idx = np.random.randint(0, cutoff)
            new_hyperparams = self.population[donor_idx]['hyperparams'].copy()
            
            # Perturb hyperparameters
            new_hyperparams['learning_rate'] *= np.random.uniform(0.8, 1.2)
            new_hyperparams['dropout_rate'] = np.clip(
                new_hyperparams['dropout_rate'] * np.random.uniform(0.8, 1.2),
                0.1, 0.5
            )
            
            self.population[i] = {
                'hyperparams': new_hyperparams,
                'model': self.create_model(new_hyperparams),
                'score': 0
            }
```

### 3. Hyperband Implementation

```python
class Hyperband:
    def __init__(self, max_iter=81, eta=3):
        self.max_iter = max_iter
        self.eta = eta
        self.s_max = int(np.log(max_iter) / np.log(eta))
        self.B = (self.s_max + 1) * max_iter
        
    def run_optimization(self, get_params_function, train_function):
        for s in reversed(range(self.s_max + 1)):
            n = int(np.ceil(int(self.B / self.max_iter / (s + 1)) * self.eta ** s))
            r = self.max_iter * self.eta ** (-s)
            
            # Generate configurations
            configs = [get_params_function() for _ in range(n)]
            
            for i in range(s + 1):
                n_i = n * self.eta ** (-i)
                r_i = r * self.eta ** i
                
                # Run each configuration for r_i iterations
                val_losses = []
                for config in configs:
                    loss = train_function(config, r_i)
                    val_losses.append(loss)
                
                # Keep top 1/eta configurations
                indices = np.argsort(val_losses)
                n_survivors = int(n_i / self.eta)
                configs = [configs[i] for i in indices[:n_survivors]]
```

## Advanced Techniques

### 1. Automated Hyperparameter Tuning Pipeline

```python
class AutoHyperTuner:
    def __init__(self, strategy='optuna', n_trials=100):
        self.strategy = strategy
        self.n_trials = n_trials
        self.best_params = None
        self.best_score = None
        
    def optimize(self, X_train, y_train, X_val, y_val):
        if self.strategy == 'optuna':
            study = optuna.create_study(direction='maximize')
            study.optimize(
                lambda trial: self._objective(trial, X_train, y_train, X_val, y_val),
                n_trials=self.n_trials
            )
            self.best_params = study.best_params
            self.best_score = study.best_value
            
        elif self.strategy == 'pbt':
            pbt = PBTOptimizer(population_size=10)
            self.best_params, self.best_score = pbt.optimize(
                X_train, y_train, X_val, y_val
            )
    
    def _objective(self, trial, X_train, y_train, X_val, y_val):
        params = self._get_trial_params(trial)
        model = self._build_model(params)
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=params['epochs'],
            batch_size=params['batch_size'],
            verbose=0
        )
        return history.history['val_accuracy'][-1]
```

### 2. Multi-Objective Optimization

```python
def multi_objective_optimization(trial):
    params = {
        'learning_rate': trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True),
        'n_layers': trial.suggest_int('n_layers', 1, 4),
        'batch_size': trial.suggest_int('batch_size', 16, 256, log=True)
    }
    
    model = build_model(params)
    history = model.fit(
        X_train, y_train,
        validation_split=0.2,
        epochs=100,
        verbose=0
    )
    
    # Return multiple objectives
    return [
        history.history['val_accuracy'][-1],  # Maximize accuracy
        -history.history['val_loss'][-1],     # Minimize loss
        -get_model_size(model)                # Minimize model size
    ]

study = optuna.create_study(directions=['maximize', 'maximize', 'maximize'])
study.optimize(multi_objective_optimization, n_trials=100)
```

## Best Practices

### 1. Parameter Space Definition

```python
def define_parameter_space():
    return {
        'continuous': {
            'learning_rate': (1e-5, 1e-1, 'log'),
            'dropout_rate': (0.1, 0.5, 'uniform'),
            'weight_decay': (1e-6, 1e-3, 'log')
        },
        'discrete': {
            'batch_size': ([16, 32, 64, 128, 256], 'choice'),
            'hidden_units': ([32, 64, 128, 256, 512], 'choice')
        },
        'categorical': {
            'optimizer': ['adam', 'sgd', 'rmsprop'],
            'activation': ['relu', 'elu', 'selu']
        }
    }
```

### 2. Cross-Validation Strategy

```python
def cross_validated_tuning(model_fn, param_grid, X, y, n_splits=5):
    kf = KFold(n_splits=n_splits, shuffle=True)
    results = []
    
    for fold, (train_idx, val_idx) in enumerate(kf.split(X)):
        X_train_fold = X[train_idx]
        y_train_fold = y[train_idx]
        X_val_fold = X[val_idx]
        y_val_fold = y[val_idx]
        
        for params in ParameterGrid(param_grid):
            model = model_fn(**params)
            history = model.fit(
                X_train_fold, y_train_fold,
                validation_data=(X_val_fold, y_val_fold),
                verbose=0
            )
            results.append({
                'fold': fold,
                'params': params,
                'score': history.history['val_accuracy'][-1]
            })
    
    return pd.DataFrame(results)
```

## Resource Management

### 1. Memory Efficient Tuning

```python
class MemoryEfficientTuner:
    def __init__(self):
        self.current_model = None
        
    def evaluate_params(self, params):
        # Clear previous model
        if self.current_model is not None:
            del self.current_model
            tf.keras.backend.clear_session()
            gc.collect()
        
        # Build and evaluate new model
        self.current_model = self.build_model(params)
        return self.train_and_evaluate()
```

### 2. Distributed Hyperparameter Tuning

```python
class DistributedTuner:
    def __init__(self, n_workers=4):
        self.n_workers = n_workers
        
    def parallel_optimize(self, param_space):
        with concurrent.futures.ProcessPoolExecutor(max_workers=self.n_workers) as executor:
            futures = []
            for params in self.generate_params(param_space):
                futures.append(
                    executor.submit(self.evaluate_single_config, params)
                )
            
            results = []
            for future in concurrent.futures.as_completed(futures):
                results.append(future.result())
                
        return self.aggregate_results(results)
```

## Conclusion

Effective hyperparameter tuning is crucial for deep learning success. Key points:

1. Choose appropriate search strategy based on computational resources
2. Define meaningful parameter spaces
3. Use cross-validation for robust evaluation
4. Consider multi-objective optimization when relevant
5. Implement proper resource management
6. Monitor and analyze results carefully

For more advanced techniques and implementations, refer to the documentation of specialized libraries like Optuna, Ray Tune, and Keras Tuner.

In [1]:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

2025-01-09 12:18:29.890 python[22420:29462828] Metal API Validation Enabled


ModuleNotFoundError: No module named 'tensorflow.keras.wrappers'