# Introduction to Deep Learning Hyperparameter Tuning

Deep Learning models are highly sensitive to their hyperparameters, which significantly influence the model's performance. Hyperparameter tuning is the process of selecting the set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is set before the learning process begins.

Hyperparameters can broadly be classified into two categories:

- **Model hyperparameters** which influence model selection such as the number and width of hidden layers in a neural network.
- **Algorithm hyperparameters** which influence the speed and quality of the learning algorithm such as learning rate or batch size.

The goal of hyperparameter tuning is to find the combination of hyperparameters that yields the best model performance, often measured through validation data.


### Importance of Hyperparameter Tuning

Hyperparameter tuning is critical in deep learning for several reasons:

1. **Improves Model Performance:** Proper tuning can lead to significant improvements in model accuracy.
2. **Controls Overfitting:** By tuning regularization parameters, we can control model complexity and mitigate overfitting.
3. **Efficiency:** Optimal hyperparameters can make training faster and more efficient, saving time and computational resources.

### Key Hyperparameters

- **Learning Rate**: Controls the speed at which the model's weights are adjusted during training.
- **Batch Size**: The number of samples processed before the model's weights are updated.
- **Number of Epochs**: The number of times the model iterates through the entire training dataset.
- **Optimizer**: The algorithm used to update the model's weights (e.g., Stochastic Gradient Descent, Adam).
- **Network Architecture**: Number of layers, neurons per layer, activation functions.

### Techniques for Hyperparameter Tuning

Several techniques exist for hyperparameter tuning, each with its own advantages and trade-offs:

1. **Grid Search:** Exhaustively tries every combination of hyperparameters specified in a grid.
2. **Random Search:** Randomly selects combinations of hyperparameters to try.
3. **Bayesian Optimization:** Uses a probabilistic model to guide the search for the best hyperparameters.
4. **Gradient-based Optimization:** Adjusts hyperparameters using gradient information when available.

# Hyperparameter Tuning Example with PyTorch

This example demonstrates how to perform hyperparameter tuning for a simple feedforward neural network (FNN) on the MNIST dataset. We use Pytorch for model construction and the tuning process. The goal is to find the best combination of hyperparameters that yields the highest accuracy on the validation set.

## Hyperparameters Tuned

- **Number of Layers (`num_layers`):** Determines the depth of the neural network. Deep networks can model complex patterns, but are also more computationally expensive and prone to overfitting.
- **Number of Neurons in Each Layer (`hidden_size`):** Controls the width of the layers. More neurons can capture more information but also make the network more complex and prone to overfitting.
- **Learning Rate (`learning_rate`):** Affects how quickly or slowly a neural network updates its parameters. The right learning rate can make training faster and more stable.

In [35]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from sklearn.model_selection import ParameterGrid
from tqdm import tqdm

### Loading and Preprocessing the Data
The MNIST dataset is a collection of handwritten digits, which we'll use for a classification task.

In [36]:
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST('./data', train=False, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)

input_size = 28 * 28
output_size = 10

# Load MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])


### Defining the Model
We define a class that constructs a neural network model. This will be used to create different models with various hyperparameters.

In [48]:
   
class FNN(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size):
        super(FNN, self).__init__()
        layers = []
        for i in range(len(hidden_sizes)):
            if i == 0:
                layers.append(nn.Linear(input_size, hidden_sizes[i]))
            else:
                layers.append(nn.Linear(hidden_sizes[i-1], hidden_sizes[i]))
            layers.append(nn.ReLU())
        self.layers = nn.Sequential(*layers)
        self.output = nn.Linear(hidden_sizes[-1], output_size)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten the input
        x = self.layers(x)
        x = self.output(x)
        return x


### Tuning the Model
First we define the grid of hyperparameters to be tuned, including learning rate, number of layers, and number of neurons per layer. We use ParameterGrid to iterate over the grid of hyperparameters combinations. For each combination, it creates an instance of the FNN model with the specified hyperparameters, trains the model on the training dataset, evaluates its performance on the validation dataset, and records the accuracy.

In [49]:
best_accuracy = 0.0
best_hyperparams = None

# Define hyperparameters and their search space
params_grid = {
    'lr': [0.001, 0.01,],
    'num_layers': [2, 3],
    'hidden_size': [64, 128]
}

# Define a function to train and evaluate the model
def train_model(model, train_loader, val_loader, criterion, optimizer, epochs):
    for epoch in range(epochs):
        model.train()
        for data, target in train_loader:
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
        
        model.eval()
        val_loss = 0.0
        correct = 0
        with torch.no_grad():
            for data, target in val_loader:
                output = model(data)
                val_loss += criterion(output, target).item()
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()
        
        val_loss /= len(val_loader.dataset)
        accuracy = correct / len(val_loader.dataset)
        
        print(f"Epoch {epoch+1}/{epochs}, Validation Loss: {val_loss:.4f}, Validation Accuracy: {accuracy:.4f}")

# Hyperparameter tuning
for params in tqdm(list(ParameterGrid(params_grid))):
    model = FNN(input_size, [params['hidden_size']] * params['num_layers'], output_size)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=params['lr'])
    
    train_model(model, train_loader, val_loader, criterion, optimizer, epochs=5)
    
    # Evaluate on validation set
    model.eval()
    correct = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    
    accuracy = correct / len(val_loader.dataset)
    
    # Update best hyperparameters if better accuracy is achieved
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_hyperparams = params



  0%|          | 0/8 [00:00<?, ?it/s]

Epoch 1/5, Validation Loss: 0.0026, Validation Accuracy: 0.9502
Epoch 2/5, Validation Loss: 0.0020, Validation Accuracy: 0.9610
Epoch 3/5, Validation Loss: 0.0016, Validation Accuracy: 0.9666
Epoch 4/5, Validation Loss: 0.0015, Validation Accuracy: 0.9697
Epoch 5/5, Validation Loss: 0.0014, Validation Accuracy: 0.9732


 12%|█▎        | 1/8 [00:17<02:01, 17.36s/it]

Epoch 1/5, Validation Loss: 0.0023, Validation Accuracy: 0.9532
Epoch 2/5, Validation Loss: 0.0021, Validation Accuracy: 0.9609
Epoch 3/5, Validation Loss: 0.0018, Validation Accuracy: 0.9637
Epoch 4/5, Validation Loss: 0.0016, Validation Accuracy: 0.9680
Epoch 5/5, Validation Loss: 0.0016, Validation Accuracy: 0.9708


 25%|██▌       | 2/8 [00:35<01:45, 17.61s/it]

Epoch 1/5, Validation Loss: 0.0027, Validation Accuracy: 0.9487
Epoch 2/5, Validation Loss: 0.0036, Validation Accuracy: 0.9457
Epoch 3/5, Validation Loss: 0.0029, Validation Accuracy: 0.9521
Epoch 4/5, Validation Loss: 0.0035, Validation Accuracy: 0.9488
Epoch 5/5, Validation Loss: 0.0025, Validation Accuracy: 0.9609


 38%|███▊      | 3/8 [00:54<01:31, 18.28s/it]

Epoch 1/5, Validation Loss: 0.0035, Validation Accuracy: 0.9399
Epoch 2/5, Validation Loss: 0.0034, Validation Accuracy: 0.9373
Epoch 3/5, Validation Loss: 0.0033, Validation Accuracy: 0.9445
Epoch 4/5, Validation Loss: 0.0028, Validation Accuracy: 0.9547
Epoch 5/5, Validation Loss: 0.0025, Validation Accuracy: 0.9608


 50%|█████     | 4/8 [01:12<01:13, 18.32s/it]

Epoch 1/5, Validation Loss: 0.0021, Validation Accuracy: 0.9576
Epoch 2/5, Validation Loss: 0.0016, Validation Accuracy: 0.9684
Epoch 3/5, Validation Loss: 0.0014, Validation Accuracy: 0.9716
Epoch 4/5, Validation Loss: 0.0014, Validation Accuracy: 0.9732
Epoch 5/5, Validation Loss: 0.0012, Validation Accuracy: 0.9781


 62%|██████▎   | 5/8 [01:31<00:55, 18.38s/it]

Epoch 1/5, Validation Loss: 0.0019, Validation Accuracy: 0.9613
Epoch 2/5, Validation Loss: 0.0015, Validation Accuracy: 0.9714
Epoch 3/5, Validation Loss: 0.0014, Validation Accuracy: 0.9727
Epoch 4/5, Validation Loss: 0.0014, Validation Accuracy: 0.9726
Epoch 5/5, Validation Loss: 0.0012, Validation Accuracy: 0.9759


 75%|███████▌  | 6/8 [01:50<00:37, 18.76s/it]

Epoch 1/5, Validation Loss: 0.0037, Validation Accuracy: 0.9346
Epoch 2/5, Validation Loss: 0.0028, Validation Accuracy: 0.9522
Epoch 3/5, Validation Loss: 0.0031, Validation Accuracy: 0.9506
Epoch 4/5, Validation Loss: 0.0030, Validation Accuracy: 0.9486
Epoch 5/5, Validation Loss: 0.0031, Validation Accuracy: 0.9556


 88%|████████▊ | 7/8 [02:08<00:18, 18.52s/it]

Epoch 1/5, Validation Loss: 0.0037, Validation Accuracy: 0.9440
Epoch 2/5, Validation Loss: 0.0034, Validation Accuracy: 0.9490
Epoch 3/5, Validation Loss: 0.0029, Validation Accuracy: 0.9521
Epoch 4/5, Validation Loss: 0.0030, Validation Accuracy: 0.9522
Epoch 5/5, Validation Loss: 0.0027, Validation Accuracy: 0.9598


100%|██████████| 8/8 [02:28<00:00, 18.58s/it]


### Reviewing the Results
 After evaluating all combinations, select the combination of hyperparameters that achieved the highest accuracy on the validation set.

In [39]:
print("Best hyperparameters:")
print(best_hyperparams)


Best hyperparameters:
{'hidden_size': 128, 'lr': 0.001, 'num_layers': 3}


The results represents the optimal hyperparameters found through the hyperparameter optimization process using PyTorch for a neural network model. Each key-value pair in the dictionary has a specific meaning regarding the model's configuration:

- **`'num_layers': 3`**: Indicates that the optimal model configuration includes 1 dense layer (excluding the input and output layers). Despite the presence of parameters for up to 5 layers, only the first layer's configuration is applied, based on this optimal number of layers.

- **`'hidden_size': 128`**: Specifies that the first and only dense layer in the optimal model configuration should have 448 neurons. This suggests that having a large number of neurons in this layer is beneficial for the model's performance on the task.

- **`'lr': 0.001`**: The learning rate for the Adam optimizer in the optimal model configuration. A learning rate around this value is typical and indicates that moderate step sizes during training are optimal for this specific task.

In summary, the optimal model configuration includes 3 dense layer with 128 units and utilizes the Adam optimizer with a learning rate of 0.001. The specified units for additional layers beyond the first are not applicable to the best model configuration. Hyperparameter tuning requires careful experimentation and validation to ensure that the selected configuration generalizes well to unseen data and achieves optimal performance for the task at hand.