# **Hyperparameter Tuning in Deep Learning**

## **Objective**
Improve the performance of a Convolutional Neural Network (CNN) by finding the best combination of hyperparameters. This involves systematic adjustments to various parameters that control how the model learns.

---

## **Theory**

### **What are Hyperparameters?**
- Hyperparameters are settings external to the model that affect the learning process.
- Examples include:
  - Learning rates
  - Batch sizes
  - Number of filters
  - Activation functions

---

### **Why Tune Hyperparameters?**
- Proper tuning can:
  - Significantly improve model performance.
  - Prevent overfitting or underfitting.
  - Reduce training time.

---

### **Techniques for Hyperparameter Tuning**

#### **Grid Search**
- Tests all possible combinations of specified values for hyperparameters.
- **Example**: Try different combinations of learning rates `[0.001, 0.01, 0.1]` and batch sizes `[16, 32, 64]`.
- **Strength**: Exhaustive and guarantees the best setting.
- **Weakness**: Time-consuming for a large parameter space.

#### **Random Search**
- Samples a fixed number of random combinations from the parameter space.
- **Strength**: Faster than grid search and often yields results almost as good.
- **Weakness**: No guarantee of finding the best combination.

#### **Manual Tuning**
- Adjusts hyperparameters based on domain expertise or observations.
- **Strength**: Practical for small-scale projects or when intuition plays a role.
- **Weakness**: Time-intensive and might miss optimal settings.

---

## **Practical Implementation**

### **1. Hyperparameters to Tune**
- **Learning Rate**: Controls the step size during optimization.
  - **Too small** → Slow convergence.
  - **Too large** → Risk of divergence or overshooting the minima.
  
- **Batch Size**: Number of samples processed before the model updates weights.
  - **Larger batches** → Stable gradients, more memory required.
  - **Smaller batches** → Faster updates but noisier gradients.

- **Activation Functions**: Affect non-linear transformations in layers. Common options include:
  - ReLU
  - Sigmoid
  - Tanh
  - Leaky ReLU

---

### **2. Steps for Practical Tuning in CNNs**

#### **Prepare the Dataset**
- Use a dataset like **CIFAR-10** or **Cats vs. Dogs** for image classification.

#### **Baseline Model**
- Train a CNN with default hyperparameter values to establish baseline accuracy.

#### **Tuning Approach**
- Use libraries like **GridSearchCV** or **Optuna** for automated tuning.
- Perform manual adjustments starting with learning rates and batch sizes.

#### **Evaluation**
- Use metrics such as:
  - Accuracy
  - Precision
  - Recall
  - F1-score
- Track validation loss/accuracy trends to detect overfitting or underfitting.


---

## Example Code Snippet:

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import ParameterGrid
import numpy as np

In [None]:
def build_model(learning_rate, activation):
    model = Sequential([
        Conv2D(32, (3, 3), activation=activation, input_shape=(32, 32, 3)),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(128, activation=activation),
        Dense(10, activation='softmax')
    ])
    optimizer = Adam(learning_rate=learning_rate)
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
param_grid = {
    'learning_rate': [0.001, 0.01, 0.1],
    'activation': ['relu', 'tanh']
}

best_acc = 0
best_params = None
for params in ParameterGrid(param_grid):
    model = build_model(**params)
    history = model.fit(X_train, y_train, epochs=5, validation_data=(X_val, y_val), verbose=0)
    acc = max(history.history['val_accuracy'])
    if acc > best_acc:
        best_acc = acc
        best_params = params

print(f"Best Parameters: {best_params}")

---

## Key Takeaways:
- Tuning hyperparameters like learning rate and batch size can significantly impact model performance.
- Automated techniques (grid search, random search) streamline the process for larger parameter spaces.
- Always evaluate the tuned model on unseen data to confirm improved generalization.

This process is essential to maximize your CNN’s potential while balancing computational efficiency.