**Plan**

**1. Importance of hyperparameters**

**2. Techniques for hyperparameter tuning**

**3. Grid search and random search with Keras**





# **Importance of hyperparameters**

Hyperparameters are crucial in defining the performance and behavior of machine learning models, including those built with Keras. Unlike model parameters, which are learned from the data during training, hyperparameters are set before training begins and play a significant role in shaping the model's learning process and performance.

**<h2>Key Hyperparameters in Keras</h2>**

1. **Learning Rate**:
   - **Description**: Controls the size of the steps the optimizer takes during gradient descent.
   - **Importance**: A learning rate that's too high can cause the model to converge too quickly to a suboptimal solution or even diverge. A learning rate that's too low can result in slow convergence.

2. **Number of Epochs**:
   - **Description**: The number of times the entire training dataset passes through the model.
   - **Importance**: Too few epochs may lead to underfitting, while too many epochs can cause overfitting. Proper tuning is required to balance training time and model performance.

3. **Batch Size**:
   - **Description**: The number of samples processed before the model is updated.
   - **Importance**: A small batch size provides a noisy estimate of the gradient, which can be beneficial for escaping local minima but may result in slower convergence. A large batch size provides a more accurate estimate of the gradient but may lead to slower convergence due to memory constraints.

4. **Number of Layers and Units per Layer**:
   - **Description**: The architecture of the neural network, including the number and type of layers and the number of units in each layer.
   - **Importance**: Too few layers or units can lead to underfitting, while too many can lead to overfitting and increased computational cost.

5. **Activation Functions**:
   - **Description**: Functions applied to the output of each layer, such as ReLU, sigmoid, or tanh.
   - **Importance**: The choice of activation function affects the learning dynamics and the ability of the model to capture non-linear relationships.

6. **Optimizer**:
   - **Description**: Algorithm used to update the model’s weights, such as Adam, SGD, or RMSprop.
   - **Importance**: Different optimizers have different strategies for adjusting the learning rate and handling gradients. The choice of optimizer can significantly affect the convergence speed and final performance.

7. **Dropout Rate**:
   - **Description**: Fraction of input units to drop during training to prevent overfitting.
   - **Importance**: A higher dropout rate can prevent overfitting but may also result in underfitting if too high.

8. **Regularization Parameters**:
   - **Description**: Techniques like L1/L2 regularization that add a penalty to the loss function to prevent overfitting.
   - **Importance**: Regularization helps in reducing model complexity and improving generalization.

9. **Learning Rate Schedulers**:
   - **Description**: Methods to adjust the learning rate during training, such as step decay or exponential decay.
   - **Importance**: Dynamic adjustment of learning rates can help in better convergence and faster training.

**<h2>Why Hyperparameters Matter</h2>**

1. **Model Performance**: Properly tuned hyperparameters can drastically improve model accuracy, reduce loss, and increase robustness. Poorly chosen hyperparameters can lead to underfitting or overfitting.

2. **Training Efficiency**: Effective hyperparameter settings can lead to faster convergence, reducing the computational resources and time required to train the model.

3. **Generalization**: Well-chosen hyperparameters help the model generalize better to unseen data, improving its performance on test datasets.

4. **Avoiding Overfitting**: Hyperparameters such as dropout rate and regularization techniques help in preventing overfitting, where the model performs well on training data but poorly on unseen data.

**<h2>Strategies for Hyperparameter Tuning</h2>**

1. **Grid Search**: Exhaustively searches over a specified hyperparameter grid. It is computationally expensive but straightforward.

2. **Random Search**: Samples hyperparameters randomly from a specified range. It can be more efficient than grid search and often finds good hyperparameters more quickly.

3. **Bayesian Optimization**: Uses probabilistic models to find the best hyperparameters based on past results. It is more sophisticated and can be more efficient than grid or random search.

4. **Hyperparameter Tuning Libraries**: Libraries like Keras Tuner, Optuna, and Ray Tune offer advanced tools for hyperparameter optimization.

### Example: Tuning Hyperparameters with Keras Tuner

Here's an example of using Keras Tuner to optimize hyperparameters:

```python
import kerastuner as kt
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.applications import VGG16

def build_model(hp):
    model = Sequential()
    model.add(VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3)))
    model.add(Flatten())
    model.add(Dense(units=hp.Int('dense_units', min_value=32, max_value=512, step=32), activation='relu'))
    model.add(Dense(100, activation='softmax'))

    model.compile(
        optimizer=kt.optimizers.Adam(hp.Float('learning_rate', min_value=1e-4, max_value=1e-1, sampling='LOG')),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    return model

# Initialize the tuner
tuner = kt.RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=3,
    directory='my_dir',
    project_name='intro_to_kt'
)

# Search for the best hyperparameters
tuner.search(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

# Retrieve the best model
best_model = tuner.get_best_models(num_models=1)[0]
```

**<h2>Summary</h2>**

Hyperparameters are essential in determining how well a model performs, how efficiently it trains, and how well it generalizes to new data. Proper tuning of hyperparameters can significantly improve model performance and training efficiency. Techniques like grid search, random search, and Bayesian optimization can help in finding the optimal set of hyperparameters for your model.

# **Techniques for hyperparameter tuning**

Hyperparameter tuning is a crucial step in optimizing the performance of machine learning models. In Keras, several techniques can be used to find the best hyperparameters for your model. Here are the most common methods:

**<h2>1. Grid Search</h2>**

Grid Search involves specifying a grid of hyperparameter values and exhaustively evaluating all possible combinations. While it can be computationally expensive, it is straightforward and ensures that all specified values are considered.

**Steps:**
1. Define a parameter grid.
2. Train and evaluate models for each combination in the grid.
3. Select the combination that performs best based on a validation metric.

**Example:**
```python
from sklearn.model_selection import GridSearchCV
from scikeras.wrappers import KerasClassifier

def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(units=64, activation='relu', input_shape=(input_shape,)))
    model.add(Dense(units=10, activation='softmax'))
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, epochs=10, batch_size=32, verbose=0)

param_grid = {
    'optimizer': ['adam', 'rmsprop'],
    'batch_size': [16, 32],
    'epochs': [10, 20]
}

grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring='accuracy', cv=3)
grid_result = grid.fit(x_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
```

**<h2>2. Random Search</h2>**

Random Search samples a fixed number of hyperparameter combinations from a predefined range. It is generally more efficient than Grid Search, especially with large parameter spaces.

**Steps:**
1. Define the parameter space with ranges or distributions.
2. Randomly sample combinations and evaluate them.
3. Select the best-performing combination.

**Example:**
```python
from sklearn.model_selection import RandomizedSearchCV
from scikeras.wrappers import KerasClassifier
from scipy.stats import uniform

def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(units=64, activation='relu', input_shape=(input_shape,)))
    model.add(Dense(units=10, activation='softmax'))
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, epochs=10, batch_size=32, verbose=0)

param_dist = {
    'optimizer': ['adam', 'rmsprop'],
    'batch_size': [16, 32],
    'epochs': [10, 20],
    'dropout_rate': uniform(0, 0.5)  # Example of a continuous distribution
}

random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, scoring='accuracy', cv=3)
random_search_result = random_search.fit(x_train, y_train)

print(f"Best: {random_search_result.best_score_} using {random_search_result.best_params_}")
```

**<h2>3. Bayesian Optimization</h2>**

Bayesian Optimization uses probabilistic models to explore the hyperparameter space more efficiently. It builds a model of the function to optimize and uses it to choose the most promising hyperparameters to evaluate next.

**Libraries**:
- **Keras Tuner**: A library integrated with Keras that supports Bayesian optimization.
- **Optuna**: An advanced hyperparameter optimization framework.

**Example with Keras Tuner:**
```python
import kerastuner as kt
from keras.models import Sequential
from keras.layers import Dense

def build_model(hp):
    model = Sequential()
    model.add(Dense(units=hp.Int('units', min_value=32, max_value=512, step=32), activation='relu', input_shape=(input_shape,)))
    model.add(Dense(10, activation='softmax'))
    model.compile(
        optimizer=kt.optimizers.Adam(hp.Float('learning_rate', min_value=1e-4, max_value=1e-1, sampling='LOG')),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    return model

tuner = kt.BayesianOptimization(
    build_model,
    objective='val_accuracy',
    max_trials=10,
    directory='my_dir',
    project_name='intro_to_kt'
)

tuner.search(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

best_model = tuner.get_best_models(num_models=1)[0]
```

**<h2>4. Hyperband</h2>**

Hyperband is an adaptive resource allocation method that combines random search with early stopping. It allocates resources to the most promising configurations and terminates less promising ones early.

**Example with Keras Tuner:**
```python
import kerastuner as kt
from keras.models import Sequential
from keras.layers import Dense

def build_model(hp):
    model = Sequential()
    model.add(Dense(units=hp.Int('units', min_value=32, max_value=512, step=32), activation='relu', input_shape=(input_shape,)))
    model.add(Dense(10, activation='softmax'))
    model.compile(
        optimizer=kt.optimizers.Adam(hp.Float('learning_rate', min_value=1e-4, max_value=1e-1, sampling='LOG')),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    return model

tuner = kt.Hyperband(
    build_model,
    objective='val_accuracy',
    max_epochs=10,
    factor=3,
    directory='my_dir',
    project_name='intro_to_kt'
)

tuner.search(x_train, y_train, validation_data=(x_test, y_test))
```

**<h2>5. Manual Tuning</h2>**

Manual tuning involves manually experimenting with different hyperparameter values based on intuition and previous experience. This method is less systematic but can be useful for quick adjustments or when you have domain expertise.

**<h2>Summary</h2>**

Hyperparameter tuning is essential for optimizing machine learning models. Techniques like Grid Search, Random Search, Bayesian Optimization, and Hyperband offer various approaches to systematically explore and optimize hyperparameters. Each method has its strengths and is suited for different scenarios, from exhaustive search to efficient exploration of large hyperparameter spaces. Using libraries like Keras Tuner and Optuna can simplify and automate the tuning process.

# **Grid search and random search with Keras**

**<h2>Grid Search</h2>**

In [27]:
import numpy as np
from keras.datasets import cifar10
from keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Preprocess the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)


In [None]:
! pip install scikeras

In [32]:
from keras.models import Sequential
from keras.layers import Dense, Flatten
from scikeras.wrappers import KerasClassifier # KerasRegressor

def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Flatten(input_shape=(32, 32, 3)))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(10, activation='softmax'))
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, epochs=10, batch_size=32, verbose=0)

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'optimizer': ['adam', 'rmsprop'],
    'batch_size': [16, 32],
    'epochs': [10, 20]
}

grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring='accuracy', cv=3)
grid_result = grid.fit(x_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

**<h2>Random Search</h2>**

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Flatten
from scikeras.wrappers import KerasClassifier

def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Flatten(input_shape=(32, 32, 3)))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(10, activation='softmax'))
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, epochs=10, batch_size=32, verbose=0)

In [None]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

param_dist = {
    'optimizer': ['adam', 'rmsprop'],
    'batch_size': [16, 32],
    'epochs': [10, 20],
    'dropout_rate': uniform(0, 0.5)  # Example of a continuous distribution
}

random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, scoring='accuracy', cv=3)
random_search_result = random_search.fit(x_train, y_train)

print(f"Best: {random_search_result.best_score_} using {random_search_result.best_params_}")