# **Creating a CNN Model and optimize it using Keras Tuner**

The Hyperparameters in CNN
Hyperparameters in a Convolutional Neural Network (CNN) are parameters that are set before training the model and determine how the model is trained. These parameters are not learned from the data during training but are chosen by the data scientist or machine learning practitioner before training begins. Proper tuning of hyperparameters can significantly impact the performance of your CNN. Here are some common hyperparameters in CNNs:

Learning Rate: This is one of the most crucial hyperparameters. It determines the step size at which the optimizer adjusts the model's weights during training. Too high a learning rate can cause the optimization process to diverge, while too low a learning rate can lead to slow convergence. It needs to be tuned carefully.

Number of Convolutional Layers: The architecture of your CNN includes decisions about how many convolutional layers to use. Deeper networks may capture more complex features but could lead to overfitting, while shallower networks might not capture enough features.

Number of Filters/Kernels: Each convolutional layer consists of multiple filters (also known as kernels) that scan the input data to detect different features. The number of filters in each layer affects the network's capacity to learn complex patterns.

Filter Size: The size of the filters determines the spatial extent of the features they can detect. Common filter sizes are 3x3, 5x5, and 7x7. Smaller filter sizes generally capture finer details, while larger filter sizes capture more global features.

Pooling Type and Size: Pooling layers downsample the spatial dimensions of the feature maps, reducing computation and helping to generalize. Common pooling operations are max pooling and average pooling. The pooling size determines how much downsampling is applied.

Activation Functions: Activation functions introduce non-linearity into the network. Common activation functions include ReLU (Rectified Linear Unit), sigmoid, and tanh. ReLU is often preferred due to its simplicity and effectiveness.

Dropout Rate: Dropout is a regularization technique that randomly sets a fraction of input units to zero during each update. It helps prevent overfitting by reducing interdependence between neurons.

Batch Size: The number of training examples used in each iteration of gradient descent. Larger batch sizes may lead to faster convergence but could require more memory.

Number of Epochs: An epoch is one complete iteration through the entire training dataset. Too few epochs might not allow the model to learn properly, while too many could lead to overfitting.

Optimizer Choice: Different optimization algorithms like SGD (Stochastic Gradient Descent), Adam, RMSProp, etc., have different behaviors and learning rate adaptivity.

Weight Initialization: The initial values of the weights can affect the convergence of the optimization process. Proper weight initialization strategies can help the model train faster and more effectively.

Learning Rate Schedule: Adjusting the learning rate during training can help balance rapid progress in the beginning with finer adjustments as the model converges.

Tuning these hyperparameters often involves a combination of trial and error, intuition, and sometimes automated techniques like grid search, random search, or Bayesian optimization. It's important to note that finding the best hyperparameters is a part of the model development process and requires experimentation and understanding of the specific problem and dataset you're working with.

**Dataset:** Fashion_mnist dataset taken from Keras.

**Check the Runtime is in GPU** and not CPU.

In [1]:
!pip install tensorflow



In [2]:
!pip install keras-tuner

#helps us choose how many convolutional layers we need in the model

Collecting keras-tuner
  Downloading keras_tuner-1.4.7-py3-none-any.whl.metadata (5.4 kB)
Collecting kt-legacy (from keras-tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl.metadata (221 bytes)
Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras-tuner
Successfully installed keras-tuner-1.4.7 kt-legacy-1.0.5


*inside sequential layer, we adding conolutional 2D layer along with flatten and dense layer.

* First parameter is filters:
- with the help of keras_tuner, we can select diff values for filters(hp.Int will creating a range of values between 32 and 128.
- kernel_size is nothing but filter size( we can hp choice preferred)

* similar operations on performed on next layer keeping min_size = 32 and max_size = 64.


* Next we flatten the layers, and for the dense layer ..we enter min and max value.

* At last we adding the end dense layer with 10 output nodes and ativation used here for ths is Softmax. We will get the output in the form of probabilities.


* In compilation, we used the optimizer - Adam, using hp.choice to choose the learning rate between the specified values only.

In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator   #ImageDataGenerator is for applying data augmentation.
from tensorflow.keras.callbacks import EarlyStopping
from kerastuner import RandomSearch                                   #RandomSearch from Keras Tuner is used for hyperparameter tuning.
from sklearn.model_selection import train_test_split

In [None]:
# Load the Fashion MNIST dataset
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()


In [None]:
# Reshape the images to add a channel dimension
train_images = train_images.reshape(len(train_images), 28, 28, 1).astype('float32') / 255.0
test_images = test_images.reshape(len(test_images), 28, 28, 1).astype('float32') / 255.0

In [None]:
# Split training data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(train_images, train_labels, test_size=0.1, random_state=42)

In [None]:
# Data augmentation
datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

In [None]:
# Build the model
def build_model(hp):
    model = keras.Sequential([
        keras.layers.Conv2D(
            filters=hp.Int('conv_1_filter', min_value=32, max_value=128, step=16),
            kernel_size=hp.Choice('conv_1_kernel', values=[3, 5]),
            activation='relu',
            input_shape=(28, 28, 1)
        ),
        keras.layers.Conv2D(
            filters=hp.Int('conv_2_filter', min_value=32, max_value=128, step=16),
            kernel_size=hp.Choice('conv_2_kernel', values=[3, 5]),
            activation='relu'
        ),
        keras.layers.Flatten(),
        keras.layers.Dense(
            units=hp.Int('dense_1_units', min_value=32, max_value=128, step=16),
            activation='relu'
        ),
        keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=keras.optimizers.Adam(hp.Choice('learning_rate', values=[1e-2, 1e-3])),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model


In [None]:

# Hyperparameter tuning using Random Search
tuner_search = RandomSearch(build_model, objective='val_accuracy',
                            max_trials=5, directory='output',
                            project_name="Mnist_Fashion")

In [None]:
# Fit the tuner
tuner_search.search(X_train, y_train, epochs=5, validation_data=(X_val, y_val))

In [None]:
# Retrieve the best model
model = tuner_search.get_best_models(num_models=1)[0]
model.summary()

In [None]:
# Implement early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

In [None]:
# Retrain the model using the best parameters with increased epochs
r = model.fit(datagen.flow(X_train, y_train, batch_size=32),
               epochs=10,  # Increased number of epochs
               validation_data=(X_val, y_val),
               callbacks=[early_stopping])

In [None]:
# Plotting the training and validation accuracy
plt.figure(figsize=(12, 4))

# Accuracy
plt.subplot(1, 2, 1)
plt.plot(r.history['accuracy'], label='Train Accuracy')
plt.plot(r.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()


# Loss
plt.subplot(1, 2, 2)
plt.plot(r.history['loss'], label='Train Loss')
plt.plot(r.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()


Accuracy Graph:

You should see the training accuracy increase over time as the model learns from the data.
The validation accuracy should ideally increase as well, indicating that the model generalizes well to unseen data.
If the training accuracy continues to increase while the validation accuracy plateaus or decreases, it could indicate overfitting.

Loss Graph:

The training loss should decrease over epochs, indicating that the model is learning effectively.
The validation loss should also decrease. If it starts to increase after a point while training loss continues to decrease, it suggests that the model is starting to overfit to the training data.

# **Conclusion**
This code effectively implements a convolutional neural network to classify images from the Fashion MNIST dataset. The integration of data augmentation, hyperparameter tuning, and early stopping helps improve model performance and generalization. By examining the resulting graphs, you can evaluate how well the model is performing during training and validation, helping you make informed decisions about further tuning and adjustments.