#**Problem 1*#



**Design a convolutional neural network in Keras of exactly 5 convolutional layers.** Use the MNIST dataset for evaluation. Do not use any pooling layers but keep the stride at 2 for each convolutional layer. You must try three designs as detailed below and provide your observations on the performance of each:
1. A regular CNN where the number of filters in each layer increases as the depth of the
network grows i.e., the Lth layer will have more filters than the (L-1)th layer.
2. An inverted CNN where the number of filters in each layer decreases as the depth of the
network grows i.e., the Lth layer will have less filters than the (L-1)th layer.
3. An hour-glass shaped CNN where the number of filters will increase till the Lth layer and
reduce afterwards.

Your goal is to design these networks and optimize them to their best performance by choosing
the right hyperparameters for each network, such as the learning rate, batch size and the choice
of optimizer (‘SGD’, ‘adam’, ‘RMSProp’). You must provide a detailed report of what values you
tried for each hyperparameters, your observations on why the network performed well (or not)
and the final accuracy for each network on the MNIST dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import LearningRateScheduler

train_data = pd.read_csv("data/train.csv")
test_data = pd.read_csv("data/test.csv")

Y_train = train_data["label"] #defining labels as Y_train
X_train = train_data.drop(labels = ["label"],axis = 1) #defining the images as X_train

g = plt.imshow(X_train[100][:,:,0]) #displaying random image from the dataset

X_train = X_train / 255.0
X_train = X_train.astype('float32')
X_test = test_data / 255.0
X_test = X_test.astype('float32')
X_train = X_train.values.reshape(X_train.shape[0],28,28,1)
X_test = X_test.values.reshape(X_train.shape[0],28,28,1)

Y_train = to_categorical(Y_train, num_classes = 10)

datagen = ImageDataGenerator(
        rotation_range=10,
        zoom_range = 0.1,
        width_shift_range=0.1,
        height_shift_range=0.1)

datagen.fit(X_train)

X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size = 0.1)

model = Sequential()

model.add(Conv2D(32, kernel_size=5,input_shape=(28, 28, 1), activation = 'relu'))
model.add(Conv2D(32, kernel_size=5, activation = 'relu'))
model.add(MaxPool2D(2,2))
model.add(BatchNormalization())
model.add(Dropout(0.4))

model.add(Conv2D(64, kernel_size=3,activation = 'relu'))
model.add(Conv2D(64, kernel_size=3,activation = 'relu'))
model.add(MaxPool2D(2,2))
model.add(BatchNormalization())
model.add(Dropout(0.4))

model.add(Conv2D(128, kernel_size=3, activation = 'relu'))
model.add(BatchNormalization())

model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.4))
model.add(Dense(128, activation = "relu"))
model.add(Dropout(0.4))
model.add(Dense(10, activation = "softmax"))

optimizer=Adam(lr=0.001)
model.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy"])

#model summary
model.summary()

model_try = model.fit_generator(datagen.flow(X_train,Y_train, batch_size=32),
                              epochs = 30, validation_data = (X_val,Y_val),
                              verbose = 1, steps_per_epoch=300)


#Using the exact same epoch rate and the batch_size, I was able to get a test accuracy of a whopping 99.571% !

predictions = model.predict(X_test)
predictions = np.argmax(predictions,axis = 1)
predictions = pd.Series(predictions, name="Label")
submit = pd.concat([pd.Series(range(1,28001),name = "ImageId"),predictions],axis = 1)
submit.to_csv("result.csv",index=False)

Your goal is to design these networks and optimize them to their best performance by choosing
the right hyperparameters for each network, such as the learning rate, batch size and the choice
of optimizer (‘SGD’, ‘adam’, ‘RMSProp’). You must provide a detailed report of what values you
tried for each hyperparameters, your observations on why the network performed well (or not)
and the final accuracy for each network on the MNIST dataset.


**This CNN takes as input tensors of shape (image_height, image_width, image_channels). In this case, I configure the CNN to process inputs of size (28, 28, 1). I do this by passing the argument input_shape=(28, 28, 1) to the first layer.**

**The Conv2D layers are used for the convolution operation that extracts features from the input images by sliding a convolution filter over the input to produce a feature map. Here I choose feature map with size 5 x 5 for the first group of the model and a feature map of 3 x 3 for the second and the third group.**

**The MaxPooling2D layers are used for the max-pooling operation that reduces the dimensionality of each feature, which helps shorten training time and reduce number of parameters. Here I choose the pooling window with size 2 x 2 for all the groups.**

**To normalize the input layers, I use the BatchNormalization layers to adjust and scale the activations. Batch Normalization reduces the amount by what the hidden unit values shift around (covariance shift). Also, it allows each layer of a network to learn by itself a little bit more independently of other layers.**

**To combat overfitting, I use the Dropout layers, a powerful regularization technique. Dropout is the method used to reduce overfitting. It forces the model to learn multiple independent representations of the same data by randomly disabling neurons in the learning phase. For example, the layers will randomly disable 40% of the outputs in all the groups.**

**My model uses 5 Conv2D layers , 2 MaxPool2D, 3 layers of BatchNormalization and 4 layers of Dropout.**

**I have done a 10-way classification as there are 10 output labels in the dataset. Softmax activation enables me to calculate the output based on the probabilities. Each class is assigned a probability and the class with the maximum probability is the model’s output for the input.**

**All the other layers, I have used “relu” activation function because “relu” improves neural network by speeding up the training process.**

**I have used categorical_rossentropy as the loss function and Adam as the optimizer for this model.**

**The optimizer is responsible for updating the weights of the neurons via backpropagation. It calculates the derivative of the loss function with respect to each weight and subtracts it from the weight. That is how a neural network learns.**





#**Problem 2**#


Implement the LeNet Convolutional Neural Network using Keras. It is a seven-layer network with
three convolutional layers, two max-pooling layers and 2 dense layers. The structure is shown
below:
Layer 1: convolution layer with 6 convolution kernels of 5x5 with stride 1

Layer 2: max-pooling layer with 2x2 kernels with stride 2

Layer 3: convolution layer with 16 convolution kernels of 5x5 with stride 1

Layer 4: max-pooling layer with 2x2 kernels with stride 2

Layer 5: convolution layer with 120 convolution kernels of 5x5

Layer 6: dense layer with 84 neurons

Layer 7: output layer

Use the ‘Adam’ optimizer to train your network on the CIFAR-10 dataset for a fixed set of 25
epochs. You can use the built-in functions to load the data. Each image is 32x32x3 matrix and you
will have 60,000 images for training and 10,000 for test. There are 10 classes in the dataset each
representing an object in the image.
Perform the following analysis and answer each question briefly (3-5 sentences). Use plots and figures as necessary.

1. What is the effect of learning rate on the training process? Which performed best?
**To analyze the effect of the learning rate, try several values such as 0.01, 0.001, and 0.0001. In general, a high learning rate causes the model to converge fast but may result in overshooting the optimal solution. A poor learning rate, on the other hand, can result in sluggish convergence or being stuck in inferior solutions. To discover the best learning rate, evaluate the validation accuracy for each learning rate**

2. What is the effect of batch size on the training process? Which performed best?
**To analyze the effect of batch size, try different values such as 32, 64, and 128. Smaller batch sizes result in noisier updates but faster convergence, whereas bigger batch sizes reduce noise but may hinder convergence. To establish the ideal batch size, evaluate the validation accuracy and training time for each batch size.**

3. Try different hyperparameters to obtain the best accuracy on the test set. What is your best performance and what were the hyperparameters?
**Experiment with different combinations of learning rate, batch size, number of epochs, and model architecture to determine the ideal hyperparameters for the LeNet model. To explore the hyperparameter space, you can use grid search or random search approaches. Maintain a record of the validation accuracy for each combination and choose the one with the highest accuracy.**

4. Implement an equivalent feed forward network for the same task with each hidden layer containing the same number of neurons as the number of filters in each convolution layer. Use the ‘Adam’ optimizer to train your network on the CIFAR-10 dataset for a fixed set of 25 epochs. Compare its performance with your LeNet implementation based on the
following questions:
a. What is its performance?
b. How many parameters are there in this network compared to the LeNet


**You can use the same number of neurons in each hidden layer as the number of filters in each convolutional layer to create an identical dense feed-forward network. This would be 6, 16, and 120 neurons for the LeNet model, respectively. Before the output layer, you can add dense layers with these numbers of neurons. Use the same optimizer, learning rate, batch size, and number of epochs as the LeNet model to train the model. Based on training and validation accuracy, compare the performance of the dense feed-forward network and the LeNet model. In addition, the training time and number of parameters in both models can be compared.**

**Finally, plot the training and validation accuracy/loss curves using the history object to visualize the training process.**

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

# Load and preprocess the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Define the LeNet model
model = keras.Sequential()
model.add(layers.Conv2D(6, (5, 5), activation="relu", input_shape=(32, 32, 3), strides=1))
model.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(layers.Conv2D(16, (5, 5), activation="relu", strides=1))
model.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(layers.Conv2D(120, (5, 5), activation="relu", strides=1))
model.add(layers.Flatten())
model.add(layers.Dense(84, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))

# Compile the model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

# Define the training parameters
learning_rate = 0.001
batch_size = 128
epochs = 25

# Train the model
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)

# Print the test accuracy
print("Test accuracy:", test_acc)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Test accuracy: 0.6104000210762024
