## Assignment Questions

## Understanding Pooling and Padding in CNN

__Q: Describe the purpose and benefits of pooling in CNN.__

__A:__ Pooling in Convolutional Neural Networks (CNNs) is a technique used to downsample the spatial dimensions of feature maps, reducing computational complexity and memory requirements. It helps retain important information while discarding less relevant details. Pooling enhances translation invariance and reduces overfitting by providing a generalized representation of features.

__Q: Explain the difference between Max pooling and Average pooling.__

__A:__ Max pooling and Average pooling are common pooling methods. Max pooling retains the maximum value within a pooling window, emphasizing the most prominent feature. Average pooling calculates the average value in the window, providing a smoother representation of features. Max pooling tends to preserve sharper features, while Average pooling can be less sensitive to small variations.

__Q: Discuss the concept of padding in CNN and its significance.__

__A:__ Padding involves adding additional pixels to the input image before applying convolutions. It addresses the issue of information loss at the edges of feature maps caused by convolution operations. Padding ensures that the spatial dimensions are preserved, enhances the ability to capture features at the borders, and contributes to the network's stability during training.

__Q:__ Compare and contrast zero-padding and valid-padding in terms of their effects on the output feature map size.
A: Zero-padding involves adding zeros around the input image, which maintains the output size after convolution. Valid-padding, on the other hand, uses only valid portions of the input, resulting in a smaller output size. Zero-padding is useful when maintaining the spatial dimensions is crucial, while valid-padding reduces the feature map size and can lead to decreased information retention.

### Exploring LeNet

__Q: Provide a concise overview of the LeNet-5 architecture.__
A: LeNet-5 is an early convolutional neural network designed for handwritten digit recognition. It comprises layers of convolution, pooling, and fully connected layers. The architecture consists of input convolutions, subsampling layers, fully connected layers, and a softmax output layer.

__Q:Describe the key components of LeNet-5 and their respective purposes.__

A: LeNet-5's key components include convolutional layers for feature extraction, subsampling (pooling) layers for downsampling, and fully connected layers for classification. The architecture's gradual reduction in spatial dimensions through convolutions and pooling helps capture hierarchical features.

__Q:Discuss the advantages and limitations of LeNet-5 in the context of image classification tasks.__
A: Advantages include simplicity, effectiveness on small-sized images, and early recognition of handwritten digits. However, LeNet-5's limitations arise from its architecture's simplicity, making it less effective on complex tasks and large datasets compared to modern architectures.

__Q: Implement LeNet-5 using a deep learning framework of your choice and train it on a publicly available dataset (e.g., MNIST). Evaluate its performance and provide insights.__

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32')
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32')

# Build the LeNet-5 model
model = Sequential([
    Conv2D(6, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(16, kernel_size=(5, 5), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(120, activation='relu'),
    Dense(84, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy: 0.9875


### Analyzing AlexNet

__Q: Present an overview of the AlexNet architecture.__

__A:__ AlexNet is a pioneering deep convolutional neural network designed for the ImageNet Large Scale Visual Recognition Challenge. It features multiple convolutional layers, pooling layers, and fully connected layers. AlexNet's innovative architecture contributed to its breakthrough performance.

__Q: Explain the architectural innovations introduced in AlexNet that contributed to its remarkable performance.__

__A:__ AlexNet introduced the concept of stacking multiple convolutional and pooling layers, utilizing ReLU activations for faster training and overcoming the vanishing gradient problem. Additionally, it employed data augmentation and dropout to prevent overfitting.

__Q: Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.__

__A:__ Convolutional layers extract features hierarchically, pooling layers downsample spatial dimensions, and fully connected layers provide classification based on high-level features. This layered architecture enables AlexNet to capture intricate features and patterns.

In [5]:
import tensorflow as tf
from tensorflow import keras
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization

# Get Data
import tflearn.datasets.oxflower17 as oxflower17
from keras.utils import to_categorical

x, y = oxflower17.load_data()

x_train = x.astype('float32') / 255.0
y_train = to_categorical(y, num_classes=17)

Instructions for updating:
non-resource variables are not supported in the long term
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
Downloading Oxford 17 category Flower Dataset, Please wait...


100.0% 60276736 / 60270631


Succesfully downloaded 17flowers.tgz 60270631 bytes.
File Extracted
Starting to parse images...
Parsing Done!


In [6]:
print(x_train.shape)
print(y_train.shape)

(1360, 224, 224, 3)
(1360, 17)


In [7]:
# Create a sequential model
model = Sequential()

# 1st Convolutional Layer
model.add(Conv2D(filters=96, input_shape=(224,224,3), kernel_size=(11,11), strides=(4,4), padding='valid'))
model.add(Activation('relu'))

# Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid'))
# Batch Normalisation before passing it to the next layer
model.add(BatchNormalization())

# 2nd Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), padding='same'))
model.add(Activation('relu'))

# Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())



# 3rd Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())

# 4th Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())


# 5th Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))
model.add(Activation('relu'))


# Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())


# Passing it to a dense layer
model.add(Flatten())

# 1st Dense Layer
model.add(Dense(4096, input_shape=(224*224*3,)))
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

# 2nd Dense Layer
model.add(Dense(4096))
model.add(Activation('relu'))
# Add Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())

# Output Layer
model.add(Dense(17))
model.add(Activation('softmax'))

model.summary()

Instructions for updating:
Colocations handled automatically by placer.
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_17 (Conv2D)          (None, 54, 54, 96)        34944     
                                                                 
 activation (Activation)     (None, 54, 54, 96)        0         
                                                                 
 max_pooling2d_11 (MaxPoolin  (None, 26, 26, 96)       0         
 g2D)                                                            
                                                                 
 batch_normalization (BatchN  (None, 26, 26, 96)       384       
 ormalization)                                                   
                                                                 
 conv2d_18 (Conv2D)          (None, 26, 26, 256)       614656    
                                                

In [8]:
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [9]:
# Train
model.fit(x_train, y_train, batch_size=64, epochs=5, verbose=1,validation_split=0.2, shuffle=True)

Train on 1088 samples, validate on 272 samples
Epoch 1/5

  updates = self.state_updates


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x23d5138d750>