# Assignment 06 Solutions

#### 1.	What are the advantages of a CNN over a fully connected DNN for image classification?

<code>No require human supervision required.
Automatic feature extraction.
Highly accurate at image recognition & classification.
Weight sharing.
Minimizes computation.

#### 2.	Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels.
What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?

<code>Lowest Layer: 100 feature maps x (3x3x3 parameters) = 27,000 parameters.
Middle Layer: 200 feature maps x (3x3x100 parameters) = 1,800,000 parameters.
Top Layer: 400 feature maps x (3x3x200 parameters) = 7,200,000 parameters.
Total Parameters = 27,000 + 1,800,000 + 7,200,000
Total Parameters = 9,027,000 parameters
RAM for Single Prediction = 9,027,000 parameters x 4 bytes/parameter x 1 instance
RAM for Single Prediction = 36,108,000 bytes or approximately 36.11 megabytes (MB).    

#### 3.	If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?

<code>Data Augmentation and Compression, reduce batch size, Gradient Checkpointing, Model Simplification, Use Mixed Precision Training

#### 4.	Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?

<code>Taking the max from the kernel sized patch of an input feature map without having to learn weights for that layer.

#### 5.	When would you want to add a local response normalization layer?

<code>This is useful when we are dealing with ReLU neurons. ReLU neurons have unbounded activations, and we need local response normalization (LRN) to normalize them. To do this, we need to identify high frequency features. By applying LRN, the neurons becomes more sensitive than their neighbors.

#### 6.	Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?

<code>AlexNet (2012) vs. LeNet-5 (1998): 
Deeper Architecture: AlexNet is significantly deeper than LeNet-5, with more convolutional and fully connected layers.
Rectified Linear Units (ReLU): AlexNet used rectified linear units as activation functions.
Data Augmentation: AlexNet employed extensive data augmentation techniques to reduce overfitting.
Local Response Normalization (LRN): AlexNet used LRN, a form of normalization, to improve generalization.
Dropout: AlexNet introduced dropout as a regularization technique to prevent overfitting.
Parallelization: AlexNet was designed to be run on two GPUs, which allowed it to train faster.
GoogleNet: Inception Module(capture feature in various level), Global Average Pooling
RESNET: Skip Connections, Batch normalization, Residual Learning
SENET: Squeeze-and-Excitation Block(Channel wise feature map), Channel Attention    
Xception: Depthwise Separable Convolution, Increase Deapth

#### 7.	What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?

<code>A Fully Convolutional Network (FCN) is a type of neural network architecture designed for tasks that involve dense pixel-wise predictions, such as image segmentation and object detection. 
Convert dense layer into CNN Layer
    1: Remove Fully Connected (Dense) Layers
    2: Replace with 1x1 Convolution
    3: Global Average Pooling (Optional)
    4: Set Output size

#### 8.	What is the main technical difficulty of semantic segmentation?

<code> High Spatial Resolution, Class Imbalance, real time interface, Large-Scale Data Annotation, Robust, Per-Pixel Labeling, Smantic confusion, etc.

#### 9.	Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.

In [3]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

In [4]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

train_images = train_images.astype("float32") / 255
test_images = test_images.astype("float32") / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [5]:
model = models.Sequential()

# Convolutional layers
model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))

# Fully connected layers
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))  # 10 output classes

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Train the model
model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x22e13bd70a0>

In [6]:
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_accuracy * 100:.2f}%")

Test accuracy: 98.92%


#### 10.	Use transfer learning for large image classification, going through these steps:
- a.	Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).
- b.	Split it into a training set, a validation set, and a test set.
- c.	Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.
- d.	Fine-tune a pretrained model on this dataset.

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In [2]:
from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42)

In [3]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Data augmentation (optional)
datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    rescale=1.0/255.0  # Normalize pixel values to [0, 1]
)

# Prepare data generators
batch_size = 32
train_generator = datagen.flow(x_train, y_train, batch_size=batch_size)
val_generator = datagen.flow(x_val, y_val, batch_size=batch_size)
test_generator = datagen.flow(x_test, y_test, batch_size=batch_size)

In [4]:
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import Adam

# Load the pre-trained MobileNetV2 model without the top (classification) layer
base_model = MobileNetV2(input_shape=(96, 96, 3), include_top=False, weights='imagenet')

# Add custom classification layers on top of the base model
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))  # 10 classes in CIFAR-10

# Freeze the layers of the base model
base_model.trainable = False

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
epochs = 10
history = model.fit(train_generator, validation_data=val_generator, epochs=epochs)

# Unfreeze some layers for fine-tuning (optional)
base_model.trainable = True

# Recompile the model with a lower learning rate
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Fine-tune the model
fine_tune_epochs = 10
history_fine_tune = model.fit(train_generator, validation_data=val_generator, epochs=fine_tune_epochs)

# Evaluate on the test set
test_loss, test_accuracy = model.evaluate(test_generator)
print(f"Test accuracy: {test_accuracy * 100:.2f}%")

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_96_no_top.h5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 11.11%
