# Assignment_06

#### Q1. What are the advantages of a CNN over a fully connected DNN for image classification?

**Ans.** Convolutional Neural Networks (CNNs) have several advantages over fully connected Deep Neural Networks (DNNs) for image classification. Here are some of the main advantages:

1. Parameter efficiency: CNNs use shared weights and biases for the convolutional layers, which greatly reduces the number of parameters compared to fully connected DNNs. This makes CNNs more efficient for processing large images and reduces the risk of overfitting.

2. Translation invariance: CNNs are designed to recognize patterns and features within an image regardless of their location, making them translation invariant. This is achieved through the use of convolutional layers that slide filters over the image to detect patterns in a local area. This property makes CNNs well-suited for image classification tasks where the object of interest can be anywhere in the image.

3. Hierarchical feature learning: CNNs use multiple layers of feature extraction to learn hierarchical representations of images. This allows the network to learn high-level concepts, such as object categories, from low-level features, such as edges and corners. In contrast, fully connected DNNs treat all input features as independent and do not capture the spatial relationships between them.

4. Robustness to input variations: CNNs are designed to be robust to variations in the input, such as changes in lighting, rotation, and scale. This is achieved through the use of techniques such as pooling, which reduces the sensitivity of the network to small changes in the input.

Overall, the advantages of CNNs over fully connected DNNs make them the state-of-the-art approach for image classification tasks. CNN will outperform a fully-connected network if they have same number of hidden layers with same structure.Convolutional Neural Networks perform better than fully-connected networks on binary image classification, with a lot less parameters, because of their shared-weights architecture and translation invariance characteristics.

#### Q2. Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and &quot;same&quot; padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels. What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?

**Ans.** 
**parameters:** 
First convolutional layer kernel-size and RGB channels, plus bias: 3 * 3 * 3 + 1 = 28 output feature maps is 100: 28 * 100 = 2800</br>
Second convolutional layer kernel-size and last feature maps, plus bias: 3 * 3 * 100 + 1 = 901 output feature maps is 200: 901 * 200 = 180200</br>
Third convolutional layers kernel-size and last feautre maps, plus bias: 3 * 3 * 200 + 1 =1801 output feautre maps is 400: 1801 * 400 = 720400</br>
Total parameters is 2800 + 180200 + 720400 = 903400</br>

Memories since 32-bit is 4 bytes</br>

First convolutional layer one feature map size: 100 * 150 = 15000 total output: 15000 * 100 = 1,500,000</br>
Second convolutional layer one feature map size: 50 * 75 = 3,750 total output: 3750 * 200 = 750,000</br>
Third convolutional layer one feature map size: 25 * 38 = 950 total ouput: 950 * 400 = 380, 000</br>

(1,500,000 + 750,000 + 380,000) * 4 / 1024 /1024 = 10.032 (MB) </br>
903400 * 4 / 1024 / 1024 = 3.44 (MB)</br>

Total memory is 10.032+ 3.44=13.47(MB)</br>

#### Q3. If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?

**Ans.** 
1. Decrease batch size: One of the most effective ways to reduce memory usage is to decrease the batch size. This will reduce the number of samples being processed simultaneously, which in turn reduces the amount of memory required.

2. Reduce input size: If the input images are very large, downsampling them to a smaller size can also reduce the memory requirements. However, this may come at the cost of reduced performance or accuracy.

3. Use mixed precision training: Using mixed precision training, where the weights and activations are stored using 16-bit floating point numbers instead of 32-bit, can significantly reduce memory usage without sacrificing accuracy.

4. Reduce model complexity: Another way to reduce memory usage is to reduce the complexity of the model. This can be done by reducing the number of layers, filters, or neurons in the network, or by using a smaller kernel size. (Remove one or more layers.)

5. Use gradient checkpointing: Gradient checkpointing is a technique that trades off computation time for memory usage. Instead of storing all the intermediate activations during the forward pass for use during the backward pass, only a subset of the activations are stored. This reduces memory usage, but increases computation time during the backward pass.

These are just some of the techniques that can be used to reduce memory usage during CNN training. The best approach will depend on the specific model, dataset, and hardware being used, and may require experimentation to find the optimal solution.

Some other ways are:
1. Reduce dimensionality using a larger stride in one or more layers.
2. Use 16-bit floats instead of 32-bit floats.
3. Distribute the CNN across multiple devices.

#### Q4. Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?

**Ans.** Max pooling layers and convolutional layers with the same stride both reduce the spatial resolution of the feature maps. However, there are some reasons why one might want to use max pooling instead of convolution:

Non-linearity: Max pooling introduces a non-linear activation function that can help the network learn more complex features. In contrast, a convolutional layer with the same stride only applies a linear transformation to the input.

Translation invariance: Max pooling is more translation invariant than convolution with stride. This means that the features learned by the network are less sensitive to small shifts in the input image, which can improve the robustness of the network to small variations in the data.

Computational efficiency: Max pooling is computationally more efficient than convolution with the same stride because it does not require any learnable parameters.

Reduce overfitting: Max pooling can be used to reduce overfitting by reducing the spatial resolution of the feature maps and introducing some degree of spatial invariance.

Overall, max pooling can be a useful tool for reducing the spatial resolution of feature maps while introducing non-linearity, translation invariance, and computational efficiency. However, the choice between max pooling and convolution with the same stride will depend on the specific requirements of the model and the data.

#### Q5. When would you want to add a local response normalization layer?

**Ans.** A local response normalization (LRN) layer can be added to a CNN to improve its generalization performance by promoting competition among feature maps. The main purpose of LRN is to normalize the response of a neuron by dividing it by the sum of squares of its neighboring activations.

Here are some scenarios when one may want to add a local response normalization layer to a CNN:

Large-scale datasets: LRN is more effective on large-scale datasets as it helps the network generalize better and reduce overfitting. When working with large-scale datasets, adding an LRN layer can improve the performance of the network.

Convolutional layers with large receptive fields: When the receptive field of a convolutional layer is large, there is a high chance that some feature maps will respond strongly to stimuli that are irrelevant to the target object. In such cases, LRN can help the network suppress the irrelevant feature maps and promote the relevant ones.

Networks with multiple convolutional layers: When working with CNNs with multiple convolutional layers, adding an LRN layer can help the network learn more robust features by encouraging competition among feature maps. This can help prevent the network from overfitting to the training data.

Image classification tasks: LRN has been shown to be particularly effective in improving the performance of CNNs on image classification tasks. It can help the network learn more discriminative features, which in turn can improve its classification accuracy.

Overall, adding a local response normalization layer to a CNN can be useful in scenarios where there is a need to improve the generalization performance of the network or promote competition among feature maps. However, the decision to add an LRN layer will depend on the specific requirements of the model and the data.

#### Q6. Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?

**Ans.** 
AlexNet:

* AlexNet was one of the first successful deep convolutional neural networks for image classification.
* It had 5 convolutional layers and 3 fully connected layers.
* AlexNet used Rectified Linear Units (ReLU) activation function instead of the traditional sigmoid activation function used in LeNet-5.
* It also used data augmentation techniques such as cropping and flipping to reduce overfitting and increase the size of the training dataset.
* AlexNet used dropout regularization to further reduce overfitting.
* AlexNet was trained on two GPUs simultaneously, which reduced the training time significantly.

GoogLeNet:

* GoogLeNet introduced the concept of Inception modules, which allowed for the efficient use of multiple filter sizes in a single layer.
* It had 22 layers and was the first network to exceed human-level performance on the ImageNet classification task.
* GoogLeNet also used global average pooling instead of fully connected layers, which significantly reduced the number of parameters in the network.
* It used a 1x1 convolutional layer to reduce the dimensionality of the feature maps before applying more computationally expensive convolutional layers.

ResNet:

* ResNet introduced the concept of residual blocks, which allowed for the training of very deep networks without suffering from the problem of vanishing gradients.
* It had up to 152 layers and achieved state-of-the-art performance on several computer vision tasks.
* ResNet used skip connections to connect earlier layers directly to later layers, which helped to propagate gradients more effectively during training.
* It also used batch normalization to accelerate training and improve generalization.

SENet:

* SENet introduced the concept of Squeeze-and-Excitation (SE) blocks, which allowed the network to adaptively recalibrate the importance of each feature map based on the global context of the image.
* It achieved state-of-the-art performance on several computer vision tasks.
* SENet used a global average pooling layer to reduce the dimensionality of the feature maps before applying the SE blocks.
* It also used residual connections to help propagate gradients more effectively during training.

Xception:

* Xception introduced the concept of depthwise separable convolutions, which separate the spatial filtering and channel-wise filtering operations.
* It had fewer parameters and was more computationally efficient than other state-of-the-art models while achieving comparable or better performance.
* Xception used residual connections and batch normalization to accelerate training and improve generalization.
* It also used global average pooling instead of fully connected layers to reduce the number of parameters in the network.

These are just a few of the innovations introduced in each of these models. There have been many other contributions to the field of deep learning for computer vision, each building upon the successes of the previous models.

#### Q7. What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?

**Ans.**  FCN is a network that does not contain any “Dense” layers (as in traditional CNNs) instead it contains 1x1 convolutions that perform the task of fully connected layers (Dense layers).
A fully convolution network can be built by simply replacing the FC layers with there equivalent Conv layers. In the example of VGG16 we can do so by first removing the last four layers. One way to do so is to pop layers from the model. In the model stack, each popping will remove the last layer.

#### Q8. What is the main technical difficulty of semantic segmentation?

**Ans.** Semantic Segmentation follows three steps:
* Classifying: Classifying a certain object in the image.
* Localizing: Finding the object and drawing a bounding box around it.
* Segmentation: Grouping the pixels in a localized image by creating a segmentation mask.

The task of Semantic Segmentation can be referred to as classifying a certain class of image and separating it from the rest of the image classes by overlaying it with a segmentation mask.It can also be thought of as the classification of images at a pixel level.The goal is simply to take an image and generate an output such that it contains a segmentation map where the pixel value (from 0 to 255) of the input image is transformed into a class label value (0, 1, 2, … n).


#### Q9. Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.

**Ans.** 


In [1]:
import tensorflow as tf
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add a channel dimension to the images
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

# Define the model architecture
model = tf.keras.Sequential([
    layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10)
])

# Compile the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model
history = model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [2]:
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(x_test,  y_test, verbose=2)
print('\nTest accuracy:', test_acc)

313/313 - 2s - loss: 0.0233 - accuracy: 0.9929 - 2s/epoch - 8ms/step

Test accuracy: 0.992900013923645


#### Q10. Use transfer learning for large image classification, going through these steps:
    a. Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).
    b. Split it into a training set, a validation set, and a test set.
    c. Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.
    d. Fine-tune a pretrained model on this dataset.
**Ans.** 


In [6]:
!pip install tensorflow-datasets

Collecting install
  Downloading install-1.3.5-py3-none-any.whl (3.2 kB)
Collecting tensorflow-datasets
  Downloading tensorflow_datasets-4.9.2-py3-none-any.whl (5.4 MB)
     ---------------------------------------- 5.4/5.4 MB 5.1 MB/s eta 0:00:00
Collecting dm-tree
  Using cached dm_tree-0.1.8-cp310-cp310-win_amd64.whl (101 kB)
Collecting protobuf>=3.20
  Using cached protobuf-4.22.3-cp310-abi3-win_amd64.whl (420 kB)
Collecting promise
  Using cached promise-2.3.tar.gz (19 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting tensorflow-metadata
  Using cached tensorflow_metadata-1.13.1-py3-none-any.whl (28 kB)
Collecting etils[enp,epath]>=0.9.0
  Using cached etils-1.2.0-py3-none-any.whl (120 kB)
Collecting array-record
  Using cached array_record-0.2.0-py310-none-any.whl (3.0 MB)
Collecting googleapis-common-protos<2,>=1.52.0
  Downloading googleapis_common_protos-1.59.0-py2.py3-none-any.whl (223 kB)
     ---------------

ERROR: Could not install packages due to an OSError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\HappySoul\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\tree\\__init__.py'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 23.0.1 -> 23.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting tensorflow-datasets
  Using cached tensorflow_datasets-4.9.2-py3-none-any.whl (5.4 MB)
Collecting protobuf>=3.20
  Using cached protobuf-4.22.3-cp310-abi3-win_amd64.whl (420 kB)
Collecting tensorflow-metadata
  Using cached tensorflow_metadata-1.13.1-py3-none-any.whl (28 kB)
Collecting dm-tree
  Using cached dm_tree-0.1.8-cp310-cp310-win_amd64.whl (101 kB)
Collecting etils[enp,epath]>=0.9.0
  Using cached etils-1.2.0-py3-none-any.whl (120 kB)
Collecting array-record
  Using cached array_record-0.2.0-py310-none-any.whl (3.0 MB)
Collecting promise
  Using cached promise-2.3.tar.gz (19 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting googleapis-common-protos<2,>=1.52.0
  Using cached googleapis_common_protos-1.59.0-py2.py3-none-any.whl (223 kB)
Building wheels for collected packages: promise
  Building wheel for promise (setup.py): started
  Building wheel for promise (setup.py): finished with status 'done'
  C

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\HappySoul\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\google\\~rotobuf\\internal\\_api_implementation.cp310-win_amd64.pyd'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 23.0.1 -> 23.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [8]:
#  a

import tensorflow_datasets as tfds
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.applications.inception_v3 import InceptionV3

(train_ds, val_ds, test_ds), info = tfds.load('cats_vs_dogs',
                                             split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
                                             with_info=True,
                                             as_supervised=True)

IMG_SIZE = 224
BATCH_SIZE = 32

def preprocess_image(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_crop(image, size=(IMG_SIZE, IMG_SIZE, 3))
    return image, label

train_ds = train_ds.map(preprocess_image).shuffle(1000).batch(BATCH_SIZE)
val_ds = val_ds.map(preprocess_image).batch(BATCH_SIZE)
test_ds = test_ds.map(preprocess_image).batch(BATCH_SIZE)

base_model = InceptionV3(include_top=False, weights='imagenet', input_shape=(IMG_SIZE, IMG_SIZE, 3))

for layer in base_model.layers:
    layer.trainable = False

x = GlobalAveragePooling2D()(base_model.output)
x = Dense(2, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=x)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history = model.fit(train_ds, epochs=5, validation_data=val_ds)

test_loss, test_acc = model.evaluate(test_ds)
print('Test accuracy:', test_acc)

AssertionError: Duplicate registrations for type 'experimentalOptimizer'

In [None]:
#  b

(train_ds, val_ds, test_ds), info = tfds.load('cats_vs_dogs',
                                             split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
                                             with_info=True,
                                             as_supervised=True)


In [None]:
#  c

import tensorflow as tf
import tensorflow_datasets as tfds

# Load the dataset
(train_ds, val_ds, test_ds), info = tfds.load('cats_vs_dogs',
                                             split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
                                             with_info=True,
                                             as_supervised=True)

# Define preprocessing functions
IMG_SIZE = 224
NUM_CLASSES = 2

def preprocess_image(image, label):
    # Resize the image to the input size of the model
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    # Convert the pixel values to the range [0, 1]
    image = tf.cast(image, tf.float32) / 255.0
    return image, label

def augment_image(image, label):
    # Randomly flip the image horizontally
    image = tf.image.random_flip_left_right(image)
    # Randomly adjust the brightness of the image
    image = tf.image.random_brightness(image, max_delta=0.1)
    return image, label

# Create the training dataset
train_ds = train_ds.shuffle(10000)
train_ds = train_ds.map(preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.map(augment_image, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.batch(batch_size=32)
train_ds = train_ds.prefetch(tf.data.AUTOTUNE)

# Create the validation dataset
val_ds = val_ds.map(preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.batch(batch_size=32)
val_ds = val_ds.prefetch(tf.data.AUTOTUNE)

# Create the test dataset
test_ds = test_ds.map(preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
test_ds = test_ds.batch(batch_size=32)
test_ds = test_ds.prefetch(tf.data.AUTOTUNE)

In [None]:
#  d

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.applications import MobileNetV2

# Load the dataset
(train_ds, val_ds, test_ds), info = tfds.load('cats_vs_dogs',
                                             split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
                                             with_info=True,
                                             as_supervised=True)

# Define preprocessing functions
IMG_SIZE = 224
NUM_CLASSES = 2

def preprocess_image(image, label):
    # Resize the image to the input size of the model
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    # Convert the pixel values to the range [-1, 1]
    image = tf.cast(image, tf.float32) / 127.5 - 1.0
    return image, label

# Apply preprocessing to the datasets
train_ds = train_ds.map(preprocess_image).shuffle(1000).batch(32).prefetch(tf.data.AUTOTUNE)
val_ds = val_ds.map(preprocess_image).batch(32).prefetch(tf.data.AUTOTUNE)
test_ds = test_ds.map(preprocess_image).batch(32)

# Load the pre-trained MobileNetV2 model
base_model = MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False, weights='imagenet')

# Freeze the base model's layers
for layer in base_model.layers:
    layer.trainable = False
    
# Add a new classification layer on top of the base model
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
output_layer = tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')(global_average_layer)

# Create the fine-tuned model
model = tf.keras.models.Model(inputs=base_model.input, outputs=output_layer)

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model on the training set
history = model.fit(train_ds, epochs=10, validation_data=val_ds)

In [None]:
# Evaluate the model on the test set
loss, accuracy = model.evaluate(test_ds)
print(f'Test loss: {loss}, Test accuracy: {accuracy}')