# Assignment_6

## 1.	What are the advantages of a CNN over a fully connected DNN for image classification?

These are the main advantages of a CNN over a fully connected
DNN for image classification:

Because consecutive layers are only partially connected
and because it heavily reuses its weights, a CNN has
many fewer parameters than a fully connected DNN,
which makes it much faster to train, reduces the risk of
overfitting, and requires much less training data.

When a CNN has learned a kernel that can detect a
particular feature, it can detect that feature anywhere in
the image. In contrast, when a DNN learns a feature in
one location, it can detect it only in that particular
location. Since images typically have very repetitive
features, CNNs are able to generalize much better than
DNNs for image processing tasks such as classification,
using fewer training examples.

Finally, a DNN has no prior knowledge of how pixels are
organized; it does not know that nearby pixels are close.
A CNN’s architecture embeds this prior knowledge.
Lower layers typically identify features in small areas of
the images, while higher layers combine the lower-level
features into larger features. This works well with most
natural images, giving CNNs a decisive head start
compared to DNNs.

## 2.	Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels.

## What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?


Let’s compute how many parameters the CNN has. Since its first
convolutional layer has 3 × 3 kernels, and the input has three
channels (red, green, and blue), each feature map has 3 × 3 × 3
weights, plus a bias term. That’s 28 parameters per feature map.
Since this first convolutional layer has 100 feature maps, it has a
total of 2,800 parameters. The second convolutional layer has 3 ×
3 kernels and its input is the set of 100 feature maps of the
previous layer, so each feature map has 3 × 3 × 100 = 900
weights, plus a bias term. Since it has 200 feature maps, this layer
has 901 × 200 = 180,200 parameters. Finally, the third and last
convolutional layer also has 3 × 3 kernels, and its input is the set
of 200 feature maps of the previous layers, so each feature map
has 3 × 3 × 200 = 1,800 weights, plus a bias term. Since it has 400
feature maps, this layer has a total of 1,801 × 400 = 720,400
parameters. All in all, the CNN has 2,800 + 180,200 + 720,400 =
903,400 parameters.

Now let’s compute how much RAM this neural network will
require (at least) when making a prediction for a single instance.
First let’s compute the feature map size for each layer. Since we
are using a stride of 2 and "same" padding, the horizontal and
vertical dimensions of the feature maps are divided by 2 at each
layer (rounding up if necessary). So, as the input channels are 200
× 300 pixels, the first layer’s feature maps are 100 × 150, the
second layer’s feature maps are 50 × 75, and the third layer’s
feature maps are 25 × 38. Since 32 bits is 4 bytes and the first
convolutional layer has 100 feature maps, this first layer takes up
4 × 100 × 150 × 100 = 6 million bytes (6 MB). The second layer
takes up 4 × 50 × 75 × 200 = 3 million bytes (3 MB). Finally, the
third layer takes up 4 × 25 × 38 × 400 = 1,520,000 bytes (about
1.5 MB). However, once a layer has been computed, the memory
occupied by the previous layer can be released, so if everything is
well optimized, only 6 + 3 = 9 million bytes (9 MB) of RAM will
be required (when the second layer has just been computed, but
the memory occupied by the first layer has not been released yet).
But wait, you also need to add the memory occupied by the CNN’s
parameters! We computed earlier that it has 903,400 parameters,
each using up 4 bytes, so this adds 3,613,600 bytes (about 3.6
MB). The total RAM required is therefore (at least) 12,613,600
bytes (about 12.6 MB).

Lastly, let’s compute the minimum amount of RAM required
when training the CNN on a mini-batch of 50 images. During
training TensorFlow uses backpropagation, which requires
keeping all values computed during the forward pass until the
reverse pass begins. So we must compute the total RAM required
by all layers for a single instance and multiply that by 50. At this
point, let’s start counting in megabytes rather than bytes. We
computed before that the three layers require respectively 6, 3,
and 1.5 MB for each instance. That’s a total of 10.5 MB per
instance, so for 50 instances the total RAM required is 525 MB.
Add to that the RAM required by the input images, which is 50 ×
4 × 200 × 300 × 3 = 36 million bytes (36 MB), plus the RAM
required for the model parameters, which is about 3.6 MB
(computed earlier), plus some RAM for the gradients (we will
neglect this since it can be released gradually as backpropagation
goes down the layers during the reverse pass). We are up to a total
of roughly 525 + 36 + 3.6 = 564.6 MB, and that’s really an
optimistic bare minimum.

# 3.	If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?

 If your GPU runs out of memory while training a CNN, here are
five things you could try to solve the problem (other than
purchasing a GPU with more RAM):

Reduce the mini-batch size.

Reduce dimensionality using a larger stride in one or
more layers.

Remove one or more layers.

Use 16-bit floats instead of 32-bit floats.

Distribute the CNN across multiple devices.


# 4.	Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?

A max pooling layer has no parameters at all, whereas a
convolutional layer has quite a few

# 5.	When would you want to add a local response normalization layer?

A local response normalization layer makes the neurons that most
strongly activate inhibit neurons at the same location but in
neighboring feature maps, which encourages different feature
maps to specialize and pushes them apart, forcing them to explore
a wider range of features. It is typically used in the lower layers to
have a larger pool of low-level features that the upper layers can
build upon.


# 6.	Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?

A local response normalization layer makes the neurons that most
strongly activate inhibit neurons at the same location but in
neighboring feature maps, which encourages different feature
maps to specialize and pushes them apart, forcing them to explore
a wider range of features. It is typically used in the lower layers to
have a larger pool of low-level features that the upper layers can
build upon.


# 7.	What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?

Fully convolutional networks are neural networks composed
exclusively of convolutional and pooling layers. FCNs can
efficiently process images of any width and height (at least above
the minimum size). They are most useful for object detection and
semantic segmentation because they only need to look at the
image once (instead of having to run a CNN multiple times on
different parts of the image). If you have a CNN with some dense
layers on top, you can convert these dense layers to convolutional
layers to create an FCN: just replace the lowest dense layer with a
convolutional layer with a kernel size equal to the layer’s input
size, with one filter per neuron in the dense layer, and using
"valid" padding. Generally the stride should be 1, but you can
set it to a higher value if you want. The activation function should
be the same as the dense layer’s. The other dense layers should be
converted the same way, but using 1 × 1 filters. It is actually
possible to convert a trained CNN this way by appropriately
reshaping the dense layers’ weight matrices.

# 8.	What is the main technical difficulty of semantic segmentation?

 The main technical difficulty of semantic segmentation is the fact
that a lot of the spatial information gets lost in a CNN as the
signal flows through each layer, especially in pooling layers and
layers with a stride greater than 1. This spatial information needs
to be restored somehow to accurately predict the class of each
pixel.

# 9.	Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.

In [1]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Is this notebook running on Colab or Kaggle?
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# TensorFlow ≥2.0 is required
import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")
    if IS_KAGGLE:
        print("Go to Settings > Accelerator and select GPU.")

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)
tf.random.set_seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "cnn"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

No GPU was detected. CNNs can be very slow without a GPU.


In [2]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train_full = X_train_full / 255.
X_test = X_test / 255.
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

X_train = X_train[..., np.newaxis]
X_valid = X_valid[..., np.newaxis]
X_test = X_test[..., np.newaxis]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [3]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(128, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam",
              metrics=["accuracy"])

model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.029122035950422287, 0.9919000267982483]

# 10.	Use transfer learning for large image classification, going through these steps:

a.	Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).

In [13]:
import sys
import tarfile
import urllib.request

FLOWERS_URL = "http://download.tensorflow.org/example_images/flower_photos.tgz"
FLOWERS_PATH = os.path.join("datasets", "flowers")

def fetch_flowers(url=FLOWERS_URL, path=FLOWERS_PATH):
    if os.path.exists(FLOWERS_PATH):
        return
    os.makedirs(path, exist_ok=True)
    tgz_path = os.path.join(path, "flower_photos.tgz")
    urllib.request.urlretrieve(url, tgz_path, reporthook=download_progress)
    flowers_tgz = tarfile.open(tgz_path)
    flowers_tgz.extractall(path=path)
    flowers_tgz.close()
    os.remove(tgz_path)

In [14]:
fetch_flowers()

In [None]:
flowers_root_path = os.path.join(FLOWERS_PATH, "flower_photos")
flower_classes = sorted([dirname for dirname in os.listdir(flowers_root_path)
                  if os.path.isdir(os.path.join(flowers_root_path, dirname))])
flower_classes

In [None]:
from collections import defaultdict

image_paths = defaultdict(list)

for flower_class in flower_classes:
    image_dir = os.path.join(flowers_root_path, flower_class)
    for filepath in os.listdir(image_dir):
        if filepath.endswith(".jpg"):
            image_paths[flower_class].append(os.path.join(image_dir, filepath))

In [None]:
for paths in image_paths.values():
    paths.sort()    

In [None]:
import matplotlib.image as mpimg

n_examples_per_class = 2

for flower_class in flower_classes:
    print("Class:", flower_class)
    plt.figure(figsize=(10,5))
    for index, example_image_path in enumerate(image_paths[flower_class][:n_examples_per_class]):
        example_image = mpimg.imread(example_image_path)[:, :, :channels]
        plt.subplot(100 + n_examples_per_class * 10 + index + 1)
        plt.title("{}x{}".format(example_image.shape[1], example_image.shape[0]))
        plt.imshow(example_image)
        plt.axis("off")
    plt.show()

b.	Split it into a training set, a validation set, and a test set.

In [None]:
import tensorflow_datasets as tfds

dataset, info = tfds.load("tf_flowers", as_supervised=True, with_info=True)

In [None]:
info.splits

In [None]:
info.splits["train"]

In [None]:
class_names = info.features["label"].names
class_names

In [None]:
n_classes = info.features["label"].num_classes

In [None]:
dataset_size = info.splits["train"].num_examples
dataset_size

In [None]:
test_set_raw, valid_set_raw, train_set_raw = tfds.load(
    "tf_flowers",
    split=["train[:10%]", "train[10%:25%]", "train[25%:]"],
    as_supervised=True)

In [None]:
plt.figure(figsize=(12, 10))
index = 0
for image, label in train_set_raw.take(9):
    index += 1
    plt.subplot(3, 3, index)
    plt.imshow(image)
    plt.title("Class: {}".format(class_names[label]))
    plt.axis("off")

plt.show()

c.	Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.

In [None]:
def preprocess(image, label):
    resized_image = tf.image.resize(image, [224, 224])
    final_image = keras.applications.xception.preprocess_input(resized_image)
    return final_image, label

In [None]:
def central_crop(image):
    shape = tf.shape(image)
    min_dim = tf.reduce_min([shape[0], shape[1]])
    top_crop = (shape[0] - min_dim) // 4
    bottom_crop = shape[0] - top_crop
    left_crop = (shape[1] - min_dim) // 4
    right_crop = shape[1] - left_crop
    return image[top_crop:bottom_crop, left_crop:right_crop]

def random_crop(image):
    shape = tf.shape(image)
    min_dim = tf.reduce_min([shape[0], shape[1]]) * 90 // 100
    return tf.image.random_crop(image, [min_dim, min_dim, 3])

def preprocess(image, label, randomize=False):
    if randomize:
        cropped_image = random_crop(image)
        cropped_image = tf.image.random_flip_left_right(cropped_image)
    else:
        cropped_image = central_crop(image)
    resized_image = tf.image.resize(cropped_image, [224, 224])
    final_image = keras.applications.xception.preprocess_input(resized_image)
    return final_image, label

batch_size = 32
train_set = train_set_raw.shuffle(1000).repeat()
train_set = train_set.map(partial(preprocess, randomize=True)).batch(batch_size).prefetch(1)
valid_set = valid_set_raw.map(preprocess).batch(batch_size).prefetch(1)
test_set = test_set_raw.map(preprocess).batch(batch_size).prefetch(1)

In [None]:
plt.figure(figsize=(12, 12))
for X_batch, y_batch in train_set.take(1):
    for index in range(9):
        plt.subplot(3, 3, index + 1)
        plt.imshow(X_batch[index] / 2 + 0.5)
        plt.title("Class: {}".format(class_names[y_batch[index]]))
        plt.axis("off")

plt.show()

In [None]:
plt.figure(figsize=(12, 12))
for X_batch, y_batch in test_set.take(1):
    for index in range(9):
        plt.subplot(3, 3, index + 1)
        plt.imshow(X_batch[index] / 2 + 0.5)
        plt.title("Class: {}".format(class_names[y_batch[index]]))
        plt.axis("off")

plt.show()

d.	Fine-tune a pretrained model on this dataset.

In [None]:
base_model = keras.applications.xception.Xception(weights="imagenet",
                                                  include_top=False)
avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
output = keras.layers.Dense(n_classes, activation="softmax")(avg)
model = keras.models.Model(inputs=base_model.input, outputs=output)

In [None]:
for index, layer in enumerate(base_model.layers):
    print(index, layer.name)

In [None]:
for layer in base_model.layers:
    layer.trainable = False

optimizer = keras.optimizers.SGD(learning_rate=0.2, momentum=0.9, decay=0.01)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_set,
                    steps_per_epoch=int(0.75 * dataset_size / batch_size),
                    validation_data=valid_set,
                    validation_steps=int(0.15 * dataset_size / batch_size),
                    epochs=5)

In [None]:
for layer in base_model.layers:
    layer.trainable = True

optimizer = keras.optimizers.SGD(learning_rate=0.01, momentum=0.9,
                                 nesterov=True, decay=0.001)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(train_set,
                    steps_per_epoch=int(0.75 * dataset_size / batch_size),
                    validation_data=valid_set,
                    validation_steps=int(0.15 * dataset_size / batch_size),
                    epochs=40)