### 1. What are the advantages of a CNN for image classification over a completely linked DNN?


The main advantage of CNN compared to its predecessors is that it automatically detects the important features without any human supervision. For example, given many pictures of cats and dogs it learns distinctive features for each class by itself. CNN is also computationally efficient.

Because consecutive layers are only partially connected and because it heavily reuses its weights, a CNN has many fewer parameters than a fully connected DNN, which makes it much faster to train, reduces the risk of overfitting, and requires much less training data.

Deep NN is just a deep neural network, with a lot of layers. It can be CNN, or just a plain multilayer perceptron. CNN, or convolutional neural network, is a neural network using convolution layer and pooling layer.

What is the biggest advantage utilizing CNN? Little dependence on pre processing, decreasing the needs of human effort developing its functionalities. It is easy to understand and fast to implement. It has the highest accuracy among all alghoritms that predicts images.

CNNs are trained to identify and extract the best features from the images for the problem at hand. That is their main strength. The latter layers of a CNN are fully connected because of their strength as a classifier. So these two architectures aren't competing though as you may think as CNNs incorporate FC layers.

### 2. Consider a CNN with three convolutional layers, each of which has three kernels, a stride of two, and SAME padding. The bottom layer generates 100 function maps, the middle layer 200, and the top layer 400. RGB images with a size of 200 x 300 pixels are used as input. How many criteria does the CNN have in total? How much RAM would this network need when making a single instance prediction if we're using 32-bit floats? What if you were to practice on a batch of 50 images?


 Kernels = 3 x 3
     Inputs Channels = 3
     No of filters = 1 

    Input = 100  (i.e feature maps from previous layer)
    kernel = 3 x 3
    total weights = 3 * 3 * 100 = 900 + 1 (bias) = 901
   So, 1 feature map has 901 weights, then 200 feature maps has 901 * 200 = 180200 parameters


     one computation from the previous question (4 * 903,400 = 3.4 MB) and
     other computation was 6 + 9 = 15 million bytes (about 14.3 MB) 


### 3. What are five things you might do to fix the problem if your GPU runs out of memory while training a CNN?


If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem? Reduce the mini-batch size. Reduce dimensionality using a larger stride in one or more layers. Remove one or more layers.

### 4. Why would you use a max pooling layer instead with a convolutional layer of the same stride?


After a convolution operation we usually perform pooling to reduce the dimensionality. This enables us to reduce the number of parameters, which both shortens the training time and combats overfitting. Pooling layers downsample each feature map independently, reducing the height and width, keeping the depth intact.

### 5. When would a local response normalization layer be useful?


Local Response Normalization (LRN) was first introduced in AlexNet architecture where the activation function used was ReLU as opposed to the more common tanh and sigmoid at that time. Apart from the reason mentioned above, the reason for using LRN was to encourage lateral inhibition

### 6. In comparison to LeNet-5, what are the main innovations in AlexNet? What about GoogLeNet and ResNet's core innovations?


The main innovation introduced by AlexNet compared to the LeNet-5 was its sheer size. AlexNet main elements are the same: a sequence of convolutional and pooling layers followed by a couple of fully-connected layers.

The main novelty in the architecture of GoogLeNet is the introduction of a particular module called Inception.

The core innovation of ResNet is introducing a so-called “identity shortcut connection” that skips one or more layers,

### 7. On MNIST, build your own CNN and strive to achieve the best possible accuracy.

In [None]:
# example of loading the mnist dataset
from tensorflow.keras.datasets import mnist
from matplotlib import pyplot as plt
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))
# plot first few images
for i in range(9):
	# define subplot
	plt.subplot(330 + 1 + i)
	# plot raw pixel data
	plt.imshow(trainX[i], cmap=plt.get_cmap('gray'))
# show the figure
plt.show()

In [None]:
Train: X=(60000, 28, 28), y=(60000,)
Test: X=(10000, 28, 28), y=(10000,)

### 8. Using Inception v3 to classify broad images. a.
Images of different animals can be downloaded. Load them in Python using the matplotlib.image.mpimg.imread() or scipy.misc.imread() functions, for example. Resize and/or crop them to 299 x 299 pixels, and make sure they only have three channels (RGB) and no transparency. The photos used to train the Inception model were preprocessed to have values ranging from -1.0 to 1.0, so make sure yours do as well.


In [None]:
# Import data

!pip install tf-slim
import os
import numpy as np
from PIL import Image
from imageio import imread
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import tf_slim as slim
from tf_slim.nets import inception
import tf_slim as slim
import cv2
import matplotlib.pyplot as plt

In [None]:
# load Data

ckpt_path = "/kaggle/input/inception_v3.ckpt"
images_path = "/kaggle/input/animals/*"
img_width = 299
img_height = 299
batch_size = 16
batch_shape = [batch_size, img_height, img_width, 3]
num_classes = 1001
predict_output = []
class_names_path = "/kaggle/input/imagenet_class_names.txt"
with open(class_names_path) as f:
    class_names = f.readlines()

In [None]:
# Create Inception v3 model

X = tf.placeholder(tf.float32, shape=batch_shape)

with slim.arg_scope(inception.inception_v3_arg_scope()):
    logits, end_points = inception.inception_v3(
        X, num_classes=num_classes, is_training=False
    )

predictions = end_points["Predictions"]
saver = tf.train.Saver(slim.get_model_variables())
Define function for loading images and resizing for sending to model for evaluation in RGB mode

In [None]:
def load_images(input_dir):
    global batch_shape
    images = np.zeros(batch_shape)
    filenames = []
    idx = 0
    batch_size = batch_shape[0]
    files = tf.gfile.Glob(input_dir)[:20]
    files.sort()
    for filepath in files:
        with tf.gfile.Open(filepath, "rb") as f:
            imgRaw = np.array(Image.fromarray(imread(f, as_gray=False, pilmode="RGB")).resize((299, 299))).astype(np.float) / 255.0
             # Images for inception classifier are normalized to be in [-1, 1] interval.
        images[idx, :, :, :] = imgRaw * 2.0 - 1.0
        filenames.append(os.path.basename(filepath))
        idx += 1
        if idx == batch_size:
            yield filenames, images
            filenames = []
            images = np.zeros(batch_shape)
            idx = 0
    if idx > 0:
        yield filenames, images

In [None]:
# Load Pre-Trained Model

session_creator = tf.train.ChiefSessionCreator(
        scaffold=tf.train.Scaffold(saver=saver),
        checkpoint_filename_with_path=ckpt_path,
        master='')

In [None]:
#Classify Images using Model

with tf.train.MonitoredSession(session_creator=session_creator) as sess:
    for filenames, images in load_images(images_path):
        labels = sess.run(predictions, feed_dict={X: images})
        for filename, label, image in zip(filenames, labels, images):
            predict_output.append([filename, label, image])

In [None]:
# Display Predicted Output

for x in predict_output:
    out_list = list(x[1])
    topPredict = sorted(range(len(out_list)), key=lambda i: out_list[i], reverse=True)[:5]
    plt.imshow((((x[2]+1)/2)*255).astype(int))
    plt.show()
    print("Filename:",x[0])
    print("Displaying the top 5 Predictions for above image:")
    for p in topPredict:
        print(class_names[p-1].strip())

### 9. Large-scale image recognition using transfer learning.
a. Make a training set of at least 100 images for each class. You might, for example, identify your own photos based on their position (beach, mountain, area, etc.) or use an existing dataset, such as the flowers dataset or MIT's places dataset (requires registration, and it is huge).
b. Create a preprocessing phase that resizes and crops the image to 299 x 299 pixels while also adding some randomness for data augmentation.
c. Using the previously trained Inception v3 model, freeze all layers up to the bottleneck layer (the last layer before output layer) and replace output layer with  appropriate number of outputs for your new classification task (e.g., the flowers dataset has five mutually exclusive classes so the output layer must have five neurons and use softmax activation function).
d. Separate the data into two sets: a training and a test set. The training set is used to train the model, and the test set is used to evaluate it.
