### 1.	What are the advantages of a CNN over a fully connected DNN for image classification?


Fewer parameters -> faster to train
Reuse kernel -> detect feature anywhere
Architecture embeds knowledge of neighboring pixels

Because consecutive layers are only partially connected and because it heavily reuses its weights, a CNN has many fewer parameters than a fully connected DNN, which makes it much faster to train, reduces the risk of overfitting, and requires much less training data.
When a CNN has learned a kernel that can detect a particular feature, it can detect that feature anywhere on the image. In contrast, when a DNN learns a feature in one location, it can detect it only in that particular location. Since images typically have very repetitive features, CNNs are able to generalize much better than DNNs for image processing tasks such as classification, using fewer training examples.
Finally, a DNN has no prior knowledge of how pixels are organized; it does not know that nearby pixels are close. A CNN's architecture embeds this prior knowledge. Lower layers typically identify features in small areas of the images, while higher layers combine the lower-level features into larger features. This works well with most natural images, giving CNNs a decisive head start compared to DNNs.



### 2.	Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels.
What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?


parameters

first convolutional layer kernel-size and RGB channels, plus bias: 3 * 3 * 3 + 1 = 28 output feature maps is 100: 28 * 100 = 2800
second convolutional layer kernel-size and last feature maps, plus bias: 3 * 3 * 100 + 1 = 901 output feature maps is 200: 901 * 200 = 180200
third convolutional layers kernel-size and last feautre maps, plus bias: 3 * 3 * 200 + 1 =1801 output feautre maps is 400: 1801 * 400 = 720400
Total parameters is 2800 + 180200 + 720400 = 903400

memories since 32-bit is 4 bytes

first convolutional layer one feature map size: 100 * 150 = 15000 total output: 15000 * 100 = 1,500,000
second convolutional layer one feature map size: 50 * 75 = 3,750 total output: 3750 * 200 = 750,000
third convolutional layer one feature map size: 25 * 38 = 950 total ouput: 950 * 400 = 380, 000
(1,500,000 + 750,000 + 380,000) * 4 / 1024 /1024 = 10.032 (MB) 903400 * 4 / 1024 / 1024 = 3.44 (MB) 10.032+ 3.44=13.47(MB)

### 3.	If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?


Reduce the mini-batch size.
Reduce dimensionality using a larger stride in one or more layers.
Remove one or more layers.
Use 16-bit floats instead of 32-bit floats.
Distribute the CNN across multiple devices.

### 4.	Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?


A max pooling layer has no parameters at all, whereas a convolutional layer has a lot.

### 5.	When would you want to add a local response normalization layer?


This form of normalization makes the neurons that most strongly activate inhibit neurons at the same location but in neighboring feature maps (such competitive activation has been observed in biological neurons). This encourages different feature maps to specialize, pushing them apart and forcing them to explore a wider range of features, ultimately improving generalization.
It is typically used in the lower layers to have a larger pool of low-level features that the upper layers can build upon.

### 6.	Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?


The main innovations in AlexNet compared to LeNet-5 are (1) it is much larger and deeper, and (2) it stacks convolutional layers directly on top of each other, instead of stacking a pooling layer on top of each convolutional layer.

The main innovation in GoogLeNet is the introduction of inception modules, which make it possible to have a much deeper net than previous CNN architectures, with fewer parameters.

Finally, ResNet's main innovation is the introduction of skip connections, which make it possible to go well beyond 100 layers. Arguably, its simplicity and consistency are also rather innovative.

### 7.	What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?


The fully convolutional network first uses a CNN to extract image features, then transforms the number of channels into the number of classes via a 1×1 convolutional layer, and finally transforms the height and width of the feature maps to those of the input image via the transposed convolution. In a fully convolutional network, we can use upsampling of bilinear interpolation to initialize the transposed convolutional layer.

A fully convolution network can be built by simply replacing the FC layers with there equivalent Conv layers. In the example of VGG16 we can do so by first removing the last four layers. One way to do so is to pop layers from the model. In the model stack, each popping will remove the last layer.

### 8.	What is the main technical difficulty of semantic segmentation?

Semantic Segmentation is a technique that enables us to differentiate different objects in an image. It can be considered an image classification task at a pixel level.

semantic segmentation also has a major problem specific difficulty. This difficulty is caused by an ambiguity of boundaries in image space, especially for thin objects such as poles, similar looking objects such as a road and a sidewalk and far away objects.

### 9.	Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.

The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning.

Although the dataset is effectively solved, it can be used as the basis for learning and practicing how to develop, evaluate, and use convolutional deep learning neural networks for image classification from scratch. This includes how to develop a robust test harness for estimating the performance of the model, how to explore improvements to the model, and how to save the model and later load it to make predictions on new data.

In this tutorial, you will discover how to develop a convolutional neural network for handwritten digit classification from scratch.

After completing this tutorial, you will know:

How to develop a test harness to develop a robust evaluation of a model and establish a baseline of performance for a classification task.
How to explore extensions to a baseline model to improve learning and model capacity.
How to develop a finalized model, evaluate the performance of the final model, and use it to make predictions on new images.


In [None]:
# example of loading the mnist dataset
from tensorflow.keras.datasets import mnist
from matplotlib import pyplot as plt
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))
# plot first few images
for i in range(9):
	# define subplot
	plt.subplot(330 + 1 + i)
	# plot raw pixel data
	plt.imshow(trainX[i], cmap=plt.get_cmap('gray'))
# show the figure
plt.show()

### 10.	Use transfer learning for large image classification, going through these steps:

a.	Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).

b.	Split it into a training set, a validation set, and a test set.

c.	Build the input pipeline, including the appropriate preprocessing operations, and optionally add 
data augmentation.

d.	Fine-tune a pretrained model on this dataset.



In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

In [None]:
location_photo/
  beach/
  dandelion/
  mountain/
  sunflowers/
  city/

In [None]:
mountain= list(data_dir.glob('mountain/*'))
PIL.Image.open(str(mountain[0]))

In [None]:
PIL.Image.open(str(mountain[1]))