#### What are the advantages of a CNN over a fully connected DNN for image classification?

- As CNN layers are partially connected and it heavily reuses its weights, a CNN has away fewer parameters than a fully connected DNN. Which makes it much faster to train and reduces risk of overfitting and require much less training data
- CNN use the fact that nearby pixels are useful to detect useful features in a image. 
- The convolution layer extract similar types of features at different location of image. Each next layer extract more complex features which can be used to detect more complex objects.
- as CNN detects similar features in different location of image, so detection of object will be independent of its location in image. But DNN detects a feature in particular location only

#### Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and SAME padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels. What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?

In [12]:
## number of parameters for each instance
## each term is (kernel size * map size of previous layer+1)*map size of current layer
n_params = 100*(3*3*3+1)+200*(3*3*100+1)+400*(3*3*200+1)

## in each next layer image size is halfed in both with and height
## total size of feature map size in each layer is feature size * no of maps
n_feature_maps = 200*300*3 + (200*300/4)*100 + (200*300/16)*200 + (200*300/(16*4))*400


print('n_params: ', n_params )
print('n_feature_maps: ', n_feature_maps)

params_size = n_params*4/(1024*1024)
feature_maps_size = n_feature_maps*4/(1024*1024)


print('memory for params: ', params_size)
print('memory per instance: ', params_size+feature_maps_size)
print('memory per batch: ', params_size+50*feature_maps_size)

n_params:  903400
n_feature_maps:  2805000.0
memory for params:  3.446197509765625
memory per instance:  14.14642333984375
memory per batch:  538.4574890136719


#### If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?

- Reduce batch size
- Run each kernel of convolution layer in parallel
- add pooling layer to reduce map size
- apply strides along maps to skip some maps
- User larger strides to reduce feature dimension
- Use 16bit float instead of 32

#### Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?

- polling layer don't have trainable parameters
- pooling layer is computationally less intensive than convolutional layer
- It doesn't change number of feature maps

#### When would you want to add a local response normalization layer?

- local response normalization layer inhibit low activated neuron connection by highly activated connection for same location of feature but on different map
- This force each map to specialize to certain type of feature making each map different. Which help to increase range of features explored by the system

#### Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet and ResNet?

**AlexNet above LeNet5**
- It is much deeper
- It stacks convolutional layers on top of each other without pooling layer in between

**GoogLeNet**
- introduces inceptions modules with helped it go deeper with fewer parameters

**ResNet**
- introduces skip connections which made it possible to go beyond 100 layers
- It's architecture is simple

In [5]:
from collections import Counter

ls = 'top of each other without pooling layer in between'.split()

counter = Counter([s[0] for s in ls])
counter.most_common(5)

[('o', 2), ('t', 1), ('e', 1), ('w', 1), ('p', 1)]

In [None]:
counter = Counter