# Coding with Convolutional Neural Networks

- **Convolutions**: Use filters to extract information from images
- **Pooling**: Can reduce and compress the images without losing the vital information that was extracted by the filters

In previous labs, to classify fashion images and handwriting, we defined a model architecture like this. We used primarily dense layers for densely connected neurons.

In [2]:
import tensorflow as tf
from tensorflow import keras

model = keras.models.Sequential([
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])

Now to use convolutions and pooling we use the following layers on top of the previous architecture.
- For Convolution : Conv2D
- For Pooling: MaxPooling2D

The new architecture would look like this:

In [4]:
model = keras.models.Sequential([
    keras.layers.Conv2D(64, (3,3), activation='relu', 
                        input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D(2,2),
    keras.layers.Conv2D(64, (3,3), activation='relu'),
    keras.layers.MaxPooling2D(2,2),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

Observations:
1. Convolution layer takes a number of parameter
```
Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1))
```
- 64 is the number of filters for this layer. These filters will be randomly initialized. The best filters to match the pictures to their labels will be learned over time.
- (3,3) is the size of the filter. _(kernel matrix size)_
- (28,28,1) is the input shape of the images being fed in. 28x28 pixels with one byte color-depth.
  
2. Similarly pooling is done with the following layer
```
MaxPooling2D(2, 2)
```
- 2 by 2 is the size of the chunks to pool.
- There is also MinPooling, AveragePooling, but here we only focus on MaxPooling.

The group of these 2 layers are stacked on top of each other.
For example above, the results of the 64 filters from the top layer will each be pooled, and then their results will each be filtered 64 times, and they ofcourse will get pooled again.

To get a sense of how the data changes when it goes through the network we use ```model.summary()```

In [5]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_2 (Conv2D)           (None, 26, 26, 64)        640       
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 13, 13, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_3 (Conv2D)           (None, 11, 11, 64)        36928     
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 5, 5, 64)          0         
 g2D)                                                            
                                                                 
 flatten_3 (Flatten)         (None, 1600)              0         
                                                                 
 dense_6 (Dense)             (None, 128)              

Lets unpack this:
1. ```(None, 26, 26, 64)``` 
   - The input image size is 28x28, so why is the output size 26x26.
   We saw previously that, (3x3) filters operate on the middle value and require neighbours around the pixel. Hence, the convolution operation begins at the second pixel of second row. Thus, the first row, last row, fist column and last column are not included in the convoluted image. 
   - 64 is the number of filters. Hence the output has 64 images of 26x26 dimensions each.
   - 640 params, because each filter learns 9 weights and 1 bias, so 10 params per filter.
2. ```(None, 13, 13, 64)```
   - Our pooling reduced the dimensionality by half on each axis, so 26x26 becomes 13x13.
   - No parameters are learned in this layer.
3. ```(None, 11, 11, 64)```
   - 3x3 filter reduces 13x13 to 11x11, by removing a pizel border like before.
   - How are the params 36928?
4. ```(None, 5, 5, 64)```
   - MaxPooling again halves the dimensions, rounding down. So we end up with 5 by 5 images.
   - At this point we have 64 filters and images are 5by5. Multiplying it all we get 1600.
5. ```(None, 1600)```
   - Flatten takes these 64 images of 5by5, and make a 1D array.

These set of 1600 values are classified using a Dense network like before.