# Chapter 5: Conv Nets

Start with simple MNIST example

Conv nets take shapes of 

```python
(image_height, image_width, image_channels)
```

does not including the batch dimension

In [1]:
from keras import layers 
from keras import models
from keras.datasets import mnist
from keras.utils import to_categorical

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) 
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# Flatten 3D tensor into 1D
model.add(layers.Flatten())
# Reshape
model.add(layers.Dense(64, activation='relu'))
# Classify
model.add(layers.Dense(10, activation='softmax'))



Using TensorFlow backend.


Instructions for updating:
keep_dims is deprecated, use keepdims instead


In [2]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

In [3]:
# import data

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# reshape the data to what network expects
train_images = train_images.reshape((60000, 28, 28, 1))
# normalize
train_images = train_images.astype('float32') / 255
# reshape
test_images = test_images.reshape((10000, 28, 28, 1))
# normalize
test_images = test_images.astype('float32') / 255
# vectorize
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)


Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7ff89bb57278>

In [7]:
score, acc = model.evaluate(test_images, test_labels)
print('Test score:', score)
print('Test accuracy:', acc)

Test score: 0.0331606782337
Test accuracy: 0.9904


Wow! CNNs are really powerful. We got >99% accuracy on the test data. Holy mackerel.

## Conv Nets explained

### Convolutions

A convolution works by 

Step 1) sliding a window (usually 3x3) over the 3D input feature map, stopping at every location, and extracting the 3D patch of surrounding features.

Step 2) Transform 3D patch of output features into 1D vector by doing a tensor product between the 3D patch and the weights of the layer (convolution kernel).

Step 3) Reassemble all of these 1D vectors into 3D output map.

Note: Typically the output map has the same spatial dimensions as input with some exceptions.

Conv nets take a 3D tensor (height, width, channel) where the channel is the pixel value (scalar if it's black and white, and a 3 unit vector if it's RGB). 

A convolution takes a 3D tensor, extracts patches from it and applies a transformation to all of those patches.

The first convolution inputs this 3D tensors and outputs a 3D tensor. The new tensor also has a height and width as well as a channel. However, the channel is no longer the RGB or grayscale pixel value, it's now a filter. Filters store info about discovered features (e.g. edges, circles, etc) and the new width / height image show you how prevalent this feature is in the image. 

1. Filters (or features) are stored in the channel axis
2. The 2D image stored for each filter shows where a filter occurs in the image

![Image](https://s3-us-west-2.amazonaws.com/mishalaskin/dl/convnetfilter.png)


Convolutions are defined by two key parameters:

a. Size of the patches extracted from the inputs—These are typically 3 × 3 or 5 × 5. In the example, they were 3 × 3, which is a common choice.

b. Depth of the output feature map—The number of filters computed by the convolution. The example started with a depth of 32 and ended with a depth of 64.

In keras, you implement one with

```python
Conv2D(number_of_filters, (patch_height, patch_width)).
```

#### Strides

#### Padding

### MaxPooling

MaxPooling reduces the image size by scanning the image with a (typically 2X2 with stride 2) patch and taking the maximum value from the 2X2 matrix. This is done for 2 reasons:

1. Compressing the image allows a convolution filter to learn abstract features
2. it dramatically reduces the number of parameters in the model

# Practice problem: classify image with single object using ConvNet
