In [None]:
import tensorflow as tf
tf.test.is_built_with_cuda()

# Increase the accuracy of Computer Vision with convolutional networks

In the previous workshop, you saw how to recognize clothes through a neural network made up of 3 layers. You were able to experiment the impact of the different parameters of the model such as the number of neurons in the hidden layer, the number of epochs, etc. on the final accuracy.

We have given you the previous code for a little reminder. Run the next cell and retain the displayed accuracy at the end of the workout.

In [None]:
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

training_images=training_images / 255.0
test_images=test_images / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(training_images, training_labels, epochs=5)
test_loss = model.evaluate(test_images, test_labels)

Your accuracy is probably 89% on the train game and 87% on the validation game... not too bad... But how can you improve this score? We can use convolutions. We won't go into too much detail here, but the concept of convolutional neural networks is to learn to detect specific patterns in the content of an image.

If you have ever done image processing using a filter (like here: https://en.wikipedia.org/wiki/Kernel_(image_processing), then you will be very familiar with convolutions.

In short, you take a filter (usually 3x3 or 5x5) and pass it over the image. By modifying the underlying pixels according to the formula of this matrix-represented filter, you can perform operations such as edge detection. So, for example, if you look at the link above, you'll see that for a 3x3 filter set for edge detection, the middle cell is set to 8 and all of its neighbors to -1. In this case, for each pixel, you will multiply its value by 8, then subtract the value of each neighbor. Doing this for each pixel will result in a new image with improved edges.

This is perfect for computer vision, because often the features that define an object only represent a part of the whole image, and the information we need is much weaker than all the pixels in the image. image. This concept then allows us to focus only on features that are highlighted.

By adding convolution layers before your dense layers, the information provided to the dense layers is much more focused, and potentially more accurate.

**Exercise:**
Add convolution layers to the previous code, and see the impact there will be on the accuracy.<br>
You must achieve a minimum of 92% accuracy on train data and 91% on validation data.

**Hints**:
- You have 60,000 images of size 28\*28\*1
- Doc [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D)
- Doc [MaxPooling2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D)

In [None]:
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0

model = tf.keras.models.Sequential([
  ### Start of code ### (≈ 4 lines of code)

  ### End du code ###
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

model.fit(training_images, training_labels, epochs=5)
test_loss = model.evaluate(test_images, test_labels)


You should have achieved close to 93% accucacy on the train data and 91% on the validation data. That's significant, you're headed in the right direction!

Try running the code on more epochs, like 20 for example, and see the results! But although the train results are getting better and better, the validation results tend to decrease, due to overfitting.

Review the code above, then watch step-by-step how the convolutions are constructed:

The first step is to recover the data. You will see that there are some changes as the train data needs to be reshaped.\
Indeed, the first convolution expects a batch dimension.\
Here, since we are not working with batches, we will simply reshape our current values into a batch of size 1.\
So instead of `60000*28*28`, we have to provide a shape of 60000x28x28x1, and the same for the test images.\
Otherwise, we will get an error during training because the convolution layers failed to recognize the shapes.

```py
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1) ## adding a batch dimension to the training set
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1) ## adding a batch dimension to the test set
test_images=test_images/255.0
```

The next step is to define your model. Now instead of giving your input layer first, you need to add a convolution layer first. The parameters are as follows:
1. The number of convolutions you want to generate. This number is completely arbitrary, but you should usually start with a number that is a multiple of 32.
2. The size of the convolution, in our case a 3x3 grid
3. The activation function, in our case the ReLU which we recall, is equivalent to x when x > 0 otherwise 0
4. In the first layer, the input data shape.

You will then follow with a layer of MaxPooling which serves to compress the image, keeping the most important aspects determined by the convolution. By specifying (2,2) for the MaxPooling, the effect will be to divide the size of the image by 4. Without going into too much detail, the idea is that it creates an array of 2x2 pixels, grabs the largest value from that array, and turns those 4 pixels into 1 pixel. Iteratively through the whole image, the number of horizontal and vertical pixels will each be divided by 2, which will reduce the image by 25%.

You can call the model.summary() method to visualize the size and shape of your network, and you will see that after each layer of MaxPooling, the size of the image is divided.

```
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
```

Then add another convolution
```
   tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
   tf.keras.layers.MaxPooling2D(2,2)
```

And now flatten the output. After that you will have exactly the same neural network structure as the non convolution version

```
  tf.keras.layers.Flatten(),
```



The same dense 128, and 10 layers for the output as in the pre-convolution example.


```
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
```



Now you can compile your model, use the fit method for training, and evaluate loss and accuracy using the validation set.

```
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(test_acc)
```


# Visualization of convolutions and pooling

This code will allow you to visualize the convolutions graphically. The `print(test_labels[;30])` shows you the first 30 actual labels in the test set, and you can see that those at indexes 0, 23, and 28 have the same value (9). It's all about shoes. Take a look at the result of the convolution on each of these indexes, and you'll start to see some features that they have in common coming out.

Feel free to change the index values to view different images.

In [None]:
print(test_labels[:30])

In [None]:
import matplotlib.pyplot as plt

from tensorflow.keras import models

%matplotlib inline
f, axarr = plt.subplots(3,4)

FIRST_IMAGE=0
SECOND_IMAGE=23
THIRD_IMAGE=28
CONVOLUTION_NUMBER = 1

layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)

for x in range(0,4):
  f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[0,x].grid(False)
  f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[1,x].grid(False)
  f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[2,x].grid(False)

**Exercises:**

1. Try to modify the convolutions. Change the 32s to 16 or 64. What impact will this have on accuracy and/or training time?

2. Remove the final convolution. What impact will this have on accuracy or training time?

3. Why not add more convolution? What impact will this have on accuracy or training time? Experiment.

4. Remove all convolutions except the first? What will be the impact? Experiment.

5. In the previous workshop you implemented a callback to check your loss function and stop the training as soon as a certain value is reached. Try to see if you can implement it here!

Head over to `2.mnist.ipynb` to try it yourself !