# Compute Vision: the Deep Learning breakthrough

Around 2012, Deep Convolutional Neural Networks broke all the records in Computer Vision competitions. This paved the wave to the Deep Learning *era*.

Let's start from the basics! We will load a dataset of images.

The [`CIFAR-10`](https://www.cs.toronto.edu/~kriz/cifar.html) dataset! Btw, if you are doing deep learning you **have** to know [MNIST dataset](https://keras.io/api/datasets/mnist/).  
However, since I overfit on that, we will work we the not-less-famous CIFAR :) 


Note: you can use [`load_img`](https://keras.io/api/preprocessing/image/#loadimg-function) and [`img_to_array`](https://keras.io/api/preprocessing/image/#imgtoarray-function) to load images from file (into PIL images) and to convert them into numpy array.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow import keras as K
from tensorflow.keras.datasets import cifar10

In [None]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Let's see what we have got!

In [None]:
print(x_train.shape, y_train.shape, np.min(y_train), np.max(y_train))
print(x_train.dtype, np.min(x_train), np.max(x_train))

Ok, so a bunch of 32x32 images with three (RGB) channels. Targets are in [0, 9] (i.e. 10 classes). 

We don't like very much these large integer values, though. Let's rescale them!

In [None]:
from tensorflow.keras.layers.experimental.preprocessing import Rescaling
rescale = Rescaling(1./255)
rescaled_image = rescale(x_train[0])
print(np.min(rescaled_image), np.max(rescaled_image))
print("Yay!")

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.imshow(x_train[0])
plt.axis('off')

A beautiful image. What do you see?

Let's see if a neural network is better than you at this task.

## Convolutional Neural Network: the hammer for every nails

Here comes the "monster".  
I'm not providing you a good model, just a model.  
The good model is for you to find!

GPU strongly suggested.

In [None]:
def make_model(input_shape, num_classes):
    """
    Modified from here:
    https://keras.io/examples/vision/image_classification_from_scratch/#introduction
    """

    rescaling = Rescaling(1./255)
    inputs = K.Input(shape=input_shape)
    x = rescaling(inputs)


    x = K.layers.Conv2D(32, 3, strides=2, padding="same")(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.Activation("relu")(x)

    x = K.layers.Conv2D(64, 3, padding="same")(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.Activation("relu")(x)

    x = K.layers.Conv2D(128, 3, padding="same")(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.Activation("relu")(x)
    x = K.layers.MaxPool2D(3)(x)

    x = K.layers.Conv2D(256, 3, padding="same")(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.Activation("relu")(x)
    x = K.layers.MaxPooling2D(3, strides=2, padding="same")(x)

    # keep only batch size and num channels dimension
    # pool over spatial dimensions
    x = K.layers.GlobalAveragePooling2D()(x) 
    x = K.layers.Dropout(0.5)(x)
    outputs = K.layers.Dense(num_classes, activation="softmax")(x)
    return K.Model(inputs, outputs)

In [None]:
model = make_model(input_shape=(32, 32, 3), num_classes=10)
K.utils.plot_model(model, show_shapes=True)

In [None]:
model.compile(optimizer=K.optimizers.Adam(1e-3), loss="sparse_categorical_crossentropy", metrics=["accuracy"])

In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.25)

What about a [simpler neural network](https://keras.io/examples/vision/mnist_convnet/)? 

In [None]:
metrics = model.evaluate(x_test, y_test)
print(metrics)

**Exercise**: your turn! Design a convolutional neural network and try it out! 

You can work with MNIST which is a simpler dataset. Training should take less time. However, please consider that those images have only 1 channel (gray-scaled). Try it out :)

## Data Augmentation

Gathering a dataset of images costs! If you need more images you can augment your dataset with transformations. In the end, a rotated cat is still a cat, isn't it?

In [None]:
from tensorflow.keras.layers.experimental.preprocessing import RandomRotation, RandomTranslation

rot = RandomRotation(factor=(-0.2, 0.2))
tra = RandomTranslation(height_factor=(-0.2, 0.2), width_factor=(-0.2, 0.2))

In [None]:
# preprocess images before plotting
# ensure batch size exist before preprocessing and do not before plot
toplot = [rot(np.expand_dims(x_train[0], axis=0))[0] for _ in range(3)] + \
 [tra(np.expand_dims(x_train[0], axis=0))[0] for _ in range(2)]
toplot.append(x_train[0]) # append also real image for comparison

toplot = iter(toplot) # convert to iterator to not deal with indices

fig, ax = plt.subplots(nrows=2, ncols=3)
for i in range(2):
  for j in range(3):
    ax[i, j].imshow(next(toplot))
    ax[i, j].axis('off')


You can plug these layers into a Keras model as usual. If you do this way, the layer automatically apply data augmentation only **during training**. In fact, you don't want to augment your data during evaluation.

**Exercise**: try to use data augmentation on your model!

You can also try to use the [`ImageDataGenerator`](https://keras.io/api/preprocessing/image/#imagedatagenerator-class).

## Leverage pretrained networks

If a network is trained on a large collection of images, it already contains valuable knowledge which can be reused "for free".  
You can take an existing network and **finetuning** it on your data. This way, you also need less data! 

In [None]:
from tensorflow.keras.applications import EfficientNetB0


# RESCALING BY 255 IS INCLUDED IN EFFICIENT NET
model = EfficientNetB0(weights="imagenet", include_top=True) # (224, 224, 3)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
help(EfficientNetB0)

In [None]:
model.summary()

In [None]:
# model.predict(x_train) # this raises an error

In [None]:
from tensorflow.keras.layers.experimental.preprocessing import Resizing

# resize with interpolation
# better than padding
res = Resizing(width=224, height=224) 
x_train_efficient = res(x_train[:2]) / 255
print(x_train_efficient.shape)
y_train_efficient = y_train[:2]
plt.imshow(x_train_efficient[0])
plt.axis('off')

In [None]:
pred = model.predict(x_train_efficient)
print(tf.argmax(pred, axis=1), pred.shape)

We are working with different classes with respect to the ones the model have been trained on!

In [None]:
def get_model2():
  eff = EfficientNetB0(weights="imagenet", include_top=False)
  eff.trainable = False

  model = tf.keras.Sequential()
  model.add(K.layers.Input(shape=(32, 32, 3)))
  model.add(Resizing(width=224, height=224))
  model.add(eff)
  model.add(K.layers.Flatten())
  model.add(K.layers.Dense(10, activation="softmax"))
  return model

In [None]:
model = get_model2()
# you can use larger learning rate
model.compile(optimizer=K.optimizers.Adam(1e-2), loss="sparse_categorical_crossentropy", metrics=['accuracy'])
model.summary()

In [None]:
model.fit(x_train, y_train, epochs=2, batch_size=256, validation_split=0.2)

In [None]:
model.evaluate(x_test, y_test)

You can also unfreeze some EfficientNet layer and proceed to finetune them with a slower learning rate

In [None]:
def unfreeze_model(model):
  for layer in model.layers[-20:]:
    if not isinstance(layer, K.layers.BatchNormalization):
      layer.trainable = True

## Inspect convolutional feature maps

In [None]:
model = K.applications.ResNet50V2(weights="imagenet", include_top=False)
model.trainable = False

In [None]:
img_width = 180
img_height = 180
layer_name = "conv3_block4_out"

layer = model.get_layer(name="conv3_block4_out")
feature_extractor = K.Model(inputs=model.inputs, outputs=layer.output)

def compute_loss(input_image, filter_index):
    activation = feature_extractor(input_image)
    filter_activation = activation[:, :, :, filter_index]
    return tf.reduce_mean(filter_activation)

@tf.function
def gradient_ascent_step(img, filter_index, learning_rate):
    with tf.GradientTape() as tape:
        tape.watch(img)
        loss = compute_loss(img, filter_index)
    grads = tape.gradient(loss, img)
    grads = tf.math.l2_normalize(grads)
    img += learning_rate * grads
    return loss, img

def initialize_image():
    img = tf.random.uniform((1, img_width, img_height, 3))
    # ResNet50V2 expects inputs in the range [-1, +1].
    # Here we scale our random inputs to [-0.125, +0.125]
    return (img - 0.5) * 0.25


def visualize_filter(filter_index):
    # We run gradient ascent for 20 steps
    iterations = 30
    learning_rate = 10
    img = initialize_image()
    for iteration in range(iterations):
        loss, img = gradient_ascent_step(img, filter_index, learning_rate)

    # Decode the resulting input image
    img = deprocess_image(img[0].numpy())
    return loss, img


def deprocess_image(img):
    # Normalize array: center on 0., ensure variance is 0.15
    img -= img.mean()
    img /= img.std() + 1e-5
    img *= 0.15

    # Clip to [0, 1]
    img += 0.5
    img = np.clip(img, 0, 1)

    # Convert to RGB array
    img *= 255
    img = np.clip(img, 0, 255).astype("uint8")
    return img


loss, img = visualize_filter(0)
print(loss, img.shape)
K.preprocessing.image.save_img("0.png", img)
plt.imshow(img)
plt.axis("off")

You can try different blocks and filters to see what they recognize

## Not only image classification

There are other, quite important, tasks. Object detection (the nice squares around faces) and image segmentation (each pixel belongs to a specific class),for example.

You can check out Keras examples: [object detection](https://keras.io/examples/vision/retinanet/#downloading-the-coco2017-dataset), [image segmentation](https://keras.io/examples/vision/oxford_pets_image_segmentation/).

If you want to try some of them out, I'd recommend the latter since it has a simpler preparation.