# Convolutional Neural Networks and Image Classification

The image classification problem is the problem of assigning a label to an image. For example, we might want to assign the label "duck" to pictures of ducks, the label "frog" to pictures of frogs, and so on. 

In this lecture, we'll introduce some of the most important tools for image classification: convolutional neural networks. Major parts of this lecture are based on the "Images" tutorial [here](https://www.tensorflow.org/tutorials/images/cnn). 

In [None]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np

## Getting Data

For this lecture, we'll use a subset of the [CIFAR-10 data set](https://www.cs.toronto.edu/~kriz/cifar.html). This data set can be conveniently accessed using a method from `tensorflow.keras.datasets`: 

There are 50,000 training images and 10,000 test images. Each image has 32x32 pixels, and there are three color "channels" -- red, green, and blue. 

There are 10 classes of image, encoded by the `labels` arrays. 

Each class corresponds to a type of object: 

In [None]:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

Let's take a look. 

## Convolution

Convolution is a mathematical operation commonly used to extract *features* (meaningful properties) from images. The idea of image convolution is pretty simple. We define a *kernel* matrix containing some numbers, and we "slide it over" the input data. At each location, we multiply the data values by the kernel matrix values, and add them together. Here's an illustrative diagram: 

![](https://d2l.ai/_images/correlation.svg)

*Image from [Dive Into Deep Learning](https://d2l.ai/chapter_convolutional-neural-networks/conv-layer.html)*

The value of 19 in the output is obtained in this example by computing $0 \times 0 + 1 \times 1 + 3 \times 2 + 4 \times 3 = 19$. 

This operation might seem either abstract or trivial, but it can be used to extract useful image features. For example, let's manually define a kernel and use it to perform "edge detection" in a greyscale image. 

Observe that the convolved image (right) has darker patches corresponding to the distinct "edges" in the image, where darker colors meet lighter colors. 

## Learning Kernels

There are lots of convolutional kernels we could potentially use. How do we know which ones are meaningful? In practice, we don't. So, we treat them as parameters, and learn them from data as part of the model fitting process. This is exactly what the `Conv2d` layer allows us to do. 

Let's compare a few other possibilities: 

These features may or may not be informative -- they are purely random! We can try to learn informative features by embedding these kernels in a model and optimizing. 

## Building a Model

The most common approach is to alternate `Conv2D` layers with `MaxPooling2D` layers. Pooling layers act as "summaries" that reduce the size of the data at each step. After we're done doing "2D stuff" to the data, we then need to `Flatten` the data from 2d to 1d in order to pass it through the final `Dense` layers, which form the prediction. 

Let's train our model and see how it does! 

After just a few rounds of training, our model is able to guess the image label more than 50% of the time on the unseen validation data, which is relatively impressive considering that there are 10 possibilities. 

Note: the training process can often be considerably accelerated by training on a GPU. A limited amount of free GPU power is available via Google Colab, and is illustrated [here](https://colab.research.google.com/notebooks/gpu.ipynb). 

## Extracting Predictions

Let's see how our model did on the test data: 

We'll plot these predicted labels along side the (true labels). 

Overall, these results look fairly reasonable. There are plenty of mistakes, but it does look like the places where the model made errors are authentically somewhat confusing. A more complex or powerful model would potentially be able to do noticeably better on this data set. 

# Visualizing Learned Features 

It's possible to define a separate model that allows us to study the features learned by the model. These are often called *activations*. We create this model by simply asserting that the model outputs are equal to the outputs of the first convolutional layer. For this we use the `models.Model` class rather than the `models.Sequential` class, which is more convenient but less flexible. 

Now we can compute the activations

And visualize them! 

While one has to be careful about over-interpreting these activations, it looks like some of the features correspond to edge detection (like we saw above), while others correspond to highlighting different patches of colors, enabling, for example, separation of the horse's body from the background. 