# Convolutional Neural Networks

**Time**
- Teaching: 1.45 hours
- Challenges: 30 minutes

**Questions**
- "What is CIFAR10 and how does the modeling process and challenges differ from MNIST?"
- "How do convolutional neural networks differ from vanilla neural networks?"
- "How do we write python code to develop convolutional neural networks?"
- "What options do we have to train large models if my personal machine can't handle it?"


**Learning Objectives**
- "Understand the process of data preprocessing for deep learning."
- "Build python functions that help us process and visualize our data."
- "Take a peek at the machinery underlying convolutional neural networks."
- "Understand the hardware constrains in deep learning."

* * * * *

## Import packages

For this notebook, instead of importing only specific functions, we will import some modules that contain functions.

**Old way:**

`from keras.layers import Dense`

`model.add(Dense(...))`

**New way:**

`from keras import layers`

`model.add(layers.Dense(...))`

But why change it up? I had trouble myself in the past understanding the way modules work, so code would break due simply to the way imports were done. Let's avoid that by getting comfortable with python modules!

In [None]:
from keras import layers
from keras import models
from keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

In [None]:
import numpy as np
import matplotlib.pyplot as plt

### CIFAR10

So we get sidetracked at work and we want to instead build a classifier for animals and vehicles!

We shop around and find an interesting image dataset called [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html).

This dataset consists of:
- 60,000 total images
- 10 classes (6,000 images per class)

Example images of each class are shown below:

![CIFAR10 classes](https://maet3608.github.io/nuts-ml/_images/cifar10.png)

### Loading the dataset

In [None]:
def load_cifar10(subset = True):
    """
    Loads a training, validation, and test set of CIFAR10 images.
    
    When subset=TRUE:
    Returns only a subset of the mnist dataset.
    Especially important to use if you are on datahub and only have 1-2GB of memory.
    """
    if subset:
        N_TRAIN = 8000
        N_VALIDATION = 2000
        N_TEST = 2000
    else:
        N_TRAIN = 40000
        N_VALIDATION = 10000
        N_TEST = 10000
    
    (x_train_and_val, y_train_and_val), (x_test, y_test) = cifar10.load_data()
    
    x_train = x_train_and_val[:N_TRAIN,:,:]
    y_train = y_train_and_val[:N_TRAIN]
    
    x_val = x_train_and_val[N_TRAIN: N_TRAIN + N_VALIDATION,:,:]
    y_val = y_train_and_val[N_TRAIN: N_TRAIN + N_VALIDATION]
    
    x_test = x_test[:N_TEST]
    y_test = y_test[:N_TEST]
    
    return x_train, y_train, x_val, y_val, x_test, y_test
    

In [None]:
x_train, y_train, x_val, y_val, x_test, y_test = load_cifar10()

### Input Data Due Dillegence

We will borrow some of our previous functions in order to get a feel for CIFAR10.

In [None]:
def data_summary(data):
    """
    Takes a list of our data partitions and returns the shape.
    """
    
    for i, data_partition in enumerate(data):
        if i == 0:
            print("Training Data")
        elif i == 2:
            print()
            print("Validation Data")
        elif i == 4:
            print()
            print("Testing Data")

        print(f"Shape: {data_partition.shape}")

In [None]:
data_summary([x_train, y_train, x_val, y_val, x_test, y_test])

In [None]:
one_image = x_train[0]
one_image.shape

In [None]:
plt.imshow(one_image);

Can you tell what the class of the image above is!?

Let's show more images with their correct class from the CIFAR10.

In [None]:
def plot_images(x, y, random=False):
    """
    Plots 25 images from x data with titles set as y.
    Set random=True if you want random images rather than the first 25.
    """
    
    if random:
        indices = np.random.choice(range(y.shape[0]), 25, replace=False)
    
    else:
        indices = np.array(range(25))
    
    fig, axes = plt.subplots(5,5, figsize = (15,15))
    axes = axes.ravel()
    
    for ax, index in zip(axes, indices):
        ax.imshow(x[index])
        ax.set_title(f"Class: {y[index][0]}", size=15)
    
    plt.tight_layout()
    
    return plt.show()

In [None]:
plot_images(x_train, y_train)

Oh no, the class labels are just digits!

## Challenge 1: Translate Classes

Create a function `translate_class()` that uses the correct class name for the target classes (truck, horse, etc..).
- Use the [keras CIFAR10 documentation](https://keras.io/api/datasets/cifar10/) as a guide to know how the classes are labeled.

In [None]:
def translate_class():
    # Your code here
    return None

In [None]:
## TODELETE SOLUTION
def translate_class(y):
    """
    Takes a class index [0-9] and returns the CIFAR10 class category.
    """
    # Create a list of categories
    categories = ["airplane", 
                 "automobile",
                 "bird",
                 "cat",
                 "deer",
                 "dog",
                 "frog",
                 "horse",
                 "ship",
                 "truck"]
    
    return categories[y]
    
    
def translate_classes_fancy(y):
    """
    Use a key-value paired dictionary to translate target class
    """
    # Create a list of categories
    categories = ["airplane", 
                 "automobile",
                 "bird",
                 "cat",
                 "deer",
                 "dog",
                 "frog",
                 "horse",
                 "ship",
                 "truck"]
    
    # Use a dictionary comprehesion to attach class number to category
    category_dict = {key : value for key, value in zip(list(range(10)), categories)}
    
    return category_dict[y]

## Challenge 2: Plotting Image Classes

Create a new function `my_imageplotter()` that uses code from our `plot_images()` function and incorporates the `translate_class()` to give us the correct class titles in our images.

In [None]:
def my_imageplotter():
    # your code here
    return None

In [None]:
## TO DELETE TEST SOLUTION

def my_imageplotter(x, y, random=False):
    """
    Plots 25 images from x data with titles set as y.
    Set random=True if you want random images rather than the first 25.
    """
    
    if random:
        indices = np.random.choice(range(y.shape[0]), 25, replace=False)
    
    else:
        indices = np.array(range(25))
    
    fig, axes = plt.subplots(5,5, figsize = (15,15))
    axes = axes.ravel()
    
    for ax, index in zip(axes, indices):
        # New line here
        title = translate_class(y[index][0])
        ax.imshow(x[index])
        ax.set_title(f"Class: {title}", size=15)
    
    plt.tight_layout()
    
    return plt.show()

Test your function below!

In [None]:
my_imageplotter(x_train, y_train)

Let's make sure we have balanced class distributions.

In [None]:
def plot_target_distributions(targets, titles):
    """
    Returns the distribution of target classes.
    """
    
    fig, axes = plt.subplots(3,1, figsize = (10,10))
    
    for ax, target, title in zip(axes, targets, titles):
        ax.hist(target) 
        ax.set_title(f"{title} Class Distribution")
    
    return plt.show()

plot_target_distributions([y_train, y_val, y_test], ["Train", "Validation", "Test"])

The last step to prepping our data for modeling is data transformation. 

In [None]:
def x_transform_one_dim(x_data):
    """
    Transforms image data to one dimension.
    """
    
    flatten = x_data.reshape((x_data.shape[0], (x_data.shape[1] * x_data.shape[2]*x_data.shape[3])))
    scaled = flatten.astype('float32') / 255
    
    return scaled
    

In [None]:
def transform_data_onedim(x_train, y_train, x_val, y_val):
    """
    Transforms training and validation image data into a single dimension and targets to categorical.
    """
    
    x = {}
    for x_data, name in zip([x_train, x_val], ["x_train", "x_val"]):
        x_trans = x_transform_one_dim(x_data)
        x[name] = x_trans
    
    y = {}
    
    for y_data, name in zip([y_train, y_val], ["y_train", "y_val"]):
        y[name] = to_categorical(y_data)
    
    return x['x_train'], y['y_train'], x['x_val'], y['y_val']

For our first model, let's build a standard (vanilla) feed-forward neural network.

In [None]:
INPUT_SHAPE_ONE_DIM = x_train.shape[1]*x_train.shape[2]*x_train.shape[3]

In [None]:
vanilla_nn = models.Sequential()
vanilla_nn.add(layers.Dense(512, activation= "relu", input_shape=(INPUT_SHAPE_ONE_DIM,)))
vanilla_nn.add(layers.Dropout(0.5))
vanilla_nn.add(layers.Dense(512, activation= "relu"))
vanilla_nn.add(layers.Dropout(0.5))
vanilla_nn.add(layers.Dense(10, activation="softmax"))

vanilla_nn.compile(optimizer= "rmsprop",
                  loss = "categorical_crossentropy",
                  metrics = ["accuracy"])

In order to save memory with data transformation, instead of creating new objects in our global environment of our transformed data, we will create a function that uses the transformation function within.

In [None]:
def transform_train_model(transformation_func, model, x_train, y_train, x_val, y_val, epochs = 20, batch_size = 128):
    """
    Takes a transformation function, a compiled model, data, and optional fitting arguments.
    
    Trains the model and returns the history.
    """
    
    # Transforms the data
    x_train_trans, y_train_trans, x_val_trans, y_val_trans = transformation_func(x_train, y_train, x_val, y_val)
    
    # Trains the model
    history = model.fit(x_train_trans, 
                              y_train_trans, 
                              epochs=epochs,
                              batch_size=batch_size,
                              validation_data=(x_val_trans, y_val_trans))
 
    return history
 

In [None]:
vanilla_history = transform_train_model(transform_data_onedim,
                                        vanilla_nn,
                                        x_train, y_train, x_val, y_val)

In [None]:
def plot_epoch_accuracy(history_dict):
    """
    Plots the training and validation accuracy of a neural network.
    """
    
    acc = history_dict['accuracy']
    val_acc = history_dict['val_accuracy']
    epochs = range(1, len(acc) + 1)
    plt.plot(epochs, acc, color = 'navy', alpha = 0.8, label='Training Accuracy')
    plt.plot(epochs, val_acc, color = 'green', label='Validation Accuracy')
    plt.title('Training and validation Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    return plt.show()

In [None]:
def plot_wrong_predictions(transformation_func, model, x_test, y_test):
    """
    Plots 25 incorrectly predicted images.
    """
    
    # Transform data
    x_test_trans = transformation_func(x_test)[:26]
    y_test_trans = to_categorical(y_test)[:26]
    
    # Format predictions and targets
    predictions = model.predict(x_test_trans)
    predicted = np.argmax(predictions, axis = 1)
    target = np.argmax(y_test_trans, axis = 1)
    
    # Set up subplots
    fig, axes = plt.subplots(5,5, figsize = (25,25))
    axes = axes.ravel()
    
    for ax, index in zip(axes, range(25)):
        ax.imshow(x_test[index], cmap=plt.cm.binary, interpolation='nearest')
        prediction_title = translate_class(predicted[index])
        target_title = translate_class(target[index])
        
        # Color title based on if prediction is correct
        if predicted[index] == target[index]:
            color = "green"
        else:
            color = "red"
            
        ax.set_title(f"Predicted {prediction_title}, Actual is {target_title}", color = color)
    
    return plt.show()

In [None]:
plot_epoch_accuracy(vanilla_history.history)

In [None]:
plot_wrong_predictions(x_transform_one_dim, vanilla_nn, x_test, y_test)

Interesting..... 

While this model architecture that gave us great results to predict the MNIST handwritten digits, it performs poorly on the CIFAR image classification task...

#### Challenge 3
1. Why do you think the vanilla neural network performs worse on CIFAR than it did on the MNIST image classification?
2. Why might we prefer a convolutional neural network to 

### Convolutional Neural Network

Let's jump right in and build a convolutional neural network!

In [None]:
def x_transform_three_dim(x_data):
    """
    Transforms image data into three dimensions.
    """
    
    scaled = x_data.astype('float32') / 255
    
    return scaled
    

In [None]:
def transform_data_threedim(x_train, y_train, x_val, y_val):
    """
    Transforms training and validation image data into a single dimension and targets to categorical.
    """
    
    x = {}
    for x_data, name in zip([x_train, x_val], ["x_train", "x_val"]):
        x_trans = x_transform_three_dim(x_data)
        x[name] = x_trans
    
    y = {}
    
    for y_data, name in zip([y_train, y_val], ["y_train", "y_val"]):
        y[name] = to_categorical(y_data)
    
    return x['x_train'], y['y_train'], x['x_val'], y['y_val']

Before writing code to build a convolutional neural network, let's quickly review what the significance a convolutional neural network is, the convolutional layer.

#### Convolutional layers
As reviewed in our slides, convolutional layers contain *filters* that we *stride* along our image to find matching patterns, producing a *response map*. 

![title](https://qph.fs.quoracdn.net/main-qimg-6428cf505ac1e9e1cf462e1ec8fe9a68)
- The green boxes represent the pixels in our input image.
- The yellow boxes represent the *filter*.
- The movement of the filter represents the *stride*.
- The red boxes represent our *response map*.

We initialize a convolutional neural network the same way we did a vanilla neural network.

In [None]:
convnet = models.Sequential()

We now add our first convolutional layer. 

*Note: I use the argument names here to clarify what parameters we are passing into our neural network including default values.*

In [None]:
convnet.add(layers.Conv2D(filters = 32,
                          kernel_size = (3, 3),
                          strides = (1,1),
                          activation= "relu",
                          input_shape=(32, 32, 3)))

We define our first Convolution layer using:
`.add(layers.Conv2D(...))`
- This adds a convolutional layer to our model object.

`filters = 32`
- Initializes 32 filters total.
- Each filter slides along the entire image.
- Each filter produce a response map.

`kernel_size = (3, 3)`
- Specifies the "kernel" size of the filters.
- (3 x 3) : (height x width)
    
`strides = (1,1)`
- The filter strides by one unit in both the horizontal and vertical dimension.
- (1 x 1) or (height stride x width stride)

`activation= "relu"`
- Uses the relu (Rectified Linear Unit) activation function on the output from striding our filter over the input.
- This simply causes any negative values to be 0, and other values to stay the same.

`input_shape=(32, 32, 3)`
- Specifies the shape of our input data
- 32 pixels wide
- 32 pixels high
- 3 channels deep (Red, Green, Blue)

#### Max Pooling layers
After a convolutional layer, we can downsample a *reponse map* using a maxpooling layer. This simply takes the maximum value for a given kernel-size sliding along our response map, much like a filter. 

Maxpooling has a dual benefit. Not only does it effectively reduce the amount of data we must process, but it also improves our focus on finding **good** matches from our filter, making our model less specific about the location where it was found (location invariance).

![title](https://nico-curti.github.io/NumPyNet/NumPyNet/images/maxpool.gif)
- The blue boxes represent our *response map*.
- The purple box represent our kernel size (2 x 2).
- The yellow box is the output of our maxpooling operation.

In [None]:
convnet.add(layers.MaxPooling2D(pool_size=(2, 2),
                               strides = None))

`.add(layers.MaxPooling2D(...))`
- Adds a maxpooling layer.

`pool_size=(2, 2)`
- Defines the "kernel" size of our maxpooling operation.
- Will return a single value from this entire window

`strides = None`
- None makes it so there is no overlap during striding, just continue to inputs not in the previous window.

We add two more convolutional layers, separated by a maxpooling layer.

In [None]:
convnet.add(layers.Conv2D(64, (3, 3), activation= "relu"))
convnet.add(layers.MaxPooling2D((2, 2)))
convnet.add(layers.Conv2D(64, (3, 3), activation= "relu"))

#### Dense layers
Until this point, the convolutional and maxpooling layers have been performing feature extraction.

In order to classify our images we return to our good old dense layer, the same we used in our vanilla neural networks. 

Before we can use our dense layer, we first flatten the data passed through the convolutional and maxpooling layers (which are high dimensional) into a single dimension.

We then use a dense layer that is able to *look* at the all of the features extracted from the convolutional operations, and optimize its weights to correctly associate these features with the correct class.

Finally we have our output layer, which contains the same number of neurons as classes we are predicting. We use a softmax activation function, giving us the probability for each class in our prediction.

In [None]:
convnet.add(layers.Flatten())
convnet.add(layers.Dense(64, activation= "relu"))
convnet.add(layers.Dense(10, activation= "softmax"))

#### Compiling
We now compile our model as we have done previously.

In [None]:
convnet.compile(optimizer= "rmsprop",
               loss= "categorical_crossentropy", 
               metrics= ["accuracy"])

Below we train our model, a quick warning that this will take a while!

In [None]:
convnet_history = transform_train_model(transform_data_threedim,
                                        convnet,
                                        x_train, y_train, x_val, y_val)

In [None]:
plot_epoch_accuracy(convnet_history.history)

In [None]:
plot_wrong_predictions(x_transform_three_dim, 
                      convnet,
                      x_test,
                      y_test)

## Challenge 4: Build your own neural network

1. Build your own convolution neural network with a custom architecture.
2. Plot the training and validation accuracy on the test set.


The neural networks we built are relatively small compared to those used in applications such as google photos or inaturalist.

Let's compare the architecture from the models we built with a popular convolutional neural network called [VGG16](https://arxiv.org/pdf/1409.1556.pdf). 

### Vanilla Neural Network

In [None]:
vanilla_nn.summary()

### Convolutional Neural Network

In [None]:
convnet.summary()

### VGG16

![title](https://alexisbcook.github.io/assets/vgg16_keras.png)

## Hardware for Deep-Learning

We are highly restricted in our ability to train larger models due to our hardware. While there are many aspects about hardware that are critical to deep learning, we will focus on one major concept, the Processing Unit.

All computers have a Central Processing Unit (CPU) that performs all of the operations your computer needs to do, including running all this python code!

For deep-learning, CPUs are not very efficient at running the massive number operations required to train deep neural networks. That's where the Graphical Proccessing Unit (GPU) comes into the picture. 

GPUs, originally designed to process many operations to render complex graphics have been repurposed for training neural networks! This key innovation behind this change came in 2007 when NVIDIA launched [CUDA](https://developer.nvidia.com/about-cuda), a programming interface for GPUs which allows highly parallelizable computations. Since many neural networks operations are parallelizable, this has become the go-to way to train large neural network models.

*Loosely based on: (Chollet 2018) Deep Learning with Python*

The code below allows you to see the available hardware (CPUs, GPUs) that can be used in tensorflow/keras.

In [None]:
from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

### Final words

So are we doomed? Can only the tech giants create deep neural networks?

Well.. sorta, but we do have alternatives, we just need to be creative! Here are some options to train your own deep neural networks that may aren't possible on your own machine:
1. Use other services

    - Google Colab offers free could instances of a Jupyter Notebook-like environment with GPUs onboard even in their free tier!
    - Amazon Web Services (AWS) allows you to use their hardware in "EC2 instances" to train your models offerring a [dizzying number of options](https://aws.amazon.com/ec2/instance-types/). Note that this can be expensive and is billed by time!
    
2. Transfer learning
    - Why build your model from the ground up?
    - Transfer learning uses an existing model with all it's tuned weights, and retrains just the final portion of the model to a new task.