# Convolutional Neural Networks

In the previous section, we built and trained a simple model to classify ASL images. The model was able to learn how to correctly classify the training dataset with very high accuracy, but, it did not perform nearly as well on validation dataset. This behavior of not generalizing well to non-training data is called [overfitting](https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html), and in this section, we will introduce a popular kind of model called a convolutional neural network that is especially good for reading images and classifying them.

## Objectives

* Prep data specifically for a CNN
* Create a more sophisticated CNN model, understanding a greater variety of model layers
* Train a CNN model and observe its performance

## Loading and Preparing the Data

The below cell contains the data preprocessing techniques we learned in the previous labs. Review it and execute it before moving on:

In [2]:
import tensorflow.keras as keras
import pandas as pd

# Loading the dataset needed
fmnist = keras.datasets.fashion_mnist

# Seperate data into training and validation sets
(x_train, y_train), (x_valid, y_valid) = fmnist.load_data()

# Turn our scalar targets into binary categories
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

# Normalize our image data
x_train = x_train / 255
x_valid = x_valid / 255

## Creating a Convolutional Model

In [16]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Dense,
    Conv2D,
    MaxPool2D,
    Flatten,
    Dropout,
    BatchNormalization,
)

model = Sequential()
model.add(Conv2D(80, (3, 3), strides=1, padding="same", activation="relu",
                 input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(50, (3, 3), strides=1, padding="same", activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same", activation="relu"))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Flatten())
model.add(Dense(units=512, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(units=num_classes, activation="softmax"))

### [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D)

These are our 2D convolutional layers. Small kernels will go over the input image and detect features that are important for classification. Earlier convolutions in the model will detect simple features such as lines. Later convolutions will detect more complex features. Let's look at our first Conv2D layer:
```Python
model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same'...)
```
75 refers to the number of filters that will be learned. (3,3) refers to the size of those filters. Strides refer to the step size that the filter will take as it passes over the image. Padding refers to whether the output image that's created from the filter will match the size of the input image.

### [BatchNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization)

Like normalizing our inputs, batch normalization scales the values in the hidden layers to improve training.

### [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D)


Max pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be robust to translation (objects moving side to side), and also makes our model faster.

### [Dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout)

Dropout is a technique for preventing overfitting. Dropout randomly selects a subset of neurons and turns them off, so that they do not participate in forward or backward propagation in that particular pass. This helps to make sure that the network is robust and redundant, and does not rely on any one area to come up with answers.    

### [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten)

Flatten takes the output of one layer which is multidimensional, and flattens it into a one-dimensional array. The output is called a feature vector and will be connected to the final classification layer.

### [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)

We have seen dense layers before in our earlier models. Our first dense layer (512 units) takes the feature vector as input and learns which features will contribute to a particular classification. The second dense layer (24 units) is the final classification layer that outputs our prediction.

## Summarizing the Model

This may feel like a lot of information, but don't worry. It's not critical that to understand everything right now in order to effectively train convolutional models. Most importantly we know that they can help with extracting useful information from images, and can be used in classification tasks.

Here, we summarize the model we just created. Notice how it has fewer trainable parameters than the model in the previous notebook:

In [17]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 28, 28, 80)        800       
                                                                 
 batch_normalization_3 (Batc  (None, 28, 28, 80)       320       
 hNormalization)                                                 
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 14, 14, 80)       0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 14, 14, 50)        36050     
                                                                 
 dropout_2 (Dropout)         (None, 14, 14, 50)        0         
                                                                 
 batch_normalization_4 (Batc  (None, 14, 14, 50)      

## Compiling the Model

We'll compile the model just like before:

In [18]:
model.compile(loss="categorical_crossentropy", metrics=["accuracy"])

## Training the Model

Despite the very different model architecture, the training looks exactly the same. Run the cell below to train for 20 epochs and let's see if the accuracy improves:

In [20]:
model.fit(x_train, y_train, epochs=10, verbose=1, validation_data=(x_valid, y_valid))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa250202ad0>

## Discussion of Results

It looks like this model is significantly improved! The training accuracy is very high, and the validation accuracy has improved as well. This is a great result, as all we had to do was swap in a new model.

You may have noticed the validation accuracy jumping around. This is an indication that our model is still not generalizing perfectly. Fortunately, there's more that we can do. Let's talk about it in the next lecture.

## Summary

In this section, we utilized several new kinds of layers to implement a CNN, which performed better than the more simple model used in the last section. Hopefully the overall process of creating and training a model with prepared data is starting to become even more familiar.

## Clear the Memory
Before moving on, please execute the following cell to clear up the GPU memory. This is required to move on to the next notebook.

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, BatchNormalization

def build_and_train_cnn(x_train, y_train, x_valid, y_valid,
                         conv_layers=[(75, (3,3), 1, "same")],
                         pool_size=(2,2), pool_stride=2, pool_padding="same",
                         dense_units=512, dropout_rates=(0.2, 0.3), learning_rate=0.01,
                         epochs=10, batch_size=32, verbose=0):
    model = Sequential()
    
    # Add convolutional layers
    for i, (filters, kernel_size, stride, padding) in enumerate(conv_layers):
        if i == 0:
            model.add(Conv2D(filters, kernel_size, strides=stride, padding=padding, activation="relu",
                             input_shape=(28, 28, 1)))
        else:
            model.add(Conv2D(filters, kernel_size, strides=stride, padding=padding, activation="relu"))
        model.add(BatchNormalization())
        if i < len(dropout_rates):
            model.add(Dropout(dropout_rates[i]))
        model.add(MaxPool2D(pool_size, strides=pool_stride, padding=pool_padding))
    
    model.add(Flatten())
    model.add(Dense(units=dense_units, activation="relu"))
    model.add(Dropout(dropout_rates[-1]))
    model.add(Dense(units=10, activation="softmax"))
    
    # Compile model
    model.compile(loss="categorical_crossentropy", metrics=["accuracy"])
    
    # Train model
    history = model.fit(x_train, y_train, validation_data=(x_valid, y_valid), epochs=epochs, batch_size=batch_size, verbose=verbose)
    
    # Get highest validation accuracy
    val_accuracy = max(history.history['val_accuracy'])
    
    # Report hyperparameters and accuracy
    print(f"Conv Layers: {conv_layers}")
    print(f"Pooling: size={pool_size}, stride={pool_stride}, padding={pool_padding}")
    print(f"Dense Units: {dense_units}")
    print(f"Dropout Rates: {dropout_rates}")
    print(f"Learning Rate: {learning_rate}")
    print(f"Max Validation Accuracy: {val_accuracy:.4f}\n")
    
    return model, val_accuracy


############################################################################################################################################################
# TEST CODE - Allows me to iterate through a bunch of hyperparameters and run the training all at once, instead of manually changing them and taking 7 years

# List to store results
results = []

# Example loop through different hyperparameters
for epochs in [10, 25, 50]:
    for features in [65, 70, 75, 80, 85]:
        for kernel_size in [(2,2), (3,3), (4,4)]:
            model, val_acc = build_and_train_cnn(x_train, y_train, x_valid, y_valid, epochs=epochs, verbose=0, conv_layers=[(features, kernel_size, 1, "same")])
        
            # Store model summary as a string
            model_summary = []
            model.summary(print_fn=lambda x: model_summary.append(x))
            model_summary = "\n".join(model_summary)
        
            # Append to results
            results.append({
                "epochs": epochs,
                "val_accuracy": val_acc,
                "model_summary": model_summary
            })

# Convert to DataFrame
df_results = pd.DataFrame(results)

# Display results
print(df_results)

#build_and_train_cnn(x_train,y_train,x_valid,y_valid,[(75, (3,3), 1, "same")],(2,2),2,"same",512,(0.2,0.3),0.01,10,32)

Conv Layers: [(65, (2, 2), 1, 'same')]
Pooling: size=(2, 2), stride=2, padding=same
Dense Units: 512
Dropout Rates: (0.2, 0.3)
Learning Rate: 0.01
Max Validation Accuracy: 0.8816

Conv Layers: [(65, (3, 3), 1, 'same')]
Pooling: size=(2, 2), stride=2, padding=same
Dense Units: 512
Dropout Rates: (0.2, 0.3)
Learning Rate: 0.01
Max Validation Accuracy: 0.8789

Conv Layers: [(65, (4, 4), 1, 'same')]
Pooling: size=(2, 2), stride=2, padding=same
Dense Units: 512
Dropout Rates: (0.2, 0.3)
Learning Rate: 0.01
Max Validation Accuracy: 0.8738

Conv Layers: [(70, (2, 2), 1, 'same')]
Pooling: size=(2, 2), stride=2, padding=same
Dense Units: 512
Dropout Rates: (0.2, 0.3)
Learning Rate: 0.01
Max Validation Accuracy: 0.8805

