# Keras Functional Model & Reducing Overfitting: Programming Practice

COSC 410: Applied Machine Learning\
Colgate University\
*Prof. Apthorpe*

## Overview

This notebook will give you practice with the following topics:
  1. Creating and training FNNs using the Keras Functional Model
  2. Using early stopping, regularization, and dropout to reduce overfitting

We will be using the CIFAR-10 dataset. The description of the dataset is here: https://www.cs.toronto.edu/%7Ekriz/cifar.html

## Part 1. Data Import

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

import tensorflow as tf
from tensorflow import keras as ks

import os
import datetime

np.random.seed(0) # set random seeds so everyone gets same results
tf.random.set_seed(1)

In [2]:
# Load CIFAR-10 data
cifar10 = ks.datasets.cifar10
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

# Create a list with the class names
class_names = ["Airplane", "Automobile", "Bird", "Cat", "Deer", "Dog", "Frog", "Horse", "Ship", "Truck"]

We should check the shape of the data and balance of the classes:

In [3]:
# Print dataframe shapes
print(train_images.shape)
print(test_images.shape)
print(len(train_labels))
print(len(test_labels))

# Print distribution classes in training and test data
print(np.unique(train_labels, return_counts=True)[1])
print(np.unique(test_labels, return_counts=True)[1])

(50000, 32, 32, 3)
(10000, 32, 32, 3)
50000
10000
[5000 5000 5000 5000 5000 5000 5000 5000 5000 5000]
[1000 1000 1000 1000 1000 1000 1000 1000 1000 1000]


## Part 2. Creating a FNN using the Functional Model

The Keras **Functional API** allows you to create neural networks more complicated than possible using the `Sequential` class. The documentation for the Functional API is here: https://keras.io/guides/functional_api/.

Like the `Sequential` class, the Functional API requires that you create layer objects representating the elements of your neural network (e.g. `Layers.Dense`). Although we didn't see this when using `Sequential`, these layer objects can be used as functions to build a network by passing earlier layers as arguments of successive layers.

### Part 2.1. Creating Layers and Specifying Architecture

We will create a "wide and deep" network with one "deep" path through several hidden layers and one "wide" path from the input directly to the output layer.

Networks created with the Functional API start with one (or more) `Input` layers that specify the size of the data: https://keras.io/api/layers/core_layers/input/

In [6]:
# Create an input layer
input_layer = ks.layers.Input(shape=[32,32,3])

We next create the flatten layer and the batch normalization layer, using function calls to indicate how information flows through the network

In [7]:
# Create a Flatten layer that gets input from the Input layer
flatten_layer = ks.layers.Flatten()(input_layer)

# Create a Batch Normalization layer that gets input from the Flatten layer
norm_layer = ks.layers.BatchNormalization()(flatten_layer)

We then create the hidden layers for the "deep" path

In [8]:
# Create three hidden layers with ReLU activation functions
h1 = ks.layers.Dense(128, activation='relu')(norm_layer)
h2 = ks.layers.Dense(128, activation='relu')(h1)
h3 = ks.layers.Dense(128, activation='relu')(h2)


Then we create the "wide" part of the network, which takes the flattened normalized input and directly concatenates it with the output of the deep part of the network:

In [9]:
# Create Concatenate layer that combines the output of h3 with flattened normalized input
concat_layer = ks.layers.Concatenate()([norm_layer, h3])

Finally, we add the single output layer that produces the class probabilities

In [10]:
# Create output layer with softmax activation function
output_layer = ks.layers.Dense(10, activation='softmax')(concat_layer)

### Part 2.2. Creating, compiling, and training the model 

After creating the layers and specifying the architecture using function calls, we create the `Model` object. 

We need to specify the input(s) and output(s) of the model when we create the object. The rest of the architecture is already set from the function calls when the layers were created.

In [11]:
# Create a model using the layers from above
model = ks.Model(inputs=input_layer, outputs=output_layer)

Once the `Model` object is created, we can examine it using `.summary()` 

In [13]:
# Print a summary of the model
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 32, 32, 3)]  0           []                               
                                                                                                  
 flatten (Flatten)              (None, 3072)         0           ['input_1[0][0]']                
                                                                                                  
 batch_normalization (BatchNorm  (None, 3072)        12288       ['flatten[0][0]']                
 alization)                                                                                       
                                                                                                  
 dense_1 (Dense)                (None, 128)          393344      ['batch_normalization[0][0]']

It can also be helpful to print the arrow graph of more complicated networks using `plot_model`

In [14]:
# Plot network - This requires pydot and GraphViz
ks.utils.plot_model(model, show_shapes=True, show_dtype=False, show_layer_names=True, rankdir="TB", expand_nested=False, dpi=96)

You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model/model_to_dot to work.


We also need to `.compile()` the model the same way as we would using the Sequential API. All of the possible arguments to `.compile()` that we saw in class last week are still available when using the Functional API (e.g. the `optimizer`, `loss`, and `metrics` keyword arguments).

In [15]:
# Compile the model
model.compile(loss="sparse_categorical_crossentropy", metrics=["accuracy"])

**Discussion:** Why did we choose *sparse categorical crossentropy* as our loss function? Why didn't we choose *categorical crossentropy*?

Finally, we train the model using the `.fit()` method. Again, the required and optional arguments to `.fit()` are the same as if the model were created with the Sequential API.

In [17]:
# Train the model
model.fit(train_images, train_labels, epochs=10, batch_size=100, validation_split=0.15)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x17fac767b80>

## Part 3. Reducing Overfitting

### Part 3.1. Baseline Model

First, we'll create a simple FNN with 3 hidden layers and no overfitting prevention using the Sequential model. 

**Note:** Instead of using `.add()`, we will pass all the layers directly to the `Sequential` constructor as a list. 

In [None]:
model = ks.models.Sequential([
    ks.layers.Flatten(input_shape=[32, 32, 3]),
    ks.layers.BatchNormalization(),
    ks.layers.Dense(128, activation="relu"),
    ks.layers.Dense(64, activation="relu"),
    ks.layers.Dense(32, activation="relu"),
    ks.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(train_images, train_labels, batch_size=100, epochs=10, validation_split=0.15)

### Part 3.2. Activation Functions & Initializers

Next, we'll try an ELU activation function and a He weight initialization instead of the ReLU activation and Glorot initialization we have been using. These hyperparameter options are recommended by the textbook, but like any hyperparameter setting, it's worth comparing performance experimentally.  Let's see if it makes a difference for the CIFAR-10 classification task

Since the performance of the model is about the same, We'll stick with the original activation and initializer settings going forward. 

### Part 3.3. Early Stopping

Thus far, we have been manually watching the training to see when the validation error plateaus. We can configure Keras to do this automatically using the `EarlyStopping` class (documentation here: https://keras.io/api/callbacks/early_stopping/). We specify 

1. We want the early stopping to be based on the validation loss `"val_loss"`
2. We want training to stop when the validation loss has not improved (`min_delta=0`) for 5 epochs (`patience=5`)
3. We want the model to be "rolled back" to the end of the epoch with the best validation performance (`restore_best_weights=True`)

The `EarlyStopping` object gets passed to the model as a callback in `.fit()`

### Part 3.4. Regularization

Next, we'll try using L1 regularization via the `kernel_regularizer` keyword argument of our `Dense` layers. The documentation for all regularizer options provided by Keras is here: https://keras.io/api/layers/regularizers/

### Part 3.5. Dropout

We apply dropout by adding Dropout layers that specify the dropout rate (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout).

## Part 7. Training Set Augmentation

Tensorflow conveniently provides layers that perform training set augmentation on image data (https://www.tensorflow.org/guide/keras/preprocessing_layers#image_data_augmentation). We can add these directly to our model. By default, these layers are only active during *training* and deactivated during *prediction*, just as we want for training set augmentation. As always, be sure to check the documentation because each of these layers have other hyperparameters that you can adjust in the constructor.