<a href="https://colab.research.google.com/github/ryanleeallred/DS-Unit-4-Sprint-2-Neural-Networks/blob/main/module4-Deploy/LS_DS_424_Deploy_Lecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 4, Sprint 2, Module 4*

---

# Neural Network Frameworks (Prepare)

## Learning Objectives
* <a href="#p1">Part 1</a>: Implement Regularization Strategies
* <a href="#p2">Part 2</a>: Deploy a Keras Model
* <a href="#p3">Part 3</a>: Write a Custom Callback Function (Optional)

Today's class will also focus heavily on callback objects. We will use various callbacks to monitor and manipulate our models based on data that our model produces at the end of an epoch.

> A callback is an object that can perform actions at various stages of training (e.g., at the start or end of an epoch, before or after a single batch, etc.). -- [Keras Documentation](https://keras.io/api/callbacks/)

# Regularization Strategies (Learn)

## Overview

Neural networks are highly parameterized models and can easily overfit the training data. The most salient way to combat this problem is with regularization strategies.  

![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/02/Regularization.svg/1920px-Regularization.svg.png)

There are four common ways of regularization in neural networks, which we will cover briefly.  Here's a quick summary of how to apply them: 

1. Always use EarlyStopping. This strategy will prevent your weights from being updated well past the point of their peak usefulness.
2. Use EarlyStopping, Weight Decay, and Dropout
3. Use EarlyStopping, Weight Constraint, and Dropout

Weight Decay and weight constraint accomplish similar purposes - preventing overfitting the parameters by regularizing the values. However, the mechanics are just slightly different. That's why you would not necessarily want to apply them together.

## Follow Along

### Early Stopping

In [None]:
%load_ext tensorboard

In [None]:
from tensorflow.keras.datasets import fashion_mnist

# load in our dataset 
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

In [None]:
import matplotlib.pyplot as plt

image_id = 500
plt.imshow(X_train[image_id]);

In [None]:
# normalize pixel values between 0 and 1 
max_pixel_value = 255
X_train, X_test = X_train /max_pixel_value , X_test / max_pixel_value

### Build a Neural Network that Uses EarlyStopping

In [None]:
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.layers import ReLU
import tensorflow as tf
import os

# 1) Create 2 dir for logging files 
# create 2 dir -- one for tensorboard results and one for early stopping


# 2) Instantiate the callbacks 
# instantiate a tensorboard callback object


# instantiate an early stopping callback object 
# docs: https://keras.io/api/callbacks/early_stopping/


# 3) Build the model 

# instantiate Sequential class

# flatten images 

# hidden layer 1

# act func 1

# hidden layer 2

# act func 2

# hidden layer 3

# act func 3

# output layer 

# compile model 

# fit model 



# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Clear any logs from previous runs
# !rm -rf ./logs/

In [None]:
%tensorboard --logdir logs

---

### Weight Decay (a.k.a Weight Shrinkage)

```python
Dense(64, input_dim=64,
            kernel_regularizer=regularizers.l2(0.01),
            activity_regularizer=regularizers.l1(0.01)))
```

![](https://qph.fs.quoracdn.net/main-qimg-9d0dbf8074761b541ba80543ddfc9f73.webp)

In the above image with the blue diamond and circle, remember that: 

1. The X and Y-axis represent possible values for model weights. In the case of this visualization, we have w1 and w2.
2. The red dot represents the tangent line (the point of contact) between the error surface (represented by the contour map) and the unit weights (represented by the blue shapes). 
3. The red dot also tells us the weight values at the point of contact. 
4. What determines the geometry of the blue shapes are their respective distance metrics. 
5. The norm of the weights determines where the point of contact will occur. And the norm of the weights is determined by which metric space **p** we are getting the norm equation from. 

$${\displaystyle \left\|x\right\|_{p}=\left(|x_{1}|^{p}+|x_{2}|^{p}+\dotsb +|x_{n}|^{p}\right)^{1/p}.}$$  

## Regularization Take-Aways

Almost remember that:

1. Ridge (l2) and Lasso (l1) are 2 out of possibly infinitely many ways to regularize a model by [**using a distance metric in Lp space.**](https://en.wikipedia.org/wiki/Lp_space) 

2. Both L2 and L1 are used to help prevent overfitting. 

3. **The key difference between L1 and L2** is that L1 will calculate zero-valued feature weights (i.e., **w = 0**) for a subset of features; usually, redundant information is encoded in that subset of features; mathematically, this is referred to as [**MultiCollinearity**](https://en.wikipedia.org/wiki/Multicollinearity). In contrast, L2 will shrink the value of all feature weights but rarely down to zero. 

**Take Away: L1 drops features while L2 shrinks them and keeps them.**



![](https://i.stack.imgur.com/4KSgs.png)

The above image shows us the geometry of 4 specific Lp spaces. 

### Build a Neural Net Using  $L^p$ Regularization

In [None]:
from tensorflow.keras import regularizers

# build a 3 hidden layer NN using Lp regularization and using both tensorboard and early stopping callbacks 


# YOUR CODE HERE
raise NotImplementedError()

In [None]:
model.summary()

In [None]:
%tensorboard --logdir logs

----

### Weight Constraint

```python
tf.keras.constraints.MaxNorm(
    max_value=2, axis=0
)
```

![](https://qph.fs.quoracdn.net/main-qimg-9d0dbf8074761b541ba80543ddfc9f73.webp)

The weight constraint provides a maximum value for the normalized value of these weight norm shapes (represented by the mathematical notation at the bottom of each blue shape. The subscript (or superscript) indicates which $L^p$ space that norm was calculated.

$${\displaystyle \left\|x\right\|_{p}=\left(|x_{1}|^{p}+|x_{2}|^{p}+\dotsb +|x_{n}|^{p}\right)^{1/p}.}$$  

In [None]:
from tensorflow.keras.constraints import MaxNorm

# build a 3 hidden layer NN using MaxNorm regularization and using both tensorboard and early stopping callbacks

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
%tensorboard --logdir logs

-----
### Dropout

![](https://miro.medium.com/max/981/1*EinUlWw1n8vbcLyT0zx4gw.png)

If interested, feel free to read through the original publication on [**Drop Out**](https://jmlr.org/papers/volume15/srivastava14a.old/srivastava14a.pdf). 

**Key Takeaways:** 

1. During training, dropout will probabilistically "turn off" some neurons in the layer that dropout is implemented in. 
2. All neurons are used during inference (i.e., making predictions on the test set) (i.e., no dropout is applied).
3. Dropout works best when used with MaxNorm

Here are some article excerpts on how dropout works. 

"Dropout can be interpreted as a way of regularizing a neural network by adding noise to
its hidden units." (page 2)

"Combining several models [model ensembles] is most
helpful when the individual models are different from each other and to make
neural net models different, they should either have different architectures or be trained
on different data...It prevents overfitting and
provides a way of approximately combining exponentially many different neural network
architectures efficiently." (page 2)

"Training the **norm of the incoming weight vector** at each hidden unit to be upper
bounded by a fixed constant c. In other words, if w represents the vector of weights incident
on any hidden unit, the neural network was optimized under the constraint **||w||2 ≤ c**. 

This constraint was imposed during optimization by **projecting w onto the surface of a ball of
radius c, whenever w went out of it**. This is also called **max-norm regularization** since it
implies that the maximum value that the norm of any weight can take is c. The constant
c is a tunable hyperparameter, which is determined using a validation set." (page 6)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
%tensorboard --logdir logs

## Challenge

You will apply regularization strategies inside your neural network today as you try to avoid overfitting it. 

---

# Deploy (Learn)

## Overview

You've built a dope image classification model, but it's just sitting in your Jupyter Notebook. What now? Well, you deploy to some downstream application. TensorFlow supports three ways of deploying its models:

- In-Browser with TensorFlow.js
- API with TensorFlow Serving (TFX) or another Framework
- On-Device with TensorFlow Lite

You are already familiar with deploying a model as an API from Unit 3, so we will deploy a model in browser. Both methods rely on the same core idea:
  - Save your weights and architecture information.
  - Load those parameters into the application.
  - Perform inference. 



## Follow Along

### Checkpoint
Save the latest weights of your model at the end of each epoch

In [None]:
import tensorflow as tf

cpoint = tf.keras.callbacks.ModelCheckpoint("weights_best.h5",
                                            verbose=1, 
                                            save_weights_only=True)

def create_model():

    model = tf.keras.Sequential([
          Flatten(input_shape=(28,28)),
          Dense(128),
          ReLU(negative_slope=.01),
          Dense(128),
          ReLU(negative_slope=.01),
          Dense(128),
          ReLU(negative_slope=.01),
          Dense(10, activation='softmax')
        ])

    model.compile(loss='sparse_categorical_crossentropy', optimizer='nadam', metrics=['accuracy'])

    return model

model = create_model()

model.fit(X_train, y_train, epochs=2, 
          validation_data=(X_test,y_test),
          verbose=2,
          callbacks=[cpoint])

In [None]:
model.evaluate(X_test, y_test)

In [None]:
# create a compiled model and return it 


m = create_model()
m.load_weights('./weights_best.h5')

m.summary()

In [None]:
m.evaluate(X_test, y_test)

### Save Entire Model
This method includes both the weights and architecture

In [None]:
# Create and train a new model instance.
model = create_model()
model.fit(X_train,y_train, epochs=5)

# Save the entire model as a SavedModel.
!mkdir -p saved_model
model.save('saved_model/my_model') 

Load a fresh model:

In [None]:
new_model = tf.keras.models.load_model('saved_model/my_model')

# Check its architecture
new_model.summary()

In [None]:
new_model.evaluate(X_test, y_test)

In [None]:
model.evaluate(X_test, y_test)

## Challenge

You will be expected to be able to export your model weights and architecture on the assignment.

# Custom Callbacks (Learn)

## Overview

Custom callbacks allow you to access data at any point during the training: on batch end, on epoch end, on epoch start, on batch start. Our use case today is a simple one. Let's stop training once we reach a benchmark accuracy.

## Follow Along

## Challenge

Experiment with improving our custom callback function. 