<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_4_early_stop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 3: Introduction to TensorFlow**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction [[Video]](https://www.youtube.com/watch?v=zYnI4iWRmpc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_1_neural_net.ipynb)
* Part 3.2: Introduction to Tensorflow and Keras [[Video]](https://www.youtube.com/watch?v=PsE73jk55cE&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_2_keras.ipynb)
* Part 3.3: Saving and Loading a Keras Neural Network [[Video]](https://www.youtube.com/watch?v=-9QfbGM1qGw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_3_save_load.ipynb)
* **Part 3.4: Early Stopping in Keras to Prevent Overfitting** [[Video]](https://www.youtube.com/watch?v=m1LNunuI2fk&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_4_early_stop.ipynb)
* Part 3.5: Extracting Weights and Manual Calculation [[Video]](https://www.youtube.com/watch?v=7PWgx16kH8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_5_weights.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.

In [1]:
try:
    %tensorflow_version 2.x
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

Note: not using Google CoLab


# Part 3.4: Early Stopping in Keras to Prevent Overfitting

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize, as demonstrated in Figure 3.OVER. 

**Figure 3.OVER: Training vs Validation Error for Overfitting**
![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

There are several different ways that these sets can be constructed.  The following programs demonstrate some of these.

The first method is a training and validation set.  The training data are used to train the neural network until the validation set no longer improves.  This attempts to stop at a near optimal training point.  This method will only give accurate "out of sample" predictions for the validation set, this is usually 20% or so of the data.  The predictions for the training data will be overly optimistic, as these were the data that the neural network was trained on.  Figure 3.VAL demonstrates how a dataset is divided.

**Figure 3.VAL: Training with a Validation Set**
![Training with a Validation Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_train_val.png "Training with a Validation Set")

### Early Stopping with Classification

In [2]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])

# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output
model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, 
        verbose=1, mode='auto', restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor],verbose=2,epochs=1000)


Train on 112 samples, validate on 38 samples
Epoch 1/1000
112/112 - 0s - loss: 1.1940 - val_loss: 1.1126
Epoch 2/1000
112/112 - 0s - loss: 1.0545 - val_loss: 0.9984
Epoch 3/1000
112/112 - 0s - loss: 0.9533 - val_loss: 0.9130
Epoch 4/1000
112/112 - 0s - loss: 0.8823 - val_loss: 0.8365
Epoch 5/1000
112/112 - 0s - loss: 0.8243 - val_loss: 0.7619
Epoch 6/1000
112/112 - 0s - loss: 0.7592 - val_loss: 0.7059
Epoch 7/1000
112/112 - 0s - loss: 0.7142 - val_loss: 0.6644
Epoch 8/1000
112/112 - 0s - loss: 0.6788 - val_loss: 0.6302
Epoch 9/1000
112/112 - 0s - loss: 0.6481 - val_loss: 0.5979
Epoch 10/1000
112/112 - 0s - loss: 0.6198 - val_loss: 0.5698
Epoch 11/1000
112/112 - 0s - loss: 0.5957 - val_loss: 0.5434
Epoch 12/1000
112/112 - 0s - loss: 0.5738 - val_loss: 0.5189
Epoch 13/1000
112/112 - 0s - loss: 0.5539 - val_loss: 0.4964
Epoch 14/1000
112/112 - 0s - loss: 0.5344 - val_loss: 0.4771
Epoch 15/1000
112/112 - 0s - loss: 0.5177 - val_loss: 0.4601
Epoch 16/1000
112/112 - 0s - loss: 0.5022 - val_l

<tensorflow.python.keras.callbacks.History at 0x22a9ad34708>

There are a number of parameters that are specified to the **EarlyStopping** object. 

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

In [3]:
from sklearn.metrics import accuracy_score

pred = model.predict(x_test)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y_test,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 1.0


### Early Stopping with Regression

The following code demonstrates how we can apply early stopping to a regression problem.  The technique is similar to the early stopping for classification code that we just saw.

In [4]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv", 
    na_values=['NA', '?'])

cars = df['name']

# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x = df[['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor], verbose=2,epochs=1000)

Train on 298 samples, validate on 100 samples
Epoch 1/1000
298/298 - 0s - loss: 254618.1117 - val_loss: 104859.9187
Epoch 2/1000
298/298 - 0s - loss: 53735.2417 - val_loss: 10033.3467
Epoch 3/1000
298/298 - 0s - loss: 3456.0443 - val_loss: 2832.0205
Epoch 4/1000
298/298 - 0s - loss: 4912.1159 - val_loss: 5504.1926
Epoch 5/1000
298/298 - 0s - loss: 4154.7669 - val_loss: 2042.1780
Epoch 6/1000
298/298 - 0s - loss: 1411.5907 - val_loss: 1259.3724
Epoch 7/1000
298/298 - 0s - loss: 1189.8836 - val_loss: 1435.5145
Epoch 8/1000
298/298 - 0s - loss: 1207.4120 - val_loss: 1259.7002
Epoch 9/1000
298/298 - 0s - loss: 1069.7891 - val_loss: 1189.8975
Epoch 10/1000
298/298 - 0s - loss: 1068.2267 - val_loss: 1188.1633
Epoch 11/1000
298/298 - 0s - loss: 1068.9461 - val_loss: 1175.8650
Epoch 12/1000
298/298 - 0s - loss: 1044.6897 - val_loss: 1185.7492
Epoch 13/1000
298/298 - 0s - loss: 1056.0984 - val_loss: 1178.5605
Epoch 14/1000
298/298 - 0s - loss: 1041.7714 - val_loss: 1157.2365
Epoch 15/1000
298/2

Epoch 126/1000
298/298 - 0s - loss: 217.8098 - val_loss: 213.9518
Epoch 127/1000
298/298 - 0s - loss: 214.3937 - val_loss: 210.5598
Epoch 128/1000
298/298 - 0s - loss: 210.2760 - val_loss: 205.6227
Epoch 129/1000
298/298 - 0s - loss: 206.5413 - val_loss: 202.4728
Epoch 130/1000
298/298 - 0s - loss: 202.3109 - val_loss: 197.9401
Epoch 131/1000
298/298 - 0s - loss: 199.8272 - val_loss: 196.1144
Epoch 132/1000
298/298 - 0s - loss: 197.1229 - val_loss: 190.0905
Epoch 133/1000
298/298 - 0s - loss: 192.5514 - val_loss: 186.7910
Epoch 134/1000
298/298 - 0s - loss: 189.2665 - val_loss: 184.1961
Epoch 135/1000
298/298 - 0s - loss: 185.1848 - val_loss: 179.9203
Epoch 136/1000
298/298 - 0s - loss: 186.1516 - val_loss: 176.2954
Epoch 137/1000
298/298 - 0s - loss: 182.4030 - val_loss: 173.4539
Epoch 138/1000
298/298 - 0s - loss: 177.4716 - val_loss: 169.6453
Epoch 139/1000
298/298 - 0s - loss: 173.9908 - val_loss: 166.0001
Epoch 140/1000
298/298 - 0s - loss: 173.2805 - val_loss: 162.8689
Epoch 141/

Epoch 253/1000
298/298 - 0s - loss: 52.4647 - val_loss: 46.1836
Epoch 254/1000
298/298 - 0s - loss: 49.0224 - val_loss: 40.2575
Epoch 255/1000
298/298 - 0s - loss: 50.8724 - val_loss: 40.5554
Epoch 256/1000
298/298 - 0s - loss: 48.6178 - val_loss: 40.2881
Epoch 257/1000
298/298 - 0s - loss: 48.1621 - val_loss: 40.1415
Epoch 258/1000
298/298 - 0s - loss: 47.9184 - val_loss: 39.6353
Epoch 259/1000
298/298 - 0s - loss: 47.7817 - val_loss: 44.1131
Epoch 260/1000
298/298 - 0s - loss: 48.0547 - val_loss: 38.6934
Epoch 261/1000
298/298 - 0s - loss: 49.1476 - val_loss: 38.5595
Epoch 262/1000
298/298 - 0s - loss: 48.3410 - val_loss: 38.4703
Epoch 263/1000
298/298 - 0s - loss: 47.1575 - val_loss: 43.8495
Epoch 264/1000
298/298 - 0s - loss: 47.5766 - val_loss: 37.7489
Epoch 265/1000
298/298 - 0s - loss: 45.9611 - val_loss: 37.8400
Epoch 266/1000
298/298 - 0s - loss: 45.3411 - val_loss: 37.4187
Epoch 267/1000
298/298 - 0s - loss: 44.8844 - val_loss: 40.0926
Epoch 268/1000
298/298 - 0s - loss: 45.0

<tensorflow.python.keras.callbacks.History at 0x22a9acc8608>

Finally, we evaluate the error.

In [5]:
# Measure RMSE error.  RMSE is common for regression.
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 5.291219300799398
