# Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction [[Video]](https://www.youtube.com/watch?v=zYnI4iWRmpc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_1_neural_net.ipynb)
* Part 3.2: Introduction to Tensorflow and Keras [[Video]](https://www.youtube.com/watch?v=PsE73jk55cE&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_2_keras.ipynb)
* Part 3.3: Saving and Loading a Keras Neural Network [[Video]](https://www.youtube.com/watch?v=-9QfbGM1qGw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_3_save_load.ipynb)
* **Part 3.4: Early Stopping in Keras to Prevent Overfitting** [[Video]](https://www.youtube.com/watch?v=m1LNunuI2fk&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_4_early_stop.ipynb)
* Part 3.5: Extracting Weights and Manual Calculation [[Video]](https://www.youtube.com/watch?v=7PWgx16kH8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_5_weights.ipynb)

# Part 3.4: Early Stopping in Keras to Prevent Overfitting

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize, as demonstrated in Figure 3.OVER. 

**Figure 3.OVER: Training vs Validation Error for Overfitting**
![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

There are several different ways that these sets can be constructed.  The following programs demonstrate some of these.

The first method is a training and validation set.  The training data are used to train the neural network until the validation set no longer improves.  This attempts to stop at a near optimal training point.  This method will only give accurate "out of sample" predictions for the validation set, this is usually 20% or so of the data.  The predictions for the training data will be overly optimistic, as these were the data that the neural network was trained on.  Figure 3.VAL demonstrates how a dataset is divided.

**Figure 3.VAL: Training with a Validation Set**
![Training with a Validation Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_train_val.png "Training with a Validation Set")

**Figure 4.VAL: marrit**
![Training with a Validation Set](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse4.mm.bing.net%2Fth%3Fid%3DOIP.IRk3r81nnPYFTjTfVf2KfQHaJu%26pid%3DApi&f=1
  "maggg")
    

In [3]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


### Early Stopping with Classification

In [8]:
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])


In [12]:

df 

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [6]:
# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values
y

array([[1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0,

In [13]:
# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

In [15]:
# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output
model.compile(loss='categorical_crossentropy', optimizer='adam')


In [16]:
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, 
        verbose=1, mode='auto', restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor],verbose=2,epochs=1000)

Train on 112 samples, validate on 38 samples
Epoch 1/1000
112/112 - 0s - loss: 1.8243 - val_loss: 1.4654
Epoch 2/1000
112/112 - 0s - loss: 1.4839 - val_loss: 1.2208
Epoch 3/1000
112/112 - 0s - loss: 1.1971 - val_loss: 1.0484
Epoch 4/1000
112/112 - 0s - loss: 1.0253 - val_loss: 0.9609
Epoch 5/1000
112/112 - 0s - loss: 0.9332 - val_loss: 0.9304
Epoch 6/1000
112/112 - 0s - loss: 0.9046 - val_loss: 0.9021
Epoch 7/1000
112/112 - 0s - loss: 0.8784 - val_loss: 0.8558
Epoch 8/1000
112/112 - 0s - loss: 0.8412 - val_loss: 0.8025
Epoch 9/1000
112/112 - 0s - loss: 0.8014 - val_loss: 0.7631
Epoch 10/1000
112/112 - 0s - loss: 0.7776 - val_loss: 0.7282
Epoch 11/1000
112/112 - 0s - loss: 0.7485 - val_loss: 0.6973
Epoch 12/1000
112/112 - 0s - loss: 0.7221 - val_loss: 0.6722
Epoch 13/1000
112/112 - 0s - loss: 0.7006 - val_loss: 0.6494
Epoch 14/1000
112/112 - 0s - loss: 0.6787 - val_loss: 0.6283
Epoch 15/1000
112/112 - 0s - loss: 0.6585 - val_loss: 0.6094
Epoch 16/1000
112/112 - 0s - loss: 0.6411 - val_l

Epoch 135/1000
112/112 - 0s - loss: 0.0999 - val_loss: 0.0763
Epoch 136/1000
112/112 - 0s - loss: 0.0982 - val_loss: 0.0749
Epoch 137/1000
112/112 - 0s - loss: 0.0974 - val_loss: 0.0723
Epoch 138/1000
112/112 - 0s - loss: 0.0965 - val_loss: 0.0715
Epoch 139/1000
112/112 - 0s - loss: 0.0962 - val_loss: 0.0721
Epoch 140/1000
112/112 - 0s - loss: 0.0973 - val_loss: 0.0787
Epoch 141/1000
112/112 - 0s - loss: 0.0972 - val_loss: 0.0753
Epoch 142/1000
112/112 - 0s - loss: 0.0950 - val_loss: 0.0685
Epoch 143/1000
112/112 - 0s - loss: 0.0947 - val_loss: 0.0682
Epoch 144/1000
112/112 - 0s - loss: 0.0924 - val_loss: 0.0722
Epoch 145/1000
112/112 - 0s - loss: 0.0927 - val_loss: 0.0743
Epoch 146/1000
112/112 - 0s - loss: 0.0947 - val_loss: 0.0756
Epoch 147/1000
112/112 - 0s - loss: 0.0932 - val_loss: 0.0662
Epoch 148/1000
112/112 - 0s - loss: 0.0918 - val_loss: 0.0655
Epoch 149/1000
112/112 - 0s - loss: 0.0938 - val_loss: 0.0651
Epoch 150/1000
112/112 - 0s - loss: 0.0953 - val_loss: 0.0750
Epoch 15

<tensorflow.python.keras.callbacks.History at 0x241f635e8d0>

There are a number of parameters that are specified to the **EarlyStopping** object. 

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

In [17]:
from sklearn.metrics import accuracy_score

pred = model.predict(x_test)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y_test,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 1.0


### Early Stopping with Regression

The following code demonstrates how we can apply early stopping to a regression problem.  The technique is similar to the early stopping for classification code that we just saw.

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics

In [20]:
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv", 
    na_values=['NA', '?'])

cars = df['name']
cars

0      chevrolet chevelle malibu
1              buick skylark 320
2             plymouth satellite
3                  amc rebel sst
4                    ford torino
                 ...            
393              ford mustang gl
394                    vw pickup
395                dodge rampage
396                  ford ranger
397                   chevy s-10
Name: name, Length: 398, dtype: object

In [21]:
# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x = df[['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

In [22]:
# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')


In [23]:


monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor], verbose=2,epochs=1000)

Train on 298 samples, validate on 100 samples
Epoch 1/1000
298/298 - 0s - loss: 396378.5791 - val_loss: 265577.2081
Epoch 2/1000
298/298 - 0s - loss: 199454.0660 - val_loss: 118406.4763
Epoch 3/1000
298/298 - 0s - loss: 82684.2259 - val_loss: 40263.6923
Epoch 4/1000
298/298 - 0s - loss: 24337.3315 - val_loss: 8195.5712
Epoch 5/1000
298/298 - 0s - loss: 3980.0958 - val_loss: 546.4959
Epoch 6/1000
298/298 - 0s - loss: 336.5587 - val_loss: 532.1319
Epoch 7/1000
298/298 - 0s - loss: 686.0022 - val_loss: 781.8317
Epoch 8/1000
298/298 - 0s - loss: 662.1676 - val_loss: 502.0138
Epoch 9/1000
298/298 - 0s - loss: 387.4226 - val_loss: 295.0601
Epoch 10/1000
298/298 - 0s - loss: 268.4510 - val_loss: 255.3680
Epoch 11/1000
298/298 - 0s - loss: 257.5762 - val_loss: 258.0524
Epoch 12/1000
298/298 - 0s - loss: 259.9978 - val_loss: 256.3413
Epoch 13/1000
298/298 - 0s - loss: 256.6455 - val_loss: 253.8129
Epoch 14/1000
298/298 - 0s - loss: 254.3593 - val_loss: 253.8552
Epoch 15/1000
298/298 - 0s - loss

Epoch 126/1000
298/298 - 0s - loss: 162.4433 - val_loss: 148.9809
Epoch 127/1000
298/298 - 0s - loss: 160.9018 - val_loss: 149.7706
Epoch 128/1000
298/298 - 0s - loss: 159.5666 - val_loss: 148.2537
Epoch 129/1000
298/298 - 0s - loss: 159.2335 - val_loss: 146.5976
Epoch 130/1000
298/298 - 0s - loss: 158.2541 - val_loss: 146.4422
Epoch 131/1000
298/298 - 0s - loss: 157.3154 - val_loss: 145.4963
Epoch 132/1000
298/298 - 0s - loss: 156.6494 - val_loss: 144.6305
Epoch 133/1000
298/298 - 0s - loss: 155.5175 - val_loss: 143.6771
Epoch 134/1000
298/298 - 0s - loss: 155.9201 - val_loss: 144.6787
Epoch 135/1000
298/298 - 0s - loss: 154.2934 - val_loss: 141.4419
Epoch 136/1000
298/298 - 0s - loss: 153.5041 - val_loss: 141.5412
Epoch 137/1000
298/298 - 0s - loss: 153.0867 - val_loss: 141.2794
Epoch 138/1000
298/298 - 0s - loss: 151.7541 - val_loss: 139.7820
Epoch 139/1000
298/298 - 0s - loss: 152.4468 - val_loss: 138.5505
Epoch 140/1000
298/298 - 0s - loss: 149.9088 - val_loss: 139.5129
Epoch 141/

Epoch 252/1000
298/298 - 0s - loss: 91.3982 - val_loss: 78.7430
Epoch 253/1000
298/298 - 0s - loss: 89.3408 - val_loss: 77.8901
Epoch 254/1000
298/298 - 0s - loss: 88.8630 - val_loss: 79.8220
Epoch 255/1000
298/298 - 0s - loss: 88.2425 - val_loss: 77.3772
Epoch 256/1000
298/298 - 0s - loss: 88.7628 - val_loss: 77.0453
Epoch 257/1000
298/298 - 0s - loss: 88.6361 - val_loss: 77.6612
Epoch 258/1000
298/298 - 0s - loss: 87.1525 - val_loss: 75.9464
Epoch 259/1000
298/298 - 0s - loss: 88.0737 - val_loss: 78.4852
Epoch 260/1000
298/298 - 0s - loss: 87.4752 - val_loss: 75.1005
Epoch 261/1000
298/298 - 0s - loss: 86.8550 - val_loss: 75.5550
Epoch 262/1000
298/298 - 0s - loss: 85.2738 - val_loss: 75.2988
Epoch 263/1000
298/298 - 0s - loss: 84.8675 - val_loss: 74.5789
Epoch 264/1000
298/298 - 0s - loss: 85.8826 - val_loss: 75.2925
Epoch 265/1000
298/298 - 0s - loss: 84.4377 - val_loss: 74.3969
Epoch 266/1000
298/298 - 0s - loss: 87.5883 - val_loss: 74.8084
Epoch 267/1000
298/298 - 0s - loss: 83.9

Epoch 381/1000
298/298 - 0s - loss: 47.3104 - val_loss: 39.7792
Epoch 382/1000
298/298 - 0s - loss: 45.9214 - val_loss: 39.7942
Epoch 383/1000
298/298 - 0s - loss: 47.1434 - val_loss: 40.9561
Epoch 384/1000
298/298 - 0s - loss: 45.3758 - val_loss: 39.0569
Epoch 385/1000
298/298 - 0s - loss: 45.7876 - val_loss: 38.8663
Epoch 386/1000
298/298 - 0s - loss: 44.7180 - val_loss: 39.8566
Epoch 387/1000
298/298 - 0s - loss: 44.5997 - val_loss: 38.9879
Epoch 388/1000
298/298 - 0s - loss: 45.1622 - val_loss: 38.3716
Epoch 389/1000
298/298 - 0s - loss: 44.8800 - val_loss: 38.0020
Epoch 390/1000
298/298 - 0s - loss: 44.2437 - val_loss: 38.1766
Epoch 391/1000
298/298 - 0s - loss: 43.1290 - val_loss: 37.5455
Epoch 392/1000
298/298 - 0s - loss: 44.6062 - val_loss: 43.3905
Epoch 393/1000
298/298 - 0s - loss: 44.2250 - val_loss: 38.5567
Epoch 394/1000
298/298 - 0s - loss: 43.6014 - val_loss: 36.7647
Epoch 395/1000
298/298 - 0s - loss: 44.2322 - val_loss: 42.4080
Epoch 396/1000
298/298 - 0s - loss: 46.4

<tensorflow.python.keras.callbacks.History at 0x241f7dedef0>

Finally, we evaluate the error.

In [25]:
# Measure RMSE error.  RMSE is common for regression.
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 4.909552420069041
