# Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction [[Video]](https://www.youtube.com/watch?v=zYnI4iWRmpc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_1_neural_net.ipynb)
* Part 3.2: Introduction to Tensorflow and Keras [[Video]](https://www.youtube.com/watch?v=PsE73jk55cE&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_2_keras.ipynb)
* Part 3.3: Saving and Loading a Keras Neural Network [[Video]](https://www.youtube.com/watch?v=-9QfbGM1qGw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_3_save_load.ipynb)
* **Part 3.4: Early Stopping in Keras to Prevent Overfitting** [[Video]](https://www.youtube.com/watch?v=m1LNunuI2fk&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_4_early_stop.ipynb)
* Part 3.5: Extracting Weights and Manual Calculation [[Video]](https://www.youtube.com/watch?v=7PWgx16kH8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_5_weights.ipynb)

# Part 3.4: Early Stopping in Keras to Prevent Overfitting

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize, as demonstrated in Figure 3.OVER. 

**Figure 3.OVER: Training vs Validation Error for Overfitting**
![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

There are several different ways that these sets can be constructed.  The following programs demonstrate some of these.

The first method is a training and validation set.  The training data are used to train the neural network until the validation set no longer improves.  This attempts to stop at a near optimal training point.  This method will only give accurate "out of sample" predictions for the validation set, this is usually 20% or so of the data.  The predictions for the training data will be overly optimistic, as these were the data that the neural network was trained on.  Figure 3.VAL demonstrates how a dataset is divided.

**Figure 3.VAL: Training with a Validation Set**
![Training with a Validation Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_train_val.png "Training with a Validation Set")

**Figure 4.VAL: marrit**
![Training with a Validation Set](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse4.mm.bing.net%2Fth%3Fid%3DOIP.IRk3r81nnPYFTjTfVf2KfQHaJu%26pid%3DApi&f=1
  "maggg")
    

In [10]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

### Early Stopping with Classification

In [11]:
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])


In [12]:

df 

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [13]:
# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values
y

array([[1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0,

In [14]:
# Split into validation and training sets 25%  test set
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

In [15]:
# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output
model.compile(loss='categorical_crossentropy', optimizer='adam')


In [16]:
# monitor EarlyStopping  with min_delta=1e-3
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, 
        verbose=1, mode='auto', restore_best_weights=True)
#fit  with callback to  monitor
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor],verbose=2,epochs=1000)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 112 samples, validate on 38 samples
Epoch 1/1000
112/112 - 0s - loss: 1.2186 - val_loss: 1.1123
Epoch 2/1000
112/112 - 0s - loss: 1.0439 - val_loss: 1.0011
Epoch 3/1000
112/112 - 0s - loss: 0.9629 - val_loss: 0.9276
Epoch 4/1000
112/112 - 0s - loss: 0.9071 - val_loss: 0.8569
Epoch 5/1000
112/112 - 0s - loss: 0.8436 - val_loss: 0.8009
Epoch 6/1000
112/112 - 0s - loss: 0.8061 - val_loss: 0.7670
Epoch 7/1000
112/112 - 0s - loss: 0.7783 - val_loss: 0.7322
Epoch 8/1000
112/112 - 0s - loss: 0.7497 - val_loss: 0.6997
Epoch 9/1000
112/112 - 0s - loss: 0.7188 - val_loss: 0.6646
Epoch 10/1000
112/112 - 0s - loss: 0.6881 - val_loss: 0.6303
Epoch 11/1000
112/112 - 0s - loss: 0.6619 - val_loss: 0.6013
Epoch 12/1000
112/112 - 0s - loss: 0.6382 - val_loss: 0.5762
Epoch 13/1000
112/112 - 0s - loss: 0.6121 - val_loss: 0.5531
Epoch 14/1000
112/112 - 0s - loss: 0.5900 - val_loss: 0.5373
Epoch 15/1000
11

Epoch 129/1000
112/112 - 0s - loss: 0.0942 - val_loss: 0.0775
Epoch 130/1000
112/112 - 0s - loss: 0.0926 - val_loss: 0.0788
Epoch 131/1000
112/112 - 0s - loss: 0.0903 - val_loss: 0.0859
Epoch 132/1000
112/112 - 0s - loss: 0.0962 - val_loss: 0.0855
Epoch 133/1000
112/112 - 0s - loss: 0.0930 - val_loss: 0.0756
Epoch 134/1000
112/112 - 0s - loss: 0.0962 - val_loss: 0.0751
Epoch 135/1000
112/112 - 0s - loss: 0.0926 - val_loss: 0.0808
Epoch 136/1000
112/112 - 0s - loss: 0.0942 - val_loss: 0.0878
Epoch 137/1000
112/112 - 0s - loss: 0.0912 - val_loss: 0.0755
Epoch 138/1000
112/112 - 0s - loss: 0.0930 - val_loss: 0.0732
Epoch 139/1000
112/112 - 0s - loss: 0.0898 - val_loss: 0.0792
Epoch 140/1000
112/112 - 0s - loss: 0.0902 - val_loss: 0.0827
Epoch 141/1000
112/112 - 0s - loss: 0.0885 - val_loss: 0.0761
Epoch 142/1000
112/112 - 0s - loss: 0.0867 - val_loss: 0.0719
Epoch 143/1000
112/112 - 0s - loss: 0.0872 - val_loss: 0.0720
Epoch 144/1000
112/112 - 0s - loss: 0.0856 - val_loss: 0.0746
Epoch 14

<tensorflow.python.keras.callbacks.History at 0x29847329b38>

There are a number of parameters that are specified to the **EarlyStopping** object. 

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

In [17]:
from sklearn.metrics import accuracy_score

pred = model.predict(x_test)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y_test,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 0.9736842105263158


### Early Stopping with Regression

The following code demonstrates how we can apply early stopping to a regression problem.  The technique is similar to the early stopping for classification code that we just saw.

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics

In [19]:
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv", 
    na_values=['NA', '?'])

cars = df['name']
cars

0      chevrolet chevelle malibu
1              buick skylark 320
2             plymouth satellite
3                  amc rebel sst
4                    ford torino
                 ...            
393              ford mustang gl
394                    vw pickup
395                dodge rampage
396                  ford ranger
397                   chevy s-10
Name: name, Length: 398, dtype: object

In [20]:
# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x = df[['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

In [21]:
# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')


In [27]:


monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor], verbose=2,epochs=1000)

Train on 298 samples, validate on 100 samples
Epoch 1/1000
298/298 - 0s - loss: 15.0193 - val_loss: 10.5498
Epoch 2/1000
298/298 - 0s - loss: 15.4302 - val_loss: 13.9567
Epoch 3/1000
298/298 - 0s - loss: 17.1015 - val_loss: 13.2150
Epoch 4/1000
298/298 - 0s - loss: 17.2163 - val_loss: 12.3502
Epoch 5/1000
298/298 - 0s - loss: 15.5229 - val_loss: 13.3638
Epoch 6/1000
Restoring model weights from the end of the best epoch.
298/298 - 0s - loss: 18.1975 - val_loss: 11.9359
Epoch 00006: early stopping


<tensorflow.python.keras.callbacks.History at 0x29848d8cfd0>

Finally, we evaluate the error.

In [28]:
# Measure RMSE error.  RMSE is common for regression.
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 3.248045004502701
