#  Early Stopping in Keras to Prevent Overfitting

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize.  

![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

There are several different ways that these sets can be constructed.  The following programs demonstrate some of these.

The first method is a training and validation set.  The training data are used to train the neural network until the validation set no longer improves.  This attempts to stop at a near optimal training point.  This method will only give accurate "out of sample" predictions for the validation set, this is usually 20% or so of the data.  The predictions for the training data will be overly optimistic, as these were the data that the neural network was trained on.  

![Training with a Validation Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_train_val.png "Training with a Validation Set")

### Early Stopping with Classification

In [15]:
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split

df = pd.read_csv("https://data.heatonresearch.com/data/t81-558/iris.csv")
df

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [16]:
X = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.25,
                                                    random_state=667
                                                    )


In [17]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

model = Sequential()
model.add(Dense(50, input_dim=X.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output

model.compile(loss='categorical_crossentropy', optimizer='adam')

In [18]:
model.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_17 (Dense)            (None, 50)                250       
                                                                 
 dense_18 (Dense)            (None, 25)                1275      
                                                                 
 dense_19 (Dense)            (None, 3)                 78        
                                                                 
Total params: 1603 (6.26 KB)
Trainable params: 1603 (6.26 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [19]:
monitor = EarlyStopping(monitor='val_loss',
                        min_delta=1e-3,
                        patience=5,
                        verbose=1,
                        mode='auto',
                        restore_best_weights=True)

model.fit(X_train,y_train,
          validation_data=(X_test,y_test),
          callbacks=[monitor],
          verbose=2,
          epochs=1000
          )

Epoch 1/1000
4/4 - 1s - loss: 0.8706 - val_loss: 0.7900 - 1s/epoch - 298ms/step
Epoch 2/1000
4/4 - 0s - loss: 0.7671 - val_loss: 0.7384 - 58ms/epoch - 15ms/step
Epoch 3/1000
4/4 - 0s - loss: 0.7084 - val_loss: 0.6832 - 55ms/epoch - 14ms/step
Epoch 4/1000
4/4 - 0s - loss: 0.6589 - val_loss: 0.6295 - 55ms/epoch - 14ms/step
Epoch 5/1000
4/4 - 0s - loss: 0.6152 - val_loss: 0.5936 - 55ms/epoch - 14ms/step
Epoch 6/1000
4/4 - 0s - loss: 0.5866 - val_loss: 0.5792 - 39ms/epoch - 10ms/step
Epoch 7/1000
4/4 - 0s - loss: 0.5682 - val_loss: 0.5650 - 58ms/epoch - 15ms/step
Epoch 8/1000
4/4 - 0s - loss: 0.5490 - val_loss: 0.5481 - 56ms/epoch - 14ms/step
Epoch 9/1000
4/4 - 0s - loss: 0.5306 - val_loss: 0.5236 - 57ms/epoch - 14ms/step
Epoch 10/1000
4/4 - 0s - loss: 0.5160 - val_loss: 0.5020 - 39ms/epoch - 10ms/step
Epoch 11/1000
4/4 - 0s - loss: 0.5043 - val_loss: 0.4882 - 57ms/epoch - 14ms/step
Epoch 12/1000
4/4 - 0s - loss: 0.4869 - val_loss: 0.4864 - 40ms/epoch - 10ms/step
Epoch 13/1000
4/4 - 0s - l

<keras.src.callbacks.History at 0x7cc113596260>

There are a number of parameters that are specified to the **EarlyStopping** object.

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

In [8]:
from sklearn.metrics import accuracy_score

pred = model.predict(X_test)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y_test,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 0.9736842105263158


### Early Stopping with Regression

In [1]:
!wget https://data.heatonresearch.com/data/t81-558/auto-mpg.csv

--2024-01-24 22:13:05--  https://data.heatonresearch.com/data/t81-558/auto-mpg.csv
Resolving data.heatonresearch.com (data.heatonresearch.com)... 108.157.162.19, 108.157.162.8, 108.157.162.56, ...
Connecting to data.heatonresearch.com (data.heatonresearch.com)|108.157.162.19|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18121 (18K) [text/csv]
Saving to: ‘auto-mpg.csv’


2024-01-24 22:13:06 (198 MB/s) - ‘auto-mpg.csv’ saved [18121/18121]



In [2]:
import pandas as pd
import numpy as np
from sklearn import metrics

df = pd.read_csv('auto-mpg.csv', na_values=['NA', '?'])
df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82,1,ford mustang gl
394,44.0,4,97.0,52.0,2130,24.6,82,2,vw pickup
395,32.0,4,135.0,84.0,2295,11.6,82,1,dodge rampage
396,28.0,4,120.0,79.0,2625,18.6,82,1,ford ranger


In [3]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
X = df[['cylinders', 'displacement', 'horsepower', 'weight','acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled,y,
                                                    test_size=0.2,
                                                    random_state=667
                                                    )

In [12]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

model = Sequential()
model.add(Dense(25, input_dim=X.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2

model.add(Dense(1)) # Output regression

model.compile(loss='mean_squared_error', optimizer='adam') # regression

In [13]:
model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_12 (Dense)            (None, 25)                200       
                                                                 
 dense_13 (Dense)            (None, 10)                260       
                                                                 
 dense_14 (Dense)            (None, 1)                 11        
                                                                 
Total params: 471 (1.84 KB)
Trainable params: 471 (1.84 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [6]:
monitor = EarlyStopping(monitor='val_loss',
                        min_delta=1e-3,
                        patience=5,
                        verbose=1,
                        mode='auto',
                        restore_best_weights=True)

model.fit(X_train,y_train,
          validation_data=(X_test,y_test),
          callbacks=[monitor],
          verbose=2,
          epochs=1000
          )

Epoch 1/1000
10/10 - 1s - loss: 621.2048 - val_loss: 543.8117 - 1s/epoch - 110ms/step
Epoch 2/1000
10/10 - 0s - loss: 614.6876 - val_loss: 537.6483 - 66ms/epoch - 7ms/step
Epoch 3/1000
10/10 - 0s - loss: 608.1035 - val_loss: 530.6010 - 50ms/epoch - 5ms/step
Epoch 4/1000
10/10 - 0s - loss: 600.0046 - val_loss: 521.6755 - 67ms/epoch - 7ms/step
Epoch 5/1000
10/10 - 0s - loss: 589.5901 - val_loss: 510.4449 - 66ms/epoch - 7ms/step
Epoch 6/1000
10/10 - 0s - loss: 576.0862 - val_loss: 496.1087 - 67ms/epoch - 7ms/step
Epoch 7/1000
10/10 - 0s - loss: 559.1260 - val_loss: 477.8607 - 53ms/epoch - 5ms/step
Epoch 8/1000
10/10 - 0s - loss: 537.2171 - val_loss: 455.5360 - 51ms/epoch - 5ms/step
Epoch 9/1000
10/10 - 0s - loss: 511.2726 - val_loss: 429.8179 - 59ms/epoch - 6ms/step
Epoch 10/1000
10/10 - 0s - loss: 481.4557 - val_loss: 400.8863 - 53ms/epoch - 5ms/step
Epoch 11/1000
10/10 - 0s - loss: 448.0124 - val_loss: 368.9013 - 68ms/epoch - 7ms/step
Epoch 12/1000
10/10 - 0s - loss: 412.1237 - val_loss

<keras.src.callbacks.History at 0x7cc12478a3b0>

In [8]:
# Measure RMSE error.  RMSE is common for regression.
y_pred = model.predict(X_test)
score = np.sqrt(metrics.mean_squared_error(y_pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 2.94720086303077
