**Underfitting** occurs when your model cannot obtain sufficiently low loss on the training set. In this case, your model fails to learn the underlying patterns in your training data. On the other end of the spectrum, we have **overfitting** where your network models the training data too well and fails to generalize to your validation data.

Therefore, our goal when training a machine learning model is to:
1.  Reduce the training loss as much as possible.
2.  While ensuring the gap between the training and testing loss is reasonably small.

To combat overfitting, in general there are two techniques:
1. Reduce the complexity of the model
2. Apply regularization - highly prefered solution

**Sometimes, overfitting is an invitability. What we need is to control the 'generalization gap'**. At some point learning and validation loss will start diverging and we need to try and keep the gap under control. If we start seeing a rise in validation loss, we are stronly overfitting.

In [1]:
# import the necessary packages
from ndl.callbacks import TrainingMonitor
from sklearn.preprocessing import LabelBinarizer
from ndl.nn.conv import MiniVGGNet
from keras.optimizers import SGD
from keras.datasets import cifar10
import os

Using TensorFlow backend.


In [7]:
# show information on the process ID
print("[INFO process ID: {}".format(os.getpid()))
output = 'output'

[INFO process ID: 14140


In [3]:
# load the training and testing data, then scale it into the
# range [0, 1]
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0

[INFO] loading CIFAR-10 data...


In [4]:
# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

In [5]:
labelNames = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

In [6]:
# initialize the SGD optimizer, but without any learning rate decay
print("[INFO] compiling model...")
opt = SGD(lr=0.01, momentum=0.9, nesterov=True)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

[INFO] compiling model...


We're not including any decay on purpose to demonstrate monitoring and spot overfitting as it's happening.

In [9]:
# construct the set of callbacks
figPath = os.path.sep.join([output, "{}.png".format(os.getpid())])
jsonPath = os.path.sep.join([output, "{}.json".format(os.getpid())])
callbacks = [TrainingMonitor(figPath, jsonPath=jsonPath)]

## Once you start training, take a look at your output directory and you should see the training history and loss plots being stored.

- At epoch 5 we should still be underfitting
- At epoch 10 we see the first signs of overfitting, but perfectly normal
- At epoch 25 we see validation loss stagnating while training loss continues to go down.
- At epoch 50 we are clearly in trouble with validation loss going up. We should have stopped the experiment here and not waste more time. Re-assess parameters, try again...

If we let this go all over to epoch 100 we see our validation loss continuing to rise.

In [None]:
# train the network
print("[INFO] training network...")
model.fit(trainX, trainY, validation_data=(testX, testY), batch_size=64, epochs=100, callbacks=callbacks, verbose=1)

[INFO] training network...
Train on 50000 samples, validate on 10000 samples
Epoch 1/100
