# Model Validation

How many epochs should you train with?

Use a training, validation, and test set.

1. Training - Used to train model
2. Validation - Tells us how well the model is generalizing while the model is training
3. Test - Check accuracy after the model is trained

It is important to look at the losses relative to each other to know when to stop. In this case, stop around 100.

![q](cnn_img/q7.png)

## Validation Set: Takeaways
We create a validation set to

Measure how well a model generalizes, during training
Tell us when to stop training a model; when the validation loss stops decreasing (and especially when the validation loss starts increasing and the training loss is still decreasing)

There is a way to tell the model to stop training by using the validation data set losses as our threshold.

In [1]:
import torch
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler # Helps Randomize Valdiation Set

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20
# percentage of training set to use as validation
valid_size = 0.2

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,download=True, transform=transform)

In [11]:
num_train = len(train_data)
indices = list(range(num_train)) # Create a list of indices
np.random.shuffle(indices) # Shuffle
split = int(np.floor(valid_size * num_train)) # Calculate Split size
train_idx, valid_idx = indices[split:], indices[:split]

# Define Samplers
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
    sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers)

In [13]:
import torch.nn as nn
import torch.nn.functional as F

# define the NN architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # number of hidden nodes in each layer (512)
        hidden_1 = 512
        hidden_2 = 512
        # linear layer (784 -> hidden_1)
        self.fc1 = nn.Linear(28 * 28, hidden_1)
        # linear layer (n_hidden -> hidden_2)
        self.fc2 = nn.Linear(hidden_1, hidden_2)
        # linear layer (n_hidden -> 10)
        self.fc3 = nn.Linear(hidden_2, 10)
        # dropout layer (p=0.2)
        # dropout prevents overfitting of data
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)
        # add hidden layer, with relu activation function
        x = F.relu(self.fc1(x))
        # add dropout layer
        x = self.dropout(x)
        # add hidden layer, with relu activation function
        x = F.relu(self.fc2(x))
        # add dropout layer
        x = self.dropout(x)
        # add output layer
        x = self.fc3(x)
        return x

# initialize the NN
model = Net()
print(model)

# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()

# specify optimizer (stochastic gradient descent) and learning rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Net(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=512, bias=True)
  (fc3): Linear(in_features=512, out_features=10, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
)


In [None]:

# number of epochs to train the model
n_epochs = 50

# initialize tracker for minimum validation loss
valid_loss_min = np.Inf # set initial "min" to infinity

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    valid_loss = 0.0
    
    ###################
    # train the model #
    ###################
    model.train() # prep model for training
    for data, target in train_loader:
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update running training loss
        train_loss += loss.item()*data.size(0)
        
    ######################    
    # validate the model #
    ######################
    model.eval() # prep model for evaluation
    for data, target in valid_loader:
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update running validation loss 
        valid_loss += loss.item()*data.size(0)
        
    # print training/validation statistics 
    # calculate average loss over an epoch
    train_loss = train_loss/len(train_loader.sampler)
    valid_loss = valid_loss/len(valid_loader.sampler)
    
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
        epoch+1, 
        train_loss,
        valid_loss
        ))
    
    # save model if validation loss has decreased
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        valid_loss_min,
        valid_loss))
        torch.save(model.state_dict(), 'model.pt')
        valid_loss_min = valid_loss

Epoch: 1 	Training Loss: 0.952880 	Validation Loss: 0.381055
Validation loss decreased (inf --> 0.381055).  Saving model ...
Epoch: 2 	Training Loss: 0.358773 	Validation Loss: 0.280473
Validation loss decreased (0.381055 --> 0.280473).  Saving model ...
Epoch: 3 	Training Loss: 0.282861 	Validation Loss: 0.232941
Validation loss decreased (0.280473 --> 0.232941).  Saving model ...
Epoch: 4 	Training Loss: 0.234109 	Validation Loss: 0.195092
Validation loss decreased (0.232941 --> 0.195092).  Saving model ...
Epoch: 5 	Training Loss: 0.199125 	Validation Loss: 0.170588
Validation loss decreased (0.195092 --> 0.170588).  Saving model ...
Epoch: 6 	Training Loss: 0.175659 	Validation Loss: 0.160440
Validation loss decreased (0.170588 --> 0.160440).  Saving model ...
Epoch: 7 	Training Loss: 0.152874 	Validation Loss: 0.134890
Validation loss decreased (0.160440 --> 0.134890).  Saving model ...
Epoch: 8 	Training Loss: 0.136255 	Validation Loss: 0.121645
Validation loss decreased (0.13489

The new training algorithm will save a model as long as ***valid_loss <= valid_loss_min*** , such that the validation loss is decreasing. This will result in the lowest validation loss model saved.


In [None]:
# We can load the saved model like
model.load_state_dict(torch.load('model.pt'))
# This will be use full for large epochs