In [0]:
# ms-python.python added
import os
try:
	os.chdir(os.path.join(os.getcwd(), 'intro-to-pytorch'))
	print(os.getcwd())
except:
	pass

 # Inference and Validation

 Now that you have a trained network, you can use it for making predictions. This is typically called **inference**, a term borrowed from statistics. However, neural networks have a tendency to perform *too well* on the training data and aren't able to generalize to data that hasn't been seen before. This is called **overfitting** and it impairs inference performance. To test for overfitting while training, we measure the performance on data not in the training set called the **validation** set. We avoid overfitting through regularization such as dropout while monitoring the validation performance during training. In this notebook, I'll show you how to do this in PyTorch.

 As usual, let's start by loading the dataset through torchvision. You'll learn more about torchvision and loading data in a later part. This time we'll be taking advantage of the test set which you can get by setting `train=False` here:

 ```python
 testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
 ```

 The test set contains images just like the training set. Typically you'll see 10-20% of the original dataset held out for testing and validation with the rest being used for training.

In [4]:
from tqdm import tqdm
import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)


0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw/train-images-idx3-ubyte.gz


26427392it [00:05, 5279637.23it/s]                              


Extracting /root/.pytorch/F_MNIST_data/FashionMNIST/raw/train-images-idx3-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw


0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


32768it [00:00, 36474.57it/s]                           
0it [00:00, ?it/s]

Extracting /root/.pytorch/F_MNIST_data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


4423680it [00:02, 1505831.08it/s]                            
0it [00:00, ?it/s]

Extracting /root/.pytorch/F_MNIST_data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


8192it [00:00, 13473.97it/s]            

Extracting /root/.pytorch/F_MNIST_data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to /root/.pytorch/F_MNIST_data/FashionMNIST/raw
Processing...
Done!





 Here I'll create a model like normal, using the same one from my solution for part 4.

In [0]:
from torch import nn, optim
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x


 The goal of validation is to measure the model's performance on data that isn't part of the training set. Performance here is up to the developer to define though. Typically this is just accuracy, the percentage of classes the network predicted correctly. Other options are [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall#Definition_(classification_context)) and top-5 error rate. We'll focus on accuracy here. First I'll do a forward pass with one batch from the test set.

In [6]:
model = Classifier()

images, labels = next(iter(testloader))
# Get the class probabilities
ps = torch.exp(model(images))
# Make sure the shape is appropriate, we should get 10 class probabilities for 64 examples
print(ps.shape)


torch.Size([64, 10])


 With the probabilities, we can get the most likely class using the `ps.topk` method. This returns the $k$ highest values. Since we just want the most likely class, we can use `ps.topk(1)`. This returns a tuple of the top-$k$ values and the top-$k$ indices. If the highest value is the fifth element, we'll get back 4 as the index.

In [7]:
top_p, top_class = ps.topk(1, dim=1)
# Look at the most likely classes for the first 10 examples
print(top_class[:10,:])


tensor([[0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0]])


 Now we can check if the predicted classes match the labels. This is simple to do by equating `top_class` and `labels`, but we have to be careful of the shapes. Here `top_class` is a 2D tensor with shape `(64, 1)` while `labels` is 1D with shape `(64)`. To get the equality to work out the way we want, `top_class` and `labels` must have the same shape.

 If we do

 ```python
 equals = top_class == labels
 ```

 `equals` will have shape `(64, 64)`, try it yourself. What it's doing is comparing the one element in each row of `top_class` with each element in `labels` which returns 64 True/False boolean values for each row.

In [0]:
equals = top_class == labels.view(*top_class.shape)


 Now we need to calculate the percentage of correct predictions. `equals` has binary values, either 0 or 1. This means that if we just sum up all the values and divide by the number of values, we get the percentage of correct predictions. This is the same operation as taking the mean, so we can get the accuracy with a call to `torch.mean`. If only it was that simple. If you try `torch.mean(equals)`, you'll get an error

 ```
 RuntimeError: mean is not implemented for type torch.ByteTensor
 ```

 This happens because `equals` has type `torch.ByteTensor` but `torch.mean` isn't implemented for tensors with that type. So we'll need to convert `equals` to a float tensor. Note that when we take `torch.mean` it returns a scalar tensor, to get the actual value as a float we'll need to do `accuracy.item()`.

In [9]:
accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')


Accuracy: 14.0625%


 The network is untrained so it's making random guesses and we should see an accuracy around 10%. Now let's train our network and include our validation pass so we can measure how well the network is performing on the test set. Since we're not updating our parameters in the validation pass, we can speed up our code by turning off gradients using `torch.no_grad()`:

 ```python
 # turn off gradients
 with torch.no_grad():
     # validation pass here
     for images, labels in testloader:
         ...
 ```

 >**Exercise:** Implement the validation loop below and print out the total accuracy after the loop. You can largely copy and paste the code from above, but I suggest typing it in because writing it out yourself is essential for building the skill. In general you'll always learn more by typing it rather than copy-pasting. You should be able to get an accuracy above 80%.

In [0]:
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)

epochs = 100
steps = 0

train_losses, test_losses = [], []
for e in tqdm(range(epochs)):
    running_loss = 0
    for images, labels in trainloader:
        
        optimizer.zero_grad()
        
        log_ps = model(images)
        loss = criterion(log_ps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
    else:
        ## Implement the validation pass and print out the validation accuracy
        with torch.no_grad():
          test_loss = 0
          accuracy = 0
        
          for images, labels in testloader:
            outputs = model.forward(images)
            loss = criterion(outputs, labels)
            test_loss += loss.item()
            _, predictions = outputs.topk(1)
            equals = (labels == predictions.view(*labels.shape))
            accuracy += torch.mean(equals.type(torch.FloatTensor))
        
        running_loss = running_loss / len(trainloader)
        test_loss = test_loss / len(testloader)
        train_losses.append(running_loss)
        test_losses.append(test_loss)
        accuracy = accuracy / len(testloader)
        print(f'Train loss: {running_loss}')
        print(f'Test loss: {test_loss}')
        print(f'Accuracy: {accuracy.item()*100}%')
print('Train losses')
print(train_losses)
print('Test losses')
print(test_losses)


  1%|          | 1/100 [00:10<16:44, 10.14s/it]

Train loss: 0.5203725083518639
Test loss: 0.44095899890752355
Accuracy: 84.7034215927124%


  2%|▏         | 2/100 [00:20<16:45, 10.26s/it]

Train loss: 0.39387606552986704
Test loss: 0.4678008576297456
Accuracy: 83.77786874771118%


  3%|▎         | 3/100 [00:31<16:41, 10.33s/it]

Train loss: 0.35275250359543603
Test loss: 0.4166838939592337
Accuracy: 85.6289803981781%


  4%|▍         | 4/100 [00:41<16:34, 10.36s/it]

Train loss: 0.3337282226252149
Test loss: 0.3618882103425682
Accuracy: 86.92277073860168%


  5%|▌         | 5/100 [00:52<16:25, 10.37s/it]

Train loss: 0.31550066578172165
Test loss: 0.3673840626409859
Accuracy: 87.32085824012756%


  6%|▌         | 6/100 [01:02<16:24, 10.48s/it]

Train loss: 0.30269738803826163
Test loss: 0.37801110421776013
Accuracy: 86.44506335258484%


  7%|▋         | 7/100 [01:13<16:17, 10.51s/it]

Train loss: 0.2934258213953804
Test loss: 0.3897553550874352
Accuracy: 86.8332028388977%


  8%|▊         | 8/100 [01:24<16:14, 10.59s/it]

Train loss: 0.2832121339036839
Test loss: 0.3839776585246347
Accuracy: 86.65406107902527%


  9%|▉         | 9/100 [01:35<16:13, 10.70s/it]

Train loss: 0.2749881675637671
Test loss: 0.37030344539482124
Accuracy: 87.43033409118652%


 10%|█         | 10/100 [01:46<16:10, 10.78s/it]

Train loss: 0.26958366647473914
Test loss: 0.3688056524012499
Accuracy: 87.53980994224548%


 11%|█         | 11/100 [01:57<16:05, 10.85s/it]

Train loss: 0.2630023638894563
Test loss: 0.3879646291113963
Accuracy: 87.2113823890686%


 12%|█▏        | 12/100 [02:08<16:02, 10.94s/it]

Train loss: 0.25602717665848196
Test loss: 0.4025779417745627
Accuracy: 86.75358295440674%


 13%|█▎        | 13/100 [02:19<15:50, 10.93s/it]

Train loss: 0.2568469913577093
Test loss: 0.3610928262210196
Accuracy: 88.19665312767029%


 14%|█▍        | 14/100 [02:30<15:44, 10.98s/it]

Train loss: 0.24353082766776274
Test loss: 0.39761077427560354
Accuracy: 87.09195852279663%


 15%|█▌        | 15/100 [02:41<15:34, 11.00s/it]

Train loss: 0.23941684394344084
Test loss: 0.3859347704869167
Accuracy: 87.2113823890686%


 16%|█▌        | 16/100 [02:52<15:25, 11.02s/it]

Train loss: 0.23606227050378506
Test loss: 0.3833254431463351
Accuracy: 88.01751732826233%


 17%|█▋        | 17/100 [03:04<15:32, 11.24s/it]

Train loss: 0.22997456357113397
Test loss: 0.3721485874455446
Accuracy: 88.25637102127075%


 18%|█▊        | 18/100 [03:15<15:24, 11.28s/it]

Train loss: 0.22491316844040016
Test loss: 0.37975125489341227
Accuracy: 87.84832954406738%


 19%|█▉        | 19/100 [03:26<15:20, 11.36s/it]

Train loss: 0.22458484714457602
Test loss: 0.3750864506052558
Accuracy: 87.93789744377136%


 20%|██        | 20/100 [03:38<15:19, 11.49s/it]

Train loss: 0.2244342175632048
Test loss: 0.45508767038014286
Accuracy: 86.82324886322021%


 21%|██        | 21/100 [03:50<15:15, 11.59s/it]

Train loss: 0.21239324597153328
Test loss: 0.4011436152704962
Accuracy: 88.40565085411072%


 22%|██▏       | 22/100 [04:02<15:06, 11.62s/it]

Train loss: 0.21292253636093791
Test loss: 0.4150004408731582
Accuracy: 87.7886176109314%


 23%|██▎       | 23/100 [04:13<14:48, 11.54s/it]

Train loss: 0.20454649160951693
Test loss: 0.3980779854260432
Accuracy: 87.98766136169434%


 24%|██▍       | 24/100 [04:24<14:30, 11.46s/it]

Train loss: 0.20611378507637013
Test loss: 0.40025516390610655
Accuracy: 87.96775341033936%


 25%|██▌       | 25/100 [04:36<14:13, 11.38s/it]

Train loss: 0.20427273334200574
Test loss: 0.38437104823103374
Accuracy: 88.18670511245728%


 26%|██▌       | 26/100 [04:47<14:02, 11.38s/it]

Train loss: 0.1950002873677816
Test loss: 0.42636196974925933
Accuracy: 88.1170392036438%


 27%|██▋       | 27/100 [04:58<13:51, 11.39s/it]

Train loss: 0.19415177069683828
Test loss: 0.41334258247712613
Accuracy: 88.27627301216125%


 28%|██▊       | 28/100 [05:10<13:39, 11.38s/it]

Train loss: 0.1953849788191222
Test loss: 0.43135311326403525
Accuracy: 87.88813948631287%


 29%|██▉       | 29/100 [05:21<13:27, 11.37s/it]

Train loss: 0.19173791758350725
Test loss: 0.40862062027689755
Accuracy: 88.30612897872925%


 30%|███       | 30/100 [05:33<13:18, 11.41s/it]

Train loss: 0.18283761731946646
Test loss: 0.41340311564457644
Accuracy: 88.23646306991577%


 31%|███       | 31/100 [05:44<13:11, 11.47s/it]

Train loss: 0.18084679798149605
Test loss: 0.46125362827709526
Accuracy: 87.89808750152588%


 32%|███▏      | 32/100 [05:56<12:58, 11.45s/it]

Train loss: 0.18285834896904446
Test loss: 0.4468681034009168
Accuracy: 88.19665312767029%


 33%|███▎      | 33/100 [06:07<12:52, 11.53s/it]

Train loss: 0.18437565935215652
Test loss: 0.41887107444037297
Accuracy: 88.14689517021179%


 34%|███▍      | 34/100 [06:19<12:42, 11.56s/it]

Train loss: 0.1719765613622058
Test loss: 0.4865987641131802
Accuracy: 87.65923380851746%


 35%|███▌      | 35/100 [06:31<12:32, 11.58s/it]

Train loss: 0.17861845399509232
Test loss: 0.4214392338588739
Accuracy: 88.65445852279663%


 36%|███▌      | 36/100 [06:42<12:20, 11.57s/it]

Train loss: 0.1756172904581912
Test loss: 0.4528754439418483
Accuracy: 88.3957028388977%


 37%|███▋      | 37/100 [06:54<12:06, 11.52s/it]

Train loss: 0.17130165602932415
Test loss: 0.4484476192267078
Accuracy: 88.23646306991577%


 38%|███▊      | 38/100 [07:05<12:01, 11.64s/it]

Train loss: 0.1660172944602523
Test loss: 0.46051583481822045
Accuracy: 87.55971193313599%


 39%|███▉      | 39/100 [07:17<11:52, 11.68s/it]

Train loss: 0.17082179412206036
Test loss: 0.47347501252487206
Accuracy: 87.76870965957642%


 40%|████      | 40/100 [07:29<11:35, 11.59s/it]

Train loss: 0.17134940218148645
Test loss: 0.48637914657592773
Accuracy: 87.96775341033936%


 41%|████      | 41/100 [07:40<11:20, 11.53s/it]

Train loss: 0.16112039015809101
Test loss: 0.49567312182514534
Accuracy: 88.41560482978821%


 42%|████▏     | 42/100 [07:52<11:08, 11.53s/it]

Train loss: 0.1583377238827696
Test loss: 0.4908205665125968
Accuracy: 88.19665312767029%


 43%|████▎     | 43/100 [08:03<10:58, 11.55s/it]

Train loss: 0.15744583245748078
Test loss: 0.49615254139254805
Accuracy: 88.61464858055115%


 44%|████▍     | 44/100 [08:15<10:45, 11.52s/it]

Train loss: 0.17411695509902766
Test loss: 0.535878571877434
Accuracy: 87.94785141944885%


 45%|████▌     | 45/100 [08:26<10:34, 11.54s/it]

Train loss: 0.16449395583958418
Test loss: 0.5202399696798841
Accuracy: 87.7786636352539%


 46%|████▌     | 46/100 [08:38<10:21, 11.51s/it]

Train loss: 0.15216445404170417
Test loss: 0.5280928681990144
Accuracy: 87.46019005775452%


 47%|████▋     | 47/100 [08:49<10:10, 11.52s/it]

Train loss: 0.15525552426784564
Test loss: 0.49928586778177575
Accuracy: 88.4255588054657%


 48%|████▊     | 48/100 [09:01<10:02, 11.59s/it]

Train loss: 0.1536533378009031
Test loss: 0.5055768040252054
Accuracy: 88.43550682067871%


 49%|████▉     | 49/100 [09:12<09:51, 11.59s/it]

Train loss: 0.15079870566265033
Test loss: 0.46722718992620516
Accuracy: 88.56489062309265%


 50%|█████     | 50/100 [09:24<09:43, 11.68s/it]

Train loss: 0.15269479489942858
Test loss: 0.5203022782684891
Accuracy: 88.4255588054657%


 51%|█████     | 51/100 [09:36<09:34, 11.73s/it]

Train loss: 0.16316682025532064
Test loss: 0.5082247133847255
Accuracy: 87.63933181762695%


 52%|█████▏    | 52/100 [09:48<09:27, 11.83s/it]

Train loss: 0.1488743448165307
Test loss: 0.535161878272986
Accuracy: 88.30612897872925%


 53%|█████▎    | 53/100 [10:01<09:23, 11.99s/it]

Train loss: 0.1466554517208386
Test loss: 0.5482978315869714
Accuracy: 88.41560482978821%


 54%|█████▍    | 54/100 [10:13<09:15, 12.07s/it]

Train loss: 0.14064681459901365
Test loss: 0.5409179653224957
Accuracy: 88.36584687232971%


 55%|█████▌    | 55/100 [10:25<09:03, 12.09s/it]

Train loss: 0.1427846611308645
Test loss: 0.5755922543300185
Accuracy: 88.0871832370758%


 56%|█████▌    | 56/100 [10:37<08:55, 12.17s/it]

Train loss: 0.14662804108248081
Test loss: 0.556413077055269
Accuracy: 88.02746534347534%


 57%|█████▋    | 57/100 [10:50<08:45, 12.23s/it]

Train loss: 0.14591958210058908
Test loss: 0.5308317955417238
Accuracy: 88.50517272949219%


 58%|█████▊    | 58/100 [11:03<08:41, 12.42s/it]

Train loss: 0.14514103750569193
Test loss: 0.5852904023638197
Accuracy: 87.23129034042358%


 59%|█████▉    | 59/100 [11:15<08:29, 12.42s/it]

Train loss: 0.14677611958863004
Test loss: 0.5770881203519311
Accuracy: 87.72889971733093%


 60%|██████    | 60/100 [11:28<08:17, 12.45s/it]

Train loss: 0.1435859569989796
Test loss: 0.5417179767588142
Accuracy: 88.35589289665222%


 61%|██████    | 61/100 [11:40<08:10, 12.57s/it]

Train loss: 0.13192245657585544
Test loss: 0.5452146997592252
Accuracy: 87.98766136169434%


 62%|██████▏   | 62/100 [11:53<07:58, 12.59s/it]

Train loss: 0.1530653854091364
Test loss: 0.5153488890285705
Accuracy: 88.20660710334778%


 63%|██████▎   | 63/100 [12:06<07:50, 12.71s/it]

Train loss: 0.15634848530799436
Test loss: 0.5613061284800623
Accuracy: 88.33598494529724%


 64%|██████▍   | 64/100 [12:19<07:42, 12.84s/it]

Train loss: 0.12796688089415287
Test loss: 0.5474398918687158
Accuracy: 88.46536874771118%


 65%|██████▌   | 65/100 [12:32<07:32, 12.93s/it]

Train loss: 0.13994592562643512
Test loss: 0.6115312457965556
Accuracy: 88.43550682067871%


 66%|██████▌   | 66/100 [12:45<07:18, 12.89s/it]

Train loss: 0.1280709843261481
Test loss: 0.5536029079252747
Accuracy: 88.03741931915283%


 67%|██████▋   | 67/100 [12:58<07:04, 12.87s/it]

Train loss: 0.12746800344039216
Test loss: 0.5788765771753469
Accuracy: 87.86823153495789%


 68%|██████▊   | 68/100 [13:11<06:53, 12.92s/it]

Train loss: 0.14037038265233007
Test loss: 0.5879683258711912
Accuracy: 88.25637102127075%


 69%|██████▉   | 69/100 [13:24<06:44, 13.04s/it]

Train loss: 0.12660531693551222
Test loss: 0.6139981775147141
Accuracy: 88.21656107902527%


 70%|███████   | 70/100 [13:38<06:37, 13.24s/it]

Train loss: 0.11843931755182077
Test loss: 0.6580906498963666
Accuracy: 88.21656107902527%


 71%|███████   | 71/100 [13:52<06:28, 13.39s/it]

Train loss: 0.1495803346259714
Test loss: 0.6260111216621794
Accuracy: 88.1369411945343%


 72%|███████▏  | 72/100 [14:06<06:20, 13.59s/it]

Train loss: 0.12544787533656715
Test loss: 0.6843499890558279
Accuracy: 88.24641704559326%


 73%|███████▎  | 73/100 [14:20<06:10, 13.72s/it]

Train loss: 0.12939042506961904
Test loss: 0.7511527046656153
Accuracy: 84.93232727050781%


 74%|███████▍  | 74/100 [14:34<06:00, 13.85s/it]

Train loss: 0.13273217166408594
Test loss: 0.7882509399976605
Accuracy: 88.19665312767029%


 75%|███████▌  | 75/100 [14:48<05:49, 14.00s/it]

Train loss: 0.13175778048657089
Test loss: 0.6635321895028375
Accuracy: 87.75875568389893%


 76%|███████▌  | 76/100 [15:03<05:42, 14.28s/it]

Train loss: 0.12462592682491408
Test loss: 0.6349730728917821
Accuracy: 88.10708522796631%


 77%|███████▋  | 77/100 [15:18<05:32, 14.44s/it]

Train loss: 0.12014725930821984
Test loss: 0.6721454174465434
Accuracy: 88.05732727050781%


 78%|███████▊  | 78/100 [15:33<05:21, 14.61s/it]

Train loss: 0.14170921784835552
Test loss: 0.6002515075597793
Accuracy: 88.61464858055115%


 79%|███████▉  | 79/100 [15:48<05:10, 14.79s/it]

Train loss: 0.11169406121521235
Test loss: 0.6725085567042326
Accuracy: 88.7340784072876%


 80%|████████  | 80/100 [16:04<05:00, 15.03s/it]

Train loss: 0.1258174158853175
Test loss: 0.6517272033509175
Accuracy: 88.59474658966064%


 81%|████████  | 81/100 [16:20<04:49, 15.26s/it]

Train loss: 0.12007049128819289
Test loss: 0.7002301174364273
Accuracy: 88.37579488754272%


 82%|████████▏ | 82/100 [16:36<04:38, 15.47s/it]

Train loss: 0.12975280545949777
Test loss: 0.665918747852942
Accuracy: 88.4454607963562%


 83%|████████▎ | 83/100 [16:51<04:24, 15.59s/it]

Train loss: 0.1243976322211214
Test loss: 0.6748635911258163
Accuracy: 88.06727528572083%


 84%|████████▍ | 84/100 [17:08<04:11, 15.72s/it]

Train loss: 0.1097935169144893
Test loss: 0.7041387763467564
Accuracy: 88.17675113677979%


 85%|████████▌ | 85/100 [17:24<03:57, 15.85s/it]

Train loss: 0.12315782985197249
Test loss: 0.7622921870202776
Accuracy: 88.06727528572083%


 86%|████████▌ | 86/100 [17:40<03:45, 16.08s/it]

Train loss: 0.13685478849201474
Test loss: 0.6883623030060416
Accuracy: 88.50517272949219%


 87%|████████▋ | 87/100 [17:57<03:29, 16.14s/it]

Train loss: 0.10523847618035034
Test loss: 0.819985852736956
Accuracy: 88.15684914588928%


 88%|████████▊ | 88/100 [18:13<03:14, 16.20s/it]

Train loss: 0.1202883484613484
Test loss: 0.7675498977398417
Accuracy: 87.30095624923706%


 89%|████████▉ | 89/100 [18:29<02:58, 16.19s/it]

Train loss: 0.11739996396430107
Test loss: 0.7899985408801942
Accuracy: 87.81847357749939%


 90%|█████████ | 90/100 [18:45<02:42, 16.23s/it]

Train loss: 0.115743674458499
Test loss: 0.6660827068956034
Accuracy: 87.5199019908905%


 91%|█████████ | 91/100 [19:02<02:26, 16.31s/it]

Train loss: 0.11049560567168995
Test loss: 0.7619710755401
Accuracy: 88.01751732826233%


 92%|█████████▏| 92/100 [19:18<02:10, 16.35s/it]

Train loss: 0.11222840438652505
Test loss: 0.7658912431282602
Accuracy: 88.59474658966064%


 93%|█████████▎| 93/100 [19:35<01:54, 16.37s/it]

Train loss: 0.1247593795122114
Test loss: 0.7342923151649488
Accuracy: 88.51512670516968%


 94%|█████████▍| 94/100 [19:51<01:38, 16.41s/it]

Train loss: 0.12249455943669893
Test loss: 0.7067912214548345
Accuracy: 88.86345624923706%


 ## Overfitting

 If we look at the training and validation losses as we train the network, we can see a phenomenon known as overfitting.

 <img src='https://github.com/dinaldoap/deep-learning-v2-pytorch/blob/e5737c87f3c6fe345f8a3a24494d199084456386/intro-to-pytorch/assets/overfitting.png?raw=1' width=450px>

 The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible. One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called *early-stopping*. In practice, you'd save the model frequently as you're training then later choose the model with the lowest validation loss.

 The most common method to reduce overfitting (outside of early-stopping) is *dropout*, where we randomly drop input units. This forces the network to share information between weights, increasing it's ability to generalize to new data. Adding dropout in PyTorch is straightforward using the [`nn.Dropout`](https://pytorch.org/docs/stable/nn.html#torch.nn.Dropout) module.

 ```python
 class Classifier(nn.Module):
     def __init__(self):
         super().__init__()
         self.fc1 = nn.Linear(784, 256)
         self.fc2 = nn.Linear(256, 128)
         self.fc3 = nn.Linear(128, 64)
         self.fc4 = nn.Linear(64, 10)

         # Dropout module with 0.2 drop probability
         self.dropout = nn.Dropout(p=0.2)

     def forward(self, x):
         # make sure input tensor is flattened
         x = x.view(x.shape[0], -1)

         # Now with dropout
         x = self.dropout(F.relu(self.fc1(x)))
         x = self.dropout(F.relu(self.fc2(x)))
         x = self.dropout(F.relu(self.fc3(x)))

         # output so no dropout here
         x = F.log_softmax(self.fc4(x), dim=1)

         return x
 ```

 During training we want to use dropout to prevent overfitting, but during inference we want to use the entire network. So, we need to turn off dropout during validation, testing, and whenever we're using the network to make predictions. To do this, you use `model.eval()`. This sets the model to evaluation mode where the dropout probability is 0. You can turn dropout back on by setting the model to train mode with `model.train()`. In general, the pattern for the validation loop will look like this, where you turn off gradients, set the model to evaluation mode, calculate the validation loss and metric, then set the model back to train mode.

 ```python
 # turn off gradients
 with torch.no_grad():

     # set model to evaluation mode
     model.eval()

     # validation pass here
     for images, labels in testloader:
         ...

 # set model back to train mode
 model.train()
 ```

 > **Exercise:** Add dropout to your model and train it on Fashion-MNIST again. See if you can get a lower validation loss or higher accuracy.

In [0]:
## Define your model with dropout added
class ClassifierDropout(nn.Module):
    def __init__(self):
        super().__init__()
        input_size = 28 * 28
        fc1_size = 256
        fc2_size = 128
        fc3_size = 64
        output_size = 10
        dropout = 0.2
        self.model = nn.Sequential(nn.Linear(input_size, fc1_size),
                                    nn.ReLU(),
                                    nn.Dropout(p=dropout),
                                    nn.Linear(fc1_size, fc2_size),
                                    nn.ReLU(),
                                    nn.Dropout(p=dropout),
                                    nn.Linear(fc2_size, fc3_size),
                                    nn.ReLU(),
                                    nn.Dropout(p=dropout),
                                    nn.Linear(fc3_size, output_size),
                                    nn.LogSoftmax(dim=1))

    def forward(self, x):
        x = x.view(x.shape[0], -1)
        return self.model.forward(x)  


In [0]:
## Train your model with dropout, and monitor the training progress with the validation loss and accuracy
model = ClassifierDropout()
optimizer = optim.Adam(model.parameters(), lr=0.003)
criterion = nn.NLLLoss()
epochs = 100

train_losses, val_losses = [], []
for epoch in tqdm(range(epochs)):
    model.train()
    train_loss = 0
    for images, labels in trainloader:
        outputs = model.forward(images)
        loss = criterion(outputs, labels)
        train_loss += loss.item()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    else:
        with torch.no_grad():
            model.eval()
            val_loss = 0
            accuracy = 0
            for images, labels in testloader:
                outputs = model.forward(images)
                loss = criterion(outputs, labels)
                val_loss += loss.item()

                _, predictions = outputs.topk(1)
                equals = (labels == predictions.view(*labels.shape))
                accuracy += torch.mean(equals.type(torch.FloatTensor))
        
        train_loss = train_loss / len(trainloader)
        val_loss = val_loss / len(testloader)
        train_losses.append(train_loss)
        val_losses.append(val_loss)
        accuracy = accuracy / len(testloader)
        print('Training loss: {}'.format(train_loss))
        print('Validation loss: {}'.format(val_loss))
        print('Accuracy: {}%'.format(accuracy*100))
print('Train losses')
print(train_losses)
print('Validation losses')
print(val_losses)        






  0%|          | 0/100 [00:00<?, ?it/s][A[A

  1%|          | 1/100 [00:10<17:55, 10.87s/it][A[A

Training loss: 0.6034170603478896
Validation loss: 0.49014643783781936
Accuracy: 82.18550872802734%




  2%|▏         | 2/100 [00:22<18:15, 11.17s/it][A[A

Training loss: 0.48117305770484625
Validation loss: 0.44821617386902973
Accuracy: 83.25039672851562%




  3%|▎         | 3/100 [00:34<18:14, 11.28s/it][A[A

Training loss: 0.4466402783735729
Validation loss: 0.406605645065095
Accuracy: 85.19108581542969%




  4%|▍         | 4/100 [00:45<18:09, 11.35s/it][A[A

Training loss: 0.4307492532963946
Validation loss: 0.4340949402113629
Accuracy: 84.58399963378906%


 ## Inference

 Now that the model is trained, we can use it for inference. We've done this before, but now we need to remember to set the model in inference mode with `model.eval()`. You'll also want to turn off autograd with the `torch.no_grad()` context.

In [0]:
# Import helper module (should be in the repo)
import helper

# Test out your network!

model.eval()

dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[0]
# Convert 2D image to 1D vector
img = img.view(1, 784)

# Calculate the class probabilities (softmax) for img
with torch.no_grad():
    output = model.forward(img)

ps = torch.exp(output)

# Plot the image and probabilities
helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')


 ## Next Up!

 In the next part, I'll show you how to save your trained models. In general, you won't want to train a model everytime you need it. Instead, you'll train once, save it, then load the model when you want to train more or use if for inference.