In [1]:
%matplotlib inline



Build the Neural Network
===================

Neural networks comprise of layers/modules that perform operations on data.
The `torch.nn` namespace provides all the building blocks you need to
build your own neural network. Every module in PyTorch subclasses the `nn.Module`.
A neural network is a module itself that consists of other modules (layers). This nested structure allows for
building and managing complex architectures easily.

In the following sections, we'll build a neural network to classify our pet image dataset.




In [2]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Get Device for Training
-----------------------
We want to be able to train our model on a hardware accelerator like the GPU,
if it is available. Let's check to see if
`torch.cuda`_ is available, else we
continue to use the CPU.



In [3]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cuda device


Define the Class
-------------------------
We define our neural network by subclassing ``nn.Module``, and
initialize the neural network layers in ``__init__``. Every ``nn.Module`` subclass implements
the operations on input data in the ``forward`` method.



In [4]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        #self.linear1 = nn.Linear(28*28, 512)
        #self.linear2 = nn.Linear(512, 512)
        #self.linear3 = nn.Linear(512, 10)
        #self.relu = nn.ReLU()
        
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        #x = self.linear1(x)
        #x = self.relu(x)
        #x = self.linear2(x)
        #x = self.relu(x)
        #logits = self.linear3(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of ``NeuralNetwork``, and move it to the ``device``, and print
its structure.



In [5]:
model = NeuralNetwork().to(device)
print(model) #only the parameters, not the structure

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)



Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class.
We get the prediction probabilities by passing it through an instance of the ``nn.Softmax`` module.



In [8]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
logits.shape
print(logits)

tensor([[ 0.0144,  0.0352,  0.0336, -0.0363, -0.0153, -0.0390, -0.0393,  0.0025,
          0.0629, -0.0492]], device='cuda:0', grad_fn=<AddmmBackward0>)


In [9]:
pred_probab = nn.Softmax(dim=1)(logits)
print(pred_probab)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

tensor([[0.1017, 0.1038, 0.1037, 0.0967, 0.0987, 0.0964, 0.0964, 0.1005, 0.1067,
         0.0954]], device='cuda:0', grad_fn=<SoftmaxBackward0>)
Predicted class: tensor([8], device='cuda:0')


Model Parameters
-------------------------
Many layers inside a neural network are *parameterized*, i.e. have associated weights
and biases that are optimized during training. Subclassing ``nn.Module`` automatically
tracks all fields defined inside your model object, and makes all parameters
accessible using your model's ``parameters()`` or ``named_parameters()`` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.




In [10]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0115, -0.0044, -0.0267,  ..., -0.0035, -0.0060, -0.0343],
        [-0.0004, -0.0123, -0.0279,  ..., -0.0141, -0.0336,  0.0069]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([0.0103, 0.0273], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0300,  0.0048, -0.0123,  ..., -0.0330, -0.0265,  0.0339],
        [ 0.0227, -0.0010, -0.0190,  ...,  0.0393, -0.0112, -0.0101]],
       device='cuda:0', grad_fn=<Slic