<a href="https://colab.research.google.com/github/JunyuYan/Pytorch-Learning-Materials/blob/main/pytorch_official/Build_the_Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Neural networks comprise of layers/modules that perform operations on data.

Two important concepts:


> torch.nn: provides all building blocks you need to build the neural network


> nn.Module: every module in Pytorch subclasses the nn.Module. A neural network itself is a module that consists of other modules (layers).

In the following, we will show how to build a neural network in Pytorch step by step.


In [2]:
# Import any libraries needed
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

**Topic 1: Get device for training**

We would like to train our model on a hardware accelerator like the GPU or mps. For the device module, it will first check whether GPU or mps is available, if not, then choose CPU.

GPU is checked through ***torch.cuda***, and mps is checked through ***torch.backends.mps***.

In [3]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


**Topic2: Define the Class**

The neural network is defined by subclassing nn.Module. The neural network class should at least contains two methods as:


> \_\_init\_\_: initializes all neural network layers.


> forward: defines operations on input data.


In [4]:
class NeuralNetwork(nn.Module):
  def __init__(self):
    super().__init__()
    self.flatten = nn.Flatten()
    self.linear_relu_stack = nn.Sequential(
        nn.Linear(28*28, 512),
        nn.ReLU(),
        nn.Linear(512, 512),
        nn.ReLU(),
        nn.Linear(512, 10),
    )

  def forward(self, x):
    x = self.flatten(x)
    logits = self.linear_relu_stack(x)
    return logits

In [5]:
# Creat an instance of NeuralNetwork, and move it to the device
model = NeuralNetwork().to(device)
# Print model's structure
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use the model, we pass it the input data. This executes the model's forward, along with some background operations.

DO NOT CALL model.forward() DIRECTLY!

In [6]:
x = torch.rand(1, 28, 28, device=device)
logits = model(x) # logits is a two-dimension tensor with dim=0 corresponding to each output of 10 raw predicted values for each class,
          # dim=1 corresponding to the individual values of each output
pred_prob = nn.Softmax(dim=1)(logits)
y_pred = pred_prob.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([0])


**Topic 3: Model Layers**

Pytorch supports several layers with a simple API. Here we start with a sample minibatch of 3 images of size 28×28 to see how some specific layers work

In [7]:
# Define the input data
input_image = torch.rand(3, 28, 28)
print(input_image.size())

torch.Size([3, 28, 28])


***nn.Flatten()***: convert data of multiple dimensions into a contiguous array with only 1 dimension, just like flat the data. (the minibatch dimension, as dim=0, is maintained)

In [8]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


***nn.Linear()***: applied linear transformations on the input using its stored weights and biases.

In [9]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


***nn.ReLU()***: activation functions are applied after linear transformations to introduce non-linearity into the neural netowrk. It creates the complex mappings between inputs and outputs, enabling the neural network to learn a variety of phenomena.

relu is just one of the activation functions, some other activation functions contain sigmoid, tanh, relu, leaky relu, elu, prelu, softplus, and so on


In [11]:
print(f"Before relu: {hidden1}")
relu1 = nn.ReLU()(hidden1)
print(f"After relu: {relu1}")

Before relu: tensor([[ 0.5179,  0.3581,  0.2407,  0.1851, -0.0105, -0.5799, -0.1621,  0.2630,
          0.5479, -0.4820,  0.0849, -0.4071,  0.3334, -0.4439,  0.0151, -0.0371,
          0.5095, -0.4157, -0.2217, -0.4470],
        [ 0.1123,  0.5256,  0.5164, -0.0685, -0.4289, -0.7043, -0.0721,  0.3884,
          0.3303, -0.5106,  0.5004, -0.0073,  0.2316, -0.0795,  0.0021, -0.0821,
          0.4403, -0.2475, -0.2706, -0.1518],
        [ 0.2741,  0.4296,  0.5632, -0.0423, -0.3454, -0.5177, -0.0436,  0.4500,
          0.2363, -0.2600,  0.3312, -0.0431,  0.1478,  0.2601,  0.0545,  0.1024,
          0.1976, -0.3729, -0.3928, -0.4924]], grad_fn=<AddmmBackward0>)
After relu: tensor([[0.5179, 0.3581, 0.2407, 0.1851, 0.0000, 0.0000, 0.0000, 0.2630, 0.5479,
         0.0000, 0.0849, 0.0000, 0.3334, 0.0000, 0.0151, 0.0000, 0.5095, 0.0000,
         0.0000, 0.0000],
        [0.1123, 0.5256, 0.5164, 0.0000, 0.0000, 0.0000, 0.0000, 0.3884, 0.3303,
         0.0000, 0.5004, 0.0000, 0.2316, 0.0000, 0.0021

***nn.Sequential()***: an ordered container of modules. Data is passed through all modules following the order defined by nn.Sequential.

In [14]:
seq_module = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3, 28, 28)
output = seq_module(input_image)
print(output.size())

torch.Size([3, 10])


***nn.Softmax()***: scaled logits into value [0, 1] to represent the model's predicted probabilities for each class. dim parameters indicates the dimension along which the sum of values is 1

In [16]:
softmax = nn.Softmax(dim=1)
logits = softmax(output)
print(logits)

tensor([[0.0918, 0.1181, 0.0705, 0.0777, 0.1167, 0.1393, 0.0629, 0.1111, 0.1191,
         0.0928],
        [0.0967, 0.1220, 0.0736, 0.0781, 0.1152, 0.1424, 0.0631, 0.1037, 0.1146,
         0.0905],
        [0.0875, 0.1259, 0.0669, 0.0764, 0.1144, 0.1537, 0.0515, 0.1055, 0.1046,
         0.1135]], grad_fn=<SoftmaxBackward0>)


**Topic 4: Model Parameters**

Many layers inside the neural netowrks are parameterized: i.e. have associated weights and biases that are optimized during training. nn.Module automatically tracked all fields inside the neural network, and make parameters accessible using ***parameters()*** or ***named_parameters()*** methods.

In [17]:
print(f"Model structure: \n{model}\n")

for name, param in  model.named_parameters():
  print(f"Layer: {name}|Size: {param.size()}|Value: {param[:2]}\n")

Model structure: 
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

Layer: linear_relu_stack.0.weight|Size: torch.Size([512, 784])|Value: tensor([[-0.0047, -0.0289, -0.0276,  ..., -0.0083, -0.0299, -0.0084],
        [ 0.0179, -0.0047,  0.0353,  ...,  0.0047, -0.0096,  0.0104]],
       grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias|Size: torch.Size([512])|Value: tensor([0.0064, 0.0269], grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight|Size: torch.Size([512, 512])|Value: tensor([[ 0.0111, -0.0299,  0.0100,  ...,  0.0328, -0.0416, -0.0039],
        [-0.0281,  0.0113,  0.0401,  ...,  0.0386,  0.0225,  0.0413]],
       grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias|Size: torch.Size([512])|V