## TLDR:
- Custom neural networks are defined as a subclass of `Module()`
- Components can be combined sequentially to form a network
- Models and their parameters can be inspected using various methods

## Get the Training Device

In [1]:
import torch
from torch import nn

# Use the best possible available device for training
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


Checking `.is_available()` on various Pytorch backends can be used to find the optimal device for training

## Defining a Custom Neural Network

In [2]:
# We can define a neural network as a subclass of
# `Module()`
class MyNeuralNetwork(nn.Module):
  # `__init__` initializes the structure of the network
  # as well as the parameters and names of its components
  def __init__(self):
    super().__init__()
    self.flatten = nn.Flatten()
    self.linear_relu_stack = nn.Sequential(
      nn.Linear(28 * 28, 512),
      nn.ReLU(),
      nn.Linear(512, 512),
      nn.ReLU(),
      nn.Linear(512, 10),
    )

  # `forward` implements the operations done on the input
  # data, or, in simpler terms, what the model does
  def forward(self, x):
    x = self.flatten(x)
    logits = self.linear_relu_stack(x)
    return logits

# Initialize necessary modules
model = MyNeuralNetwork().to(device)
softmax = nn.Softmax(dim=1)

# Show scaled model output
x = torch.rand((1, 28, 28)).to(device)
probs = softmax(model(x))
y_pred = probs.argmax(1)
print(f"Predicted class: {y_pred.item()}")

Predicted class: 9


When defining a custom neural network, the following needs to be defined:
- The structure of the model
- What the model does

# Components of the Model

In [3]:
# Generate a minibatch of three 28x28 images
images = torch.rand(3, 28, 28).to(device)
print(f"Images size: {images.size()}")
print()

# The `Flatten()` module is in charge of flattening each image
# in the minibatch into a contiguous array. The default
# arguments of `Flatten()` preserve the minibatch dimension
flatten = nn.Flatten()
after_flatten = flatten(images)
print(f"Size after flatten layer: {after_flatten.size()}")
print()

# The `Linear()` module passes each of its input features
# through a linear transformation `m * x + b`, where `m`,
# `x` and `b` are the weights, input features and biases
# respectively
linear = nn.Linear(in_features=28 * 28, out_features=20)
after_linear = linear(after_flatten)
print(f"Size after linear layer: {after_linear.size()}")
print()

# The `ReLU()` module applies a non-linearity to the input
# features, which provides the model a means of approximating
# the shape of the true mapping between the input and outputs
# of the model
relu = nn.ReLU()
after_relu = relu(after_linear)
print(f"Before ReLU: {after_linear}")
print(f"After ReLU: {after_relu}")
print()

# `Sequential` is an ordered container of modules, passing
# data through all its modules in the order in which they are
# defined
sequential = nn.Sequential(
  flatten,
  linear,
  relu,
)
images = torch.rand(3, 28, 28).to(device)
logits = sequential(images)
print(f"Logits: {logits}")
print()

# Since the final layer of the neural network returns
# "logits", values in range `(-infinity, infinity)`, the
# `Softmax()` module can be used to scale the logits to
# predicted probabilities for each class
softmax = nn.Softmax(dim=1)
probs = softmax(logits)
print(f"Softmax of logits: {logits}")
print()

Images size: torch.Size([3, 28, 28])

Size after flatten layer: torch.Size([3, 784])

Size after linear layer: torch.Size([3, 20])

Before ReLU: tensor([[-0.1457,  0.3225,  0.1064, -0.3286, -0.0615, -0.0990, -0.1113, -0.1257,
         -0.1581,  0.2575, -0.0024, -0.1774, -0.0598,  0.0683,  0.3196,  0.2255,
         -0.2396, -0.4253,  0.3154, -0.2214],
        [-0.3872,  0.4018, -0.0428, -0.2538, -0.2778,  0.1549,  0.0825, -0.2577,
         -0.1063,  0.6018, -0.1120, -0.1484, -0.0986, -0.0574,  0.4554,  0.2962,
         -0.0538,  0.2258,  0.3814, -0.2245],
        [ 0.0629,  0.7890, -0.0446, -0.2031, -0.0312,  0.3248, -0.0231, -0.1580,
         -0.0016, -0.2692,  0.3072, -0.0067,  0.0764, -0.1213,  0.3454, -0.1110,
          0.0645, -0.1058,  0.3771, -0.4963]], grad_fn=<AddmmBackward0>)
After ReLU: tensor([[0.0000, 0.3225, 0.1064, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.2575, 0.0000, 0.0000, 0.0000, 0.0683, 0.3196, 0.2255, 0.0000, 0.0000,
         0.3154, 0.0000],
    

We can use different layers to represent different operations, and piece them together sequentially

# Deeper Look at the Model

In [4]:
# Print model structure
print(f"Model structure: {model}")
print()

# Print each named parameter
for name, param in model.named_parameters():
  print(f"Name: {name}, size: {param.size()}, values:\n{param[:2]}")
  print()

Model structure: MyNeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

Name: linear_relu_stack.0.weight, size: torch.Size([512, 784]), values:
tensor([[-0.0083, -0.0301, -0.0188,  ...,  0.0101,  0.0274,  0.0128],
        [ 0.0261,  0.0138, -0.0259,  ...,  0.0073, -0.0028,  0.0069]],
       grad_fn=<SliceBackward0>)

Name: linear_relu_stack.0.bias, size: torch.Size([512]), values:
tensor([-0.0233, -0.0044], grad_fn=<SliceBackward0>)

Name: linear_relu_stack.2.weight, size: torch.Size([512, 512]), values:
tensor([[ 0.0367,  0.0334,  0.0374,  ..., -0.0055,  0.0107, -0.0053],
        [-0.0197,  0.0325,  0.0245,  ...,  0.0338,  0.0339,  0.0318]],
       grad_fn=<SliceBackward0>)

Name: linear_relu_stack.2.bias, size: torch.Size

There are various ways of representing what is inside the model:
- Stringifying the model results in a string showing the structure and components of the model
- `.named_parameters()` returns the names and values of all named parameters within the model