## Composing the Neural Network Model

- A Neural Network consists of layers and modules that perform operations on the data.
- All modules in PyTorch are the subclass of `nn.Module`.
- A Neural Network is the module composed of different modules.
- Superimposed structures like these make it easy to bulid and manage complex architectures.

In [1]:
import os
import torch
from torch import nn
from torch.utils.data import dataloader
from torchvision import datasets, transforms 

In [2]:
# Gets device for learning.
device = ("cuda" if torch.cuda.is_available()
          else "mps" if torch.backends.mps.is_available()
          else "cpu")

print(f"Using {device} device for learning.")

Using cuda device for learning.


### Define the Class
- Define the Neural Network Model as the subclass of `nn.Module` and initialize all layers at `__init__`.
- All of the classes that inherit `nn.Module` implement operations on the input data in the `forward` method.

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()

        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 18),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [4]:
# Creates an instance of NeuralNetwork class, moves it to device and prints its layer structure.
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=18, bias=True)
  )
)


In [5]:
# Applies input data to use the model. It executes forward method with some background operations.
# Note: DO NOT call the model.forward() method directly.
X = torch.rand(1, 28, 28, device=device)

logits = model(X)
pred_proba = nn.Softmax(dim=1)(logits)
y_pred = pred_proba.argmax(1)

print(f"Predicted Class: {y_pred}")

Predicted Class: tensor([8], device='cuda:0')


### The Layers of the Model

In [6]:
input_image = torch.rand(3, 28, 28)
print(input_image.size())

torch.Size([3, 28, 28])


#### nn.Flatten
- Initialize the `nn.Flatten` layer to convert each 2D Image(28*28 size in example) into a contiguous array with a specific pixel value(784 in example).
- Dimentions of mini batch(dim=0) will be kept.

In [7]:
flatten = nn.Flatten()
flat_image = flatten(input_image)

print(flat_image.size())

torch.Size([3, 784])


#### nn.Linear
- Linear layer applies linear transformation to input with saved weight and bias.

In [8]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)

print(hidden1.size())

torch.Size([3, 20])


#### nn.ReLU
- Nonlinear Activation makes a complex map between input and output of the model.
- Nonlinear Activation is applied after Linear Transformation, introducing nonlinearity, and helps Newral Networks learn many different phenomena.
- In this example, `nn.ReLU` is used between linear layers, but other activation functions with nonlinearity can also be introduced when creating the model.

In [9]:
# ReLu activation function sets any input with a value less than 0 to 0.
# If the input value is the same or greater than 0, it retains that value.
print(f"Before applying ReLU: {hidden1}\n\n")

hidden1 = nn.ReLU()(hidden1)
print(f"After applying ReLU: {hidden1}")

Before applying ReLU: tensor([[-0.2942,  0.2204, -0.1483,  0.5273, -0.4020,  0.3496, -0.7781, -0.4195,
         -0.6320,  0.1857, -0.6648,  0.4757,  0.1968, -0.1052,  0.1250,  0.0185,
         -0.4808, -0.0561,  0.2680, -0.2458],
        [-0.2719,  0.4207, -0.2440,  0.4605,  0.0188,  0.3676, -0.6601, -0.0324,
         -0.1287,  0.4999, -0.2973,  0.6089, -0.1022, -0.2541,  0.3667,  0.0437,
         -0.3211, -0.0402,  0.4533,  0.0366],
        [-0.6537,  0.0795,  0.3592,  0.3255,  0.1345,  0.0922, -0.7034, -0.2227,
         -0.1431,  0.4892, -0.5470,  0.5832,  0.0438, -0.1328,  0.1814,  0.3746,
         -0.4423,  0.0488,  0.1487, -0.0369]], grad_fn=<AddmmBackward0>)


After applying ReLU: tensor([[0.0000, 0.2204, 0.0000, 0.5273, 0.0000, 0.3496, 0.0000, 0.0000, 0.0000,
         0.1857, 0.0000, 0.4757, 0.1968, 0.0000, 0.1250, 0.0185, 0.0000, 0.0000,
         0.2680, 0.0000],
        [0.0000, 0.4207, 0.0000, 0.4605, 0.0188, 0.3676, 0.0000, 0.0000, 0.0000,
         0.4999, 0.0000, 0.6089, 0.

#### nn.Sequential
- `nn.Sequential` is a container of modules with an order.
- Data is transferred through all modules in the same order as defined.
- With the Sequential Container, you can make Neural Networks like the example below quickly.

In [10]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)

input_image = torch.rand(3, 28, 28)
logits = seq_modules(input_image)

print(logits)

tensor([[ 0.1585,  0.2785, -0.3719,  0.0063, -0.0454,  0.0622, -0.2120,  0.1918,
          0.1283, -0.1359],
        [ 0.1621,  0.2199, -0.3099,  0.0151, -0.0699,  0.0487, -0.2349,  0.1672,
          0.0778, -0.1599],
        [ 0.2185,  0.2299, -0.2581,  0.1749,  0.1395,  0.0176, -0.3048,  0.0776,
          0.0659, -0.2537]], grad_fn=<AddmmBackward0>)


#### nn.Softmax
- The last Linear layer of Neural Network returns logits(the raw value with -∞ ~ ∞ range) that will be transferred to `nn.Softmax` module.
- The *logits* are scaled proportionally to the range between 0 and 1 to represent the predictive probability for each class of the model.
- The `dim` parameter represents a dimension in which the sum of the values is 1.

In [11]:
# Softmax Activation function is used on the output layer of Neural Network that performs multiclass classification.
# If your Neural Network performs binary classification, use Sigmoid Activation Function(nn.Sigmoid) instead.
softmax = nn.Softmax(dim=1)
pred_proba = softmax(logits)

print(pred_proba)

tensor([[0.1144, 0.1290, 0.0673, 0.0983, 0.0933, 0.1039, 0.0790, 0.1183, 0.1110,
         0.0853],
        [0.1169, 0.1239, 0.0729, 0.1009, 0.0927, 0.1044, 0.0786, 0.1175, 0.1075,
         0.0847],
        [0.1208, 0.1222, 0.0750, 0.1157, 0.1117, 0.0989, 0.0716, 0.1050, 0.1037,
         0.0754]], grad_fn=<SoftmaxBackward0>)


### Parameters of Model
- Many layers inside of the Neural Network are parameterized. It means they are associated with weights and biases that are optimized during learning.
- If your model inherits from `nn.Module`, all fields inside model object will be tracked automatically,  
and allow you access to all parameters of model with `parameters()` and `named_parameters()` method.

In [14]:
# Iterate all parameters of model and print their size and value.
print(f"Model Structure: {model}\n\n")

layer_num = 0

for name, param in model.named_parameters():
    layer_num += 1

    print(f"Layer {layer_num}: {name} | Size: {param.size()} | Values: {param[:2]}\n")

Model Structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=18, bias=True)
  )
)


Layer 1: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values: tensor([[-0.0336,  0.0056,  0.0234,  ...,  0.0083,  0.0025,  0.0262],
        [ 0.0246,  0.0068, -0.0349,  ...,  0.0260, -0.0076,  0.0083]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer 2: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values: tensor([-0.0161,  0.0181], device='cuda:0', grad_fn=<SliceBackward0>)

Layer 3: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values: tensor([[-0.0188, -0.0061,  0.0142,  ..., -0.0280, -0.0404,  0.0321],
        [-0.0219,  0.0319,  0.0224,  ..., -0.0413, -0.0041, -0.0151]],
       device='cuda:0', grad_fn=<S