# Build the Neural Network
To build the neural network, we need the framework of [PyTorch](https://pytorch.org/). You can refer to the website for installation and other informations.
After installation PyTorch (if you are using Google Colab, you do NOT need to worry about installation), you can call torch to build your own models. Now, let's start.

In one of the lectures, we saw that a simple feedforward neural network includes the:
1. input layer,
2. hidden layer, and
3. output layer.

In Pytorch, the [torch.nn](https://pytorch.org/docs/stable/nn.html) can be used to constructed the neural network models.
The [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) is the base class for all neural network modules, which contains layers, and a method ``forward(input)`` that returns the <b>output</b>.
Let's have a look the details.

In the next section, we'll build a neural network to classify images in the FashionMNIST dataset.


In [None]:
%matplotlib inline

# Load the package
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import torch.nn.functional as F
# torch.manual_seed(0)

## Get Device for Training model

You can train the model on a hardware accelerator like the GPU, if it is available.
Let's check to see if
[torch.cuda](https://pytorch.org/docs/stable/notes/cuda.html) is available, else we
continue to use the CPU.



In [None]:
# choose the device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

If the output above shows 'cpu', it means that PyTorch was unable to detect GPUs. If you are running this using Google Colab, you can change it to use GPU for free. Just go to Runtime > Change Runtime Type and choose GPU.
Note that there is a limit to GPU usage and once you reach that limit, you will need to pay. However, we will not go into that limit in this section.

## 1- Define the Neural Network

We define our neural network by subclassing ``nn.Module``, and
initialize the neural network layers in ``__init__``. Every ``nn.Module`` subclass implements
the operations on input data in the ``forward`` method.



In [None]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        # here we define the structure
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),  # 10 here means that there are 10 classes
        )

    def forward(self, x):
        # and here we define the feed-forward step
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Now, we create an instance of ``NeuralNetwork``, i.e. ``model``, and move it to the ``device`` (meaning 'cpu' or 'gpu'), then print its structure.



In [None]:
# create "model" as network structure
model = NeuralNetwork().to(device)
print(model)


To use the model, we feed it a random input. This executes the model's ``forward``,
along with some [background operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866).
Do not call ``model.forward()`` directly!

Calling the model on the input returns a 2-dimensional tensor:
- the first dimension corresponds to each output of 10 raw predicted values for each class, and
- the second dimension corresponds to the individual values of each output.

We get the prediction probabilities by passing it through an instance of the ``nn.Softmax`` module.



In [None]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)  # this generates the predictions
print(logits)
print(logits.size())  # 1 sample x 10 classes

pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

#### Gradient

Let's now zero the gradient buffers of all parameters and then perform a ackpropagation with random gradients.
Although this may not make a lot of sense now, it will be useful later when we train the model for real.


In [None]:
# model gradient initialize
model.zero_grad()
logits.backward(torch.randn(1,10))
print(logits.size())

#### Loss Function

A loss function takes the `(output, target)` pair of inputs, and computes a
value that estimates how far away the output is from the target.

There are several different
[loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions) in the nn package.
A simple loss is: ``nn.MSELoss``, which computes the mean-squared error
between the output and the target.

In [None]:
# a dummy target, for example
target = torch.randn(10)

# make it the same shape as output
target = target.view(1, -1)
print(target.size())

# define the loss function
criterion = nn.MSELoss()
loss = criterion(logits, target)

print(loss)

## 2- Each Layer Analysis

Now, we analyze each layer in the model.
To illustrate it, we will take a sample minibatch of 3 with size 28x28 using [nn.Rand](https://pytorch.org/docs/stable/generated/torch.rand.html). Then we pass it through the network and do the further processing.

In [None]:
input = torch.rand(3,28,28)
print(input.size())

### nn.Flatten
We initialize the [nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) layer to convert each 28x28 input into a contiguous array of 784 values (
the minibatch dimension (at dim=0 / first dimension) is maintained).



In [None]:
flatten = nn.Flatten()
flat_input = flatten(input)
print(flat_input.size())

### nn.Linear
The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
is a module that applies a linear transformation on the input using its stored weights and biases.




In [None]:
layer1 = nn.Linear(in_features=28*28, out_features=20)  # this will convert 28*28 values into 20
hidden1 = layer1(flat_input)
print(hidden1.size())

### nn.ReLU
Non-linear activations are what create the complex mappings between the model's inputs and outputs.
They are applied after linear transformations to introduce *nonlinearity*, helping neural networks
learn a wide variety of phenomena.

In this model, we use [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) between our
linear layers, but there's [other activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) to introduce non-linearity in your model.



In [None]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

### nn.Sequential
[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered
container of modules. The data is passed through all the modules in the same order as defined. You can use
sequential containers to put together a quick network like ``seq_modules``.



In [None]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input = torch.rand(3,28,28)

logits = seq_modules(input)
print(logits.shape)

### nn.Softmax
The last linear layer of the neural network returns `logits` - raw values in $[-\infty, \infty]$ - which are passed to the
[nn.Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html#torch.nn.Softmax) module. The logits are scaled to values
[0, 1] representing the model's predicted probabilities for each class. ``dim`` parameter indicates the dimension along
which the values must sum to 1.



In [None]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)
print(pred_probab)
print(pred_probab.size())


## 3- Model Parameters

In the model, we define the ``forward function`` in ``NeuralNetwork``, the backward function can be automatically calculated by ``autograd`` in PyTorch. No need to worry about the `autograd`, just think that it will work all the backpropagation for us.

Many layers of a model are *parameterized*, i.e. have associated weights
and biases that are optimized during training. Subclassing ``nn.Module`` automatically
tracks all fields defined inside your model object, and makes all parameters
accessible using your model's ``parameters()`` or ``named_parameters()`` methods.

In this example, we iterate over each learnable parameter, and print its size and its values.



In [None]:
params = list(model.parameters())
print(len(params))
print(params[0].size())

print(f"Model structure: {model}\n\n")
for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

## 4- Update the Weights of the networks
After we obtain the parameters by feeding input into network, we can update the parameters based on the optimizer.

In practice, the simplest update rule is the Stochastic Gradient Descent (SGD):

$$ weight = weight - learning rate * gradient $$

By updating the parameters, we complete the whole training process and achieve the best parameters.

Now, let's have a look a simple example with [torch.optim](https://pytorch.org/docs/stable/optim.html).



In [None]:
#  import the package
import torch.optim as optim

# create your optimizer with SGD and learning rate
optimizer = optim.SGD(model.parameters(), lr=0.01)

# a dummy target, for example
target = torch.randn(10)
# make it the same shape as output
target = target.view(1, -1)   # 1x10
print(target.size())

# in the training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = model(input)  # prediction
loss = criterion(output, target)  # calculate the loss
loss.backward()  # compute the backpropagation
optimizer.step()    # Does the update

print(loss)

--------------




## Further Reading
You can refer the following website for further information.

- [torch.nn API](https://pytorch.org/docs/stable/nn.html)
- [tutorials](https://pytorch.org/tutorials/)

