In [133]:
import pandas as pd 
import numpy as np 
import os
import torch
from torch import nn

# Deep Neural Networks Laboratory

## A first, shallow Neural Network

To check if the CUDA version of Pytorch was installed successfully, only if your machine supports CUDA, you can do:

In [134]:
torch.cuda.is_available()

True

The input of a neural network will be composed of two tensors: the sample and the label.

Imagine we have a classification task with 3 possible labels (0,1,2) and samples of 5 features (5-dimensional samples).

We will have tensors like:

In [135]:
x = torch.rand(1,5)
y = torch.randint(3, (1,)).long()

In [136]:
x

tensor([[0.9165, 0.8504, 0.4030, 0.9713, 0.4783]])

In [137]:
y

tensor([1])

Let us define a shallow, very simple Neural Network.

The Neural Network will have: 
1. As many input neurons as sample features (5 in this case)
2. An hidden layer 
3. As many ouput neurons as possible labels (3 in this case)

![images/shallow_nn.png](images/shallow_nn.png) 

In [138]:
class ShallowNeuralNetwork(nn.Module):
    def __init__(self, input_size, num_hidden, output_size):
        super().__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.fc1 = nn.Linear(input_size, num_hidden)
        self.output_layer = nn.Linear(num_hidden, output_size)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.output_layer(x)
        x = self.softmax(x)
        return x

## Step by step computation

So, we start with the tensor x.

In [139]:
x

tensor([[0.9165, 0.8504, 0.4030, 0.9713, 0.4783]])

x goes through the hidden layer. What this means is that we obtain the output:

$ Wx + b $ where:
1. W is the weight matrix of the connections
2. b is the bias vector

These values depend on the chosen initialization and might be random at first.

In [140]:
hidden_layer = nn.Linear(5,10)

print( hidden_layer.weight)

print( hidden_layer.bias )

Parameter containing:
tensor([[-0.3646,  0.2963, -0.2007,  0.3272, -0.0337],
        [-0.0173,  0.1944,  0.0164,  0.1191,  0.4206],
        [-0.3664,  0.0666, -0.4121,  0.3695,  0.3009],
        [ 0.2638, -0.2859,  0.0034, -0.1858,  0.4321],
        [-0.3570, -0.0592,  0.0775, -0.0298,  0.2510],
        [-0.2177, -0.3864, -0.2467,  0.3437,  0.2321],
        [ 0.2404,  0.3210,  0.2272, -0.3242, -0.4134],
        [-0.1686,  0.0423, -0.4178, -0.1454,  0.1543],
        [-0.4054,  0.0991, -0.1222, -0.3320,  0.1154],
        [ 0.2185, -0.2747,  0.3672,  0.1464,  0.0720]], requires_grad=True)
Parameter containing:
tensor([ 0.4298,  0.3314, -0.0127, -0.0022, -0.0920, -0.3345, -0.1043, -0.3490,
         0.4319,  0.1284], requires_grad=True)


In [141]:
w = hidden_layer.weight
b = hidden_layer.bias

output = torch.matmul(x, w.t()) + b
output

tensor([[ 0.5684,  0.8043,  0.0449,  0.0240, -0.3472, -0.5172, -0.0321, -0.7034,
         -0.1720,  0.4197]], grad_fn=<AddBackward0>)

If we apply the hidden layer, we obtain the same results:

In [142]:
hidden_layer(x)

tensor([[ 0.5684,  0.8043,  0.0449,  0.0240, -0.3472, -0.5172, -0.0321, -0.7034,
         -0.1720,  0.4197]], grad_fn=<AddmmBackward0>)

![images/dnn_2.jpeg](images/dnn_2.jpeg) 

In order to introduce non-linearity into our net, allowing it to learn more complex patterns, we pass the output of the FC layer through the ReLU activation function, which looks like this:


![images/relu.png](images/relu.png) 

In [143]:
output

tensor([[ 0.5684,  0.8043,  0.0449,  0.0240, -0.3472, -0.5172, -0.0321, -0.7034,
         -0.1720,  0.4197]], grad_fn=<AddBackward0>)

In [144]:
relu = nn.ReLU()

output = relu(output)
output

tensor([[0.5684, 0.8043, 0.0449, 0.0240, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.4197]], grad_fn=<ReluBackward0>)

In [145]:
relu(hidden_layer(x))

tensor([[0.5684, 0.8043, 0.0449, 0.0240, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.4197]], grad_fn=<ReluBackward0>)

Now, we go through another round of linear multiplication from the nodes inside the hidden layer to the nodes in the output layer.

In [146]:
output_layer = nn.Linear(10,3)

weights_2 = output_layer.weight 
bias_2 = output_layer.bias

output = torch.matmul(output, weights_2.t()) + bias_2
output

tensor([[-0.4243,  0.1639,  0.5530]], grad_fn=<AddBackward0>)

In [147]:
output_layer(relu(hidden_layer(x)))

tensor([[-0.4243,  0.1639,  0.5530]], grad_fn=<AddmmBackward0>)

We have obtained a tensor of the same size of the output layer, which is the same size of the number of possible labels in our classification task.


In order to convert these values to probabilities, we apply the SoftMax activation function.

![images/softmax.png](images/softmax.png) 

In [148]:
softmax = nn.Softmax(dim=1)

output = softmax(output)
output

tensor([[0.1832, 0.3299, 0.4869]], grad_fn=<SoftmaxBackward0>)

In [149]:
softmax(output_layer(relu(hidden_layer(x))))

tensor([[0.1832, 0.3299, 0.4869]], grad_fn=<SoftmaxBackward0>)

The outputs are the probabilities of the sample being labeled with each one of the labels (0,1,2).

We manually implemented a forward pass of the network we defined at the start:

In [150]:
class ShallowNeuralNetwork(nn.Module):
    def __init__(self, input_size, num_hidden, output_size):
        super().__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.fc1 = nn.Linear(input_size, num_hidden)
        self.output_layer = nn.Linear(num_hidden, output_size)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.output_layer(x)
        x = self.softmax(x)
        return x

In [151]:
shallow_nn = ShallowNeuralNetwork(5,10,3)

The values might be slightly different due to the randomization of the starting weights:

In [152]:
shallow_nn(x)

tensor([[0.2472, 0.4404, 0.3124]], grad_fn=<SoftmaxBackward0>)