In [10]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

## Theory

Machine Learning: relies on hand crafted feature engineering

Deep Learning: enables feature learning from raw data. It is a subset of machine learning and requires a lot of data. It can handle:
- torchvision: images
- torchaudio: audio
- torchtext: text


### Tensor definition

In [14]:
# from list
input_list = [22,46,55,6,7,81,54,6,100]
input_t = torch.tensor(input_list)

# from numpy array
input_np = np.array([23,44,56,78,44,64,98])
input_t_np = torch.from_numpy(input_np)

# random
input_t_rand = torch.rand(3,3)

# zeros
input_t_zeros = torch.zeros(2,3)

# ones
input_t_ones = torch.ones(3,6)

### Tensor attributes

In [18]:
## attributes
shape = input_t.shape
type = input_t.dtype
device = input_t.device

print(f'Shape: {shape}, Type: {type}, Device: {device}')

Shape: torch.Size([9]), Type: torch.int64, Device: cpu


### First Neural Network

Define a neural Network with one linear layer which takes:
- Input of size n
- Applies a linear function
- Return an output with size m

In [35]:
# First Neural Network with 3 features as input and 2 node as output
input_tensor = torch.tensor([[0.3471, 0.457, -0.2356]])
# initialize the linear layer
linear_layer = nn.Linear(in_features=3, out_features=2)
print(f' Weigth: {linear_layer.weight}, \n Bias: {linear_layer.bias}')
# generate output
output = linear_layer(input_tensor)

 Weigth: Parameter containing:
tensor([[-0.0459,  0.0295,  0.2518],
        [ 0.4441,  0.5070,  0.0743]], requires_grad=True), 
 Bias: Parameter containing:
tensor([0.5265, 0.1087], requires_grad=True)


In [38]:
# Stacking multiple layers: create network with three linear layers
model = nn.Sequential(
    nn.Linear(10,18),
    nn.Linear(18, 20),
    nn.Linear(20,5)
)

### Activation function

An activation function add the non linearity inside the neural network. In this way a model can learn more complex relationship. Otherwise the output of many W @ x + b would be a linear function hence many layers could still be summarized with only one.
- Sigmoid Function: used in binary classification. Takes an input and gives an output between 0-1 which can be interpreted as a probability with a threshold of 0.5. This function is $$ \sigma(x) = \frac{1}{1 + e^{-x}} $$
- softmax:  is a mathematical function that takes as input a vector of arbitrary real-valued numbers and normalizes it into a probability distribution consisting of values between 0 and 1 that sum to 1. It's commonly used in machine learning for multi-class classification problems $$ y_i = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}} $$.

In [39]:
model_with_sigmoid = nn.Sequential(
    nn.Linear(5,4),
    nn.Linear(4, 1),
    nn.Sigmoid() # if not specified it takes the last input which should be 1-D
)

In [42]:
model_with_softmax = nn.Sequential(
    nn.Linear(5,4),
    nn.Linear(4, 1),
    nn.Softmax(dim=-1) # indicates that the softmax is applied along the last dimension of the input tensor
)

### Forward pass
- Input data is passed forward or propagated through a network
- Computations are performed at each layer
- Output of each layer passed to each subsequential layer
- Output of final layer: "prediction"
  
It is used both for training and prediction

### Backward pass (Backpropagation)

It is used to update weights and biases during training. it is a complementary step to the forward pass. More in general:
1. Propagate data forward
2. Compare outouts to true values (ground truth)
3. Backpropagate to update model and biases
4. Repeat until weights and biases are tuned to produce useful outputs

In [44]:
input_tensor_ex = torch.rand(1,6)

In [45]:
input_tensor_ex.shape

torch.Size([1, 6])

### Example

In [47]:
# Binary Classification
binary = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])
model_binary = nn.Sequential(
    nn.Linear(8,4),
    nn.Linear(4,3),
    nn.Linear(3,1),
    nn.Sigmoid()
)

output_binary = model_binary(binary)
print(f'Binary Classification Output {output_binary}')


# Multi Class Classification
n_classes = 4
multiclass = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])
model_multiclass = nn.Sequential(
    nn.Linear(8,4),
    nn.Linear(4,3),
    nn.Linear(3,4),
    nn.Softmax(dim=-1)
)

output_multiclass = model_multiclass(multiclass)
print(f'Binary Classification Output {output_multiclass}')

Binary Classification Output tensor([[0.6717]], grad_fn=<SigmoidBackward0>)
Binary Classification Output tensor([[0.8205, 0.0070, 0.1243, 0.0482]], grad_fn=<SoftmaxBackward0>)


### Loss Function
- gives feedback to the model during training
- takes in model prediction $ \hat{y_i} $ and ground truth $ y_i $
- Output a float

It must be highlighted that $ y_i $ is a single integer (the class label) while $ \hat{y_i} $ is a tensor (output of the softmax function) that might be of different sizes.  

To compare an integer to a tensor we use one-hot encoding where the ground truth is returned as a tensor with 1 in the class it is representing and 0 o the others. In this way we allow the calculation between the output and the ground truth. The conversion of the labels in tensors is done by torch.nn.functional.one_hot()

Among others we have the following loss functions:
1. Cross Entropy: for classification problems $
\text{Cross-Entropy Loss} = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right)$\]
2. Mean Square Error MSE: $ \text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 $


In [48]:
# create a tensor from a label where we want the first class to be true
import torch.nn.functional as F
F.one_hot(torch.tensor(0), num_classes =3)

tensor([1, 0, 0])

In [50]:
# declare the loss function
from torch.nn import CrossEntropyLoss
scores = torch.tensor([[-0.121, 0.1059]])
one_hot_target = torch.tensor([[1,0]])
criterion = CrossEntropyLoss()
criterion(scores.double(), one_hot_target.double())

tensor(0.8130, dtype=torch.float64)