## Introduction to the Multilayer Perceptron (MLP)

### Motivation: Limitations of Logistic regression

LR is a very powerful and simple to use tool, however it has its limitations.
* It can only perform linear classification
* Therefore is not useful to classificate complex relationships

![imagen.png](attachment:imagen.png)


Image recovered from: https://www.mathematik.uni-muenchen.de/~deckert/teaching/SS18/sec-steps.html

### Solution

The solution was to introduce the MLP, which is an extension of the logistic regression. 
It is called deep learning because by having several layers we add depth to the network, which increase the complexity of the model.

![image.png](attachment:image.png)

In simple words consist in creating several logistic regression models, where the output of the input model, goes to the input of the hidden layer model.
It is possible to use more than one hidden layer, deppending on the complexity of the problem

![image.png](attachment:image.png)

### Note: NN connections

![image.png](attachment:image.png)

## Practice

Continuing the problem of the MNIST data set classification, a MLP will be implemented to improve the model

In [None]:
import torch
import torchvision
import numpy as np
from torchvision import datasets, transforms
from torch.utils.data import SubsetRandomSampler
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F

### Step 1. Prepare the data

In [None]:
# Same as last session.
batch_size = 100

def split_idxs(set_size, percentage):
    # Size of validation data set
    val_size = int(percentage*set_size)
    # Create a random permutation of 0 to n-1
    idxs = np.random.permutation(set_size)
    # Pick first val_size indices for validation set
    return idxs[:val_size], idxs[val_size:]  # validation set idxs, training set idxs

# No necessary to download
dataset = datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_set = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor(), download=True)

val_idxs, train_idxs = split_idxs(len(dataset), 0.2)

train_sampler = SubsetRandomSampler(train_idxs) 
train_loader = DataLoader(dataset, batch_size, sampler=train_sampler)

val_sampler = SubsetRandomSampler(val_idxs)
val_loader = DataLoader(dataset, batch_size, sampler=val_sampler)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



### Define the model

In [None]:
class MNIST_MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        # Hidden layer
        self.linear1 = nn.Linear(input_size, hidden_size)
        # Output layer
        self.linear2 = nn.Linear(hidden_size, output_size)
        
#     def __init__(self, input_size, hidden_size, hidden_size2, output_size):
#         super().__init__()
#         # 1st Hidden layer
#         self.linear1 = nn.Linear(input_size, hidden_size)
#         # 2nd Hidden layer
#         self.linear2 = nn.Linear(hidden_size, hidden_size2)
#         # Output layer
#         self.linear3 = nn.Linear(hidden_size2, output_size)

    def forward(self, x):
        x = x.view(x.size(0),-1)
        out = self.linear1(x)
        out = F.relu(out)
        out = self.linear2(out)
        return out

In [None]:
input_size = 28*28
num_classes = 10

model = MNIST_MLP(input_size, hidden_size=32, output_size=num_classes)


### Helper functions

In [None]:
def loss_batch(model, loss_fn, x, y, optimizer=None, metric=None):
    # Calculate loss
    # Step 2. Generate predictions
    predictions = model(x)
    # Step 3. Calculate the loss.
    loss = loss_fn(predictions, y)
    
    if optimizer is not None:
        # Compute gradients
        # Step 3.
        loss.backward()
        # Update parameters
        # Step 4.
        optimizer.step()
        # Reset gradients
        # Step 5.
        optimizer.zero_grad()
        
    metric_result=None
    if metric is not None:
        metric_result = metric(predictions, y)
        
    return loss.item(), len(x), metric_result


def evaluate(model, loss_fn, val_loader, metric=None):
    with torch.no_grad():
        # Pass each validation batch through the model
        results = [loss_batch(model, loss_fn, x, y, metric=metric) for x, y in val_loader]
        
        # Unzip
        losses, elements, metrics = zip(*results)
        # Total size of the data set
        total = np.sum(elements)
        # Average loss
        avg_loss = np.sum(np.multiply(losses, elements)) / total
        # Avg metric
        if metric is not None:
            avg_metric = np.sum(np.multiply(metrics, elements)) / total
        return avg_loss, total, avg_metric
    
    
def accuracy(outputs, labels):
    _, predictions = torch.max(outputs, dim=1)
    return torch.sum(predictions == labels).item() / len(predictions)


def fit(epochs, lr, model, loss_fn, train_loader, val_loader, optimizer=None, metric=None):
    if optimizer is None:
        optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    
    for epoch in range(epochs):
        for x, y in train_loader:
            loss,_,_ = loss_batch(model, loss_fn, x, y, optimizer)
        # Evaluation per epoch
        result = evaluate(model, loss_fn, val_loader, metric)
        val_loss, total, val_metric = result
        
        # Print eval
        if metric is None:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, epochs, val_loss))
        else:
            print('Epoch [{}/{}], Loss: {:.4f}, Accuracy: {:.4f}'.format(epoch+1, epochs, val_loss, val_metric))

In [None]:
# Train the model

fit(10, 0.1, model, F.cross_entropy, train_loader, val_loader, metric=accuracy)

Epoch [1/10], Loss: 0.3540, Accuracy: 0.9030
Epoch [2/10], Loss: 0.2965, Accuracy: 0.9189
Epoch [3/10], Loss: 0.2647, Accuracy: 0.9272
Epoch [4/10], Loss: 0.2426, Accuracy: 0.9327
Epoch [5/10], Loss: 0.2224, Accuracy: 0.9393
Epoch [6/10], Loss: 0.2028, Accuracy: 0.9433
Epoch [7/10], Loss: 0.1944, Accuracy: 0.9455
Epoch [8/10], Loss: 0.1805, Accuracy: 0.9491
Epoch [9/10], Loss: 0.1735, Accuracy: 0.9515
Epoch [10/10], Loss: 0.1669, Accuracy: 0.9554


## Using GPU

In [None]:
# Step 1: Check if GPU is available
torch.cuda.is_available()

False

In [None]:
# Step 2. Select device

def get_default_device():
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
    
device = get_default_device()
print(device)

cpu


In [None]:
# Step 3. Move data and model to device

def to_device(data, device):
    if isinstance(data, (list, tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

# Wrapper to move data to a device
class DeviceDataLoader():
    def __init__(self, data_loader, device):
        self.dl = data_loader
        self.device = device
        
    def __iter__(self):
        for batch in self.dl:
            yield to_device(batch, self.device)
            
    def __len__(self):
        return(len.dl)
    
# Pass our data loaders to GPU
train_dl = DeviceDataLoader(train_loader, device)
val_dl = DeviceDataLoader(val_loader, device)

In [None]:
# Train using GPU

model2 = MNIST_MLP(input_size, hidden_size=32, output_size=num_classes)
to_device(model, device)

fit(12,0.2,model2, F.cross_entropy, train_dl, val_dl, metric=accuracy)
fit(5, 0.1, model2, F.cross_entropy, train_dl, val_dl, metric=accuracy)

Epoch [1/12], Loss: 0.3371, Accuracy: 0.9001
Epoch [2/12], Loss: 0.2400, Accuracy: 0.9325
Epoch [3/12], Loss: 0.2150, Accuracy: 0.9393
Epoch [4/12], Loss: 0.1908, Accuracy: 0.9428
Epoch [5/12], Loss: 0.1814, Accuracy: 0.9487
Epoch [6/12], Loss: 0.1704, Accuracy: 0.9514
Epoch [7/12], Loss: 0.1611, Accuracy: 0.9535
Epoch [8/12], Loss: 0.1453, Accuracy: 0.9578
Epoch [9/12], Loss: 0.1470, Accuracy: 0.9577
Epoch [10/12], Loss: 0.1489, Accuracy: 0.9569
Epoch [11/12], Loss: 0.1461, Accuracy: 0.9585
Epoch [12/12], Loss: 0.1360, Accuracy: 0.9617
Epoch [1/5], Loss: 0.1315, Accuracy: 0.9608
Epoch [2/5], Loss: 0.1290, Accuracy: 0.9634
Epoch [3/5], Loss: 0.1295, Accuracy: 0.9621
Epoch [4/5], Loss: 0.1271, Accuracy: 0.9623
Epoch [5/5], Loss: 0.1278, Accuracy: 0.9623


In [None]:
# Test

# Expected accuracy: 97%
test_loader = DataLoader(test_set, batch_size)
result = evaluate(model2, F.cross_entropy, test_loader, accuracy)
result

(0.11287384209805168, 10000, 0.9672)