<div>
    <img src="images/emlyon.png" style="height:60px; float:left; padding-right:10px; margin-top:5px" />
    <span>
        <h1 style="padding-bottom:5px;"> Introduction to Deep Learning </h1>
        <a href="https://masters.em-lyon.com/fr/msc-in-data-science-artificial-intelligence-strategy">[DSAIS]</a> MSc in Data Science & Artificial Intelligence Strategy <br/>
         Paris | © Saeed VARASTEH
    </span>
</div>

## Lecture 03 : Model Training: Regression

In this lecture we're going to cover a standard PyTorch training workflow, specifically:

- Getting data ready
- Building the model
- Fitting the model to data (training)
- Evaluating the model
- Making predications

on our previous simple regression model.

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

np.random.seed(42)
torch.manual_seed(42)
device = 'cpu'

### Data

In [None]:
# Data Generation
X = np.random.rand(100, 1)
y = 2 * X + 1. + .1 * np.random.randn(100, 1)

# Shuffles the indices
idx = np.arange(100)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:80]
# Uses the remaining indices for validation
test_idx = idx[80:]

# Generates train and test sets
X_train, y_train = X[train_idx], y[train_idx]
X_test, y_test = X[test_idx], y[test_idx]

# Visualize data
fig, ax = plt.subplots(1,2, figsize=(10,4))
ax[0].scatter(X_train,y_train, c="b", label="train data"); ax[0].legend();
ax[1].scatter(X_test,y_test, c="r", label="test data"); ax[1].legend();

### PyTorch Datasets

<div class="alert-success">
    In PyTorch, a dataset is represented by a regular Python class that inherits from the <b>Dataset</b> class.
</div>

You can think of it as a kind of a Python list of tuples, each tuple corresponding to one point (features, label).

The most fundamental methods it needs to implement are:

- `__init__(self)`: it takes whatever arguments needed to build a list of tuples — it may be the name of a CSV file that will be loaded and processed; it may be two tensors, one for features, another one for labels; or anything else, depending on the task at hand.

- `__getitem__(self, index)`: it allows the dataset to be indexed, so it can work like a list (dataset[i]) — it must return a tuple (features, label) corresponding to the requested data point. We can return the corresponding slices of our pre-loaded dataset or tensors.

- `__len__(self):`: it should simply return the size of the whole dataset so, whenever it is sampled, its indexing is limited to the actual size




Let’s build a simple custom dataset that takes two tensors as arguments: one for the features, one for the labels.

In [None]:
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, data, labels):
        self.data = torch.from_numpy(data).float()
        self.labels = torch.from_numpy(labels).float()
        
    def __getitem__(self, index):
        return (self.data[index], self.labels[index])

    def __len__(self):
        return len(self.data)
    
train_dataset = MyDataset(X_train, y_train)
print(train_dataset[0])

<i>Why go through all this trouble to wrap a couple of tensors in a class?</i>

The answer is to use the __DataLoader__.

### DataLoaders

Until now, we have used the __whole training data__ at every training step. It has been __batch gradient descent__ all along. This is fine for our ridiculously small dataset, sure, but if we want to go serious about all this, we must use __mini-batch gradient descent__. Thus, we need mini-batches. Thus, we need to slice our dataset accordingly.

So we use PyTorch’s DataLoader class for this job.

<div class="alert-success">
 We tell the <b>DataLoader</b> which <b>dataset</b> to use (e.g. the one we just built in the previous section), the desired <b>mini-batch size</b> and if we’d like to shuffle it or not. That’s it!
</div>

Our loader will behave like an __iterator__, so we can __loop over it__ and __fetch a different mini-batch__ every time.

In [None]:
from torch.utils.data import DataLoader

train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True)
print(len(train_loader)) # How many mini batches we will get from the dataset

To retrieve a sample mini-batch, one can simply run the command below — it will return a list containing two tensors, one for the features, another one for the labels.

In [None]:
next(iter(train_loader))

### Model Training v1

How does this change our previous training loop? 

In [None]:
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)
    
    def forward(self, x):
        return self.linear(x)
  

In [None]:
model = MyModel().to(device)

print(model.state_dict())

lr = 1e-1
n_epochs = 1000

loss_fn = nn.MSELoss(reduction='mean')
optimizer = optim.SGD(model.parameters(), lr=lr)

for epoch in range(n_epochs):
    model.train() 
    
    for x_batch, y_batch in train_loader: # The mini batches loop
            
        x_batch = x_batch.to(device) # send them to device
        y_batch = y_batch.to(device)
        
        yhat = model(x_batch) # use the batches instead of the whole training data
        loss = loss_fn(yhat, y_batch)
        
        optimizer.zero_grad()
        loss.backward()    
        optimizer.step()
        
print(model.state_dict())

<div class="alert-info">
Two things are different now: not only we have an inner loop to load each and every mini-batch from our DataLoader but, more importantly, we are now sending <b>only one mini-batch to the device</b>.
</div>

This is important, particularly for bigger datasets and while we are working with GPUs. Why?

### Train/Validation Split

So far, we’ve focused on the training data only. We built a dataset and a data loader for it. We could do the same for the validation/test data.

<div class="alert-warning">
Do not forget, for each subset of data (train, validation, test), we build a corresponding <b>DataLoader</b>.
</div>

An easiest way to create a validation data out of your training dataset is to split it using Torch `random_split()`. 

PyTorch’s `random_split()` method is an easy and familiar way of performing a training-validation split.

In [None]:
len(train_dataset) # current

In [None]:
from torch.utils.data.dataset import random_split

train_dataset, val_dataset = random_split(train_dataset, [60, 20])

len(train_dataset), len(val_dataset) # after train/val split

Finall, we create the __DataLoaders__:

In [None]:
train_loader = DataLoader(dataset=train_dataset, batch_size=16)
val_loader = DataLoader(dataset=val_dataset, batch_size=4)

Now we have the data loader for our validation set, so, it makes sense to use it for the evaluation.

### Model Training v2

We need to change the training loop to include the evaluation of our model, that is, computing the validation loss.

In [None]:
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)
    
    def forward(self, x):
        return self.linear(x)

In [None]:
model = MyModel().to(device)

print(model.state_dict())

train_losses = [] # To track the training loss
validation_losses = [] # To track the validation loss

lr = 1e-1
n_epochs = 1000

loss_fn = nn.MSELoss(reduction='mean')
optimizer = optim.SGD(model.parameters(), lr=lr)

for epoch in range(n_epochs):
    # Training Loop
    model.train() 
    train_loss = 0
    
    for x_batch, y_batch in train_loader: # The mini batches loop for train
            
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)
        
        yhat = model(x_batch)
        loss = loss_fn(yhat, y_batch)
        
        train_loss += loss.item()
        
        optimizer.zero_grad()
        loss.backward()    
        optimizer.step()
        
    train_loss /= len(train_loader)
    train_losses.append(train_loss) # keep tracking of the losses
    
    # Validation Loop
    model.eval()
    validation_loss = 0
    
    with torch.no_grad():
        for x_val, y_val in val_loader: # The mini batches loop for validation

            x_val = x_val.to(device)
            y_val = y_val.to(device)

            yhat = model(x_val)
            val_loss = loss_fn(y_val, yhat)
            
            validation_loss += val_loss.item()
    
    validation_loss /= len(val_loader)
    validation_losses.append(validation_loss) # keep tracking of the losses
    
print(model.state_dict())

How have losses changed?

In [None]:
plt.plot(train_losses,  label="train loss");
plt.plot(validation_losses,  label="val loss");
plt.legend()

That’s pretty much it, but there are two small, yet important, things to consider:

<div class="alert-warning">
    <code>torch.no_grad()</code>: even though it won’t make a difference in our simple model, it is a good practice to wrap the validation inner loop with this <b>context manager</b> to disable any gradient calculation that you may inadvertently trigger — gradients belong in training, not in validation steps;
</div>

<div class="alert-danger">
    <code>eval()</code>: the only thing it does is setting the model to evaluation mode (just like its <code>train()</code> counterpart did), so the model can adjust its behavior regarding some operations, like Dropout.
</div>

### Making Predications

Finally, we can use our trained model to make predictions on test data.

In [None]:
y_pred = model( torch.from_numpy(X_test).float().to(device) )
y_pred = y_pred.detach().numpy()

plt.scatter(X_test,y_test, c="b", label="actual data");
plt.scatter(X_test,y_pred, c="r", label="predicted data");
plt.legend()

---

<div class="alert-info" style="background-color:#fff4e3; padding-bottom:22px; background-image:url(images/arrows.png); background-repeat:no-repeat; background-position: right; background-size: contain;">
    <img src="images/homework.png" style="height:60px; float:left; padding-right:10px;" />
    <span style="font-weight:bold; color:#db9425;">
        <h4 style="padding-top:25px;"> HOMEWORK 01 </h4>
    </span>
</div>

---