# Training Pipeline - Model, Loss, & Optimizer

In this steps, we will perform steps 3 and 4, as outlined in the previous tutorial. As a reminder, the steps are: 

**Step 1**:

* Prediction: Manually
* Gradients Computation: Manually
* Loss Computation: Manually
* Parameters Updates: Manually

**Step 2**:

* Prediction: Manually
* Gradients Computation: `Autograd`
* Loss Computation: Manually
* Parameters Updates: Manually

**Step 3**:

* Prediction: Manually
* Gradients Computation: `Autograd`
* Loss Computation: `Pytorch Loss`
* Parameters Updates: `Pytorch Optimizer`

**Step 4**:

* Prediction: `Pytorch Model`
* Gradients Computation: `Autograd`
* Loss Computation: `Pytorch Loss`
* Parameters Updates: `Pytorch Optimizer`

So, we are going to replace the manually computed elements with only `pytorch` code. 

In `pytorch`, the general training pipeline looks like:

* Design the model (input size, output size, forward pass)
* Construct the loss and optimizer
* Training loop
    * Forward pass: Compute prediction
    * Backward pass: Compute the gradients
    * Update our parameters 

We will iterate this many times until it converges or reaches a certain number of iterations.

First, let's import `pytorch` and `torch.nn`. This will allow us to replace the loss and optimization. 

In [1]:
import torch
import torch.nn as nn

Let's define the training samples and parameters.

In [2]:
#making training samples with pytorch
X = torch.tensor([1,2,3,4], dtype = torch.float32)
y = torch.tensor([2,4,6,8], dtype = torch.float32)

#initializing the weight at 0 - we need to see requires_grad = True
w = torch.tensor(0.0, dtype = torch.float32, requires_grad = True)

Let's define the model prediction/forward pass.

In [3]:
#forward pass
def forward(x):
    return w * x

In the previous video, we defined the loss as below, but here we will not need to do it because we will be replacing the manual calculation.

`def loss(y, y_predicted):
    return ((y_predicted - y)**2).mean()`

Now, let's making the training pipeline.

In [8]:
#making training samples with pytorch
X = torch.tensor([1,2,3,4], dtype = torch.float32)
y = torch.tensor([2,4,6,8], dtype = torch.float32)

#initializing the weight at 0 - we need to see requires_grad = True
w = torch.tensor(0.0, dtype = torch.float32, requires_grad = True)

#learning rate
learning_rate = 0.01
n_iters = 100

#we still need to define the loss - we can use a loss that is provided by pytorch
loss = nn.MSELoss()

#we also need to define an optimizer from pytorch - we need to input the parameters to optimize as a [list] and the lr
optimizer = torch.optim.SGD([w], lr = learning_rate)

#now we do the training loop
for epoch in range(n_iters):
    
    #prediction = forward pass
    y_pred = forward(X)
    
    #loss - this still needs to be defined because it is a callable function, which takes the actual values 
    # (y) and the predicted values (y_pred)
    l = loss(y, y_pred)
    
    #gradients with respect to w = backward pass 
    l.backward() #gradient of the loss with respect to w
    
    #we do not need to manually update our weights anymore, so we can simply say:
    optimizer.step() #this does an optimization step
    
    #we still have to empty the gradient after our optimization step
    optimizer.zero_grad()
    
    #printing some information
    if epoch % 10 == 0:
        print(f'epoch {epoch + 1}: w = {w:.3f}, loss = {l:.8f}')
        
print(f' Prediction after training: f(5) = {forward(5):.3f}')

epoch 1: w = 0.300, loss = 30.00000000
epoch 11: w = 1.665, loss = 1.16278565
epoch 21: w = 1.934, loss = 0.04506890
epoch 31: w = 1.987, loss = 0.00174685
epoch 41: w = 1.997, loss = 0.00006770
epoch 51: w = 1.999, loss = 0.00000262
epoch 61: w = 2.000, loss = 0.00000010
epoch 71: w = 2.000, loss = 0.00000000
epoch 81: w = 2.000, loss = 0.00000000
epoch 91: w = 2.000, loss = 0.00000000
 Prediction after training: f(5) = 10.000


As we can see, our prediction is good after the training. Now, let's continue with step 4, replacing our manually calcuated forward method with a `pytorch` automatic calculation. 

Now, we do not need our forward pass, which we defined as:

`def forward(x):
    return w * x`
    
And we no longer need to define the weight `w` because `pytorch` knows the parameters and will initialize them. 

In [30]:
#making training samples with pytorch
#X now has to be a 2D array & where the number of rows is the number of samples and the columns are features - in this example
#X is 4 samples and 1 feature 
X = torch.tensor([[1],[2],[3],[4]], dtype = torch.float32)
y = torch.tensor([[2],[4],[6],[8]], dtype = torch.float32)

#we will define the number of samples and features - we have four samples and one feature
n_samples = list(X.shape)[0]
n_features = list(X.shape)[1]

input_size = n_features #this is 1 
output_size = n_features #this is 1

#defining the model for the forward pass
model = nn.Linear(in_features = input_size, out_features = output_size) #this needs an input and output size of our features -
#for this, we need to modify our X and y - they  need to have a different shape - we did this above

#test tensor 
X_test = torch.tensor([5], dtype = torch.float32)

#if we want the prediction, we can simply say below, but now we cannot have a float value and it must be a float value
print(f' Prediction after training: f(5) = {model(X_test).item():.3f}') #we can call .item() to get the actual float value

#learning rate
learning_rate = 0.01
n_iters = 100

#we still need to define the loss - we can use a loss that is provided by pytorch
loss = nn.MSELoss()

#since we are doing a forward pass with pytorch, we do not have w defined, so we need to modify this 
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

#now we do the training loop
for epoch in range(n_iters):
    
    #prediction = forward pass - changing this to the model
    y_pred = model(X)
    
    #loss - this still needs to be defined because it is a callable function, which takes the actual values 
    # (y) and the predicted values (y_pred)
    l = loss(y, y_pred)
    
    #gradients with respect to w = backward pass 
    l.backward() #gradient of the loss with respect to w
    
    #we do not need to manually update our weights anymore, so we can simply say:
    optimizer.step() #this does an optimization step
    
    #we still have to empty the gradient after our optimization step
    optimizer.zero_grad()
    
    #if we want the prediction, we can simply say below, but now we cannot have a float value and it must be a float value
    if epoch % 10 == 0:
        [w, b] = model.parameters() #this will unpack the parameters
        print(f'epoch {epoch + 1}: w = {w[0][0].item():.3f}, loss = {l:.8f}') #we can call .item() to get the actual float value
        
print(f' Prediction after training: f(5) = {model(X_test).item():.3f}')

 Prediction after training: f(5) = 0.495
epoch 1: w = 0.232, loss = 24.61823654
epoch 11: w = 1.375, loss = 0.87271494
epoch 21: w = 1.569, loss = 0.24463740
epoch 31: w = 1.610, loss = 0.21546268
epoch 41: w = 1.626, loss = 0.20253506
epoch 51: w = 1.637, loss = 0.19073656
epoch 61: w = 1.648, loss = 0.17963444
epoch 71: w = 1.659, loss = 0.16917877
epoch 81: w = 1.669, loss = 0.15933174
epoch 91: w = 1.679, loss = 0.15005785
 Prediction after training: f(5) = 9.356


As we can see, the prediction is a bit off. This may be due to the fact that the weights are randomly intialized. We can change how the weight is initialized, play with the number of iterations, and also change the optimizer.

As a note, this is how the tensor `X` look before and after it is reshaped. Also, we display how to get the shape of a tensor, both rows and columns. 

In [31]:
#old
X = torch.tensor([1,2,3,4], dtype = torch.float32)
print(X)

#new
X = torch.tensor([[1],[2],[3],[4]], dtype = torch.float32)
print(X)

#to see the shape of a tensor, you can do:
list(X.shape)

#or
[*X.shape]

#or
[*X.size()]

tensor([1., 2., 3., 4.])
tensor([[1.],
        [2.],
        [3.],
        [4.]])


[4, 1]

In this example, we used the model:

`model = nn.Linear(in_features = input_size, out_features = output_size)`

But, we may need a custom model. In this case, you can write a custom model. We will do so in the example below.

In [32]:
#making training samples with pytorch
#X now has to be a 2D array & where the number of rows is the number of samples and the columns are features - in this example
#X is 4 samples and 1 feature 
X = torch.tensor([[1],[2],[3],[4]], dtype = torch.float32)
y = torch.tensor([[2],[4],[6],[8]], dtype = torch.float32)

#we will define the number of samples and features - we have four samples and one feature
n_samples = list(X.shape)[0]
n_features = list(X.shape)[1]

input_size = n_features #this is 1 
output_size = n_features #this is 1

#defining the model for the forward pass
#model = nn.Linear(in_features = input_size, out_features = output_size) #this needs an input and output size of our features -
#for this, we need to modify our X and y - they  need to have a different shape - we did this above

#writing the custom linear regression model - we have to derive this from nn.Module
class LinearRegression(nn.Module):
    
    def __init__(self, input_dim, output_dim):
        super(LinearRegression, self).__init__()
        #define layers
        self.lin = nn.Linear(input_dim, output_dim)
        
    def forward(self, x):
        return(self.lin(x))

#now, making a model 
model = LinearRegression(input_size, output_size)

#test tensor 
X_test = torch.tensor([5], dtype = torch.float32)

#if we want the prediction, we can simply say below, but now we cannot have a float value and it must be a float value
print(f' Prediction after training: f(5) = {model(X_test).item():.3f}') #we can call .item() to get the actual float value

#learning rate
learning_rate = 0.01
n_iters = 100

#we still need to define the loss - we can use a loss that is provided by pytorch
loss = nn.MSELoss()

#since we are doing a forward pass with pytorch, we do not have w defined, so we need to modify this 
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

#now we do the training loop
for epoch in range(n_iters):
    
    #prediction = forward pass - changing this to the model
    y_pred = model(X)
    
    #loss - this still needs to be defined because it is a callable function, which takes the actual values 
    # (y) and the predicted values (y_pred)
    l = loss(y, y_pred)
    
    #gradients with respect to w = backward pass 
    l.backward() #gradient of the loss with respect to w
    
    #we do not need to manually update our weights anymore, so we can simply say:
    optimizer.step() #this does an optimization step
    
    #we still have to empty the gradient after our optimization step
    optimizer.zero_grad()
    
    #if we want the prediction, we can simply say below, but now we cannot have a float value and it must be a float value
    if epoch % 10 == 0:
        [w, b] = model.parameters() #this will unpack the parameters
        print(f'epoch {epoch + 1}: w = {w[0][0].item():.3f}, loss = {l:.8f}') #we can call .item() to get the actual float value
        
print(f' Prediction after training: f(5) = {model(X_test).item():.3f}')

 Prediction after training: f(5) = 2.518
epoch 1: w = 0.699, loss = 16.40454865
epoch 11: w = 1.631, loss = 0.47620538
epoch 21: w = 1.786, loss = 0.06108698
epoch 31: w = 1.815, loss = 0.04750837
epoch 41: w = 1.824, loss = 0.04448386
epoch 51: w = 1.830, loss = 0.04188794
epoch 61: w = 1.835, loss = 0.03944966
epoch 71: w = 1.840, loss = 0.03715352
epoch 81: w = 1.845, loss = 0.03499096
epoch 91: w = 1.849, loss = 0.03295427
 Prediction after training: f(5) = 9.698


Now we have replaced all of our manual calculations with `pytorch` calculations. However, we still have to design the model and know which loss and optimizer we have to use. We do not have to worry about the underlying algorithms anymore, though. 