# Model optimization with automatic gradient computation 
<h3 span style='color:yellow'>This tutorial will guide you through the implementation of a basic linear regression optimization using the Autograd module.</h3>

<h3 span style='color:yellow'>There are four different scenarios that can be pursued for automatic optimization.</h3>

<h3 span style='color:yellow'>Case #1 manual opertaions.</h3>
<ul>
<li ><span style="color:yellow">Prediction:</span> manually.</li>
<li><span style="color:yellow">Loss computation:</span> manually.</li>
<li><span style="color:yellow">Gradient computation:</span> manually.</li>
<li><span style="color:yellow">Parameter updates:</span> manually.</li>
</ul>
</ul>

<h3 span style='color:yellow'>Case #2 gradient computation by Autograd.</h3>
<ul>
<li ><span style="color:yellow">Prediction:</span> manually.</li>
<li><span style="color:yellow">Loss computation:</span> manually.</li>
<li><span style="color:lightgreen">Gradient computation:</span> Autograd.</li>
<li><span style="color:yellow">Parameter updates:</span> manually.</li>
</ul>
</ul>

<h3 span style='color:yellow'>Case #3 all operations except the prediction is manually.</h3>
<ul>
<li ><span style="color:yellow">Prediction:</span> manually.</li>
<li><span style="color:lightgreen">Loss computation:</span> Pytorch loss.</li>
<li><span style="color:lightgreen">Gradient computation:</span> Autograd.</li>
<li><span style="color:lightgreen">Parameter updates:</span> Pytorch optimizer.</li>
</ul>
</ul>
<h3 span style='color:yellow'>Case #4 automatic optimization and prediction.</h3>
<ul>
<li ><span style="color:lightgreen">Prediction:</span> Pytorch model.</li>
<li><span style="color:lightgreen">Loss computation:</span> Pytorch loss.</li>
<li><span style="color:lightgreen">Gradient computation:</span> Autograd.</li>
<li><span style="color:lightgreen">Parameter updates:</span> Pytorch optimizer.</li>
</ul>
</ul>

<h1>Case #1:  manual operations/ steps.</h1>

In [55]:
import numpy as np
# Keep in mind, linear regression signifies the linear combination of inputs and weights.
# Consider an example in which the optimal weight for the 'x' variable is 2.
x=np.array([1,2,3,4],dtype=np.float32)
y=np.array([2,4,6,8],dtype=np.float32)

In [47]:
# Weight Initialization.
w=0.0

In [48]:
# Model prediction
def forward(x):  # forward pass
    return w*x

In [49]:
# loss
def loss(y,y_predicted): # Considering the Mean Square Error (Loss).
    return ((y_predicted-y)**2).mean()

print(f'The prediction before training: {forward(5):.3f}')

The prediction before training: 0.000


In [50]:
# Gradient 
# Error= 1/N(w*x  - y)**2
#dj/dw=1/N 2x(wx-y)
def gradient(x,y,y_predicted):
    return np.dot(2*x, y_predicted-y).mean()


In [51]:
# Training procedure
LR=0.005
N_ITER=20

for epoch in range(N_ITER):
    #prediction
    y_pred=forward(x)
    #loss
    l=loss(y,y_pred)
    #gradient
    dw=gradient(x,y,y_pred)
    # Weight update: go in the negative direction of the gradient.
    w-=LR*dw
    if epoch % 1==0:   # To print information at every epoch we use %1==0, if we want to print every even (2) epoch we use % 2.
        print(f'epoch {epoch+1}: w={w:.3f}, loss={l:.3f}')
        

epoch 1: w=0.600, loss=30.000
epoch 2: w=1.020, loss=14.700
epoch 3: w=1.314, loss=7.203
epoch 4: w=1.520, loss=3.529
epoch 5: w=1.664, loss=1.729
epoch 6: w=1.765, loss=0.847
epoch 7: w=1.835, loss=0.415
epoch 8: w=1.885, loss=0.203
epoch 9: w=1.919, loss=0.100
epoch 10: w=1.944, loss=0.049
epoch 11: w=1.960, loss=0.024
epoch 12: w=1.972, loss=0.012
epoch 13: w=1.981, loss=0.006
epoch 14: w=1.986, loss=0.003
epoch 15: w=1.991, loss=0.001
epoch 16: w=1.993, loss=0.001
epoch 17: w=1.995, loss=0.000
epoch 18: w=1.997, loss=0.000
epoch 19: w=1.998, loss=0.000
epoch 20: w=1.998, loss=0.000


In [53]:
# Inference
data=np.array([5,3,4])
forward(data)

array([9.99202076, 5.99521245, 7.9936166 ])

<h1>Case #2: Gradient calculation using Autograd</h1>

In [57]:
import torch
x=torch.tensor([1,2,3,4],dtype=torch.float32)
y=torch.tensor([2,4,6,8],dtype=torch.float32)
w=torch.tensor(0.0,dtype=torch.float32,requires_grad=True)

In [58]:
# Model prediction
def forward(x):  # forward pass
    return w*x

# loss
def loss(y,y_predicted): # Considering the Mean Square Error (Loss)
    return ((y_predicted-y)**2).mean()

print(f'The prediction before training: {forward(5):.3f}')

The prediction before training: 0.000


In [59]:
LR=0.01
N_ITER=20
# The training process is similar to the aforementioned procedure, with the exception of the gradient calculation.
for epoch in range(N_ITER):
    # prediction
    y_pred=forward(x)
    # Loss
    l=loss(y,y_pred)
    # Gradient
    l.backward()  # dl/dw
    
    # Update Rule: In this case, we don't want 'w' to be part of the gradient tracking or the computational graph.
    with torch.no_grad():
                w-=LR*w.grad
    
    # Empty the gradient: zero gradient.
    w.grad.zero_()
    
    if epoch %2==0:
        print(f'epoch {epoch+1}: w={w:.3f}, loss={l:.3f}')
        
    

epoch 1: w=0.300, loss=30.000
epoch 3: w=0.772, loss=15.660
epoch 5: w=1.113, loss=8.175
epoch 7: w=1.359, loss=4.267
epoch 9: w=1.537, loss=2.228
epoch 11: w=1.665, loss=1.163
epoch 13: w=1.758, loss=0.607
epoch 15: w=1.825, loss=0.317
epoch 17: w=1.874, loss=0.165
epoch 19: w=1.909, loss=0.086


In [61]:
# Inference
data=torch.tensor([5,3,4])
forward(data)

tensor([9.6124, 5.7674, 7.6899], grad_fn=<MulBackward0>)

<h1>Case #3: All operations, except the prediction, are performed manually.</h1>

<h3 span style='color:yellow'>"Replace the manually calculated loss and parameter updates with those computed using the loss and optimizer classes in PyTorch.</h3>

<h3 span style='color:yellow'>Pytorch training Pipeline:</h3>
<ul>
<li ><span style="color:yellow">Importing libraries.</span></li>
<li><span style="color:yellow">Design the model by specifying the number of inputs and outputs, and by crafting the forward pass with various operations and layers: Design the model (inputs, outputs, forward pass).</span></li>
<li><span style="color:yellow">Construct loss and optimizer.</span></li>
<li><span style="color:yellow">Implement the training loop which includes conducting a forward pass to compute the prediction, carrying out a backward pass to compute the gradients, and continually updating the weights until convergence is achieved.</span></li>
</ul>
</ul>


In [62]:
import torch
import torch.nn as nn
X=torch.tensor([1,2,3,4],dtype=torch.float32)
y=torch.tensor([2,4,6,8],dtype=torch.float32)
w=torch.tensor(0.0,dtype=torch.float32,requires_grad=True)

In [68]:
# Model prediction
def forward(x):  # forward pass
    return w*x

In [69]:
# There's no need to manually define the loss, as it is defined during the training process.

In [70]:
# Define the loss and optimizer from nn class.
LR=0.005
N_ITER=40
loss= nn.MSELoss()          # This is a callable function
optimizer=torch.optim.SGD([w],lr=LR)

for epoch in range(N_ITER):
     # prediction
    y_pred=forward(X)
    # Loss
    l=loss(y,y_pred)
    # Gradient
    l.backward()  # dl/dw
    # Optimization step: weight update
    optimizer.step()
    # Empty the gradient: zero gradient
    optimizer.zero_grad()
    if epoch %2==0:
         print(f'epoch {epoch+1}: w={w:.3f}, loss={l:.3f}')    

epoch 1: w=0.150, loss=30.000
epoch 3: w=0.417, loss=21.963
epoch 5: w=0.646, loss=16.079
epoch 7: w=0.841, loss=11.771
epoch 9: w=1.008, loss=8.618
epoch 11: w=1.152, loss=6.309
epoch 13: w=1.274, loss=4.619
epoch 15: w=1.379, loss=3.381
epoch 17: w=1.469, loss=2.475
epoch 19: w=1.545, loss=1.812
epoch 21: w=1.611, loss=1.327
epoch 23: w=1.667, loss=0.971
epoch 25: w=1.715, loss=0.711
epoch 27: w=1.756, loss=0.521
epoch 29: w=1.791, loss=0.381
epoch 31: w=1.822, loss=0.279
epoch 33: w=1.847, loss=0.204
epoch 35: w=1.869, loss=0.150
epoch 37: w=1.888, loss=0.109
epoch 39: w=1.904, loss=0.080


In [71]:
# Inference
data=torch.tensor([5,3,4])
forward(data)

tensor([9.5577, 5.7346, 7.6462], grad_fn=<MulBackward0>)

<h1>Case #4 automatic optimization and prediction.</h3>
<h3 span style='color:yellow'>Replace the manually implemented forward method with a PyTorch model.</h3>

In [67]:
import torch
import torch.nn as nn
X=torch.tensor([1,2,3,4],dtype=torch.float32)
y=torch.tensor([2,4,6,8],dtype=torch.float32)
w=torch.tensor(0.0,dtype=torch.float32,requires_grad=True)

In [73]:
# There's no need to manually define the weight, loss, or forward method.
# model=nn.Linear() # This represents a single linear layer, which is essentially a basic linear regression layer.
# The above model necessitates the identification of the shapes of the inputs and outputs. Therefore, 'X' should be a 2D array where the number of samples is represented in the rows.

X=torch.tensor([[1],[2],[3],[4]],dtype=torch.float32) #4 samples with 1 feature
y=torch.tensor([[2],[4],[6],[8]],dtype=torch.float32)
data_test=torch.tensor([[5],[3],[4]],dtype=torch.float32)

n_samples, n_features= X.shape
print(n_samples,n_features)
input_size=n_features 
output_size=n_features
model=nn.Linear(input_size,output_size)
print(f'The prediction before training: {model(data_test)}')

4 1
The prediction before training: tensor([[-3.0947],
        [-2.2554],
        [-2.6750]], grad_fn=<AddmmBackward0>)


In [75]:
# Define the loss and optimizer from nn class.
LR=0.01
N_ITER=40
loss= nn.MSELoss()          # This is a callable function
optimizer=torch.optim.SGD(model.parameters(),lr=LR)

for epoch in range(N_ITER):
     # prediction
    y_pred=model(X)
    # Loss
    l=loss(y,y_pred)
    # Gradient
    l.backward()  # dl/dw
    # Optimization step: weight update
    optimizer.step()
    # Empty the gradient: zero gradient
    optimizer.zero_grad()
    [w,b]=  model.parameters()  # Unpacking the weights and biases. Remember, this is a list of lists for 'w' and 'b.
    if epoch %2==0:
         print(f'epoch {epoch+1}: w={w[0][0].item():.3f}, loss={l:.3f}')    # Unpack the first value of 'w' and use .item() to hide the tensor, thereby obtaining only the value.

epoch 1: w=2.045, loss=0.003
epoch 3: w=2.046, loss=0.003
epoch 5: w=2.046, loss=0.003
epoch 7: w=2.045, loss=0.003
epoch 9: w=2.045, loss=0.003
epoch 11: w=2.045, loss=0.003
epoch 13: w=2.045, loss=0.003
epoch 15: w=2.045, loss=0.003
epoch 17: w=2.044, loss=0.003
epoch 19: w=2.044, loss=0.003
epoch 21: w=2.044, loss=0.003
epoch 23: w=2.044, loss=0.003
epoch 25: w=2.043, loss=0.003
epoch 27: w=2.043, loss=0.003
epoch 29: w=2.043, loss=0.003
epoch 31: w=2.043, loss=0.003
epoch 33: w=2.042, loss=0.003
epoch 35: w=2.042, loss=0.003
epoch 37: w=2.042, loss=0.003
epoch 39: w=2.042, loss=0.003


In [76]:
print(f'The prediction before training: {model(data_test)}')


The prediction before training: tensor([[10.0857],
        [ 6.0025],
        [ 8.0441]], grad_fn=<AddmmBackward0>)


In [78]:
# The above model, represented by nn.Linear(input_size, output_size), constitutes a single layer. This layer can be defined inside a class through inheritance, as follows:
class LinearRegrission(nn.Module):
    def __init__(self, input_size,output_size):
        super(LinearRegrission,self).__init__()
        self.lin=nn.Linear(input_size,output_size)
    
    def forward(self,x):
        return self.lin(x)
    
model=LinearRegrission(input_size,output_size)

In [79]:
# Define the loss and optimizer from nn class
LR=0.01
N_ITER=40
loss= nn.MSELoss()          # This is a callable function
optimizer=torch.optim.SGD(model.parameters(),lr=LR)

for epoch in range(N_ITER):
     # prediction
    y_pred=model(X)
    # Loss
    l=loss(y,y_pred)
    # Gradient
    l.backward()  # dl/dw
    # Optimization step: weight update
    optimizer.step()
    # Empty the gradient: zero gradient
    optimizer.zero_grad()
    [w,b]=  model.parameters()  # Unpacking the weights and biases. Remember, this is a list of lists for 'w' and 'b.
    if epoch %2==0:
         print(f'epoch {epoch+1}: w={w[0][0].item():.3f}, loss={l:.3f}')    # Unpack the first value of 'w' and use .item() to hide the tensor, thereby obtaining only the value.

epoch 1: w=1.095, loss=3.617
epoch 3: w=1.251, loss=1.861
epoch 5: w=1.360, loss=1.014
epoch 7: w=1.437, loss=0.605
epoch 9: w=1.490, loss=0.406
epoch 11: w=1.528, loss=0.309
epoch 13: w=1.555, loss=0.261
epoch 15: w=1.575, loss=0.237
epoch 17: w=1.589, loss=0.224
epoch 19: w=1.600, loss=0.216
epoch 21: w=1.608, loss=0.211
epoch 23: w=1.614, loss=0.208
epoch 25: w=1.619, loss=0.205
epoch 27: w=1.623, loss=0.202
epoch 29: w=1.627, loss=0.199
epoch 31: w=1.630, loss=0.197
epoch 33: w=1.633, loss=0.195
epoch 35: w=1.635, loss=0.192
epoch 37: w=1.638, loss=0.190
epoch 39: w=1.640, loss=0.188


In [80]:
print(f'The prediction before training: {model(data_test)}')


The prediction before training: tensor([[9.2597],
        [5.9772],
        [7.6184]], grad_fn=<AddmmBackward0>)
