# Minimizing the Loss

Assuming the loss function to be $x^2+3$ and the neural network's weight to be $x$, you can find the weight $x$ which minimizes the loss by descending the gradient from a random starting value of $x$ (say $x = 2$).

In [8]:
import torch
import random

# Let's say the weight = 2
weight = torch.nn.Parameter(torch.Tensor([[2]]))

print("Weight = "+str(weight.data[0,0]))

Weight = 2.0


We assume the weight to be 2.

We can now perform the forward and backward passes below as many times as we want.

The forward pass involves the computation of the loss from the training data and the current parameters.

The backward pass is performed automatically by Pytorch when you call loss.backward().

Pytorch calculates all the gradients with respect to the loss.

These gradients are stored in each parameter's 'grad' member variable.

In [9]:
# Forward pass = computing the loss from the parameters and input data

# Let's first set the partial derivative of the loss with respect to the weight to 0.
if weight.grad is not None:
    weight.grad.data.zero_()

# Toy loss function -> loss = weight^2 + 3
loss = torch.mm(weight, weight) + 3

print("Loss for weight = "+str(weight.data[0,0])+" = "+str(loss.data[0,0]))

# Backward pass ->  Pytorch does this automatically for you!

loss.backward()

gradient = weight.grad

# The gradient should be 2 * weight
print("Gradient for weight = "+str(weight.data[0,0])+" = "+str(gradient.data[0,0]))

learning_rate = 0.01

# Gradient descent = when you climb down the loss gradient by nudging the weights in a direction opposite the gradient.

# Now weight = weight - lr * dloss/dweight
weight.data = weight.data - learning_rate * gradient.data

print("The next iteration of the weight is "+str(weight.data[0,0]))

Loss for weight = 2.0 = 7.0
Gradient for weight = 2.0 = 4.0
The next iteration of the weight is 1.9600000381469727


You can run the above code as many times as you want.  The weight will keep moving down the gradient till it reaches the point where the loss is at a minimum.

You can also use the code below to run this loop 100 times.

In [10]:
for i in range(100):

    if weight.grad is not None:
        weight.grad.data.zero_()
    
    loss = torch.mm(weight, weight) + 3
    
    if i % 10 == 0:
        print("The loss is now "+str(round(loss.data[0,0],3)))
    
    loss.backward()
    
    gradient = weight.grad
    
    if i % 10 == 0:
        print("\tThe gradient is now "+str(round(gradient.data[0,0],3)))
        
    learning_rate = 0.01
    
    weight.data = weight.data - learning_rate * gradient.data
    
    if i % 10 == 0:
        print("\t\tThe weight is now "+str(round(weight.data[0,0],3)))
    

The loss is now 6.842
	The gradient is now 3.92
		The weight is now 1.921
The loss is now 5.565
	The gradient is now 3.203
		The weight is now 1.569
The loss is now 4.712
	The gradient is now 2.617
		The weight is now 1.282
The loss is now 4.143
	The gradient is now 2.138
		The weight is now 1.048
The loss is now 3.763
	The gradient is now 1.747
		The weight is now 0.856
The loss is now 3.509
	The gradient is now 1.428
		The weight is now 0.699
The loss is now 3.34
	The gradient is now 1.166
		The weight is now 0.572
The loss is now 3.227
	The gradient is now 0.953
		The weight is now 0.467
The loss is now 3.152
	The gradient is now 0.779
		The weight is now 0.382
The loss is now 3.101
	The gradient is now 0.636
		The weight is now 0.312


If you run it about 400 times, the weight should get really close to 0.