# Backpropagation Again

We run through the backpropagation procedure again.

This time, we assume that the features are $\begin{bmatrix}8 & -2\end{bmatrix}$ and the correct category is $0$ which is one-hot encoded as $\begin{bmatrix}1 & 0\end{bmatrix}$.

In this exercise, we compute those partial derivatives manually, so we can learn the nuts and bolts of the backpropagation algorithm.

In [1]:
import torch
import torch.nn.functional as F

weights = torch.nn.Parameter(torch.Tensor([[1,2],[2,1]]))
print("Parameters (weights): "+str(weights.data))

bias = torch.nn.Parameter(torch.Tensor([[0,0]]))
print("Parameters (bias): "+str(bias.data))

features = torch.autograd.Variable(torch.Tensor([[8, -2]]))
print("Features: "+str(features.data))

target = torch.autograd.Variable(torch.LongTensor([0]))
#print("The correct class: "+str(target.data))

one_hot_target = torch.autograd.Variable(torch.Tensor([[1, 0]]))
print("One hot encoding of the correct class: "+str(one_hot_target.data))


Parameters (weights): 
 1  2
 2  1
[torch.FloatTensor of size 2x2]

Parameters (bias): 
 0  0
[torch.FloatTensor of size 1x2]

Features: 
 8 -2
[torch.FloatTensor of size 1x2]

One hot encoding of the correct class: 
 1  0
[torch.FloatTensor of size 1x2]



We assume the weights are $\begin{bmatrix}1 & 2 \\ 2 & 1\end{bmatrix}$ at start.

The training is assumed to proceed one data point at a time (batch size of 1).

For this iteration of training, the features are $\begin{bmatrix}-10 & 20\end{bmatrix}$ and the correct category is $1$ which is one-hot encoded as $\begin{bmatrix}0 & 1\end{bmatrix}$.

In [2]:
if weights.grad is not None:
    weights.grad.data.zero_()

# Forward pass

result = torch.mm(features, weights) + bias
print("c: "+str(result.data))

softmax_result = F.softmax(result, dim=1)
print("Softmax of c: "+str(softmax_result.data))

log_softmax_result = F.log_softmax(result, dim=1)
print("Log softmax of c: "+str(log_softmax_result.data))

loss_nll_softmax = F.nll_loss(log_softmax_result, target)
print("NLL + log_softmax loss: "+str(loss_nll_softmax.data))

loss = F.cross_entropy(result, target)
print("Cross entropy loss: "+str(loss.data))

# Backward pass

print("transpose of features: "+str(features.data.t()))

grad_c = (softmax_result.data - one_hot_target.data)
print("grad of loss wrt to c: "+str(grad_c))

grad_weights = features.data.t().mm(grad_c)
print("grad of loss wrt to weights: "+str(grad_weights))

grad_bias = grad_c
print("grad of loss wrt to bias: "+str(grad_bias))

print("\tThe manually computed gradient of the loss with respect to weights is "+str(grad_weights))

print("\tThe manually computed gradient of the loss with respect to bias is "+str(grad_bias))

# You can now update the weights and bias

learning_rate = 0.01

weights.data = weights.data - learning_rate * grad_weights

bias.data = bias.data - learning_rate * grad_bias

print("\tThe weights are now "+str(weights.data))

print("\tThe bias is now "+str(bias.data))


c: 
  4  14
[torch.FloatTensor of size 1x2]

Softmax of c: 
 0.0000  1.0000
[torch.FloatTensor of size 1x2]

Log softmax of c: 
-1.0000e+01 -4.5399e-05
[torch.FloatTensor of size 1x2]

NLL + log_softmax loss: 
 10.0000
[torch.FloatTensor of size 1]

Cross entropy loss: 
 10.0000
[torch.FloatTensor of size 1]

transpose of features: 
 8
-2
[torch.FloatTensor of size 2x1]

grad of loss wrt to c: 
-1.0000  1.0000
[torch.FloatTensor of size 1x2]

grad of loss wrt to weights: 
-7.9996  7.9996
 1.9999 -1.9999
[torch.FloatTensor of size 2x2]

grad of loss wrt to bias: 
-1.0000  1.0000
[torch.FloatTensor of size 1x2]

	The manually computed gradient of the loss with respect to weights is 
-7.9996  7.9996
 1.9999 -1.9999
[torch.FloatTensor of size 2x2]

	The manually computed gradient of the loss with respect to bias is 
-1.0000  1.0000
[torch.FloatTensor of size 1x2]

	The weights are now 
 1.0800  1.9200
 1.9800  1.0200
[torch.FloatTensor of size 2x2]

	The bias is now 
1.00000e-03 *
  9.9995

## Parameters

At the end of the first pass, the weights (after being nudged around) should look something like this

$$\begin{bmatrix}1.08 & 1.92 \\ 1.98 & 1.02\end{bmatrix}$$

and the bias should look like this

$$\begin{bmatrix}0.01 & -0.01\end{bmatrix}$$


## Cross-Check Results

You can check that our computations are correct by verifying the same with Pytorch's automatic backward pass.

In [3]:
# We can uncomment the following two lines to check our gradient against the automatically computed gradients

loss.backward(retain_graph=True)  # Setting retain_graph=True allows us to call loss.backward repeatedly in a local scope

grad_weights = weights.grad.data
print("grad of loss wrt to weights: "+str(grad_weights))

grad_bias = bias.grad.data
print("grad of loss wrt to bias: "+str(grad_bias))

if weights.grad is not None:
    weights.grad.data.zero_()
if bias.grad is not None:
    bias.grad.data.zero_()


grad of loss wrt to weights: 
-7.9996  7.9996
 1.9999 -1.9999
[torch.FloatTensor of size 2x2]

grad of loss wrt to bias: 
-1.0000  1.0000
[torch.FloatTensor of size 1x2]

