
# AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD

In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:


In [10]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3)
b.requires_grad_(True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

<img height="200" src="comp-graph.png" width="520"/>

In [11]:
# .grad_fn
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x137f7abe0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x137f7abb0>


we need ∂loss/∂w and ∂loss/∂b under some fixed values of x and y.
To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

In [12]:
loss.backward()
print(w.grad)
print(b.grad)


tensor([[0.3148, 0.3146, 0.0105],
        [0.3148, 0.3146, 0.0105],
        [0.3148, 0.3146, 0.0105],
        [0.3148, 0.3146, 0.0105],
        [0.3148, 0.3146, 0.0105]])
tensor([0.3148, 0.3146, 0.0105])


In [13]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)


True
False


In [14]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)


False


# Jacobian product


In [16]:
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)

out.backward(torch.ones_like(inp), retain_graph=True)
print(f"First call\n{inp.grad}")

out.backward(torch.ones_like(inp), retain_graph=True)
print(f"\nSecond call\n{inp.grad}")

inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")


First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])


In [30]:
inp = torch.eye(5, requires_grad=True)

def fx(input):
    output = (input+3).pow(2)
    output[0][0] = input[0][0]**3+2*input[0][0]**2
    return output

out = fx(inp)
print(inp.grad)
out.backward(inp)
print(inp.grad)

None
tensor([[7., 0., 0., 0., 0.],
        [0., 8., 0., 0., 0.],
        [0., 0., 8., 0., 0.],
        [0., 0., 0., 8., 0.],
        [0., 0., 0., 0., 8.]])
