In [1]:
%matplotlib inline
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [2]:
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Gradient function for z = <AddBackward0 object at 0x10621b1c0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x106264cd0>


In [3]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0927, 0.0086, 0.0453],
        [0.0927, 0.0086, 0.0453],
        [0.0927, 0.0086, 0.0453],
        [0.0927, 0.0086, 0.0453],
        [0.0927, 0.0086, 0.0453]])
tensor([0.0927, 0.0086, 0.0453])


In [4]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [5]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


In a forward pass, autograd does two things simultaneously:

runs the requested operation to compute a resulting tensor, and
maintains the operation’s gradient function in the DAG.
The backward pass kicks off when .backward() is called on the DAG root. autograd then:

computes the gradients from each .grad_fn,
accumulates them in the respective tensor’s .grad attribute, and
using the chain rule, propagates all the way to the leaf tensors.
DAGs are dynamic in PyTorch

An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.

For a vector function  
y
​
 =f( 
x
 ), where  
x
 =⟨x 
1
​
 ,…,x 
n
​
 ⟩ and  
y
​
 =⟨y 
1
​
 ,…,y 
m
​
 ⟩, a gradient of  
y
​
  with respect to  
x
  is given by a Jacobian matrix, whose element J 
ij
​
  contains  
∂x 
j
​
 
∂y 
i
​
 
​
 .

In [6]:
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)

First call
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Second call
 tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])

Call after zeroing gradients
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])
