This notebook demonstrates:
- How to compute the derivative of a function using PyTorch's automatic differentiation (autograd) system.
- How to define both scalar and vector functions using PyTorch tensors.
- The process of validating PyTorch's computed gradients by comparing them with manually derived analytical gradients.
- Practical usage of backward propagation with batch data and broadcasting.


In [2]:
import torch
import matplotlib.pyplot as plt

torch.set_default_dtype(torch.float64)

### Scalar function example

Use `backward` to compute the gradient of a scalar function in a batch.

In [3]:
def f(x: torch.Tensor) -> torch.Tensor:
    return x**2 + torch.sin(x**2)

def df_dx(x: torch.Tensor) -> torch.Tensor:
    return 2*x + 2*x*torch.cos(x**2)

N = 1000

x = torch.linspace(0, 5, N, requires_grad=True)

y = f(x)

y.backward(gradient=torch.ones_like(y))

print("Are derivatives correct?", torch.allclose(df_dx(x), x.grad))

Are derivatives correct? True


### Vector function example

Use `backward` to compute the gradient of a vector function in a batch.

Notice that `gradient=torch.ones_like(y)` is needed in order to compute the gradient
of a vector function.

In [4]:
def f(x: torch.Tensor) -> torch.Tensor:
    return x[:,0]*x[:,1] + torch.sin(x[:,0]*x[:,1]**2)

def df_dx(x: torch.Tensor) -> torch.Tensor:
    return torch.stack(
        (
            x[:,1] + x[:,1]**2*torch.cos(x[:,0]*x[:,1]**2), 
            x[:,0] + 2*x[:,0]*x[:,1]*torch.cos(x[:,0]*x[:,1]**2)
        ),
        dim=1
    )

N = 1000

x = 2 * torch.rand(N, 2) - 1
x.requires_grad = True

y = f(x)

y.backward(gradient=torch.ones_like(y))

dfdx = df_dx(x)

print("Are gradients correct?", torch.allclose(dfdx, x.grad))

Are gradients correct? True


### Jacobian example

Use `backward` to compute the Jacobian of a vector function in a batch.

In order to compute the Jacobian, we loop over the output dimensions and:
1. set the corresponding element of the `gradient` tensor to one and the rest to zero,
2. set `retain_graph=True` to allow the reuse of intermediate results for the next iteration,
otherwise the computational graph will be deleted after the first iteration.


In [6]:
def f(x: torch.Tensor) -> torch.Tensor:
    x1 = x[:, 0]
    x2 = x[:, 1]

    y1 = x1 * x2
    y2 = torch.sin(x1 + x2**2)
    y3 = x1**2 - 3 * x2

    return torch.stack((y1, y2, y3), dim=1)


def df_dx(x: torch.Tensor) -> torch.Tensor:
    x1 = x[:, 0]
    x2 = x[:, 1]

    dy1_dx1 = x2
    dy1_dx2 = x1

    dy2_dx1 = torch.cos(x1 + x2**2)
    dy2_dx2 = 2 * x2 * torch.cos(x1 + x2**2)

    dy3_dx1 = 2 * x1
    dy3_dx2 = -3 * torch.ones_like(x2)

    J = torch.stack(
        (
            torch.stack((dy1_dx1, dy1_dx2), dim=1),
            torch.stack((dy2_dx1, dy2_dx2), dim=1),
            torch.stack((dy3_dx1, dy3_dx2), dim=1),
        ),
        dim=1
    )

    return J


N = 1000

x = 2 * torch.rand(N, 2) - 1
x.requires_grad = True

y = f(x)


Js = torch.zeros(N, 3, 2)
for k in range(y.shape[1]):
    x.grad = None
    g = torch.zeros_like(y)
    g[:, k] = 1.0
    y.backward(gradient=g, retain_graph=True)
    Js[:, k, :] = x.grad.clone()


dfdx = df_dx(x)

print(dfdx.shape)
print(Js.shape)


print("Are Jacobians correct?", torch.allclose(dfdx, Js))


torch.Size([1000, 3, 2])
torch.Size([1000, 3, 2])
Are Jacobians correct? True
