### Top-5 Matrix Calculus Rules ###

### Rule-1 ###

Given a function $f(x) = a^T x$, where:
- $a$ is a $n \times 1$ vector,
- $x$ is a $n \times 1$ vector,

the gradient of $f(x)$ with respect to $x$ is:

$$
\nabla_x f = a
$$




In [4]:
%pip install torch

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [5]:
import torch
torch.manual_seed(47)

a = torch.randn(2, 1)
x = torch.randn(2, 1, requires_grad=True)

def grad_f(x, a):
    f = a.T @ x
    f.backward()
    return x.grad

expected_gradient = a
calculated_gradient = grad_f(x, a)

assert torch.allclose(expected_gradient, calculated_gradient)
print(calculated_gradient.tolist())

[[-1.4624308347702026], [0.7523223161697388]]


### Rule-2 ###

Given a function $ f(x) = A x $, where:
- $ A $ is an $ m \times n $ matrix,
- $ x $ is an $ n \times 1 $ vector,

the Jacobian of $ f(x) $ with respect to $ x $ is:

$$
\mathbf{J}_{f(x)} = A
$$


In [6]:
import torch
torch.manual_seed(47)

# Define A as a 2x3 matrix and x as a 3x1 vector
A = torch.randn(2, 3)
x = torch.randn(3, 1, requires_grad=True)

# f is a vector-valued function (in and out: vector)
def f(x):
    return A @ x

jacobian = (
    torch.autograd
    .functional
    .jacobian(f, x)
    .reshape(2, -1)
)

expected_jacobian = A
assert torch.allclose(jacobian, expected_jacobian)

### Rule-3

Given a function $f(x) = x^T A x$, where:
- $A$ is a $n \times n$ matrix,
- $x$ is a $n \times 1$ vector,

the gradient of $f(x)$ with respect to $x$ is:

$$
\nabla_x f = A x + A^T x
$$

#### Condition on $A$:
- If $A$ is **symmetric** ($A = A^T$), the gradient simplifies to, $\nabla_x f = 2 A x$


In [7]:
import torch; torch.manual_seed(47)

A = torch.randn(2, 2);
x = torch.randn(2, 1, requires_grad=True)

def grad_f(A, x):
    f = x.T @ A @ x
    f.backward()
    return x.grad

expected_gradient = A @ x + A.T @ x
calculated_gradient = grad_f(A, x)

assert torch.allclose(expected_gradient, calculated_gradient)
print(calculated_gradient.tolist())

[[-2.916311502456665], [0.7888590097427368]]


### Rule-4 ###

Given a function $f(x, y) = x^T A y$, where:
- $A$ is a $n \times n$ matrix,
- $x$ is a $n \times 1$ vector,
- $y$ is a $n \times 1$ vector,

the gradients of $f(x, y)$ with respect to $x$ and $y$ are:

$$
\nabla_x f = A y
$$

$$
\nabla_y f = A^T x
$$


In [8]:
import torch; torch.manual_seed(47)

A = torch.randn(2, 2)
x = torch.randn(2, 1, requires_grad=True)
y = torch.randn(2, 1, requires_grad=True)

def grad_f(A, x, y):
    f = x.T @ A @ y
    f.backward()
    return x.grad, y.grad

expected_grad_x = A @ y
expected_grad_y = A.T @ x
calculated_grad_x, calculated_grad_y = grad_f(A, x, y)

assert torch.allclose(expected_grad_x, calculated_grad_x)
assert torch.allclose(expected_grad_y, calculated_grad_y)


print("Calculated Gradient with respect to y:")
print(calculated_grad_y)

Calculated Gradient with respect to y:
tensor([[-2.9293],
        [ 1.1403]])


### Rule-5

Given a function $ f(X) = a^T X b $, where:
- $ a $ is a $n \times 1 $ column vector,
- $ X $ is a $n \times m $ matrix,
- $ b $ is a $m \times 1 $ column vector,

the gradient of $ f(X) $ with respect to $X $ is:

$$
\nabla_X (a^T X b) = a b^T
$$


In [9]:
import torch
torch.manual_seed(47)

a = torch.randn(3, 1)
b = torch.randn(2, 1)
X = torch.randn(3, 2, requires_grad=True)

def grad_f(X, a, b):
    f = a.T @ X @ b
    f.backward()
    return X.grad

calculated_grad_X = grad_f(X, a, b)
expected_grad_X = a @ b.T
assert torch.allclose(expected_grad_X, calculated_grad_X)
print(calculated_grad_X)

tensor([[-0.8419, -0.8833],
        [ 0.4331,  0.4544],
        [-0.9886, -1.0373]])
