# Calculating Jacobi matrix in Pytorch

In this tutorial we address the question how to calculate the Jacobi matrix elements in Pytorch. The purpose is to show an approach worth following when necessary. Jacobi matrix elements can be intereseting in case of a CAE (Contractive Auto Encoder). This auto encoder requires the Frobenius norm of the Jacobi matrix in the last layer of the encoder part. The motivation behind this is that CAE is an auto encoder that is less sensitive for the small changes in the input. 

Here we are dealing with a simple example to show and check how the calculation works. First, lets define the Jacobi matrix in mathematical terms:

\begin{equation}
J_{ij} = \frac{\partial F_i(x)}{\partial x_j}
\end{equation}

Then the Frobenius norm:

\begin{equation}
|J|^2_F = \sum_{i,j}{J^2_{ij}}
\end{equation}

Where $x$ is the input vector while $F$ is a function which calculates the output $y$. In our example $F$:

\begin{equation}
F(x) = \sigma\left( \underline{\underline{W}}x\right)
\end{equation}

$\sigma$ is the sigmoid function. 

## Calculating with pytorch:

In [1]:
import torch
import torch.nn.functional as F
from torch.autograd import grad # this function is for calculating the gradients

In [2]:
W = torch.tensor([[0.1, 0.3, 0.6], [1.5, 0.3, 4.1], [0.2, -0.3, -0.7]], requires_grad=False)
x = torch.tensor([2.0, 1.0, 1.0], requires_grad=True) # this is for calculating gradient according to x
z = torch.matmul(W, x)
y = torch.sigmoid(z)

In [3]:
J = torch.zeros((3, 3))

In [4]:
# retain_graph = True is important because the gradient should be calculated for all elements of y
# grad requires a scalar as output that is the reason for the cycle
for i in range(3):
    J[i] = grad(y[i], x, retain_graph=True)[0] # [0] is because the result of grad is a tuple
print(J)

tensor([[ 0.0187,  0.0562,  0.1124],
        [ 0.0009,  0.0002,  0.0025],
        [ 0.0458, -0.0686, -0.1601]])


In [5]:
# calculating the Frobenius norm, the torch.norm function does the same but it gives the square root
F = J.pow(2).sum()
print(F)

tensor(0.0486)


## Calculate the gradient manually:

In [6]:
J_calc = torch.diag(y*(1-y)).matmul(W)
print(J_calc)

tensor([[ 0.0187,  0.0562,  0.1124],
        [ 0.0009,  0.0002,  0.0025],
        [ 0.0458, -0.0686, -0.1601]], grad_fn=<MmBackward>)


In [7]:
F_calc = J_calc.pow(2).sum()
print(F_calc)

tensor(0.0486, grad_fn=<SumBackward0>)


In [8]:
torch.le(torch.abs(F - F_calc), 1e-7)

tensor(1, dtype=torch.uint8)