### Some basic functions of auto differentiation

Start with package import

In [22]:
import torch
import torch.nn as nn
from torch.nn.utils import parameters_to_vector
from torch.autograd import grad
from second import jvp, hvp

Create a simple model

In [30]:
input_dim = 3
output_dim = 1
model = nn.Sequential(
    nn.Linear(input_dim, 5), 
    nn.Sigmoid(),
    nn.Linear(5, output_dim))
criterion = nn.MSELoss(reduction='mean')

Create a dummy data and compute the loss

In [35]:
x = torch.randn(1, input_dim)
y = torch.randn(1,1)
output = model(x)
loss = criterion(output, y)

Perform autograd

In [36]:
gradient = grad(
    loss,
    tuple(model.parameters())
)
gradient

(tensor([[-0.0283, -0.0011,  0.0244],
         [-0.1512, -0.0061,  0.1303],
         [-0.0452, -0.0018,  0.0390],
         [-0.1673, -0.0067,  0.1442],
         [ 0.2585,  0.0104, -0.2228]]),
 tensor([-0.0269, -0.1434, -0.0429, -0.1588,  0.2453]),
 tensor([[0.8887, 1.0663, 0.5952, 1.3538, 1.2464]]),
 tensor([2.6375]))

The above result of gradients contains the weights and biases for two layers (4 tensors in total)

#### Jacobian-vector product (JVP)
Given a function $f: \mathbb{R}^n \to \mathbb{R}^m$, the Jacobian matrix $\nabla f$ is an $m \times n$ matrix. The Jacobian-vector product $(\nabla f) \mathbf{v}$ is the product of an $m \times n$ matrix and an $n \times 1$ vector. In the ```second.py```, we implement this function as
```
def jvp(outputs, inputs, vector, create_graph=False):
    """Jacobian vector product
        This version where vector is flatten
    """

    if isinstance(outputs, tuple):
        dummy = [torch.zeros_like(o, requires_grad=True) for o in outputs]
    else:
        dummy = torch.zeros_like(outputs, requires_grad=True)

    jacobian = grad(outputs,inputs, grad_outputs=dummy, create_graph=True)
    Jv = grad(parameters_to_vector(jacobian), dummy, grad_outputs=vector, create_graph=create_graph)
    return parameters_to_vector(Jv)
```

We need this implementation in Pytorch because the existing ```torch.autograd.grad``` only allow the option vector-Jacobian product which is $\mathbf{v}^\top\nabla f$, a product of an $m \times 1$ vector and $m \times n$ matrix

In [15]:
vector = []
for p in model.parameters():
    vector += [torch.randn_like(p)]

Jv = jvp(loss, tuple(model.parameters()), vector=parameters_to_vector(vector))

In [21]:
Jv2 = grad(loss, tuple(model.parameters()), grad_outputs=torch.zeros_like(loss))

In [24]:
output = model(x)
loss = criterion(output, y)
Hv = hvp(loss, tuple(model.parameters()), vector=parameters_to_vector(vector))

  return F.mse_loss(input, target, reduction=self.reduction)


In [25]:
Hv

tensor([ 1.0711,  0.3990, -0.1729, -0.4672, -0.1740,  0.0754, -0.9558, -0.3560,
         0.1543, -0.9900, -0.3688,  0.1598, -1.3511, -0.5032,  0.2181,  1.0908,
        -0.4758, -0.9734, -1.0082, -1.3759, -0.6078, -1.9473, -2.4624, -4.1942,
        -4.7859, -6.3056])