In [2]:
import torch

# Automatic gradient functionality

When you design a neural network, we humans think in a forward manner: we want data to get into the network and pass through layers and we usually can visualise what transformations of the data can or should be applied so we achieve our goal. Thus, the design and then the implementation of the so called forward pass through a network is something we humans can naturally grasp in our minds. 

But the way that neural networks are trained requires that all transformations of the forward pass, no matter how complex, are differentiated with respect to the weights of the neural network and all those gradients play then a role in weight update. The more complex were the transformations of the forward pass, the more complex will be the calculation of corresponding gradient functions. It is very inflexible, error prone and in some cases simply imposible or very hard to execute. It is a task for a machine.

The appearance of the dynamic expression tree building with automatic gradient calculation ability let us, humans, design only the forward pass (i.e. how the data is transformed in the neural network), while the engine of the framework takes care of correctly executing the backward pass. It gave the design of neural networks unseen flexibility and gave the whole Machine Learning field the boost we are experiencing nowadays.

One of the examples of frameworks that support dynamic expression tree building with automatic gradient calculation ability is PyTorch.

So let's see how it works in practice in Pytorch. We don't need a neural network to see it.

Let's have $f=x^3+y^2$ as an example. 
Then you can calculate by hand: 

$\frac{\delta f}{\delta x} = 3x^2, \frac{\delta f}{\delta y} = 2y$

Concretely, as an example with which we will work some more:

$\frac{\delta f}{\delta x} |_{x=2} = 12$

$\frac{\delta f}{\delta x} |_{x=3} = 27$

$\frac{\delta f}{\delta y} |_{y=4} = 8$

$\frac{\delta f}{\delta y} |_{y=5} = 10$

You get all this automatically with PyTorch, which builds a tree of an expression that is constructed as we go. Let's have an example in Python code. 

In [3]:
x = torch.tensor([2.], requires_grad=True)
y = torch.tensor([4.], requires_grad=True)
f = x**3 + y**2
f.backward()

To calculate gradients you need to call `backward()`. Let's see if we get

$\frac{\delta f}{\delta x} |_{x=2} = 12$

$\frac{\delta f}{\delta y} |_{y=4} = 8$

as expected. The gradients of $f$ with respect to variables $x$ and $y$ are held in the `grad` field in those variables (`x.grad`, `y.grad`).

In [4]:
x.grad

tensor([12.])

In [5]:
y.grad

tensor([8.])

OK, so far so good. But the weights in a neural network are not scalars, they are multidimensional entities, most commonly they are two dimensional matrices. In PyTorch, two dimensional matrices are called tensors of order two. What if $x$ and $y$ were tensors of order two? Let's have a look

In [6]:
x = torch.tensor([[2., 3., 2.], [2., 3., 3.]], requires_grad=True)
y = torch.tensor([[4., 5., 5.], [4., 5., 5.]], requires_grad=True)
f = x**3 + y**2
f.sum().backward()

Notice the last line, `f.sum().backward()`. Recall, that to calculate gradients you need to call `backward()`, but you may call it on scalar variables only, because loss that we calculate for a neural network is a scalar. Hence, we use such a trick to force PyTorch to calculate our gradients for us.

In [7]:
x.grad

tensor([[12., 27., 12.],
        [12., 27., 27.]])

In [8]:
y.grad

tensor([[ 8., 10., 10.],
        [ 8., 10., 10.]])

OK, results are as expected (i.e. as calculated by hand earlier)

Now, let us consider another function. 

$f=xy$

$\frac{\delta f}{\delta x} = y$

$\frac{\delta f}{\delta y} = x$


In [9]:
x = torch.tensor([[1., 2., 3.], [2., 4., 6.]], requires_grad=True)
y = torch.tensor([[3., 6., 9.], [-1., 1., 2.]], requires_grad=True)
f = x*y
f.sum().backward()
x.grad

tensor([[ 3.,  6.,  9.],
        [-1.,  1.,  2.]])

Well, $\frac{\delta f}{\delta x} = y$ doesn't it?

# Your task

Your task is to calculate values of

$\frac{\delta f}{\delta x} |_{x=2.0, y=3.0}$

$\frac{\delta f}{\delta y} |_{x=1.5, y=-1.5}$

for

$f=\frac{sin(xy)}{sinx}$
