In [1]:
import torch
from torch import nn
import numpy as np

In [24]:
net = nn.Linear(2,1)
net.weight, net.bias

(Parameter containing:
 tensor([[-0.1265, -0.4986]], requires_grad=True),
 Parameter containing:
 tensor([-0.5391], requires_grad=True))

As stated before, the neural network calculates $y = w_1 x_1 + w_2 x_2$.

In [25]:
x = torch.tensor([1.0, 2.0])
y = net(x)
y

tensor([-1.6627], grad_fn=<AddBackward0>)

In [26]:
np.sum(x.detach().numpy()*net.weight.detach().numpy()) + net.bias.detach().numpy()

array([-1.6627022], dtype=float32)

The overall idea is to tune the neural network's weights in order to minimize some error function $e = f(y, z)$. The error function depends on the neural network's output $y$ and the "observation" $z$.

We usually modify the weights along the gradient $\partial e / \partial w_1$ which can be expressed as follows,

$$
\frac{\partial e}{\partial w_1} = \frac{\partial f}{\partial y} \frac{\partial y}{\partial w_1}
$$

The first (partial) derivative depends on the error function and we will deal with that later. For now, let's calculate the gradient of $y$ wrt to the neural network's weights, $w_i$. We have

$$
\frac{\partial y}{\partial w_1} = x_1
$$

In [27]:
y.backward()
net.weight.grad

tensor([[1., 2.]])