# Introduction of torch.autograd
torch.autograd is PyTorch’s automatic differentiation
engine 自动差分引擎 that powers neural network training.
In this section, you will get a conceptual
understanding of how autograd helps a neural network
train.

## Background
Neural networks (NNs) are a collection of nested
functions that are executed on some input data.
These functions are defined by *parameters* (consisting
of weights and biases 由权重和偏差组成), which is in
PyTorch are stored in tensors.

Training a NN happens in two steps:<br>
**Forward Propagation 正向传播**: In forward prop, the
NN makes its best guess about the correct output. It
runs the input data through each of its functions to
make this guess<br>
**Backward Propagation**: In back prop, the NN adjusts
its parameters proportionate 误差 to the error in its
guess. It does this by traversing backwards from the
output, collecting the derivatives of the error 误差导数
with respect to the parameters of the functions (gradient),
and optimizing the parameters using gradient descent
梯度下降. For a more detailed walk-through of backprop,
check out this [video](https://www.youtube.com/watch?v=tIeHLnjs5U8)

## Usage in PyTorch
Let's take a look at a single training step. For this example,
we load a pretrained resnet18 model from *torchvision*. We
create a random data tensor to represent a single image
with 3 channels, and height & width of 64, and its correspond
*label* initialized to some random values

In [8]:
import torch, torchvision

model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

ModuleNotFoundError: No module named 'torch'

Next, we run the input data through the model through each
of its layers to make a prediction. This is the **forward
pass**.

In [None]:
prediction = model(data)  # forward pass

We use the model's prediction, and the corresponding label
to calculate the error 误差 (loss). The next step is to
back propagate this error through the network. Backward
propagation is kicked off when we call *.backward()* on the
error tensor. Autograd then calculates and stores the
gradients for each model parameter in the parameter's
*.grad* attribute. 然后，Autograd 会为每个模型参数计算梯度
并将其存储在参数的.grad属性中

In [3]:
loss = (prediction - labels).sum()
loss.backward()  # backward pass

Next, we load an optimizer, in this case SGD with a learning
rate of 0.01 and momentum 动量 of 0.9. We register all the
parameters of the model in the optimizer.

In [4]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2,
                        momentum=0.9)

Finally, we call *.step()* to initiate gradient descent.
The optimizer adjusts each parameter by its gradient
stored in *.grad*.

In [5]:
optim.step()  # gradient descent

At this point, you have everything you need to train your
neural network. The below sections detail the workings of
autograd - feel free to skip them.
****

## Differentiation in Autograd
Let's take a look at how *autograd* collects gradients. We
create two tensors *a* and *b* with *requires_grad=True*.
This signals to *autograd* that every operation on them
should be tracked.

In [6]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

We create another tensor Q from a and b
$Q = 3a^3-b^2$

In [7]:
Q = 3*a**3 - b**2

Let's assume a and b to be parameters of an NN, and Q to
be the error. In NN training, we want gradients of the
error w.r.t. parameters, i.e.

$$\frac{\partial Q}{\partial a} = 9a^2$$ <br>
$$\frac{\partial Q}{\partial b} = -2b^2$$

When we call *.backward()* on Q, autograd calculates these
gradients and stores them in the respective tensor's
*.grad* attribute

We need to explicitly 明确的 pass a *gradient* argument in
*Q.backward()* because it is a vector *.gradient* is a
tensor of same shape as Q, and it represents the gradient
of Q w.r.t. itself, i.e.<br>
我们需要在Q.backward()中显式传递gradient参数，因为它是向量。
gradient是与Q形状相同的张量，它表示Q相对于本身的梯度，即

$$\frac{dQ}{dQ} = 1$$

Equivalently 相当于, we can also aggregate 合计 Q into a
scalar and call backward implicitly 暗中的, like
*Q.sum().backward()*.<br>
同样，我们也可以将Q聚合为一个标量，然后隐式地向后调用，
例如Q.sum().backward()。

In [8]:
external_grad = torch.tensor([1.,1.])
Q.backward(gradient=external_grad)

Gradients are now deposited in *a.grad* and *b.grad* <br>
梯度现在沉积在a.grad和b.grad中

In [9]:
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])


## Optional Reading - Vector Calculus using autograd
