Ниже представлен ноутбук, заимствованный с [официального сайта](https://pytorch.org/tutorials/index.html) PyTorch

Computation Graphs and Automatic Differentiation
================================================

The concept of a computation graph is essential to efficient deep
learning programming, because it allows you to not have to write the
back propagation gradients yourself. A computation graph is simply a
specification of how your data is combined to give you the output. Since
the graph totally specifies what parameters were involved with which
operations, it contains enough information to compute derivatives. This
probably sounds vague, so let's see what is going on using the
fundamental flag ``requires_grad``.

First, think from a programmers perspective. What is stored in the
torch.Tensor objects we were creating above? Obviously the data and the
shape, and maybe a few other things. But when we added two tensors
together, we got an output tensor. All this output tensor knows is its
data and shape. It has no idea that it was the sum of two other tensors
(it could have been read in from a file, it could be the result of some
other operation, etc.)

**Important:** If ``requires_grad=True``, the Tensor object keeps track of how it was
created. Let's see it in action.




In [None]:
import torch

torch.manual_seed(1)

In [None]:
# Tensor factory methods have a ``requires_grad`` flag
x = torch.tensor([1., 2., 3], requires_grad=True)

# With requires_grad=True, you can still do all the operations you previously
# could
y = torch.tensor([4., 5., 6], requires_grad=True)
t = x + y
print(t)

# BUT t knows something extra.
print(t.grad_fn)

So Tensors know what created them. t knows that it wasn't read in from
a file, it wasn't the result of a multiplication or exponential or
whatever. And if you keep following t.grad_fn, you will find yourself at
x and y.

But how does that help us compute a gradient?




In [None]:
# Let's sum up all the entries in z
s = t.sum()
print(s)
print(s.grad_fn)

So now, what is the derivative of this sum with respect to the first
component of x? In math, we want

\begin{align}\frac{\partial s}{\partial x_0}\end{align}



Well, s knows that it was created as a sum of the tensor t. t knows
that it was the sum x + y. So

\begin{align}s = \overbrace{x_0 + y_0}^\text{$t_0$} + \overbrace{x_1 + y_1}^\text{$t_1$} + \overbrace{x_2 + y_2}^\text{$t_2$}\end{align}

And so s contains enough information to determine that the derivative
we want is 1!

Of course this glosses over the challenge of how to actually compute
that derivative. The point here is that s is carrying along enough
information that it is possible to compute it. In reality, the
developers of Pytorch program the sum() and + operations to know how to
compute their gradients, and run the back propagation algorithm. An
in-depth discussion of that algorithm is beyond the scope of this
tutorial.




Let's have Pytorch compute the gradient, and see that we were right:
(note if you run this block multiple times, the gradient will increment.
That is because Pytorch *accumulates* the gradient into the .grad
property, since for many models this is very convenient.)




In [None]:
print(x)

In [None]:
# calling .backward() on any variable will run backprop, starting from it.
s.backward()
print(x.grad)

Understanding what is going on in the block below is crucial for being a
successful programmer in deep learning.




In [None]:
x = torch.randn(2, 2)
y = torch.randn(2, 2)
# By default, user created Tensors have ``requires_grad=False``
print(x.requires_grad, y.requires_grad)
z = x + y
# So you can't backprop through z
print(z.grad_fn)

# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
# flag in-place. The input flag defaults to ``True`` if not given.
x = x.requires_grad_()
y = y.requires_grad_()
# z contains enough information to compute gradients, as we saw above
z = x + y
print(z.grad_fn)
# If any input to an operation has ``requires_grad=True``, so will the output
print(z.requires_grad)

# Now z has the computation history that relates itself to x and y
# Can we just take its values, and **detach** it from its history?
new_z = z.detach()

# ... does new_z have information to backprop to x and y?
# NO!
print(new_z.grad_fn)
# And how could it? ``z.detach()`` returns a tensor that shares the same storage
# as ``z``, but with the computation history forgotten. It doesn't know anything
# about how it was computed.
# In essence, we have broken the Tensor away from its past history

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

x = torch.ones([10], requires_grad=True)
# plt.plot(x)  # doesn't work
plt.plot(x.detach())

You can also stop autograd from tracking history on Tensors
with ``.requires_grad=True`` by wrapping the code block in
``with torch.no_grad():``



In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)


### Зачем нужно делать zero_grad()?

If a tensor already has grad attribute, e.g from the previous call to backward, subsequent calls will add to the value of this attribute. So, if you, e.g, backward() in a loop, you need to explicitely set grad to zero each time.

In [None]:
x = torch.tensor([1.], requires_grad=True)
x.grad is None

In [None]:
a = x * 2
b = x * 3
y = a + b  # == x * 5

# in this case y.backward() is the same as
a.backward()
b.backward()
x.grad

In [None]:
x = torch.tensor([1.], requires_grad=True)
print(x.grad is None)
a = x * 2
b = x * 3
a.backward()
print(x.grad)
b.backward()
print(x.grad)

In [None]:
x = torch.tensor([1.], requires_grad=True)
for i in range(2):
    y = x * 2
    y.backward()
    print(x.grad)

In [None]:
x = torch.tensor([1.], requires_grad=True)
for i in range(2):
    y = x * 2
    y.backward()
    print(x.grad)
    x.grad.zero_()

In [None]:
# optimizer.zero_grad()