# Deep Learning with PyTorch: A 60 Minute Blitz

Goal of this tutorial:
- Understand PyTorch’s Tensor library and neural networks at a high level.
- Train a small neural network to classify images.

## Part 2: Autograd: Automatic Differentiation

Central to all neural networks in PyTorch is the <b>autograd</b> package. Let’s first briefly visit this, and we will then go to training our first neural network.

The <b>autograd</b> package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

### Tensor

<b>torch.Tensor</b> is the central class of the package. If you set its attribute <b>.requires_grad</b> as <b>True</b>, it starts to track all operations on it. When you finish your computation you can call <b>.backward()</b> and have all the gradients computed automatically. The gradient for this tensor will be accumulated into <b>.grad</b> attribute.

To stop a tensor from tracking history, you can call <b>.detach()</b> to detach it from the computation history, and to prevent future computation from being tracked.

To prevent tracking history (and using memory), you can also wrap the code block in <b>with torch.no_grad():</b>. This can be particularly helpful when evaluating a model because the model may have trainable parameters with <b>requires_grad=True</b>, but for which we don’t need the gradients.

There’s one more class which is very important for autograd implementation - a <b>Function</b>.

<b>Tensor</b> and <b>Function</b> are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a <b>.grad_fn</b> attribute that references a <b>Function</b> that has created the <b>Tensor</b> (except for Tensors created by the user - their <b>grad_fn is None</b>).

If you want to compute the derivatives, you can call <b>.backward()</b> on a <b>Tensor</b>. If <b>Tensor</b> is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to <b>backward()</b>, however if it has more elements, you need to specify a <b>gradient</b> argument that is a tensor of matching shape.

In [1]:
import torch

Create a tensor and set <b>requires_grad=True</b> to track computation with it

In [2]:
x = torch.ones(2, 2, requires_grad=True)
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Do a tensor operation:

In [3]:
y = x + 2
y

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

<b>y</b> was created as a result of an operation, so it has a <b>grad_fn</b>.

In [4]:
y.grad_fn

<AddBackward0 at 0x11765b6d8>

In [5]:
z = y * y * 3
out = z.mean()
print(z)
print(out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)


<b>.requires_grad_( ... )</b> changes an existing Tensor’s <b>requires_grad</b> flag in-place. The input flag defaults to <b>False</b> if not given.

In [6]:
a = torch.randn(2, 2)
a = ((a*3) / (a-1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x10e6cc160>


### Gradients

Let’s backprop now. Because <b>out</b> contains a single scalar, <b>out.backward()</b> is equivalent to <b>out.backward(torch.tensor(1.))</b>.

In [7]:
out

tensor(27., grad_fn=<MeanBackward0>)

In [8]:
out.backward()
x.grad

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

Now let’s take a look at an example of vector-Jacobian product:

In [15]:
x = torch.randn(3, requires_grad=True)
x

tensor([-0.2009,  0.5411, -0.1914], requires_grad=True)

In [16]:
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(y)

tensor([-411.5365, 1108.1287, -391.9906], grad_fn=<MulBackward0>)


Now in this case <b>y</b> is no longer a scalar. <b>torch.autograd</b> could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to <b>backward</b> as argument:

In [17]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

x.grad

tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])

You can also stop autograd from tracking history on Tensors with <b>.requires_grad=True</b> either by wrapping the code block in <b>with torch.no_grad()</b>:

In [18]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


Or by using <b>.detach()</b> to get a new Tensor with the same content but that does not require gradients:

In [19]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

True
False
tensor(True)
