In [1]:
import torch

In [3]:
x = torch.arange(12, dtype=torch.float32)
x

tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

In [4]:
# get number of elements
x.numel()

12

### Differentiation

Say, we are interested in finding the derivative of $y=2x^Tx$ with respect to x.

In [5]:
x = torch.arange(4.)
x

tensor([0., 1., 2., 3.])

Before we calculate the gradient of y with respect to x, we need a place to store it. In general, we avoid allocating new memory every time we take a derivative because deep learning requires successively computing derivatives with respect to the same parameters thousands or millions of times, and we might risk running out of memory. Note that the gradient of a scalar-valued function with respect to a vector x is vector-valued and has the same shape as x

In [7]:
x.requires_grad_(True)
print(x)

# or a better alternative:
x = torch.arange(4., requires_grad=True)
x

tensor([0., 1., 2., 3.], requires_grad=True)


tensor([0., 1., 2., 3.], requires_grad=True)

In [8]:
# now calculate the function
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

We can now take the gradient of y with respect to x by calling its backward method. Next, we can access the gradient via x’s grad attribute.

In [9]:
y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

Now let’s calculate another function of x and take its gradient. Note that PyTorch does not
automatically reset the gradient buffer when we record a new gradient. Instead the new gra-
dient is added to the already stored gradient. This behavior comes in handy when we want
to optimize the sum of multiple objective functions. To reset the gradient buffer, we can call
x.grad.zero() as follows:

In [11]:
x.grad.zero_()
y = x.sum()
y.backward()
x.grad

tensor([1., 1., 1., 1.])