# The package pytorch

In [1]:
import torch

## Tensor manipulation

In [None]:
x = torch.arange(12.)
x

In [None]:
x.numel()

In [None]:
x.shape

Change the shape of a tensor
without altering its size or values

In [None]:
A = x.reshape(3, 4)
A

Supplying the exact values for each element

In [6]:
A = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

`[-1]` selects the last row

In [None]:
A[-1]

`[1:3]` selects the second and third rows

In [None]:
A[1:3]

Apply a function component-wise

In [None]:
torch.exp(A)

Write elements by specifying indices

In [None]:
B = A.clone()
B[1, 2] = 17
B

To assign multiple elements the same value,
we apply the indexing on the left-hand side 
of the assignment operation

In [None]:
B[:2, :] = 12
B

*concatenate* multiple tensors

In [None]:
torch.cat((A, B), dim=0)

In [None]:
torch.cat((A, B), dim=1)

Construct a binary tensor via *logical statements*

In [None]:
A == B

**Broadcasting**: Perform elementwise binary operations

In [None]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

In [None]:
a + b

In [17]:
import torch
A = torch.arange(6.).reshape(2, 3)
B = A

Elementwise sum or product

In [None]:
A + B, A * B

Adding or multiplying a scalar and a tensor

In [None]:
A + 3, A * 0.5

Sums over the elements

In [None]:
A

In [None]:
A.sum()

Specify the axes 
along which the tensor should be reduced

In [None]:
A.sum(axis=0)

In [None]:
A.sum(axis=1)

In [None]:
A.sum(axis=[0, 1]) == A.sum()

Keep the number of axes unchanged

In [None]:
A.sum(axis=0).shape

In [None]:
A.sum(axis=0, keepdim=True).shape

The *dot product* of two vectors

In [None]:
x = torch.arange(3.)
y = torch.ones(3, dtype = torch.float32)
x, y, torch.dot(x, y), x @ y

Matrix-vector product

In [None]:
A.shape

In [None]:
torch.matmul(A, x), A @ x

Matrix product

In [None]:
B = torch.ones(3, 4)
torch.matmul(A, B), A @ B

The $\ell_2$ *norm*
$$\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2}$$

In [None]:
u = torch.tensor([3.0, -4.0])
torch.norm(u)

The $\ell_1$ norm
$$\|\mathbf{x}\|_1 = \sum_{i=1}^n \left|x_i \right|$$

In [None]:
torch.abs(u).sum()

The *Frobenius norm*, 
which is much easier to compute
$$\|\mathbf{X}\|_\textrm{F} = \sqrt{\sum_{i=1}^m \sum_{j=1}^n x_{ij}^2}$$

In [None]:
A = torch.ones((4, 9))
torch.norm(A)

## Automatic Differentiation



In [34]:
import torch

Differentiating the function
$y = 2\mathbf{x}^{\top}\mathbf{x}$
with respect to the column vector $\mathbf{x}$

In [None]:
x = torch.arange(4.0)
x

Before we calculate the gradient
of $y$ with respect to $\mathbf{x}$,
we need a place to store it

In [36]:
x.requires_grad_(True)
x.grad

We now calculate our function of `x` and assign the result to `y`

In [None]:
y = 2 * torch.dot(x, x)
y

We can now take the gradient of `y`
with respect to `x`

In [None]:
y.backward()
x.grad

We already know that the gradient of the function $y = 2\mathbf{x}^{\top}\mathbf{x}$
with respect to $\mathbf{x}$ should be $4\mathbf{x}$

In [None]:
x.grad == 4 * x

Now let's calculate 
another function of `x`
and take its gradient

In [None]:
x.grad.zero_()
y = x.sum()
y.backward()
x.grad

Sum up the gradients
computed individually for each example

In [None]:
x.grad.zero_()
y = x * x
y.backward(gradient=torch.ones(len(y)))
x.grad

Note: The argument `gradient` specifies a vector to multiply with the Jacobian matrix.

Move some calculations
outside of the recorded computational graph

In [None]:
x.grad.zero_()
y = x * x
u = y.detach()
z = u * x

z.sum().backward()
x.grad == u

In [None]:
x.grad.zero_()
y.sum().backward()
x.grad == 2 * x

Even if
a function required passing through a maze of Python control flow
we can still calculate the gradient of the resulting variable

In [None]:
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()

a.grad == d / a