# PyTorch Basics

## Init, helpers, utils, ...

In [None]:
%matplotlib inline

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision

In [None]:
from pprint import pprint

import matplotlib.pyplot as plt
import numpy as np
from IPython.core.debugger import set_trace

# Tensors
tensors - the atoms of machine learning

## Tensors in numpy and pytorch

In [None]:
import numpy as np
from numpy.linalg import inv
from numpy.linalg import multi_dot as mdot

In [None]:
# numpy
np.eye(3)

In [None]:
# torch
torch.eye(3)

In [None]:
# numpy
X = np.random.random((5, 3))
X

In [None]:
# pytorch
Y = torch.rand((5, 3))
Y

In [None]:
X.shape

In [None]:
Y.shape

In [None]:
# numpy
X.T @ X

In [None]:
# torch
Y.t() @ Y

In [None]:
# numpy
inv(X.T @ X)

In [None]:
# torch
torch.inverse(Y.t() @ Y)

## More on PyTorch Tensors

Operations are also available as methods.

In [None]:
A = torch.eye(3)
A.add(1)

In [None]:
A

Any operation that mutates a tensor in-place has a `_` suffix.

In [None]:
A.add_(1)
A

## Indexing and broadcasting
It works as expected/like numpy:

In [None]:
A[0, 0]

In [None]:
A[0]

In [None]:
A[0:2]

In [None]:
A[:, 1:3]

## Converting

In [None]:
A = torch.eye(3)
A

In [None]:
# torch --> numpy
B = A.numpy()
B

Note: torch and numpy can share the same memory / zero-copy

In [None]:
A.add_(.5)
A

In [None]:
B

In [None]:
# numpy --> torch
torch.from_numpy(np.eye(3))

## Much more

In [None]:
[o for o  in dir(torch) if not o.startswith("_")]

In [None]:
[o for o  in dir(A) if not o.startswith("_")]

# But what about the GPU?
How do I use the GPU?

If you have a GPU make sure that the right pytorch is installed
(check https://pytorch.org/ for details).

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

If you have a GPU you should get something like: 
`device(type='cuda', index=0)`

You can move data to the GPU by doing `.to(device)`.

In [None]:
data = torch.eye(3)
data.to(device)

Now the computation happens on the GPU.

In [None]:
res = data + data
res

In [None]:
res.device

Note: before `v0.4` one had to use `.cuda()` and `.cpu()` to move stuff to and from the GPU.
This littered the code with many:
```python
if CUDA:
    model = model.cuda()
```

# Automatic differentiation with `autograd`
Prior to `v0.4` PyTorch used the class `Variable` to record gradients. You had to wrap `Tensor`s in `Variable`s.
`Variable`s behaved exactly like `Tensors`.

With `v0.4` `Tensor` can record gradients directly if you tell it do do so, e.g. `torch.ones(3, requires_grad=True)`.
There is no need for `Variable` anymore.
Many tutorials still use `Variable`, be aware!

Ref:
- https://pytorch.org/docs/stable/autograd.html
- https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

You rarely use `torch.autograd` directly.
Pretty much everything is part or `torch.Tensor` now.
Simply add `requires_grad=True` to the tensors you want to calculate the gradients for.
`nn.Module` track gradients automatically.

In [None]:
from torch import autograd

In [None]:
x = torch.tensor(2.)
x

In [None]:
x = torch.tensor(2., requires_grad=True)
x

In [None]:
print(x.requires_grad)

In [None]:
print(x.grad)

In [None]:
y = x ** 2

print("Grad of x:", x.grad)

In [None]:
y = x ** 2
y.backward()

print("Grad of x:", x.grad)

In [None]:
# What is going to happen here?
x = torch.tensor(2.)
x.backward()

In [None]:
# Don't record the gradient
# Useful for inference

params = torch.tensor(2., requires_grad=True)

with torch.no_grad():
    y = x * x
    print(x.grad_fn)

`nn.Module` and `nn.Parameter` keep track of gradients for you.

In [None]:
lin = nn.Linear(2, 1, bias=True)
lin.weight

In [None]:
type(lin.weight)

In [None]:
isinstance(lin.weight, torch.FloatTensor)

# Exercise
- Do you remember the analytical solution to solve for the parameters of linear regression? Implement it.