# Pytorch ðŸ”¥

Introduction to pytorch

This notebook is assembled from these sources:
- [practical-dl seminar](https://github.com/yandexdataschool/Practical_DL/blob/fall21/week02_autodiff/seminar_pytorch.ipynb)
- [hse dl-course homework](https://github.com/aosokin/dl_cshse_ami/blob/master/2021-fall/homeworks_small/shw2/DL21-fall-shw2.ipynb)
- [nyu dl course tensor tutorial](https://github.com/Atcold/pytorch-Deep-Learning/blob/master/01-tensor_tutorial.ipynb)
- [nyu dl course autograd tutorial](https://github.com/Atcold/pytorch-Deep-Learning/blob/master/03-autograd_tutorial.ipynb)
- [pytorch docs](https://pytorch.org/docs/stable/)

In [None]:
!nvidia-smi | head -n 3

In [None]:
!which python
!python -V

Installing pytorch (easier then ever):

- Better use virtualenv ([conda](https://docs.conda.io/en/latest/miniconda.html) **is ok**)
- `pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html` (https://pytorch.org/get-started/locally/)

In [1]:
import torch



In [2]:
torch.__version__

'1.10.0'

## Basics:

Jupyter lifehacks:

In [None]:
torch.sq  # <Tab>

In [None]:
# What about all `*Tensor`s?
# Press <esc> to get out of help
torch.*Tensor?

In [None]:
torch.nn.Module()  # <Shift>+<Tab>

In [None]:
# Annotate your functions / classes!
torch.nn.Module?

In [None]:
torch.nn.Module??

### Tensor class

In [None]:
# Generate a tensor of size 2x3x4
t = torch.Tensor(2, 3, 4)
type(t)

In [None]:
# Get the size of the tensor
t.size()

In [None]:
# t.size() is a classic tuple =>
print('t size:', ' \u00D7 '.join(map(str, t.size())))

In [None]:
# prints dimensional space and sub-dimensions
print(f'point in a {t.numel()} dimensional space')
print(f'organised in {t.dim()} sub-dimensions')

In [None]:
t

In [None]:
# Mind the underscore!
# Any operation that mutates a tensor in-place is post-fixed with an _.
# For example: x.copy_(y), x.t_(), x.random_(n) will change x.
t.random_(10)

In [None]:
t

In [None]:
# This resizes the tensor permanently 
r = torch.Tensor(t)
r.resize_(3, 8)
r

In [None]:
# As you can see zero_ would replace r with 0's which was originally filled with integers
r.zero_()

In [None]:
t

In [None]:
# This *is* important, sigh...
s = r.clone()

In [None]:
# In-place fill of 1's
s.fill_(1)
s

In [None]:
# Because we cloned r, even though we did an in-place operation, this doesn't affect r
r

### Vectors and matrices

In [None]:
# Creates a 1D tensor of integers 1 to 4
v = torch.Tensor([1, 2, 3, 4])
v

In [None]:
# Print number of dimensions (1D) and size of tensor
print(f'dim: {v.dim()}, size: {v.size()[0]}')

In [None]:
w = torch.Tensor([1, 0, 2, 0])
w

In [None]:
# Element-wise multiplication
v * w

In [None]:
# Scalar product: 1*1 + 2*0 + 3*2 + 4*0
v @ w

In [None]:
# In-place replacement of random number from 0 to 10
x = torch.Tensor(5).random_(10)
x

In [None]:
print(f'first: {x[0]}, last: {x[-1]}')

In [None]:
# Extract sub-Tensor [from:to)
x[1:2 + 1]

In [None]:
# But :.(
x[::-1]

In [None]:
v

In [None]:
# Create a tensor with integers ranging from 1 to 5, excluding 5
v = torch.arange(1, 4 + 1)
v

In [None]:
# Square all elements in the tensor
print(v.pow(2), v)

In [None]:
# Create a 2x4 tensor
m = torch.Tensor([[2, 5, 3, 7],
                  [4, 2, 1, 9]])
m

In [None]:
m.dim()

In [None]:
print(m.size(0), m.size(1), m.size(), sep=' -- ')

In [None]:
# Returns the total number of elements, hence num-el (number of elements)
m.numel()

In [None]:
# Indexing row 0, column 2 (0-indexed)
m[0][2]

In [None]:
# Indexing row 0, column 2 (0-indexed)
m[0, 2]

In [None]:
# Indexing column 1, all rows (returns size 2)
m[:, 1]

In [None]:
# Indexing column 1, all rows (returns size 2x1)
m[:, [1]]

In [None]:
# Indexing columns 1 and 3, all rows (returns size 2x2)
m[:, [1,3]]

In [None]:
# Indexes row 0, all columns (returns 1x4)
m[[0], :]

In [None]:
# Indexes row 0, all columns (returns size 4)
m[0, :]

In [None]:
# Create tensor of numbers from 1 to 5 (excluding 5)
v = torch.arange(1., 4 + 1)
v

In [None]:
m

In [None]:
# Scalar product
m @ v

In [None]:
# Calculated by 1*2 + 2*5 + 3*3 + 4*7
m[[0], :] @ v

In [None]:
# Calculated by 
m[[1], :] @ v

In [None]:
# Add a random tensor of size 2x4 to m
m + torch.rand(2, 4)

In [None]:
# Subtract a random tensor of size 2x4 to m
m - torch.rand(2, 4)

In [None]:
# Multiply a random tensor of size 2x4 to m
m * torch.rand(2, 4)

In [None]:
# Divide m by a random tensor of size 2x4
m / torch.rand(2, 4)

In [None]:
m.size()

In [None]:
# Transpose tensor m, which is essentially 2x4 to 4x2
m.t()

In [None]:
# Same as
m.transpose(0, 1)

### Constructors

In [None]:
# Create tensor from 3 to 8, with each having a space of 1
torch.arange(3., 8 + 1)

In [None]:
# Create tensor from 5.7 to -2.1 with each having a space of -3
torch.arange(5.7, -2.1, -3)

In [None]:
# returns a 1D tensor of steps equally spaced points between start=3, end=8 and steps=20
torch.linspace(3, 8, 20).view(1, -1)

In [None]:
# Create a tensor filled with 0's
torch.zeros(3, 5)

In [None]:
# Create a tensor filled with 1's
torch.ones(3, 2, 5)

In [None]:
# Create a tensor with the diagonal filled with 1
torch.eye(3)

### Casting

In [None]:
# Helper to get what kind of tensor types
torch.*Tensor?

In [None]:
m

In [None]:
# This is basically a 64 bit float tensor
m_double = m.double()
m_double

In [None]:
# This creates a tensor of type int8
m_byte = m.byte()
m_byte

In [None]:
# Move your tensor to GPU device 0 if there is one (first GPU in the system)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
m.to(device)

In [None]:
# Converts tensor to numpy array
m_np = m.numpy()
m_np

In [None]:
# In-place fill of column 0 and row 0 with value -1
m_np[0, 0] = -1
m_np

In [None]:
# Create a tensor of integers ranging from 0 to 4
import numpy as np
n_np = np.arange(5)
n = torch.from_numpy(n_np)
print(n_np, n)

In [None]:
# In-place multiplication of all elements by 2 for tensor n
# Because n is essentiall n_np, not a clone, this affects n_np
n.mul_(2)
n_np

### Named tensors

New addition to pytorch: https://pytorch.org/docs/stable/named_tensor.html

In [None]:
# To create a named tensor, just pass names for each dim
imgs = torch.randn(1, 2, 2, 3 , names=('N', 'C', 'H', 'W'))

In [None]:
imgs.names

In [None]:
renamed_imgs = imgs.rename(H='height', W='width')

In [None]:
renamed_imgs

In [None]:
# Names propagate
imgs.abs()

In [None]:
# Adding names to unnamed tensors
tensor = torch.randn(2, 3, 5, 7, 11)
tensor = tensor.refine_names('A', ..., 'B', 'C')
tensor.names

In [None]:
# Name matching (and how it could be usefull?)
x = torch.randn(3, names=('X',))
y = torch.randn(3)
z = torch.randn(3, names=('Z',))

In [None]:
z + z

In [None]:
x + y

In [None]:
x + z

In [None]:
# Binary ops unify names 
x = torch.randn(3, 3, names=('N', None))
y = torch.randn(3, 3, names=(None, 'C'))

x * y

In [None]:
# Won't work
x = torch.randn(3, 3, names=('N', 'C'))
y = torch.randn(3, names=('N',))

x * y

In [None]:
# Also won't work
x = torch.randn(3, 3, names=('N', None))
y = torch.randn(3, names=('N',))

x * y

In [None]:
# Explicit align
img = torch.randn(5,3,28,28, names=('N','C','H','W'))
scale = torch.randn(3, names=('C',))


In [None]:
img * scale

In [None]:
# No more unsqueeze and [...,None]
img * scale.align_as(img);

In [None]:
# Reorder
img.align_to('H', 'W', ...).names

In [None]:
# Contract away dims
img.sum(('H', 'W'))

In [None]:
m = torch.randn(10,10, names=('A', 'B'))
v = torch.randn(10, names=('C',))

In [None]:
# multiply
m @ v

In [None]:
# Permute dims and vector multiply
m.t() @ v

In [None]:
# bmm
x = torch.randn(3, 10, 4, 5, names=('A', 'B', 'C', 'D'))
y = torch.randn(10, 5, 8, names=('B', 'E', 'F'))
z = torch.matmul(x, y)

z.names

In [None]:
z.shape

**Note:** named tensors are still in development, some operations might not be supported, autograd support is also limited

---

### KNN in pytorch  `[TODO]`

Let's implement knn in pytorch `Tensor`'s

In [4]:
import matplotlib.pyplot as plt
from tqdm.auto import tqdm

Let's implement a knn classifier:

* Iterate through test_features (whole dataset probably wont fit in memory)
    * For each batch compute l2 nearest neighbors `[batch_size, k_neighbors]`
    * Retrieve each neighbors class `[batch_size, k_neighbors]`
    * Compute 'probabilities' with `sum_neighbors(exp(-l2/T))` (just weighted sum for neighbors)
    * return sorted classes `[test_size, n_classes]`

In [5]:
@torch.inference_mode()
def knn_classifier(
    train_features, train_labels, test_features, k, T=1, num_classes=7
):
    n_test, num_chunks = test_features.shape[0], 50
    n_per_chunk = n_test // num_chunks
    retrieval_one_hot = torch.zeros(k, num_classes).to(train_features.device)

    predictions = []

    for index in range(0, n_test, n_per_chunk):
        features = test_features[index : min(index + n_per_chunk, n_test)]
        batch_size = 
        l2 = torch.cdist(features, train_features)
        
        dist, ids = torch.topk(dim=1, k=2, largest=False)
        candidates = train_labels[None].expand(batch_size, -1)
        neighbors = torch.gather(candidates, 1, idx).long()
        neighbors[i]
        

    return torch.cat(predictions)


def accuracy(inputs, targets):
    return ((inputs == targets).sum() / inputs.shape[0]).item()

In [6]:
from sklearn.datasets import fetch_covtype
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import QuantileTransformer

X, y = fetch_covtype(return_X_y=True); y -= 1

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

quantile = QuantileTransformer()

X_train = quantile.fit_transform(X_train)
X_test  = quantile.transform(X_test)

# Convert to torch tensors

X_train = torch.from_numpy(X_train).float()
X_test  = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).long()
y_test  = torch.from_numpy(y_test).long()

In [7]:
X_train = X_train.to("cuda:2")
X_test  = X_test.to("cuda:2")
y_train = y_train.to("cuda:2")
y_test  = y_test.to("cuda:2")

AssertionError: Torch not compiled with CUDA enabled

In [8]:
predictions = knn_classifier(
    X_train, y_train, X_test,
    k=10,
)

NotImplementedError: Your code here

In [None]:
plt.hist(y_test.cpu().numpy());

In [None]:
accuracy(predictions[:,0], y_test)

In [None]:
accuracy(torch.zeros_like(y_test), y_test)

In [None]:
accuracy(torch.ones_like(y_test), y_test)

### More:

- *Torch* full API should be read at least once.
Hence, go [here](https://pytorch.org/docs/stable/index.html).
You'll find 100+ `Tensor` operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random numbers, etc are described.
- It's *almost* numpy, but not quite (but people are working on it https://data-apis.org/array-api/latest/purpose_and_scope.html)
- Cool library (einops): https://openreview.net/forum?id=oapKSVM2bcj
- Competition strong! https://github.com/google/jax

---
## Autograd

Basics

In [None]:
# Create a 2x2 tensor with gradient-accumulation capabilities
x = torch.tensor([[1, 2], [3, 4]], requires_grad=True, dtype=torch.float32)
print(x)

Do an operation on the tensor:

In [None]:
# Deduct 2 from all elements
y = x - 2
print(y)

``y`` was created as a result of an operation, so it has a ``grad_fn``.



In [None]:
print(y.grad_fn)

In [None]:
# What's happening here?
print(x.grad_fn)

In [None]:
# Let's dig further...
y.grad_fn

In [None]:
y.grad_fn.next_functions

In [None]:
y.grad_fn.next_functions[0][0]

In [None]:
y.grad_fn.next_functions[0][0].variable

In [None]:
# Do more operations on y
z = y * y * 3
a = z.mean()  # average

print(z)
print(a)

In [None]:
# Let's visualise the computational graph! (thks @szagoruyko)
from torchviz import make_dot

In [None]:
make_dot?

In [None]:
make_dot(a)

### Gradients

Let's backprop now `out.backward()` is equivalent to doing `out.backward(torch.tensor([1.0]))`

In [None]:
# Backprop
a.backward()

Print gradients $\frac{\text{d}a}{\text{d}x}$.




In [None]:
# Compute it by hand BEFORE executing this
print(x.grad)

You can do many crazy things with autograd!
> With Great *Flexibility* Comes Great Responsibility

In [None]:
# Dynamic graphs!
x = torch.randn(3, requires_grad=True)

y = x * 2
for i in range(10):
    y = y * 2
print(y)

In [None]:
# make_dot(y.mean())

In [None]:
# If we don't run backward on a scalar we need to specify the grad_output
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(x.grad)

### Inference

In [None]:
# This variable decides the tensor's range below
n = 3

In [None]:
# Both x and w that allows gradient accumulation
x = torch.arange(1., n + 1, requires_grad=True)
w = torch.ones(n, requires_grad=True)
z = w @ x
z.backward()
print(x.grad, w.grad, sep='\n')

In [None]:
# Only w that allows gradient accumulation
x = torch.arange(1., n + 1)
w = torch.ones(n, requires_grad=True)
z = w @ x
z.backward()
print(x.grad, w.grad, sep='\n')

In [None]:
# Both x and w that allows gradient accumulation
x = torch.arange(1., n + 1, requires_grad=True)
w = torch.ones(n, requires_grad=True)

# Non leaf node
h = w * x
h.retain_grad()

z = h.sum()
z.backward()
print(x.grad, w.grad, h.grad, sep='\n')

In [None]:
x = torch.arange(1., n + 1)
w = torch.ones(n, requires_grad=True)

# Regardless of what you do in this context, all torch tensors will not have gradient accumulation
with torch.no_grad():
    z = w @ x

try:
    z.backward()  # PyTorch will throw an error here, since z has no grad accum.
except RuntimeError as e:
    print('RuntimeError!!! >:[')
    print(e)

In [None]:
@torch.inference_mode()
def knn_classifier():
    # Your evaluation code
    pass

### More:
- Good blog post on backprop: https://colah.github.io/posts/2015-08-Backprop/
- Advanced, but fun: https://minitorch.github.io/
- Documentation of the automatic differentiation package is at
http://pytorch.org/docs/autograd.

## Linear regression

In [None]:
%%capture --no-display
from sklearn.datasets import load_boston
from IPython.display import clear_output

X, y = load_boston(return_X_y=True)


x = (X[:, -1] - X[:, -1].mean()) / X[:, -1].std()
y = (y - y.mean()) / y.std()

plt.scatter(x, y)
plt.show()

In [None]:
# model tensors
w = torch.zeros(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)

# data tensors
x = torch.from_numpy(x).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

for vv in [w, b, x, y]:
    print(vv.is_leaf, vv.requires_grad)

### `[TODO]`

In [None]:
for i in range(100):
    
    #compute loss
    y_pred = w * x  + b
    loss = torch.mean((y_pred - y)**2)
    
    # backprop
    loss.backward()

    # gradient descent step for weights
    # take alpha about 0.1
    raise NotImplementedError("Your code here")

    w.data -= 0.1 * w.grad
    b.data -= 0.1 * b.grad

    
    #zero gradients
    w.grad.zero_()
    b.grad.zero_()
    
    #the rest of code is just bells and whistles
    if (i + 1) % 5==0:
        #draw linear regression prediction vs data
        clear_output(True)
        plt.axhline(0, color='gray')
        plt.axvline(0, color='gray')
        plt.scatter(x.numpy(),y.numpy())
        plt.plot(x.numpy(),y_pred.data.numpy(),color='orange')
        plt.show()

        print("loss = ", loss.item())
        if loss.item() < 0.5:
            print("Done!")
            break

## Higher level APIs

Above we've coded linear regression and basic gradient descent by hand. In practice it becomes cumbersome to manage parameters, their updates when you go beyond linear regression. Pytorch also has high-level api's with common nn building blocks, optimizers, distributed training utils and more. (see [docs](https://pytorch.org/docs/stable/) for examples)

In [None]:
!wget --quiet --show-progress "https://raw.githubusercontent.com/aosokin/dl_cshse_ami/master/2021-fall/homeworks_small/shw2/util.py"

In [None]:
# MNIST again
from util import load_mnist
X_train, y_train, X_val, y_val, X_test, y_test = load_mnist(flatten=True)

fig = plt.figure(figsize=[6, 6], dpi=80)
for i in range(4):
    plt.subplot(2, 2, i + 1)
    plt.title("Label: %i" % y_train[i])
    plt.imshow(X_train[i].reshape([28, 28]), cmap='gray');
fig.tight_layout()

In [None]:
import torch.nn as nn
import torch.nn.functional as F

In [None]:
nn.Module?

In [None]:
class Net(nn.Module):
    def __init__(self, hidden_size=40):
        super(Net, self).__init__()
        # here you construct weights for layers
        self.fc1 = nn.Linear(X_train.shape[1], hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, 10)
        
    def forward(self, x):
        # here you describe usage of layers
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

In [None]:
model = Net()

In [None]:
list(model.parameters())

In [None]:
from torch.utils.data import TensorDataset, DataLoader

In [None]:
train_ds = TensorDataset(*map(lambda x: torch.from_numpy(x.copy()).to("cuda:2"), [X_train, y_train]))
test_ds = TensorDataset(*map(lambda x: torch.from_numpy(x.copy()).to("cuda:2"), [X_test, y_test]))    

In [None]:
train_dl = DataLoader(train_ds, batch_size=128, shuffle=True, drop_last=True)
test_dl  = DataLoader(test_ds, batch_size=128, shuffle=False)

In [None]:
# Write a train function
def train(model, optimizer, batchsize=32):
    loss_log, acc_log = [], []
        
    model.train()
    for x_batch, y_batch in tqdm(train_dl, leave=False):
        optimizer.zero_grad()
        output = model(x_batch)
        loss = F.cross_entropy(output, y_batch)
        # compute gradients
        loss.backward()
        # make a step
        optimizer.step()

        pred = torch.max(output, 1).indices
        acc = (pred == y_batch).sum() / y_batch.shape[0]
        acc_log.append(acc.item())

        loss = loss.item()
        loss_log.append(loss)
    return loss_log, acc_log


# TODO: write a validation function
@torch.inference_mode()
def test(model):
    loss_log, acc_log = [], []
    model.eval()

    for x_batch, y_batch in tqdm(test_dl, leave=False):
        output = model(x_batch)
        loss = F.cross_entropy(output, y_batch)

        # compute gradients
        loss = loss.item()
        loss_log.append(loss)
        
        pred = torch.max(output, 1).indices
        acc = (pred == y_batch).sum() / y_batch.shape[0]
        acc_log.append(acc.item())

    return loss_log, acc_log


def plot_history(train_history, val_history, title='loss'):
    plt.figure()

    plt.title('{} at {} epoch'.format(title, epoch))
    plt.plot(train_history, label='train', zorder=1)
    
    points = torch.tensor(val_history)
    
    plt.scatter(points[:, 0], points[:, 1], marker='+', s=180, c='orange', label='val', zorder=2)
    plt.xlabel('train steps')
    
    plt.legend(loc='best')
    plt.grid()

    plt.show()

In [None]:
from statistics import mean

train_log, train_acc_log = [],[]
val_log, val_acc_log = [],[]

model = Net().to("cuda:2")
opt = torch.optim.SGD(model.parameters(), lr=0.0005, momentum=0.95)

for epoch in range(10):
    train_loss, train_acc = train(model, opt)
    val_loss, val_acc = test(model)
    
    # store metrics
    # <your code>
    train_log.extend(train_loss)
    train_acc_log.extend(train_acc)
    
    val_log.append((steps * (epoch + 1), mean(val_loss)))
    val_acc_log.append((steps * (epoch + 1), mean(val_acc)))
    
    # plot all metrics (loss and acc for train/val)
    # <your code>
    clear_output()
    plot_history(train_log, val_log)    
    plot_history(train_acc_log, val_acc_log, title='accuracy')    

### More:
Also with all links from above:
- https://pytorch.org/tutorials/
- https://pytorch.org/ecosystem/
- Pytorch examples - a repo that implements many cool DL models in pytorch - https://github.com/pytorch
- More on new pytorch data-loading - https://github.com/pytorch/data