# PyTorch

Google Colaboratory link: https://drive.google.com/file/d/1D89uzH6-9Qu7DPLJglqdHTgsr-7UX0E9/view?usp=sharing

PyTorch is a spritual successor of Torch and is being implemented by Facebook. 

It operates on various levels of abstraction:

* Tensor - something similar to `np.array` but can be stored on the GPU
* Variable - a part of a computational graph. Holds tensors as the value of the variable, as well as variable's gradients.
* Module - a neural network layer

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as torch_datasets
import torchvision.transforms as T

import numpy as np

from sklearn import datasets
import timeit
import matplotlib.pyplot as plt
from tqdm import tqdm_notebook as tqdm
from copy import deepcopy

We'll start by creating a simple, shallow model which we'll use to classify the Iris dataset ( https://archive.ics.uci.edu/ml/datasets/iris )

## First, let's load data!

In [None]:
iris = datasets.load_iris()
iris['data']
iris['target']

In [None]:
X = Variable(torch.FloatTensor(iris['data']), requires_grad=False)
y = Variable(torch.LongTensor(iris['target']), requires_grad=False)
# We'll train on the whole dataset - don't ever do that - but for ilustrating behaviour it's good enough!

This is an example of an autograd function - you can use them to define your own operations!

In [None]:
# a helper function to measure accuracy
def accuracy(logits, y):
    return (logits == y).sum() / y.shape

In [None]:
class MyReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

relu = MyReLU.apply


We can easily check whether GPU is available

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

PyTorch works on dynamic computational graphs. It means that with every operation, the graph is constructed from scratch. It's slower than Tensorflow, but allows for nice things such as loops.

The downside is that models don't infer dimensionality that easily. It can be a pain, especially when building more complex models.

In [None]:
D_in, H, D_out = 4, 10, 3

X_t = X.to(device)
y_t = y.to(device)
w1 = torch.randn(D_in, H, requires_grad=True, device=device)
w2 = torch.randn(H, D_out, requires_grad=True, device=device)

loss_fn = nn.CrossEntropyLoss().to(device)

learning_rate = 1e-2

epochbar = tqdm(range(500))
for t in epochbar:
    
    # Forward pass
    y_pred = relu(X_t @ w1) @ w2
    
    loss = loss_fn(y_pred, y_t.long())

    # Use autograd to compute the backward pass.
    loss.backward()

    # Backward pass
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()
    
    #
    logits = torch.topk(y_pred, 1)[1].data.cpu().numpy().flatten()  
    acc = accuracy(logits, y_t.data.cpu().numpy())
    epochbar.set_description(
            f"epoch: {t} |\t" 
            f"loss: {loss.item()} |\t"
            f"accuracy: {acc}"
        )


## Let's train a network on a more serious dataset - CIFAR-10 !

CIFAR-10 is a very basic dataset, ideal for toying with image recognition. It contains small (32x32) images and 10 classes. There is also a more ambitious version od this dataset, called CIFAR-100.

But first, we need to load the dataset and apply whatever preprocessing we want to, such as transforming images to tensors and normalizing the data. 

PyTorch has utilities for that as well. They can be found in https://pytorch.org/docs/stable/torchvision/index.html

How cool is that?

In [None]:
image_transforms = T.Compose([
    T.ToTensor(), # transforms PIL images to tensors (this includes transposing (H, W, C) -> (C, H, W))
    T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), # image tensors now in range (0, 1)
    T.Lambda(lambda t: t.to(device)) # moves the tensor to the appropriate device
])

label_transforms = T.Compose([
    T.Lambda(lambda t: torch.tensor(t, device=device)) # moves the tensor to the appropriate device
])

In [None]:
cifar_train, cifar_test =  [
    torch_datasets.CIFAR10(
        root="/tmp",
        train=is_train, 
        download=True,
        transform=image_transforms,
        target_transform=label_transforms
    )
    for is_train in [True, False]
]
cifar_train

Is it a bird? Is it a plane?

In [None]:
cifar_names = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck"
]
img, label  = cifar_train[100]
plt.imshow(img.permute(1,2,0).cpu())
plt.title(f"label {label}: {cifar_names[label]}")
plt.show()

Finally, we'll wrap the dataset in a DataLoader, which will help us sample and iterate over it.

In [None]:
loader_train = DataLoader(cifar_train, batch_size=128,shuffle=True)
loader_test = DataLoader(cifar_test, batch_size=128)

In PyTorch you can not only use pre-implemented modules - you can also implement your own. 

The only thing to do is implement the forward pass. 

In [None]:
class Flatten(nn.Module):
    def forward(self, x):
        N, C, H, W = x.size() # read in N, C, H, W
        return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image
      
def conv_block(in_channels: int, out_channels: int):
  return nn.Sequential(
    nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1),
    nn.PReLU(),
    nn.BatchNorm2d(out_channels),
    nn.Dropout(p=0.3),
  )

And this is an easy way to define a model:

Note that you could also create non-sequential connections (like Inception layers) for example by implementing your own modules.

In [None]:
model_base = nn.Sequential( 
    conv_block(3, 32),
    conv_block(32, 32),
    conv_block(32, 32),

    nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),
    conv_block(32, 64),
    conv_block(64, 64),
    conv_block(64, 64),

    nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1),
    conv_block(64, 128),
    conv_block(128, 128),
    conv_block(128, 128),

    nn.Conv2d(128, 128, kernel_size=3, stride=2, padding=1),
    conv_block(128, 256),
    conv_block(256, 256),
    conv_block(256, 256),


    Flatten(),
    nn.Linear(4096, 10),  
            )


Before we train the model, let's see how fast it is!

In [None]:
model_cpu = deepcopy(model_base)
model_gpu = deepcopy(model_base).to(device)
x_cpu = torch.randn(64, 3, 32, 32)
x_gpu = torch.randn(64, 3, 32, 32).to(device)

In [None]:
%timeit ans = model_cpu(x_cpu)

In [None]:
torch.cuda.synchronize() # Make sure there are no pending GPU computations
%timeit ans = model_gpu(x_gpu)        # Feed it through the model! 
torch.cuda.synchronize() # Make sure there are no pending GPU computations

Now let's have fun with the model and train it!

In [None]:
torch.cuda.random.manual_seed(2137)
torch.random.manual_seed(2137)

model = deepcopy(model_base).to(device)

loss_fn = nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam(model.parameters(), lr=6e-4)

How accurate is our model?

In [None]:
for i in range(10):
    # training
    model.train()
    epochbar = tqdm(loader_train)
    accuracies = []
    losses =[]
    for X, y in epochbar:
        y_pred = model(X)
        _, logits = torch.max(y_pred, 1)
        loss = loss_fn(y_pred, y)
        accuracy = (logits == y).sum().item() / y.nelement()
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        accuracies.append(accuracy)
        losses.append(loss.item())
    print(
          f"iter: {i} | " 
          f"train_loss: {np.mean(losses)} | "
          f"train_acc: {np.mean(accuracies)} | "
      )
    
    # testing
    model.eval() # mode where gradients are not computed
    epochbar = tqdm(loader_test)
    accuracies = []
    losses =[]
    for X, y in epochbar:
        y_pred = model(X)
        _, logits = torch.max(y_pred, 1)
        loss = loss_fn(y_pred, y)
        accuracy = (logits == y).sum().item() / y.nelement()
        accuracies.append(accuracy)
        losses.append(loss.item())
    print(
          f"iter: {i} | " 
          f"test_loss: {np.mean(losses)} | "
          f"test_acc: {np.mean(accuracies)} | "
      )