# Introduction

### Installation


1. Install conda. For example in Ubuntu:
<br/>

```bash
curl -O https://repo.anaconda.com/archive/Anaconda3-2018.12-Linux-x86_64.sh
sh Anaconda3-2018.12-Linux-x86_64.sh
```

2. Install conda packages: 
<br/>

```bash
conda install pytorch-cpu torchvision-cpu -c pytorch
```
PyTorch conda package comes automatically with CUDA and cuDNN

### Create tensor

Functions which create tensors in PyTorch are similar to the functions which create ndarray in numpy.  One can also create a torch.tensor from numpy.ndarray and convert torch.tensor to numpy.ndarray easily.

In [None]:
import torch

In [None]:
torch.empty(2, 3)


In [None]:
torch.rand(2, 3)

In [None]:
torch.zeros(2, 3, dtype=torch.long)

In [None]:
torch.tensor([5.5, 3])

In [None]:
import numpy as np
a = np.ones((2,3))
x = torch.from_numpy(a)
x

In [None]:
x.numpy()

torch.tensor always copies data. To avoid copy one can use torch.as_tensor()

In [None]:
x = np.ones((2,3))
b = torch.tensor(x)
x[0][0] = 3
print(x)
print(b)

In [None]:
x = np.ones((2,3))
b = torch.as_tensor(x)
x[0][0] = 3
print(x)
print(b)

### What is tensor

In [None]:
x = torch.tensor([5., 3.], requires_grad=True)
z = sum(x + x)
z.backward()

In [None]:
print (type(z), type(x))

Each tensor consists of many attributes. Some of them are:
* Data of the tensor, which is also tensor itself

In [None]:
x.data

* Parameter which shows if the tensor needs to compute gradients

In [None]:
x.requires_grad

* Gradients of the tensor is of the same size as tensor.data or None

In [None]:
x.grad

* Function of computational graph which computes gradients during backward path

In [None]:
z.grad_fn

* Parameter which shows if the tensor is a leaf of computational graph. The tensor is a leaf if it is created by one of the following methods: 

    * direct initialization of tensors
    * any operations on tensors with require_grad=False
    * .detach() function

In [None]:
z.is_leaf, x.is_leaf

### Compute gradients

In [None]:
x = torch.tensor([1.0, 2.0], requires_grad=True)
y = torch.tensor([3.0, 4.0], requires_grad=True)

In [None]:
x.data, y.data

In [None]:
x.grad, y.grad

In [None]:
z = torch.dot(x, y) * x
print(z)

$\textbf{x}=[x_1, x_2]$
<br/>
$\textbf{y}=[y_1, y_2]$
<br/>
$\textbf{z} = (x_1*y_1+ x_2*y_2)* [x_1, x_2]$
<br/>
Both elements are vectors, and gradient of one vector over another is a matrix (Jacobian):
<br/>
<br/>
$\dfrac{\partial \textbf{z}}{\partial \textbf{x}} =   
\left[ {\begin{array}{cc}
   \dfrac{\partial z_1}{\partial x_1} & \dfrac{\partial z_1}{\partial x_2} \\
   \dfrac{\partial z_2}{\partial x_1} & \dfrac{\partial z_2}{\partial x_2}\\
  \end{array} } \right] $
<br/>
Do we compute Jacobian during backpropagation?

In [None]:
try:
    z.backward()
except RuntimeError as re:
    print("RuntimeError:", re)

The error says that the gradient can be computed only for scalar outputs. Lets make it scalar by summing $z_1$ and $z_2$

In [None]:
z = torch.dot(x, y) * x
z = z.sum()
print (z)

In [None]:
z.backward()
print (x.grad.data, y.grad.data)

backward() function accepts vector as an argument. The transverse Jacobian is multiplied by this vector and the loss function became scalar: 
<br/>
<br/>
\begin{equation}
\label{eq:Jacob}
\tag{1}
\dfrac{\partial \textbf{z}}{\partial \textbf{x}} =   
\left[ {\begin{array}{cc}
   \dfrac{\partial z_1}{\partial x_1} & \dfrac{\partial z_1}{\partial x_2} \\
   \dfrac{\partial z_2}{\partial x_1} & \dfrac{\partial z_2}{\partial x_2}\\
  \end{array} } \right]^T 
  \times
  \left[ {\begin{array}{c}
   \dfrac{\partial l}{\partial z_1} \\
   \dfrac{\partial l}{\partial z_2} \\
  \end{array} } \right] = 
  \left[ {\begin{array}{c}
   \dfrac{\partial l}{\partial x_1} \\
   \dfrac{\partial l}{\partial x_2} \\
  \end{array} } \right]
 \end{equation}
<br/>
The following expression should give the same result as the former:

In [None]:
z = torch.dot(x, y) * x
v = torch.tensor([1., 1.])
z.backward(v)
print (x.grad.data, y.grad.data)

Gradients are not the same! The reason is that gradients are always accumulated. We need to set gradients to zero before the next cast of backward() function:

In [None]:
x.grad.zero_()
y.grad.zero_()
z = torch.dot(x, y) * x
v = torch.tensor([1., 1.])
z.backward(v)
print (x.grad.data, y.grad.data)

# Linear regression

In [None]:
import matplotlib.pyplot as plt

In [None]:
LEARNING_RATE = 1e-2

x = torch.tensor([1, 2, 3, 4, 5],
                     dtype=torch.float32)
y = torch.tensor([5, 6, 8, 9, 10],
                     dtype=torch.float32)

w = torch.tensor([1], requires_grad=True,
                     dtype=torch.float32)
b = torch.tensor([1], requires_grad=True,
                     dtype=torch.float32)

for i in range(1000):
    y_pred = w*x + b
    z = sum((y_pred - y)**2)
    z.backward()
    w.data -= LEARNING_RATE * w.grad.data
    b.data -= LEARNING_RATE * b.grad.data
    w.grad.data.zero_()
    b.grad.data.zero_()

In [None]:
plt.plot(x.numpy(), y.numpy(), 'o')
plt.plot(x.numpy(), y_pred.detach().numpy(), '-')
plt.show()
print("w =", w.data.numpy()[0],"; b =", b.data.numpy()[0])

### Use optimizer

One can use optimizers from torch.optim instead of updating weights manually. Optimizers accepts weights with gradients as parameter and then make step() which is updating weights of model based on the gradients. In this case the training will look like:

In [None]:
x = torch.tensor([1, 2, 3, 4, 5],
                     dtype=torch.float32)
y = torch.tensor([5, 6, 8, 9, 10],
                     dtype=torch.float32)

w = torch.tensor([1], requires_grad=True,
                     dtype=torch.float32)
b = torch.tensor([1], requires_grad=True,
                     dtype=torch.float32)

optimizer = torch.optim.SGD([w,b],
                            lr=1e-2)

for i in range(1000):
    y_pred = w*x + b
    z = y_pred - y
    z.backward(gradient=2*z)
    optimizer.step()
    optimizer.zero_grad()

Note, that here we do not compute mean square. Instead we use vector $v = [2 z_1, 2 z_2]$ to reduce Jacobian to vector according to equation (1)

In [None]:
plt.plot(x.numpy(), y.numpy(), 'o')
plt.plot(x.numpy(), y_pred.detach().numpy(), '-')
plt.show()

print("w =", w.data.numpy()[0],"; b =", b.data.numpy()[0])

All optimizers accept forward and backward pass function as an argument. Though, SGD and Adam optimizers will work without it, for some of optimizers the closure() function is required. The creation of computational graph as well as backward pass should be given in step() function as a parameter. For example, the former construction will not work with LBFGS optimizer:

In [None]:
try:
    optimizer = torch.optim.LBFGS([w,b], lr=LEARNING_RATE)

    for i in range(1000):
        y_pred = w*x + b
        z = y_pred - y
        z.backward(gradient=2*z) # << d sum((dy)**2)
        optimizer.step()
        optimizer.zero_grad()
except TypeError as te:
    print ("TypeError:", te)

So one needs to pass the closure to optimizer inside the loop. We will also use tqdm tool to track the progress.

In [None]:
def tt(*x): return torch.tensor(x, dtype=torch.float32)
def tw(*x): return torch.tensor(x, requires_grad=True, dtype=torch.float32)

x = tt(1, 2, 3, 4, 5)
y = tt(5, 6, 8, 9, 10)
w = tw(1)
b = tw(1)

optimizer = torch.optim.LBFGS([w,b], lr=LEARNING_RATE)
from tqdm import tqdm_notebook as tqdm
from time import sleep
def closure():
    optimizer.zero_grad()
    y_pred = w*x + b
    z = sum((y_pred - y)**2)
    z.backward()
    return z

#t = trange(50, leave=False)
t = tqdm(range(50), desc='Training Loss', leave=True)
for i in t:
    z = optimizer.step(closure)
    t.set_description('Training Loss: %.2g' % z)

In [None]:
plt.plot(x.numpy(), y.numpy(), 'o')
plt.plot(x.numpy(), y_pred.detach().numpy(), '-')
plt.show()

print("w =", w.data.numpy()[0],"; b =", b.data.numpy()[0])

# Quality control of Tic-Tac pills production

In [None]:
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import numpy as np

In [None]:
# parameters
VALIDATION_SPLIT = 0.2
INPUT_SIZE = 1
NUM_EPOCHS = 2
BATCH_SIZE = 128
LEARNING_RATE = 1e-4

### Build a model

We will use pre-trained neural network, AlexNet, which was developed to classify images in ImageNet contest. The torchvision library provides pretrained models which are easy to use. The model weights will be downloaded automatically.

In [None]:
alexnet = torchvision.models.alexnet(pretrained=True)
print(alexnet)

The model consists of two parts: "features" , which extracts features from image and "classifier" which classify images based on the extracted features. The classifier is created to classify between 1000 different classes. To adopt AlexNet to our dataset, we firstly fix parameters of "features" part, and secondly will modify classifier for classification between two classes.

In [None]:
# do not compute gradients for features parameters
for param in alexnet.features.parameters():
    param.require_grad = False


In [None]:
# replace the model classifier
alexnet.classifier = nn.Sequential(*[nn.Dropout(p=0.5),
                                     nn.Linear(9216, 1000),
                                     nn.ReLU(),
                                     nn.Linear(1000, 1),
                                     nn.Sigmoid()])


### Load Dataset

The dataset consists of about 40000 images 160x160 of valid Tic Tac pills and 1000 images of broken Tic-Tac pills. The dataset can be downloaded from https://goo.gl/CWmLWD .

![alt text](pills.png)

We first create transformations and then load dataset using predefinded transformations:

In [None]:
# create transformations
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
data_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    normalize])

# load dataset
dataset = torchvision.datasets.ImageFolder('tictac_dataset/',
                                            transform=data_transform)


The AlexNet is trained on the colored images which were normalized by each of 3 image chanels to have certain mean and standard deviation. We normalize pills that they will have the same values. Note that pills images are grayscale, though we still load them having 3 channels.
<br/>
<br/>
On this stage, no images loaded in RAM. Instead, dataset stores classes ids and pathes to images:

In [None]:
print(dataset.class_to_idx)
print (dataset.imgs[0:3])

To split dataset to training and test, we can use PyTorch utils

In [None]:
n = len(dataset)
m = int(VALIDATION_SPLIT*n)
Train, Test = torch.utils.data.random_split(dataset, [n - m, m])

We also create function which return WeightedRandomSampler which balances classes of our dataset

In [None]:
def make_balance_sampler(dataset):
    l = [t[1] for t in dataset]
    num_defect = sum(l)
    frac = num_defect/len(l)
    weight = np.array([frac, 1-frac])
    samples_weight = np.array([weight[t] for t in l])
    return torch.utils.data.sampler.WeightedRandomSampler(samples_weight, len(samples_weight))


The next step is to create generators of batches. These generators can be iterated in the loop and will return BATCH_SIZE number of images with labels 0-correct 1-defect. Note that after this step no images are loaded in RAM, the loading of images happens batch-wise durich iteration of the generators. pin_memory argument allows to allocate memory on GPU to fasten transfer between RAM and GPU memory.

In [None]:
train_sampler = make_balance_sampler(Train)
train_loader = torch.utils.data.DataLoader(Train, batch_size=BATCH_SIZE, 
                                           sampler=train_sampler, pin_memory=False)
test_sampler = make_balance_sampler(Test)
test_loader = torch.utils.data.DataLoader(Test, batch_size=BATCH_SIZE,
                                          sampler=test_sampler, pin_memory=False)

### CUDA in PyTorch 

PyTorch behaviour differs from  Keras. CUDA is not used by default, one needs to specify what is running on CUDA. Though, the way to do it is very simple. One needs to inialize tensor on CUDA, then all operations on this tensor will be performed on GPU. In the following string we specify how to use CUDA automatically if available:

In [None]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")
if use_cuda:
    alexnet.cuda()

### Training of neural network
We use Adam optimizer with initial small learning rate. We pass only alexnet.classifier.parameters(), as this is the part we want to train.

In [None]:
# initialize weights and optimizer
optimizer = torch.optim.Adam(alexnet.classifier.parameters(), lr=LEARNING_RATE)

AlexNet is an instance of nn.Module. All models which are created as class by inhereting from nn.Module has two modes: "train" and "eval". The default is train=True. This parameters are needed for some layers, such as BatchNormalization or Dropout, which behave differently during evaluation and training. This way we set alexnet.classifier to training mode:

In [None]:
alexnet.classifier = alexnet.classifier.train(True)
alexnet.features = alexnet.features.eval()

The training loop will look similar to linear regression. The only difference is that we need to push torch tensors to GPU. The key non_blocking is added for assynchronization of GPU, which makes training a little bit faster.

In [None]:
optimizer = torch.optim.Adam(alexnet.classifier.parameters(), lr=LEARNING_RATE)
for epoch in tqdm(range(NUM_EPOCHS), desc='Epochs'): 
    loss_mini_batch = torch.tensor(0., device=device)
    count = torch.tensor(0., device=device)
    t = tqdm(enumerate(train_loader))
    for i, (images, labels) in t:
        if use_cuda:
            labels = labels.view(-1,1).to(device, dtype=torch.float32,
                                          non_blocking=True)
            images = images.to(device, non_blocking=True)
        else:
            labels = labels.view(-1,1).to(dtype=torch.float32)
        outputs = alexnet(images)
        loss = nn.functional.binary_cross_entropy(outputs, labels)
        loss.backward()
        optimizer.step() 
        optimizer.zero_grad()
        loss_mini_batch += loss.data
        count += torch.tensor(1., device=device)
        t.set_description('Training Loss: %.2g' % (loss_mini_batch/count).data)

### Speed up CUDA computations
In order to speed up training of the model, one can save images to the hard drive as cuda-tensors. That will improve speed of training in almost two times. One can use the following script to transfer images:

```python
from PIL import Image
import os
import multiprocessing as mp
import torch
import torchvision.transforms as transforms


normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
data_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    normalize])


def convert_jpg2bmp(x):
    infname, outdir = x[0], x[1]
    outname = os.path.join(os.path.basename(os.path.dirname(infname)),
                           os.path.basename(infname))
    outname = os.path.join(outdir, outname)
    outname = os.path.splitext(outname)[0] + '.pth'
    img = Image.open(infname)
    tens = data_transform(img)
    tens = tens.to(device='cuda')
    torch.save(tens, outname)

outdir = 'tictac_cuda'

# get list of images
lifiles = []
liclasses =  [name for name in os.listdir("tictac_dataset")]
for cli in liclasses:
    indir = os.path.join('tictac_dataset', cli)
    lii = [os.path.join(indir, name) for name in os.listdir(indir) if name.endswith(".jpg")]
    lifiles = lifiles + lii

# create directories
os.makedirs(outdir, exist_ok=True)
for cli in liclasses:
    diri = os.path.join(outdir, cli)
    os.makedirs(diri, exist_ok=True)

# transfer images to pickled tensors
pool = mp.Pool(processes=6)
liparams = [[i, outdir] for i in lifiles]
pool.map(convert_jpg2bmp, liparams)
pool.close()
```

Note that it will take about half an hour on 6 CPU to transform all images. Afterward, one can create DataLoaders again with new images and train the model

In [None]:
# load dataset
dataset = torchvision.datasets.DatasetFolder('tictac_cuda/', loader=torch.load, extensions='.pth')

n = len(dataset)
m = int(VALIDATION_SPLIT*n)
Train, Test = torch.utils.data.random_split(dataset, [n - m, m])

train_sampler = make_balance_sampler(Train)
test_sampler = make_balance_sampler(Test)

train_loader = torch.utils.data.DataLoader(Train, batch_size=BATCH_SIZE, 
                                           sampler=train_sampler)

test_loader = torch.utils.data.DataLoader(Test, batch_size=BATCH_SIZE,
                                          sampler=test_sampler)

In [None]:
from tqdm import tqdm_notebook as tqdm

for epoch in tqdm(range(NUM_EPOCHS), desc='Epochs'):
    
    loss_mini_batch = torch.tensor(0., device=device)
    count = torch.tensor(0., device=device)
    
    t = tqdm(enumerate(train_loader))
    for i, (images, labels) in t:
        labels = labels.view(-1,1).to(device, dtype=torch.float32, non_blocking=True)
        images = torch.cat([images]*3, dim=1)
        outputs = alexnet(images)
        loss = nn.functional.binary_cross_entropy(outputs, labels)
        loss.backward()
        optimizer.step() 
        optimizer.zero_grad()
        
        loss_mini_batch += loss.data
        count += torch.tensor(1., device=device)
        t.set_description('Training Loss: %.2f' % (loss_mini_batch/count).data)

### Save and load model

In [None]:
torch.save({
            'model_state_dict': alexnet.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            }, "alexnetstate_1.pth")

It is worth to check if the state of optimizer is saved properly by loading it after saving:

In [None]:
# need to load first model and optimizer again
alexnet = torchvision.models.alexnet(pretrained=True)
alexnet.classifier = alexnet.classifier.train(True)
alexnet.features = alexnet.features.eval()
for param in alexnet.features.parameters():
    param.require_grad = False
alexnet.classifier = nn.Sequential(*[nn.Dropout(p=0.5),
                                     nn.Linear(9216, 1000),
                                     nn.ReLU(),
                                     nn.Linear(1000, 1),
                                     nn.Sigmoid()])
if use_cuda:
    alexnet.cuda()
optimizer = torch.optim.Adam(alexnet.classifier.parameters())

# then load stat of model and optimizer
checkpoint = torch.load("alexnetstate_1.pth")
alexnet.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

### Model evaluation

In [None]:
# Test the Model
li_proba = []
li_labels = []
alexnet = alexnet.eval()
for images, labels in test_loader:
    labels = labels.to(device, dtype=torch.float32, non_blocking=True)
    images = torch.cat([images]*3, dim=1)
    outputs = alexnet(images)
    proba = outputs.data.cpu().numpy()
    li_proba.extend(proba)
    li_labels.extend(labels.data.cpu().numpy())

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_curve
import pylab as plt
# confusion matrix
li_pred = np.where(np.array(li_proba)>0.6, 1, 0)
confusion_matrix(li_labels, li_pred)

In [None]:
# draw precision-recall curve
precision, recall, thr = precision_recall_curve(li_labels, li_proba)

plt.plot(recall, precision)
plt.xlabel('recall')
plt.ylabel('precision')
plt.show()