# Deep learning for computer vision


This notebook will teach you to build and train convolutional networks for image recognition. Brace yourselves.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_DL/blob/spring20/seminar3/seminar3_pytorch.ipynb)

# Tiny ImageNet dataset
This week, we shall focus on the image recognition problem on Tiny Image Net dataset
* 100k images of shape 3x64x64
* 200 different classes: snakes, spaiders, cats, trucks, grasshopper, gull, etc.


In [1]:
#!L
import torchvision
import torch
from torchvision import transforms

In [2]:
#!S:bash
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1UksGhGn63aQLAfGrAkGzdx69U6waEHPR' -O tinyim3.png
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=19qsD0o7pfAI8UYxgDY18sdRjV0Aantn2' -O tiny_img.py
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=12IrLjz8pss4284xsBAJt6CW6yELPH4tL' -O tiniim.png

--2022-03-28 15:09:49--  https://docs.google.com/uc?export=download&id=1UksGhGn63aQLAfGrAkGzdx69U6waEHPR
Resolving docs.google.com (docs.google.com)... 142.251.6.101, 142.251.6.102, 142.251.6.100, ...
Connecting to docs.google.com (docs.google.com)|142.251.6.101|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-0k-6s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/5cth2rj1hq232q0hr878g5qo9li8r1t2/1648480125000/01961971800886548445/*/1UksGhGn63aQLAfGrAkGzdx69U6waEHPR?e=download [following]
--2022-03-28 15:09:49--  https://doc-0k-6s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/5cth2rj1hq232q0hr878g5qo9li8r1t2/1648480125000/01961971800886548445/*/1UksGhGn63aQLAfGrAkGzdx69U6waEHPR?e=download
Resolving doc-0k-6s-docs.googleusercontent.com (doc-0k-6s-docs.googleusercontent.com)... 142.250.148.132, 2607:f8b0:4001:c54::84
Connecting to doc-0k-6s-docs.googleusercontent.com (doc-0k-6s-docs.googleu

In [4]:
#!L
from tiny_img import download_tinyImg200
data_path = '.'
download_tinyImg200(data_path)

./tiny-imagenet-200.zip


In [5]:
#!L
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transforms.ToTensor())
test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', transform=transforms.ToTensor())
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])
test_dataset, val_dataset = torch.utils.data.random_split(val_dataset, [10000, 10000])

In [6]:
#!L
batch_size = 50
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [7]:
#!L
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

## Image examples ##



<tr>
    <td> <img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/tinyim3.png?raw=1" alt="Drawing" style="width:90%"/> </td>
    <td> <img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/tinyim2.png?raw=1" alt="Drawing" style="width:90%"/> </td>
</tr>


<tr>
    <td> <img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/tiniim.png?raw=1" alt="Drawing" style="width:90%"/> </td>
</tr>

# Building a network

Simple neural networks with layers applied on top of one another can be implemented as `torch.nn.Sequential` - just add a list of pre-built modules and let it train.

In [8]:
#!L
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


Let's start with a dense network for our baseline:

In [9]:
#!L
model = nn.Sequential()

# reshape from "images" to flat vectors
model.add_module('flatten', nn.Flatten())

# dense "head"
model.add_module('dense1', nn.Linear(3 * 64 * 64, 1064))
model.add_module('dense2', nn.Linear(1064, 512))
model.add_module('dropout0', nn.Dropout(0.05)) 
model.add_module('dense3', nn.Linear(512, 256))
model.add_module('dropout1', nn.Dropout(0.05))
model.add_module('dense4', nn.Linear(256, 64))
model.add_module('dropout2', nn.Dropout(0.05))
model.add_module('dense1_relu', nn.ReLU())
model.add_module('dense2_logits', nn.Linear(64, 200)) # logits for 200 classes


if torch.cuda.is_available():
    device = torch.device('cuda:0')
else:
    device = torch.device('cpu')
model.to(device)
device

device(type='cuda', index=0)

As in our basic tutorial, we train our model with negative log-likelihood aka crossentropy.

In [10]:
#!L
def compute_loss(X_batch, y_batch):
    X_batch = torch.FloatTensor(X_batch).to(device=device)
    y_batch = torch.LongTensor(y_batch).to(device=device)
    logits = model.to(device)(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

### Training on minibatches
* We got 100k images, that's way too many for a full-batch SGD. Let's train on minibatches instead
* Below is a function that splits the training sample into minibatches

In [11]:
opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

In [None]:
import numpy as np
from tqdm import tqdm

opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

num_epochs = 5 # total amount of full passes over training data

import time

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in tqdm(train_batch_gen):
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

Don't wait for full 100 epochs. You can interrupt training after 5-20 epochs once validation accuracy stops going up.
```
```

### Final test

In [None]:
model.train(False) # disable dropout / use averages for batch_norm
test_batch_acc = []
for X_batch, y_batch in val_batch_gen:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    test_batch_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))


test_accuracy = np.mean(test_batch_acc)
    
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 70:
    print("U'r freakin' amazin'!")
elif test_accuracy * 100 > 50:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 40:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 20:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

## Task I: small convolution net
### First step

Let's create a mini-convolutional network with roughly such architecture:
* Input layer
* 3x3 convolution with 128 filters and _ReLU_ activation
* 2x2 pooling (or set previous convolution stride to 3)
* Flatten
* Dense layer with 1024 neurons and _ReLU_ activation
* 30% dropout
* Output dense layer.


__Convolutional layers__ in torch are just like all other layers, but with a specific set of parameters:

__`...`__

__`model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3)) # convolution`__

__`model.add_module('pool1', nn.MaxPool2d(2)) # max pooling 2x2`__

__`...`__


Once you're done (and compute_loss no longer raises errors), train it with __Adam__ optimizer with default params (feel free to modify the code above).

If everything is right, you should get at least __16%__ validation accuracy.

__HACK_OF_THE_DAY__ :the number of channels must be in the order of the number of class_labels

### Before we start:
**Stride, Padding and Kernel_size**

In [None]:
from IPython.display import Image
Image(url='https://deeplearning.net/software/theano/_images/numerical_padding_strides.gif')  

In [None]:
model = nn.Sequential(
    nn.Conv2d(3, 128, kernel_size=3, stride=3),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(56448, 1024),
    nn.ReLU(),
    nn.Dropout(),
    nn.Linear(1024, 200),
)

In [None]:
opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

In [None]:
from torchsummary import summary

summary(model.cuda(), (3, 64, 64))

## retrain it ##

In [None]:
import time
from tqdm import tqdm

num_epochs = 10 # total amount of full passes over training data
batch_size = 50  # number of samples processed in one SGD iteration


for epoch in range(num_epochs):
    print (num_epochs)
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in tqdm(train_batch_gen):
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy())
    print (num_epochs)    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print (num_epochs)
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

```

```

```

```

```

```

```

```

```

```

__Hint:__ If you don't want to compute shapes by hand, just plug in any shape (e.g. 1 unit) and run compute_loss. You will see something like this:

__`RuntimeError: size mismatch, m1: [5 x 1960], m2: [1 x 64] at /some/long/path/to/torch/operation`__

See the __1960__ there? That's your actual input shape.

## Task 2: adding normalization

* Add batch norm (with default params) between convolution and ReLU
  * nn.BatchNorm*d (1d for dense, 2d for conv)
  * usually better to put them after linear/conv but before nonlinearity
* Re-train the network with the same optimizer, it should get at least 20% validation accuracy at peak.

To know more about **batch_norm** and **data covariate shift**

https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c

https://www.youtube.com/watch?v=nUUqwaxLnWs

In [12]:
model = nn.Sequential(
    nn.Conv2d(3, 128, kernel_size=3, stride=3),
    nn.BatchNorm2d(128),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(56448, 1024),
    nn.BatchNorm1d(1024),
    nn.ReLU(),
    nn.Dropout(),
    nn.Linear(1024, 200),
)

#decribe conv net with batchnorm here

In [13]:
opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

In [14]:
import time
import numpy as np
from tqdm import tqdm

num_epochs = 10 # total amount of full passes over training data
batch_size = 50  # number of samples processed in one SGD iteration


for epoch in range(num_epochs):
    print (num_epochs)
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in tqdm(train_batch_gen):
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy())
    print (num_epochs)    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print (num_epochs)
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

10


100%|██████████| 1600/1600 [01:04<00:00, 24.84it/s]

10





10
Epoch 1 of 10 took 70.523s
  training loss (in-iteration): 	4.722553
  validation accuracy: 			12.00 %
10


100%|██████████| 1600/1600 [01:02<00:00, 25.80it/s]

10





10
Epoch 2 of 10 took 68.055s
  training loss (in-iteration): 	4.081126
  validation accuracy: 			14.49 %
10


100%|██████████| 1600/1600 [01:01<00:00, 25.98it/s]

10





10
Epoch 3 of 10 took 67.644s
  training loss (in-iteration): 	3.509788
  validation accuracy: 			15.40 %
10


100%|██████████| 1600/1600 [01:01<00:00, 25.85it/s]

10





10
Epoch 4 of 10 took 67.865s
  training loss (in-iteration): 	2.888089
  validation accuracy: 			15.44 %
10


100%|██████████| 1600/1600 [01:01<00:00, 26.01it/s]

10





10
Epoch 5 of 10 took 67.512s
  training loss (in-iteration): 	2.268419
  validation accuracy: 			14.18 %
10


100%|██████████| 1600/1600 [01:01<00:00, 25.82it/s]

10





10
Epoch 6 of 10 took 68.110s
  training loss (in-iteration): 	1.722274
  validation accuracy: 			13.89 %
10


100%|██████████| 1600/1600 [01:02<00:00, 25.54it/s]

10





10
Epoch 7 of 10 took 68.886s
  training loss (in-iteration): 	1.277575
  validation accuracy: 			13.80 %
10


100%|██████████| 1600/1600 [01:01<00:00, 25.86it/s]

10





10
Epoch 8 of 10 took 67.839s
  training loss (in-iteration): 	0.932354
  validation accuracy: 			11.90 %
10


100%|██████████| 1600/1600 [01:01<00:00, 25.85it/s]

10





10
Epoch 9 of 10 took 67.856s
  training loss (in-iteration): 	0.677347
  validation accuracy: 			12.35 %
10


100%|██████████| 1600/1600 [01:03<00:00, 25.38it/s]

10





10
Epoch 10 of 10 took 69.507s
  training loss (in-iteration): 	0.499378
  validation accuracy: 			12.66 %




```

```

```

```

```
## Task 3: Data Augmentation

** Augmenti - A spell used to produce water from a wand (Harry Potter Wiki) **

<img src="https://github.com/yandexdataschool/Practical_DL/blob/sem3spring2019/week03_convnets/HagridsHut_PM_B6C28_Hagrid_sHutFireHarryFang.jpg?raw=1" style="width:80%">

There's a powerful torch tool for image preprocessing useful to do data preprocessing and augmentation.

Here's how it works: we define a pipeline that
* makes random crops of data (augmentation)
* randomly flips image horizontally (augmentation)
* then normalizes it (preprocessing)

When testing, we don't need random crops, just normalize with same statistics.

In [15]:
import torchvision
from torchvision import transforms

means = np.random.rand(3)
std = np.random.rand(3)

transform_augment = transforms.Compose([
                                        transforms.RandomRotation([-15, 15]),
                                        transforms.RandomHorizontalFlip(),
                                        transforms.ToTensor(),
                                        transforms.Normalize(means, std)                                       
])

In [16]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transform_augment)

In [17]:
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [90000, 10000])

In [18]:
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [19]:
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [20]:
import time
num_epochs = 10 # total amount of full passes over training data
batch_size = 50  # number of samples processed in one SGD iteration


for epoch in range(num_epochs):
    print (num_epochs)
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in tqdm(train_batch_gen):
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.cpu().numpy())
    print (num_epochs)    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print (num_epochs)
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

10


100%|██████████| 1800/1800 [01:41<00:00, 17.75it/s]

10





10
Epoch 1 of 10 took 110.990s
  training loss (in-iteration): 	4.536655
  validation accuracy: 			18.90 %
10


100%|██████████| 1800/1800 [01:41<00:00, 17.69it/s]

10





10
Epoch 2 of 10 took 111.398s
  training loss (in-iteration): 	4.107036
  validation accuracy: 			21.72 %
10


100%|██████████| 1800/1800 [01:42<00:00, 17.62it/s]

10





10
Epoch 3 of 10 took 111.714s
  training loss (in-iteration): 	3.922238
  validation accuracy: 			22.79 %
10


100%|██████████| 1800/1800 [01:41<00:00, 17.77it/s]

10





10
Epoch 4 of 10 took 110.840s
  training loss (in-iteration): 	3.803493
  validation accuracy: 			22.39 %
10


100%|██████████| 1800/1800 [01:41<00:00, 17.76it/s]

10





10
Epoch 5 of 10 took 110.945s
  training loss (in-iteration): 	3.716403
  validation accuracy: 			23.40 %
10


100%|██████████| 1800/1800 [01:41<00:00, 17.77it/s]

10





10
Epoch 6 of 10 took 110.778s
  training loss (in-iteration): 	3.635981
  validation accuracy: 			22.52 %
10


100%|██████████| 1800/1800 [01:41<00:00, 17.82it/s]

10





10
Epoch 7 of 10 took 110.515s
  training loss (in-iteration): 	3.577114
  validation accuracy: 			23.02 %
10


100%|██████████| 1800/1800 [01:41<00:00, 17.71it/s]

10





10
Epoch 8 of 10 took 111.213s
  training loss (in-iteration): 	3.532259
  validation accuracy: 			24.10 %
10


100%|██████████| 1800/1800 [01:40<00:00, 17.83it/s]

10





10
Epoch 9 of 10 took 110.460s
  training loss (in-iteration): 	3.488895
  validation accuracy: 			22.18 %
10


100%|██████████| 1800/1800 [01:41<00:00, 17.80it/s]

10





10
Epoch 10 of 10 took 110.609s
  training loss (in-iteration): 	3.437845
  validation accuracy: 			23.65 %


We need for test data __only normalization__, not cropping and rotation

In [47]:
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(means, std), #normalize by channel. all value along the channel have mean and deviation
])

test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', transform=transform_test)
test_batch_gen = torch.utils.data.DataLoader(test_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

test_accuracy = []

model.eval()
for X_batch, y_batch in test_batch_gen:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    print(y_batch)
    test_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))
print("Test accuracy: ", np.mean(test_accuracy))

FileNotFoundError: ignored

In [46]:
rm -rf */*.ipynb_checkpoints

## The Quest For A Better Network

See `practical_dl/homework02` for a full-scale assignment.