# CNN classifier

Congratulations, here's your first homework! You'll learn the art of training deep image classifiers. You might remember `03 seminar` about training CIFAR10 classifier, this homework is also about training **CIFAR10 classifier**, but this time you'll have to do it on your own and with some extra features.

## Data
Your dataset is CIFAR10. Check out `03 seminar` on how to load train and val data splits.

**Note:** for training you can only use `train` dataset.

## Game rules:
Maximum score you can get for this task is **10.0**.

Half of 10 points you can get by reaching high val accuracy (as listed in table below):

- accuracy > 60.0 -> **1 point**
- accuracy > 70.0 -> **2 points**
- accuracy > 80.0 -> **3 points**
- accuracy > 90.0 -> **4 points**
- accuracy > 92.5 -> **5 points**

Another half of 10 points you can get by adding following features to your training pipeline. It's okay if you see some technics for the first time (that was the idea). Feel free to google and dive into topic on your own, it's homework after all:
1. Data augmentations. Check out [this article](https://medium.com/nanonets/how-to-use-deep-learning-when-you-have-limited-data-part-2-data-augmentation-c26971dc8ced) (**1 point**)
2. [LR schedule](https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.ReduceLROnPlateau) (**0.5 point**)
3. Finetune pretrained model from [torchvision.models](https://pytorch.org/docs/stable/torchvision/models.html) (except AlexNet!) (**1 point**)
4. Implement [ResNet model](https://medium.com/@14prakash/understanding-and-implementing-architectures-of-resnet-and-resnext-for-state-of-the-art-image-cf51669e1624) (**2 points**)
5. Use of [tensorboardX](https://github.com/lanpa/tensorboardX) to monitor training process (**0.5 points**)

As a result you have to submit **notebook with working code** (results will be reproduced during homework cheking) and **short report** (write it in the same notebook) about things you tried and what tasks you managed to implement. Good luck and have fun!

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn as nn

from tqdm import tqdm_notebook as tqdm
from tensorboardX import SummaryWriter
from apex import amp

from datetime import datetime
import os

In [2]:
batch_size = 16
device = torch.device('cuda')#torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

transform = transforms.Compose([
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

transform_random = transforms.Compose([
    transforms.RandomChoice([transforms.RandomCrop(28),
                             transforms.RandomHorizontalFlip(p=0.75),
                             transforms.RandomAffine(15)
                            ]),
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [3]:
train_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=True,
    download=True,
    transform=transform_random
)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

print("len(train_dataset) =", len(train_dataset))

val_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=transform
)
val_dataloader= torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

print("len(val_dataset) =", len(val_dataset))

Files already downloaded and verified
len(train_dataset) = 50000
Files already downloaded and verified
len(val_dataset) = 10000


In [4]:
model = torchvision.models.resnet101(num_classes=10).to(device)

In [10]:
#for feature in model.parameters():
#    feature.requires_grad = False

model.fc = nn.Sequential(nn.Linear(2048, 512), nn.ReLU(inplace=True), nn.Linear(512, 10)).to(device)

In [11]:
#model.half()  # convert to half precision
#for layer in model.modules():
#  if isinstance(layer, nn.BatchNorm2d):
#    layer.float()

In [12]:
criterion = nn.CrossEntropyLoss().to(device)
opt = optim.Adam(model.parameters(), lr=0.0001)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(opt, factor=0.1, patience=4)

In [13]:
model, opt = amp.initialize(model, opt, opt_level='O1')

Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic


In [8]:
experiment_title = 'resnet101_fp16_batch16'
experiment_name = "{}@{}".format(experiment_title, datetime.now().strftime("%d.%m.%Y-%H:%M:%S"))
writer = SummaryWriter(log_dir=os.path.join("./tb_bench", experiment_name))

In [9]:
#checkpoint = torch.load('checkpoints/resnet50_aug_adam_v2_6epochs.pth')
#model.load_state_dict(checkpoint['model'])
#opt.load_state_dict(checkpoint['optimizer'])

FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/resnet50_aug_adam_v2_6epochs.pth'

In [14]:
n_epochs = 40
n_epochs_init = 0
n_iters_total = 0

for epoch in range(n_epochs_init, n_epochs):
    total_train_loss = 0
    total_val_loss = 0
    correct = 0
    
    model.train()
    for batch in tqdm(train_dataloader):
        # unpack batch
        image_batch, label_batch = batch
        image_batch, label_batch = image_batch.cuda(), label_batch.cuda()
        
        # forward
        outputs = model(image_batch)
        loss = criterion(outputs, label_batch)
        total_train_loss += loss.item()
        
        # optimize
        opt.zero_grad()
        with amp.scale_loss(loss, opt) as scaled_loss:
            scaled_loss.backward()
        opt.step()
        # dump statistics
        writer.add_scalar("train/loss", loss.item(), global_step=n_iters_total)
        
        n_iters_total += 1
        
    print("Epoch {} done, total train loss {}.".format(epoch, total_train_loss / len(train_dataset)))
    
    model.eval()
    with torch.no_grad():
        val_n = 0
        for batch in tqdm(val_dataloader):
            image_batch, label_batch = batch
            image_batch, label_batch = image_batch.to(device).half(), label_batch.to(device)
            outputs = model(image_batch)
            loss = criterion(outputs, label_batch)
            total_val_loss += loss
            predicted = torch.argmax(outputs, dim=1)
            correct += (predicted == label_batch).sum().item()
            writer.add_scalar("val/loss", loss.item(), global_step=n_iters_total+val_n)
            val_n += 1
    print("Accuracy {:.4}%, total val loss {}".format(100 * correct / len(val_dataset), total_val_loss / len(val_dataset)))
    
    scheduler.step(total_val_loss)
    
    if epoch % 2 == 0:
        torch.save({'model': model.state_dict(), 
                    'optimizer': opt.state_dict(), 
                    'epoch': epoch,
                    'iter_num': n_iters_total,
                    'loss': loss,
                    'accuracy': 100 * correct / len(val_dataset)
                   }, 
                   'checkpoints/resnet101_aug_adam_fp16_batch16_{}epochs.pth'.format(epoch))

HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0

Epoch 0 done, total train loss 0.10452983474016189.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 52.52%, total val loss 0.08082591742277145


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))


Epoch 1 done, total train loss 0.07668277584314347.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 66.75%, total val loss 0.05811149254441261


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))


Epoch 2 done, total train loss 0.058394878824353215.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 73.63%, total val loss 0.04822395741939545


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 3 done, total train loss 0.04799957282066345.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 77.65%, total val loss 0.04038304463028908


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 4 done, total train loss 0.041535085111558435.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 81.75%, total val loss 0.03358113393187523


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 5 done, total train loss 0.0369346557494998.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 83.0%, total val loss 0.03171227499842644


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 6 done, total train loss 0.033328669177889826.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 84.36%, total val loss 0.02926837094128132


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 7 done, total train loss 0.030126266874372958.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 85.14%, total val loss 0.028081480413675308


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 8 done, total train loss 0.02782021324902773.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 85.25%, total val loss 0.027215629816055298


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 9 done, total train loss 0.02556989475876093.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 86.38%, total val loss 0.025397149845957756


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 10 done, total train loss 0.023579941663742065.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 86.79%, total val loss 0.02379746176302433


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 11 done, total train loss 0.021961733056902886.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 88.22%, total val loss 0.021918723359704018


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 12 done, total train loss 0.02036060981720686.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 87.5%, total val loss 0.02297026664018631


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 13 done, total train loss 0.01927444676145911.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 88.79%, total val loss 0.02121463418006897


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 14 done, total train loss 0.017948158656805754.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 88.98%, total val loss 0.020272787660360336


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 15 done, total train loss 0.01681070318996906.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 88.77%, total val loss 0.02080303058028221


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 16 done, total train loss 0.015698441453874112.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 89.86%, total val loss 0.018597465008497238


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 17 done, total train loss 0.014968456691503524.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 89.62%, total val loss 0.019915539771318436


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 18 done, total train loss 0.014009386021494865.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.01%, total val loss 0.01913630962371826


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 19 done, total train loss 0.013273807274252177.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 89.96%, total val loss 0.019078906625509262


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 20 done, total train loss 0.012625008541941643.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.65%, total val loss 0.018172679468989372


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 21 done, total train loss 0.011920767685770989.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 89.43%, total val loss 0.020449506118893623


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 22 done, total train loss 0.01135803699463606.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.84%, total val loss 0.01751432754099369


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 23 done, total train loss 0.010851656887084246.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.92%, total val loss 0.017905648797750473


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 24 done, total train loss 0.010126942675560713.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.24%, total val loss 0.02033935859799385


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 25 done, total train loss 0.010032594983130694.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.62%, total val loss 0.018118714913725853


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 26 done, total train loss 0.00915913570806384.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.8%, total val loss 0.018028641119599342


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 27 done, total train loss 0.009100667060017586.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 90.98%, total val loss 0.017968079075217247


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0

Epoch 28 done, total train loss 0.004558365713208914.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 93.36%, total val loss 0.013153310865163803


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 29 done, total train loss 0.0032020599149167536.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 93.57%, total val loss 0.013412940315902233


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))


Epoch 30 done, total train loss 0.0026870989613234997.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 93.63%, total val loss 0.013554363511502743


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 31 done, total train loss 0.002495518975555897.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 93.67%, total val loss 0.013512806035578251


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 32 done, total train loss 0.002152007151544094.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 93.69%, total val loss 0.013819814659655094


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 33 done, total train loss 0.002094041718840599.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 93.86%, total val loss 0.01361224614083767


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))


Epoch 34 done, total train loss 0.0018196301843225956.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 94.16%, total val loss 0.013119322247803211


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 35 done, total train loss 0.0017015041922032834.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 94.08%, total val loss 0.013258152641355991


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 36 done, total train loss 0.0016266229649633168.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 94.24%, total val loss 0.013428382575511932


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 37 done, total train loss 0.0016846263094246387.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 94.09%, total val loss 0.013224441558122635


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))


Epoch 38 done, total train loss 0.00160100377202034.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 94.17%, total val loss 0.013279838487505913


HBox(children=(IntProgress(value=0, max=3125), HTML(value='')))

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0

Epoch 39 done, total train loss 0.0015424308697879314.


HBox(children=(IntProgress(value=0, max=625), HTML(value='')))


Accuracy 94.17%, total val loss 0.01309883687645197
