# CNN classifier

Congratulations, here's your first homework! You'll learn the art of training deep image classifiers. You might remember `03 seminar` about training CIFAR10 classifier, this homework is also about training **CIFAR10 classifier**, but this time you'll have to do it on your own and with some extra features.

## Data
Your dataset is CIFAR10. Check out `03 seminar` on how to load train and val data splits.

**Note:** for training you can only use `train` dataset.

## Game rules:
Maximum score you can get for this task is **10.0**.

Half of 10 points you can get by reaching high val accuracy (as listed in table below):

- accuracy > 60.0 -> **1 point**
- accuracy > 70.0 -> **2 points**
- accuracy > 80.0 -> **3 points**
- accuracy > 90.0 -> **4 points**
- accuracy > 92.5 -> **5 points**

Another half of 10 points you can get by adding following features to your training pipeline. It's okay if you see some technics for the first time (that was the idea). Feel free to google and dive into topic on your own, it's homework after all:
1. Data augmentations. Check out [this article](https://medium.com/nanonets/how-to-use-deep-learning-when-you-have-limited-data-part-2-data-augmentation-c26971dc8ced) (**1 point**)
2. [LR schedule](https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.ReduceLROnPlateau) (**0.5 point**)
3. Finetune pretrained model from [torchvision.models](https://pytorch.org/docs/stable/torchvision/models.html) (except AlexNet!) (**1 point**)
4. Implement [ResNet model](https://medium.com/@14prakash/understanding-and-implementing-architectures-of-resnet-and-resnext-for-state-of-the-art-image-cf51669e1624) (**2 points**)
5. Use of [tensorboardX](https://github.com/lanpa/tensorboardX) to monitor training process (**0.5 points**)

As a result you have to submit **notebook with working code** (results will be reproduced during homework cheking) and **short report** (write it in the same notebook) about things you tried and what tasks you managed to implement. Good luck and have fun!

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn as nn

from tqdm import tqdm_notebook as tqdm

from tensorboardX import SummaryWriter
from datetime import datetime
import os

In [3]:
batch_size = 2
device = torch.device('cuda')#torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

transform = transforms.Compose([
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

transform_random = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomApply([transforms.RandomCrop(28)], p=0.5),
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [4]:
train_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=True,
    download=True,
    transform=transform_random
)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

print("len(train_dataset) =", len(train_dataset))

val_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=transform
)
val_dataloader= torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

print("len(val_dataset) =", len(val_dataset))

Files already downloaded and verified
len(train_dataset) = 50000
Files already downloaded and verified
len(val_dataset) = 10000


In [5]:
model = torchvision.models.resnet101(num_classes=10).to(device)

In [6]:
#for feature in model.parameters():
#    feature.requires_grad = False

model.fc = nn.Sequential(nn.Linear(8192, 512), nn.ReLU(inplace=True), nn.Linear(512, 10)).to(device)

In [8]:
criterion = nn.CrossEntropyLoss().to(device)
opt = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(opt, factor=0.1, patience=4)

In [9]:
experiment_title = 'resnet101_aug_cifar10'
experiment_name = "{}@{}".format(experiment_title, datetime.now().strftime("%d.%m.%Y-%H:%M:%S"))
writer = SummaryWriter(log_dir=os.path.join("./tb_untr_wave3", experiment_name))

In [8]:
#checkpoint = torch.load('checkpoints/resnet50_aug_adam_v2_6epochs.pth')
#model.load_state_dict(checkpoint['model'])
#opt.load_state_dict(checkpoint['optimizer'])

In [12]:
n_epochs = 2
n_epochs_init = 0
n_iters_total = 0

for epoch in range(n_epochs_init, n_epochs):
    total_train_loss = 0
    total_val_loss = 0
    correct = 0
    
    model.train()
    for batch in tqdm(train_dataloader):
        # unpack batch
        image_batch, label_batch = batch
        image_batch, label_batch = image_batch.cuda(), label_batch.cuda()
        
        # forward
        outputs = model(image_batch)
        loss = criterion(outputs, label_batch)
        total_train_loss += loss.item()
        
        # optimize
        opt.zero_grad()
        loss.backward()
        opt.step()
        # dump statistics
        writer.add_scalar("train/loss", loss.item(), global_step=n_iters_total)
        
        n_iters_total += 1
        
    print("Epoch {} done, total train loss {}.".format(epoch, total_train_loss / len(train_dataset)))
    
    model.eval()
    with torch.no_grad():
        for batch in tqdm(val_dataloader):
            image_batch, label_batch = batch
            image_batch, label_batch = image_batch.to(device), label_batch.to(device)
            outputs = model(image_batch)
            loss = criterion(outputs, label_batch)
            total_val_loss += loss
            predicted = torch.argmax(outputs, dim=1)
            correct += (predicted == label_batch).sum().item()
    print("Accuracy {:.4}%, total val loss {}".format(100 * correct / len(val_dataset), total_val_loss / len(val_dataset)))
    
    scheduler.step(100 * correct / len(val_dataset))
    
    if epoch % 2 == 0:
        torch.save({'model': model.state_dict(), 
                    'optimizer': opt.state_dict(), 
                    'epoch': epoch,
                    'iter_num': n_iters_total,
                    'loss': total_val_loss,
                    'accuracy': 100 * correct / len(val_dataset)
                   }, 
                   'checkpoints/resnet101_aug_sgd_{}epochs.pth'.format(epoch))

HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

Epoch 0 done, total train loss 0.8297383439004421.


HBox(children=(IntProgress(value=0, max=5000), HTML(value='')))

Accuracy 35.81%, total val loss 1.3956438302993774


HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

Epoch 1 done, total train loss 0.7369517471086979.


HBox(children=(IntProgress(value=0, max=5000), HTML(value='')))

Accuracy 42.15%, total val loss 1.3140630722045898


HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

Epoch 2 done, total train loss 0.6500496973478794.


HBox(children=(IntProgress(value=0, max=5000), HTML(value='')))

Accuracy 51.91%, total val loss 0.792344868183136


HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

Epoch 3 done, total train loss 0.571755489230156.


HBox(children=(IntProgress(value=0, max=5000), HTML(value='')))

Accuracy 55.25%, total val loss 0.923947811126709


HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

Epoch 4 done, total train loss 0.5101069788181781.


HBox(children=(IntProgress(value=0, max=5000), HTML(value='')))

Accuracy 61.38%, total val loss 0.7555294036865234


HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

Epoch 5 done, total train loss 0.38402678300976756.


HBox(children=(IntProgress(value=0, max=5000), HTML(value='')))

Accuracy 59.44%, total val loss 1.3995038270950317


HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

Epoch 6 done, total train loss 0.3606175603461266.


HBox(children=(IntProgress(value=0, max=5000), HTML(value='')))

Accuracy 60.6%, total val loss 1.369270920753479


HBox(children=(IntProgress(value=0, max=25000), HTML(value='')))

KeyboardInterrupt: 

In [31]:
model.eval()
with torch.no_grad():
    for batch in tqdm(val_dataloader):
        image_batch, label_batch = batch
        image_batch, label_batch = image_batch.to(device), label_batch.to(device)
        outputs = model(image_batch)
        loss = criterion(outputs, label_batch)
        total_val_loss += loss
        predicted = torch.argmax(outputs, dim=1)
        correct += (predicted == label_batch).sum().item()
print("Accuracy {:.4}%, total val loss {}".format(100 * correct / len(val_dataset), total_val_loss / len(val_dataset)))

HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))

Accuracy 70.92%, total val loss 0.0002574552781879902


In [45]:
torch.save({'model': model.state_dict(), 'optimizer': opt.state_dict(), 'epoch': 5, 'loss': loss}, 'checkpoints/resnet50_pre_5epochs.pth')

In [11]:
for param_group in opt.param_groups:
    param_group['lr'] = 0.0003

In [22]:
n_iters_total

550001

In [6]:
transform_random = transforms.Compose([
    transforms.RandomChoice([transforms.RandomCrop(28), transforms.RandomHorizontalFlip(p=0.75)]),
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

train_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=True,
    download=True,
    transform=transform_random
)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

print("len(train_dataset) =", len(train_dataset))

val_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=transform
)
val_dataloader= torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

print("len(val_dataset) =", len(val_dataset))

Files already downloaded and verified
len(train_dataset) = 50000
Files already downloaded and verified
len(val_dataset) = 10000


In [29]:
checkpoint = torch.load('checkpoints/resnet50_aug_adam_v2_40epochs.pth')
model.load_state_dict(checkpoint['model'])
opt.load_state_dict(checkpoint['optimizer'])

In [30]:
experiment_title = 'resnet50_cont_aug_adam_3_cifar10'
experiment_name = "{}@{}".format(experiment_title, datetime.now().strftime("%d.%m.%Y-%H:%M:%S"))
writer = SummaryWriter(log_dir=os.path.join("./tb_untr", experiment_name))

In [32]:
n_epochs = 46
n_epochs_init = 41
n_iters_total = 500000

for epoch in range(n_epochs_init, n_epochs):
    total_train_loss = 0
    total_val_loss = 0
    correct = 0
    
    model.train()
    for batch in tqdm(train_dataloader):
        # unpack batch
        image_batch, label_batch = batch
        image_batch, label_batch = image_batch.cuda(), label_batch.cuda()
        
        # forward
        outputs = model(image_batch)
        loss = criterion(outputs, label_batch)
        total_train_loss += loss.item()
        
        # optimize
        opt.zero_grad()
        loss.backward()
        opt.step()
        # dump statistics
        writer.add_scalar("train/loss", loss.item(), global_step=n_iters_total)
        
        n_iters_total += 1
        
    print("Epoch {} done, total train loss {}.".format(epoch, total_train_loss / len(train_dataset)))
    
    model.eval()
    with torch.no_grad():
        for batch in tqdm(val_dataloader):
            image_batch, label_batch = batch
            image_batch, label_batch = image_batch.to(device), label_batch.to(device)
            outputs = model(image_batch)
            loss = criterion(outputs, label_batch)
            total_val_loss += loss
            predicted = torch.argmax(outputs, dim=1)
            correct += (predicted == label_batch).sum().item()
    print("Accuracy {:.4}%, total val loss {}".format(100 * correct / len(val_dataset), total_val_loss / len(val_dataset)))
    
    #scheduler.step(total_val_loss)
    
    if epoch % 2 == 0:
        torch.save({'model': model.state_dict(), 
                    'optimizer': opt.state_dict(), 
                    'epoch': epoch, 
                    'accuracy': 100 * correct / len(val_dataset)
                   }, 
                   'checkpoints/resnet50_aug_adam_v2_cont2_{}epochs.pth'.format(epoch))

HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))

Epoch 41 done, total train loss 0.006137742601714563.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))

Accuracy 92.19%, total val loss 0.0823383778333664


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))

Epoch 42 done, total train loss 0.005276410705707967.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))

Accuracy 91.78%, total val loss 0.08470028638839722


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))

Epoch 43 done, total train loss 0.005804890033230185.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))

Accuracy 92.05%, total val loss 0.08246500045061111


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))

Epoch 44 done, total train loss 0.005602258243680699.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))

Accuracy 92.21%, total val loss 0.07993371039628983


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))

Epoch 45 done, total train loss 0.005522879205830395.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))

Accuracy 92.01%, total val loss 0.08301439136266708
