# CNN classifier

Congratulations, here's your first homework! You'll learn the art of training deep image classifiers. You might remember `03 seminar` about training CIFAR10 classifier, this homework is also about training **CIFAR10 classifier**, but this time you'll have to do it on your own and with some extra features.

## Data
Your dataset is CIFAR10. Check out `03 seminar` on how to load train and val data splits.

**Note:** for training you can only use `train` dataset.

## Game rules:
Maximum score you can get for this task is **10.0**.

Half of 10 points you can get by reaching high val accuracy (as listed in table below):

- accuracy > 60.0 -> **1 point**
- accuracy > 70.0 -> **2 points**
- accuracy > 80.0 -> **3 points**
- accuracy > 90.0 -> **4 points**
- accuracy > 92.5 -> **5 points**

Another half of 10 points you can get by adding following features to your training pipeline. It's okay if you see some technics for the first time (that was the idea). Feel free to google and dive into topic on your own, it's homework after all:
1. Data augmentations. Check out [this article](https://medium.com/nanonets/how-to-use-deep-learning-when-you-have-limited-data-part-2-data-augmentation-c26971dc8ced) (**1 point**)
2. [LR schedule](https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.ReduceLROnPlateau) (**0.5 point**)
3. Finetune pretrained model from [torchvision.models](https://pytorch.org/docs/stable/torchvision/models.html) (except AlexNet!) (**1 point**)
4. Implement [ResNet model](https://medium.com/@14prakash/understanding-and-implementing-architectures-of-resnet-and-resnext-for-state-of-the-art-image-cf51669e1624) (**2 points**)
5. Use of [tensorboardX](https://github.com/lanpa/tensorboardX) to monitor training process (**0.5 points**)

As a result you have to submit **notebook with working code** (results will be reproduced during homework cheking) and **short report** (write it in the same notebook) about things you tried and what tasks you managed to implement. Good luck and have fun!

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn as nn

from tqdm import tqdm_notebook as tqdm

from tensorboardX import SummaryWriter
from datetime import datetime
import os

In [2]:
batch_size = 4
device = torch.device('cuda')#torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

transform = transforms.Compose([
    transforms.Resize((256,256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [3]:
train_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=True,
    download=True,
    transform=transform
)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

print("len(train_dataset) =", len(train_dataset))

val_dataset = torchvision.datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=transform
)
val_dataloader= torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

print("len(val_dataset) =", len(val_dataset))

Files already downloaded and verified
len(train_dataset) = 50000
Files already downloaded and verified
len(val_dataset) = 10000


In [4]:
model = torchvision.models.resnet101(pretrained=True).to(device)

In [5]:
for feature in model.parameters():
    feature.requires_grad = False

model.fc = nn.Sequential(nn.Linear(8192, 512), nn.ReLU(inplace=True), nn.Linear(512, 10)).to(device)

In [6]:
criterion = nn.CrossEntropyLoss().to(device)
opt = optim.SGD(model.parameters(), lr=0.00002, momentum=0.9)
scheduler = torch.optim.lr_scheduler.ExponentialLR(opt, 0.2)

In [7]:
experiment_title = 'resnet101_sgd_cifar10'
experiment_name = "{}@{}".format(experiment_title, datetime.now().strftime("%d.%m.%Y-%H:%M:%S"))
writer = SummaryWriter(log_dir=os.path.join("./tb", experiment_name))

In [51]:
n_epochs = 5
n_iters_total = 112522

for epoch in range(n_epochs):
    total_train_loss = 0
    total_val_loss = 0
    correct = 0
    
    model.train()
    for batch in tqdm(train_dataloader):
        # unpack batch
        image_batch, label_batch = batch
        image_batch, label_batch = image_batch.cuda(), label_batch.cuda()
        
        # forward
        outputs = model(image_batch)
        loss = criterion(outputs, label_batch)
        total_train_loss += loss.item()
        
        # optimize
        opt.zero_grad()
        loss.backward()
        opt.step()
        
        # dump statistics
        writer.add_scalar("train/loss", loss.item(), global_step=n_iters_total)
        
        n_iters_total += 1
        
    print("Epoch {} done, total train loss {}.".format(epoch, total_train_loss / len(train_dataset)))
    
    model.eval()
    with torch.no_grad():
        for batch in tqdm(val_dataloader):
            image_batch, label_batch = batch
            image_batch, label_batch = image_batch.to(device), label_batch.to(device)
            outputs = model(image_batch)
            loss = criterion(outputs, label_batch)
            total_val_loss += loss
            predicted = torch.argmax(outputs, dim=1)
            correct += (predicted == label_batch).sum().item()
    print("Accuracy {:.4}%, total val loss {}".format(100 * correct / len(val_dataset), total_val_loss / len(val_dataset)))
    
    if epoch % 2 == 0:
        torch.save({'model': model.state_dict(), 
                    'optimizer': opt.state_dict(), 
                    'epoch': 5, 
                    'loss': loss
                   }, 
                   'checkpoints/resnet101_pre_{}epochs.pth'.format(epoch+3))

HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Epoch 0 done, total train loss 0.23197455003499984.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))


Accuracy 78.42%, total val loss 0.1789761632680893


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Epoch 1 done, total train loss 0.20840802507400513.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))


Accuracy 79.62%, total val loss 0.24619567394256592


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Epoch 2 done, total train loss 0.19802272654294967.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))


Accuracy 79.18%, total val loss 0.20140409469604492


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))


Epoch 3 done, total train loss 0.19229435625970365.


HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))


Accuracy 77.95%, total val loss 0.23363646864891052


HBox(children=(IntProgress(value=0, max=12500), HTML(value='')))

KeyboardInterrupt: 

In [31]:
model.eval()
with torch.no_grad():
    for batch in tqdm(val_dataloader):
        image_batch, label_batch = batch
        image_batch, label_batch = image_batch.to(device), label_batch.to(device)
        outputs = model(image_batch)
        loss = criterion(outputs, label_batch)
        total_val_loss += loss
        predicted = torch.argmax(outputs, dim=1)
        correct += (predicted == label_batch).sum().item()
print("Accuracy {:.4}%, total val loss {}".format(100 * correct / len(val_dataset), total_val_loss / len(val_dataset)))

HBox(children=(IntProgress(value=0, max=2500), HTML(value='')))

Accuracy 70.92%, total val loss 0.0002574552781879902


In [48]:
torch.save({'model': model.state_dict(), 'optimizer': opt.state_dict(), 'epoch': 8, 'loss': loss}, 'checkpoints/resnet101_sgd_pre_8epochs.pth')

In [49]:
for param_group in opt.param_groups:
    param_group['lr'] = 0.0005

In [41]:
outputs

tensor([[ 1.7677,  1.6172,  0.2359, -0.7298, -1.6142, -1.2679, -1.0457, -1.4637,
          2.4966,  1.9163],
        [-1.7375, -3.0958,  0.9356,  2.8586,  1.3662,  0.8010,  2.2982,  0.5010,
         -2.0614, -0.5737],
        [-0.2894, -5.4796,  2.1962,  2.1734,  3.7695,  3.1381, -1.5267,  3.0031,
         -1.4337, -4.1767],
        [ 0.1378,  7.0258, -2.3110, -1.6613, -2.1242, -1.7423, -0.2865, -1.5042,
          0.5352,  2.2682]], device='cuda:0', grad_fn=<AddmmBackward>)

In [24]:
image_batch, label_batch = batch
image_batch, label_batch = image_batch.cuda(), label_batch.cuda()
        
        # forward
outputs = model(image_batch)
loss = criterion(outputs, label_batch)
total_train_loss += loss.item()
        
        # optimize
opt.zero_grad()
loss.backward()
opt.step()

In [36]:
model

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=F

In [47]:
for feature in model.parameters():
    print(feature.grad)

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None


In [50]:
n_iters_total

112521