# Fine-tuning VGG Network.

In this hometask you'll need to fine-tune VGG network for dogs classification (the same dataset as in practical seminar).

## Loading the data

In [1]:
# this cell downloads zip archive with data
! wget "https://www.dropbox.com/s/r11z0ugf2mezxvi/dogs.zip?dl=0" -O dogs.zip

--2025-02-24 21:11:02--  https://www.dropbox.com/s/r11z0ugf2mezxvi/dogs.zip?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.18, 2620:100:6021:18::a27d:4112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/cgwxt4jlpesmb9oq9s19f/dogs.zip?rlkey=ysuiqksb7i8ewm117m5e7lqxh&dl=0 [following]
--2025-02-24 21:11:02--  https://www.dropbox.com/scl/fi/cgwxt4jlpesmb9oq9s19f/dogs.zip?rlkey=ysuiqksb7i8ewm117m5e7lqxh&dl=0
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc9e32e8ac8c9f384dffb41b3d3d.dl.dropboxusercontent.com/cd/0/inline/CkxJIjZsAQct8ITRnTp1W7lsMaeJOb1-M4JVFQ7q0HsBuTBv2SvFQVziXTIt7NUThLulVNj9TZjdu7KvmQub0p7z4F_U5b0hfmJtyXDE88sWnK-oeFBbVU56tK5OgLnyrxEJbWrX-dLip7bzrznMQLt9/file# [following]
--2025-02-24 21:11:03--  https://uc9e32e8ac8c9f384dffb41b3d3d.dl.dropboxusercontent.com/cd/0/in

In [2]:
# this cell extract the archive. You'll now have "dogs" folder in colab
! unzip -qq dogs.zip

## Task 1

Your task is to fine-tune [VGG11 ](https://pytorch.org/vision/0.20/models/generated/torchvision.models.vgg11.html) network from torchvision for the task of dogs breed classification. Your task is to tune the model so that it has the best test accuracy possible. You are not allowed to use any other pretrained model except this and any other data except given.

What you can do:
- **Preprocess and augment data**. Note the following: there is a difference between ordinary data preprocessing (as we did in the practical session) and augmentation. Preprocessing usually refers to the way all the data (train and test) is processed before feeding into the network; augmentation is a technique used to populate training set of samples. Augmentation should only be used on training data, but not on validation and test data. You can read more about augmentation [here](https://d2l.ai/chapter_computer-vision/image-augmentation.html). Also think about what kind of image augmentations are suitable for the given task, e.g. would that be beneficial to flip images vertically in our case?
- **Change/remove/add layers to the network**. You can change layers of the pre-trained VGG11. Note, however, that newly added layers should not be pre-trained. You are allowed to add any layers, e.g. conv, fc, dropout, batchnorm
- **Tune hyperparameters**, e.g. batch size, learning rate, etc.

If X is your score on test set, them your task score is calculated as follows: min(0.95, (X-0.75))*5

In [3]:
%%capture
!pip install pytorch-lightning

In [4]:
import os
import pandas as pd
import numpy as np
from PIL import Image
import gc
import matplotlib.pyplot as plt
import pytorch_lightning as pl
from sklearn.model_selection import train_test_split, KFold, StratifiedGroupKFold, StratifiedKFold
from sklearn.metrics import f1_score
from torch.nn.utils.rnn import pad_sequence
import cv2
import io
import random
import torch
import torch.nn as nn
from torchvision import models
from torch.utils.data import Dataset, DataLoader
from tqdm.auto import tqdm
from sklearn.metrics import accuracy_score, f1_score
import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2
from transformers import get_cosine_schedule_with_warmup, get_linear_schedule_with_warmup

  check_for_updates()


In [5]:
class CFG:
    class data:
        train_path ='./dogs/train'
        test_path = './dogs/test'
        val_path = './dogs/valid'
        num_workers = 2
        img_size = 224
        batch_size = 16
        eval_batch_size = 32
        seed = 56
    class model:
        optim = torch.optim.AdamW
        num_labels = 70
        hidden_size = 4096
        cls_drop = 0.1
        scheduler = 'cosine'
        max_epoches = 10
        lr = lr_head = 1e-4
        num_cycles = 0.5
        warmup_ratio = 0.0
        drop = 0.5
        eps = 1e-12
        weight_decay = 0.0
        weight_decay_head = 0.0
        betas = (0.9, 0.999)
    seed = 56

class Transforms:
    transforms_train = A.Compose([
            A.Resize(CFG.data.img_size,CFG.data.img_size),
            A.RandomRotate90(p=0.5),
            A.HorizontalFlip(p=0.5),
            A.VerticalFlip(p=0.5),
            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0),
            ToTensorV2()
        ])
    transforms_val = A.Compose([
            A.Resize(CFG.data.img_size,CFG.data.img_size),
            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0),
            ToTensorV2()
        ])
    transforms_test = A.Compose([
            A.Resize(CFG.data.img_size,CFG.data.img_size),
            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0),
            ToTensorV2()
        ])

In [6]:
maper = {v.replace(' ',''):i for i,v in enumerate(os.listdir(CFG.data.train_path))}
inv_maper = {i:v.replace(' ','') for i,v in enumerate(os.listdir(CFG.data.train_path))}

def make_df(root,is_test=False):
    data = pd.DataFrame()
    if not is_test:
        pathes, labels = [],[]
        for cls in os.listdir(root):
            for img_name in os.listdir(f"{root}/{cls}"):
                labels.append(maper[cls.replace(' ','')])
                pathes.append(f"{root}/{cls}/{img_name}")

    data['image_path'] = pathes
    data['labels'] = labels
    return data[['labels','image_path']]

train_df = make_df(CFG.data.train_path)
val_df = make_df(CFG.data.val_path)
test_df = make_df(CFG.data.test_path)

In [7]:
class PLDataset(Dataset):
    def __init__(self, df, transforms):
        super().__init__()
        self.cfg = CFG.data
        self.data = df
        self.transforms = transforms

    def __getitem__(self, index):
        row = self.data.iloc[index]
        image = cv2.imread(row['image_path'])
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = self.transforms(image=image)['image']

        return {
            'image': image,
            'labels': row['labels']
        }

    def __len__(self):
        return len(self.data)

In [8]:
train_dataset = PLDataset(train_df,Transforms.transforms_train)
val_dataset = PLDataset(val_df,Transforms.transforms_val)
predict_dataset = PLDataset(test_df,Transforms.transforms_test)

In [9]:
train_dataloader = DataLoader(train_dataset, batch_size=CFG.data.batch_size, num_workers=CFG.data.num_workers, pin_memory=True, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=CFG.data.eval_batch_size, num_workers=CFG.data.num_workers, pin_memory=True, shuffle=False)
test_dataloader = DataLoader(predict_dataset, batch_size=CFG.data.eval_batch_size, num_workers=CFG.data.num_workers, pin_memory=True, shuffle=False)

In [10]:
class AverageMeter():
    def __init__(self):
        self.preds = []
        self.labels = []
        self.history = []

    def update(self,y_t,y_p):
        self.labels += y_t
        self.preds += y_p

    def clean(self):
        self.preds = []
        self.labels = []

    def calc_metrics(self):
        metrics = {}
        metrics['f1_macro'] = f1_score(self.labels, self.preds, average='macro')
        metrics['f1_micro'] = f1_score(self.labels, self.preds, average='micro')
        metrics['accuracy'] = accuracy_score(self.labels, self.preds)
        self.history.append(metrics)
        return metrics

In [11]:
class ImageEncoder(nn.Module):
    def __init__(self):
        super(ImageEncoder,self).__init__()
        self.cfg = CFG.model
        self.encoder = models.vgg11(pretrained=True, dropout=self.cfg.drop)
        self.encoder.classifier[5] = nn.Identity()
        self.encoder.classifier[6] = nn.Identity()
        self.cls_drop = nn.Dropout(self.cfg.cls_drop)
        self.fc = nn.Linear(self.cfg.hidden_size, self.cfg.num_labels)
        self._init_weights(self.fc)

    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            module.weight.data.normal_(mean=0.0, std=0.02)
            if module.bias is not None:
                module.bias.data.zero_()
        elif isinstance(module, nn.Embedding):
            module.weight.data.normal_(mean=0.0, std=0.02)
            if module.padding_idx is not None:
                module.weight.data[module.padding_idx].zero_()
        elif isinstance(module, nn.LayerNorm):
            module.bias.data.zero_()
            module.weight.data.fill_(1.0)

    def forward(self, image, return_features=False):
        features = self.encoder(image)
        if return_features:
            return features
        logits = self.fc(self.cls_drop(features))
        return logits

In [12]:
class PLModule(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.cfg = CFG.model
        self.model = ImageEncoder()
        self.avg_meter = AverageMeter()
        self.criterion = nn.CrossEntropyLoss()

    def forward(self, batch):
        output = self.model(batch['image'])
        return output

    def training_step(self, batch, i):
        logits = self(batch)
        loss = self.criterion(logits,batch['labels'])
        self.log('train_loss', loss.item())
        return loss

    def validation_step(self, batch, i):
        logits = self(batch)
        loss = self.criterion(logits,batch['labels'])
        self.log('val_loss',loss.item())
        preds = logits.argmax(dim=-1).tolist()
        self.avg_meter.update(batch['labels'].tolist(),preds)

    def predict_step(self, batch, i):
        logits = self(batch)
        return logits.softmax(dim=-1).tolist()

    def on_validation_epoch_end(self):
        metrics = self.avg_meter.calc_metrics()
        self.log_dict(metrics)
        print(metrics)
        self.avg_meter.clean()

    def configure_optimizers(self):
        optimizer_parameters = [
            {'params': self.model.encoder.parameters(),
             'lr': self.cfg.lr, 'weight_decay': self.cfg.weight_decay},
            {'params': self.model.fc.parameters(),
             'lr': self.cfg.lr_head, 'weight_decay': self.cfg.weight_decay_head}
        ]

        optim = self.cfg.optim(
            optimizer_parameters,
            lr=self.cfg.lr,
            betas=self.cfg.betas,
            weight_decay=self.cfg.weight_decay,
            eps=self.cfg.eps
        )

        if self.cfg.scheduler == 'cosine':
            scheduler = get_cosine_schedule_with_warmup(optim,
                                                        num_training_steps=self.cfg.num_training_steps,
                                                        num_warmup_steps=self.cfg.num_training_steps * self.cfg.warmup_ratio,
                                                        num_cycles=self.cfg.num_cycles)
        elif self.cfg.scheduler == 'linear':
            scheduler = get_linear_schedule_with_warmup(optim,
                                                        num_training_steps=self.cfg.num_training_steps,
                                                        num_warmup_steps=self.cfg.num_training_steps * self.cfg.warmup_ratio)
        else:
            return optim

        scheduler = {'scheduler': scheduler,'interval': 'step', 'frequency': 1}

        return [optim], [scheduler]

In [13]:
pl.seed_everything(56)

56

In [14]:
CFG.model.num_training_steps = len(train_dataloader) * CFG.model.max_epoches
model = PLModule().cuda()

Downloading: "https://download.pytorch.org/models/vgg11-8a719046.pth" to /root/.cache/torch/hub/checkpoints/vgg11-8a719046.pth
100%|██████████| 507M/507M [00:02<00:00, 231MB/s] 


In [15]:
lr_monitor = pl.callbacks.LearningRateMonitor(logging_interval='step')
checkpoint_cb = pl.callbacks.ModelCheckpoint(
    dirpath=f'./outputs_{0}/',
    filename='model_{epoch:02d}-{accuracy:.4f}',
    monitor='accuracy',
    mode='max',
    save_last=True
)

trainer = pl.Trainer(
    accelerator="gpu",
    precision=32,
    callbacks = [lr_monitor],#[lr_monitor,checkpoint_cb],
    logger = pl.loggers.CSVLogger("logs", name="vgg11"),#pl.loggers.WandbLogger(save_code=True),
    enable_checkpointing=False,
    log_every_n_steps=1,
    min_epochs=1,
    devices=1,
    check_val_every_n_epoch=1,
    max_epochs=CFG.model.max_epoches
)

In [16]:
trainer.fit(model,train_dataloader, val_dataloader)

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.006060606060606061, 'f1_micro': 0.015625, 'accuracy': 0.015625}


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.7994265262187152, 'f1_micro': 0.8057142857142857, 'accuracy': 0.8057142857142857}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.8433907623251194, 'f1_micro': 0.8485714285714285, 'accuracy': 0.8485714285714285}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.8753615542184471, 'f1_micro': 0.8757142857142857, 'accuracy': 0.8757142857142857}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.8651451451159159, 'f1_micro': 0.87, 'accuracy': 0.87}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.8903722835781561, 'f1_micro': 0.8914285714285715, 'accuracy': 0.8914285714285715}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.916561348318218, 'f1_micro': 0.9185714285714286, 'accuracy': 0.9185714285714286}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.9305557853125301, 'f1_micro': 0.9314285714285714, 'accuracy': 0.9314285714285714}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.943822664632041, 'f1_micro': 0.9442857142857143, 'accuracy': 0.9442857142857143}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.9465837754559558, 'f1_micro': 0.9471428571428572, 'accuracy': 0.9471428571428572}


Validation: |          | 0/? [00:00<?, ?it/s]

{'f1_macro': 0.9462916106044387, 'f1_micro': 0.9471428571428572, 'accuracy': 0.9471428571428572}


In [19]:
test_preds = np.concatenate(trainer.predict(model,test_dataloader)).argmax(axis=-1)

Predicting: |          | 0/? [00:00<?, ?it/s]

In [20]:
accuracy_score(test_df['labels'],test_preds)

0.9542857142857143

**Accuracy on test_data**: 0.9542857142857143