# Homework 2. Classification of images.

# Task 1: Image Classification (1 point).

In this task, you will need to train an image classifier. We will be working with a dataset, the name of which will not be revealed. You can check the images available in the dataset by yourself. The dataset has 200 classes and around 5,000 images per class. The classes are numbered from 0 to 199. You can download the dataset [here](https://yadi.sk/d/BNR41Vu3y0c7qA).

The dataset structure is simple — it has `train/` and `val/` directories, which contain the training and validation data. Inside `train/` and `val/`, there are subdirectories corresponding to image classes, and the images themselves are inside these subdirectories.

## Task.

You need to complete one of the two tasks:

1) Achieve validation accuracy of **at least 0.44**. In this task, **using pretrained models and image resizing is forbidden**.

2) Achieve validation accuracy of **at least 0.84**. In this task, resizing and using pretrained models is allowed.

Write a brief report on your experiments. What worked and what didn’t? Why did you decide to do it that way? Be sure to provide links to external code if you are using it. Always reference articles / blog posts / StackOverflow questions / YouTube videos from machine learning creators / courses / tips from Uncle Vasya, and any other additional materials you used.

Your code must pass all `assert` checks below.

You must write the functions `train_one_epoch`, `train`, and `predict` according to the templates below (they largely repeat examples from the seminars). Pay special attention to the `predict` function: it should return a list of losses for all objects in the dataloader, a list of predicted classes for each object from the dataloader, and a list of true classes for each object in the dataloader (in exactly this order).

**Using external data for training is strictly prohibited in both tasks. Also, training on the validation set is forbidden**.

### Evaluation criteria:
The evaluation is calculated by a simple formula: `min(10, 10 * Your accuracy / 0.44)` for the first task and `min(10, 10 * (Your accuracy - 0.5) / 0.34)` for the second. The result is rounded to one decimal place. If you complete both tasks, the maximum of the two scores will be taken.

### Bonus:
You will receive 5 bonus points if you complete both tasks with a score of 10 (a total of 15 points). Otherwise, the maximum of the two scores will be taken, and your bonus will be zero.

### Tips and Recommendations:
- You will likely need to Google a lot about classification and how to make it work. This is normal; everyone Googles. But remember, you must be ready to explain the code you used :)
- Use augmentations. You can use the `torchvision.transforms` module or the [albumentations](https://github.com/albumentations-team/albumentations) library.
- You can either train from scratch or fine-tune (depending on the task) models from `torchvision`.
- We recommend writing a custom dataset class (or using the `ImageFolder` class), which returns images and their corresponding labels, and then creating functions for training based on the templates below. However, we do not require this. If this style is inconvenient, you can write the code in your preferred style. Keep in mind that excessive changes to the templates below will increase the number of questions about your code and increase the likelihood of being called for a defense :)
- Validate. Track errors as early as possible to avoid wasting time.
- To quickly debug the code, try training on a small portion of the dataset (say, 5-10 images, just to check if the code runs). Once you’ve debugged everything, proceed with training on the full dataset.
- Make exactly one change to the model/augmentation/optimizer for each run to understand what influences the result.
- Fix the random seed.
- Start with simple models and gradually move to more complex ones. Training light models saves a lot of time.
- Set a learning rate schedule. Reduce it when the validation loss stops decreasing.
- We recommend using a GPU. If you don’t have one, use Google Colab. If you are uncomfortable using it constantly, write and debug all the code locally on the CPU, and then run the notebook in Colab. The author's solution reaches the required accuracy in Colab in 15 minutes of training.

Good luck & have fun! :)

In [59]:
# !pip install torch
# !pip install torch torchvision torchaudio

In [77]:
# !pip install wget

In [1]:
import numpy as np
import torch
import torchvision
import tqdm
import random
from torch import nn
from torch.nn import functional as F
from sklearn.metrics import accuracy_score
from torchvision.datasets import ImageFolder
from torchvision import transforms
import matplotlib.pyplot as plt
%matplotlib inline
# You may add any imports you need

In [3]:
import warnings
warnings.filterwarnings("ignore")

In [5]:
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

In [7]:
torch.cuda.is_available()

False

In [9]:
def set_random_seed(seed):
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True) 
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    np.random.default_rng(seed)
    np.random.seed(seed)
    random.seed(seed)

In [11]:
def plot_history_of_train(train_history, title="Train Loss"):

    plt.figure(figsize=(9, 6))
    plt.title('{}'.format(title))
    plt.plot(train_history, label="train", zorder=1)

    plt.xlabel("train steps")
    
    plt.legend(loc="best")
    plt.grid()

    plt.show()

In [13]:
def plot_history_of_test(val_history, title="Val Loss"):

    plt.figure(figsize=(9, 6))
    plt.title('{}'.format(title))
    plt.plot(val_history, label="test", zorder=1)

    plt.xlabel("test steps")
    
    plt.legend(loc="best")
    plt.grid()

    plt.show()

In [95]:
import requests

url = "https://www.dropbox.com/s/33l8lp62rmvtx40/dataset.zip?dl=1"
response = requests.get(url)

# Saving the dataset to a file
with open("dataset.zip", "wb") as f:
    f.write(response.content)

# Unzipping the dataset
import zipfile
with zipfile.ZipFile("dataset.zip", "r") as zip_ref:
    zip_ref.extractall()


In [97]:
import os

# List files in the current directory
extracted_files = os.listdir()
print(extracted_files)

['Image classification.ipynb', 'dataset', '.ipynb_checkpoints', 'dataset.zip', 'Introduction to Pytorch. Fully connected neural networks.ipynb']


In [99]:
dataset_folder = 'dataset'
dataset_contents = os.listdir(dataset_folder)

print(dataset_contents)

['dataset']


In [101]:
# List the contents of the 'dataset' folder
dataset_path = os.path.join(dataset_folder, 'dataset')
dataset_contents = os.listdir(dataset_path)

print(dataset_contents)

['train', 'val']


In [27]:
# List contents of 'train' and 'val' folders
train_folder = os.path.join(dataset_path, 'train')
val_folder = os.path.join(dataset_path, 'val')

train_contents = os.listdir(train_folder)
val_contents = os.listdir(val_folder)

# print("Train folder contents:", train_contents)
# print("Validation folder contents:", val_contents)

In [91]:
# !wget 'https://www.dropbox.com/s/33l8lp62rmvtx40/dataset.zip?dl=1' -O dataset.zip && unzip -q dataset.zip

zsh:1: command not found: wget


In [15]:
train_dataset = ImageFolder(root="dataset/dataset/train", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor()]))
val_dataset = ImageFolder(root="dataset/dataset/val", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor()]))

### Data Processing

In [17]:
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=False)
val_dataloader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)

In [19]:
# calculate the average and standard deviation of the dataset data for its subsequent normalization (we do it by matches so as not to get an error filling RAM)
# taken from the following YouTube video: https://youtu.be/z3kB3ISIPAg

def get_mean_and_std(dataloader):
  mean = 0.
  std = 0.
  all_images_count = 0
  for images, _ in dataloader:
    images_count_in_a_batch = images.size(0)
    images = images.view(images_count_in_a_batch, images.size(1), -1)
    mean += images.mean(2).sum(0)
    std += images.std(2).sum(0)
    all_images_count += images_count_in_a_batch
  
  mean /= all_images_count
  std /= all_images_count

  return mean, std

In [21]:
get_mean_and_std(train_dataloader)

(tensor([0.4802, 0.4481, 0.3975]), tensor([0.2302, 0.2265, 0.2262]))

In [23]:
get_mean_and_std(val_dataloader)

(tensor([0.4824, 0.4495, 0.3981]), tensor([0.2301, 0.2264, 0.2261]))

In [33]:
import PIL
from torchvision import transforms

train_transform = transforms.Compose([
    transforms.ColorJitter(hue=.06, saturation=.04),
    transforms.RandomEqualize(p=0.5),
    transforms.RandomHorizontalFlip(),
    transforms.RandomGrayscale(p=0.15),
    transforms.RandomRotation(20),  # Correct way to use resample
    transforms.RandomAutocontrast(p=0.5),
    transforms.RandomInvert(p=0.3),
    transforms.ToTensor(),
    transforms.Normalize((0.4802, 0.4481, 0.3975), (0.2302, 0.2265, 0.2262))
])

val_transform = transforms.Compose([
    transforms.ColorJitter(hue=.06, saturation=.04),
    transforms.RandomEqualize(p=0.5),
    transforms.RandomHorizontalFlip(),
    transforms.RandomGrayscale(p=0.15),
    transforms.RandomRotation(20),  # Correct way to use resample
    transforms.RandomAutocontrast(p=0.5),
    transforms.RandomInvert(p=0.3),
    transforms.ToTensor(),
    transforms.Normalize((0.4824, 0.4495, 0.3981), (0.2301, 0.2264, 0.2261))
])


In [35]:
train_dataset = ImageFolder(root="dataset/dataset/train", transform=train_transform)
val_dataset = ImageFolder(root="dataset/dataset/val", transform=val_transform)

In [41]:
set_random_seed(12)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, drop_last=True, shuffle=True)
val_dataloader = torch.utils.data.DataLoader(val_dataset, batch_size=64, drop_last=True, shuffle=False)

In [43]:
# Just very simple sanity checks
assert isinstance(train_dataset[0], tuple)
assert len(train_dataset[0]) == 2
assert isinstance(train_dataset[1][1], int)
print("tests passed")

tests passed


In [45]:
train_dataset[0][0].shape

torch.Size([3, 64, 64])

### Auxiliary functions, model implementation

In [47]:
def train(model, train_dataloader, criterion, optimizer, n_epochs=10, device=None, scheduler=None):

    train_loss_log, train_acc_log = [], []

    for epoch in range(n_epochs):
        print('Epoch number %d' % (epoch + 1))
        train_epoch_loss, train_epoch_true_hits = torch.empty(0), 0.0
        total_amount = 0.0
        model.train()
        for batch in tqdm.tqdm(train_dataloader):
            images, labels = batch
            images = images.to(device)
            labels = labels.to(device)
            total_amount += labels.size(0)
            optimizer.zero_grad()
            y_pred = model(images)
            loss = criterion(y_pred, labels)
            loss.backward()
            optimizer.step()
            
            # log loss for the current epoch and the whole training history
            train_epoch_loss = torch.cat((train_epoch_loss, loss.unsqueeze(0).cpu()))
            train_loss_log.append(loss.item())
            
            # log accuracy for the current epoch and the whole training history
            train_epoch_true_hits += (y_pred.argmax(1) == labels).sum().item()
            train_acc_log.append((y_pred.argmax(1) == labels).sum().item())
            
        mean_loss = torch.mean(train_epoch_loss)
        scheduler.step(mean_loss)
        
        plot_history_of_train(train_loss_log)

        print("Train_epoch_loss:", torch.mean(train_epoch_loss))
        print("Train_epoch_accuracy:", train_epoch_true_hits / total_amount)
    
    
def predict(model, val_dataloder, criterion, n_epochs, device=None):

    val_loss_log, val_acc_log = [], []
  
    model.eval()
    for epoch in range(n_epochs):
        print('Epoch number %d' % (epoch + 1))
        val_epoch_losses, val_epoch_true_hits = torch.empty(0), torch.empty(0)
        predicted_classes = torch.empty(0)
        true_classes = torch.empty(0)                          
        total_amount = 0.0
        with torch.no_grad():                                        
            for batch in tqdm.tqdm(val_dataloader): 
                images, labels = batch 
                images = images.to(device)
                labels = labels.to(device) # берем батч из вал лоадера
                total_amount += labels.size(0)
                true_classes = torch.cat((true_classes, labels.unsqueeze(0).cpu())) 
                y_pred = model(images)                        
                loss = criterion(y_pred, labels)              
                val_epoch_losses = torch.cat((val_epoch_losses, loss.unsqueeze(0).cpu()))
                pred_classes = torch.argmax(y_pred, dim=-1)
                predicted_classes = torch.cat((predicted_classes,  pred_classes.unsqueeze(0).cpu())) 
                val_epoch_true_hits = torch.cat((val_epoch_true_hits, (pred_classes == labels).sum().unsqueeze(0).cpu()))
                val_loss_log.append(val_epoch_losses.mean())
                val_acc_log.append((pred_classes == labels).sum().item())

            print("Val loss:", val_epoch_losses.mean().item())
            print("Val accuracy:", (val_epoch_true_hits.sum() / total_amount).item())
    
            plot_history_of_test(val_loss_log)

            val_epoch_losses = torch.reshape(val_epoch_losses, (-1, 1))
            predicted_classes = torch.reshape(predicted_classes, (-1, 1))
            true_classes = torch.reshape(true_classes, (-1, 1))
    return val_epoch_losses, predicted_classes, true_classes

### Model training, experiment launches

In [49]:
from torchvision.models import resnet18

model_unpretrained = resnet18(pretrained=False)
model_unpretrained.fc = nn.Linear(512, 200)
model_unpretrained.to(device)
optimizer = torch.optim.Adam(model_unpretrained.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=3, factor=0.1)
n_epochs = 3

In [51]:
model_unpretrained = torchvision.models.densenet161(pretrained=False)
model_unpretrained.classifier = nn.Linear(2208, 200)
model_unpretrained.to(device)
optimizer = torch.optim.Adam(model_unpretrained.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=3, factor=0.1)
n_epochs = 3

In [53]:
model_pretrained = resnet18(pretrained=True)
model_pretrained.to(device)
for param in model_pretrained.parameters():
    param.requires_grad = False
model_pretrained.fc = nn.Linear(512, 200)
optimizer = torch.optim.Adam([p for p in model_pretrained.parameters() if p.requires_grad], lr=0.001)
criterion = nn.CrossEntropyLoss()
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
n_epochs = 3

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /Users/nikitadvornov/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 53.0MB/s]


In [57]:
train(model_unpretrained, train_dataloader, criterion, optimizer, 8, device, scheduler)

In [32]:
predict(model_unpretrained, val_dataloader, criterion, 10, device)

A simple test to check the correctness of the written code

In [61]:
all_losses, predicted_labels, true_labels = predict(model_unpretrained, val_dataloader, criterion, 2, device)
assert len(predicted_labels) == len(val_dataset) - 16       # 16 - the size of the last (dropped) batch (len(val_dataset) - len(predicted_labels) = 16)
accuracy = accuracy_score(predicted_labels, true_labels)
print("tests passed")

### Checking the received accuracy

After all the experiments that you have done, choose the best of your models, implement and run the `evaluate` function. This function should take a model and a dataloader with validation data as input and return the accuracy calculated on this dataset.

In [65]:
all_losses, predicted_labels, true_labels = predict(model_unpretrained, val_dataloader, criterion, 1, device)
assert len(predicted_labels) == len(val_dataset) - 16
accuracy = accuracy_score(true_labels, predicted_labels)
print("Оценка за это задание составит {} баллов".format(min(10, 10 * accuracy / 0.44)))

### Experiments Report

I've tried different models from torch vision.models. The most successful model turned out to be the densenet16, which is slightly better than the usual reznet. I also tried different step lengths and schedulers, such as, for example, lr_scheduler.Exponential LR and lr_scheduler.ReduceLROnPlateau, settling on the latter as the most well-behaved. The best model in the end is as follows: touch vision.models.densenet161(pretrained=False) with the parameters written in the corresponding cell above (accuracy = 0.417 with 8 training epochs). The code for the second task was also written, and I didn't use anything more than the usual resnet, so the quality there is clearly no better than in the first task (in relative terms, of course).