# Objective
In this chapter, we’ll illustrate the basics of multi-label classification with an example. But first, we want to distinguish between two kinds of classification, multi-class classification and multi-label classification.
## Multi-class Classification 
The model will categorize/classify the input image into one of several classes (thus, multi-class), i.e.,the input image belongs to one and only one class out of several. For example, let's say the image has a dominant object  of an airplane in the foreground  and perhaps  smaller objects (say, trucks) in the background. This image will be classified into the single category of airplane. 



![desert](assets/desert+mountains-label-desert.png)

For computing the loss for multi-class classification, it's convenient to use the torch.nn.CrossEntropyLoss class which combines nn.LogSoftmax() and nn.NLLLoss() into one single class.

## Multi-label Classification 
“Multi-label” classification means that each image can belong to any
number of the specified classes, including zero (background). So multi-label
classification can be understood as a series of binary classifications per class.
Is the image in class A – yes or no? 
Is the same image in class B – yes or no?
And so on.
This is how we end up with multiple labels/classes for a single image.

![desert+mountains](assets/desert+mountains-label-desert+mountains.png)

For computing the loss for multi-label classification, it's convenient to use the torch.nn.BCEWithLogitsLoss class which combines a Sigmoid layer and the BCELoss (Binary Cross Entropy Loss) in one single class. By combining the operations into one layer, we take advantage of numerical stability inherent in these combined operations (and this is well documented).

We note that, in both types of classifications, the rest of the network layers are unchanged (only the loss function at the head of the network changes). 

# Dataset Download

The dataset is available at http://www.lamda.nju.edu.cn/data_MIMLimage.ashx
As you would expect, the dataset has two parts; the images of the scenes and the corresponding labels:
(1)	"original" part has the 2000 images. An image can belong to one or more classes.  The classes happen to be ['desert', 'mountains', 'sea', 'sunset', 'trees']
(2) "processed" part has the labels

You should have a folder called `original_images` containing the 2000 images and a file called `miml data.mat` containing the labels.

Note, do the steps below __only once__.


In [4]:
!wget http://www.lamda.nju.edu.cn/files/miml-image-data.rar
!unrar e miml-image-data.rar # gives two rar files
!mkdir -p original_images
!unrar e original.rar original_images
!unrar e processed.rar  # produces miml data.mat

K 
Extracting  original_images/1758.jpg                                    92  OK 
Extracting  original_images/1759.jpg                                    92  OK 
Extracting  original_images/1760.jpg                                    92  OK 
Extracting  original_images/1761.jpg                                    92  OK 
Extracting  original_images/1762.jpg                                    92  OK 
Extracting  original_images/1763.jpg                                    92  OK 
Extracting  original_images/1764.jpg                                    92  OK 
Extracting  original_images/1765.jpg                                    93  OK 
Extracting  original_images/1766.jpg                                    93  OK 
Extracting  original_images/1767.jpg                                    93  OK 
Extracting  original_images/1768.jpg                                    93  OK 
Extracting  original_images/1769.jpg                                    93  OK 
Extracting  original_images/1770.jpg 

# Imports
From the PyTorch framework, we import the necessary classes: neural net (nn) models to train the classifier on, optimizers to update the model paramters, image transforms to resize and normalize the images, and metrics generators. 

In [1]:
import os
import copy
from pathlib import Path
import numpy as np
import pandas as pd
import torch
import torchvision
from torch import nn
from torchvision import models, transforms
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader, random_split
from PIL import Image
from scipy.io import loadmat  # Load MATLAB file.
from sklearn.metrics import f1_score, roc_auc_score

We print the PyTorch and torchvision version to ensure that they meet our expectations.

In [2]:
print(f'PyTorch version: {torch.__version__}')
print(f'torchvision version: {torchvision.__version__}')

PyTorch version: 1.7.1
torchvision version: 0.8.2


# Dataset Processing
Our SceneDataset class inherits from `Dataset` and overrides the following methods (this is typical way to subclass this class) :

•	`__len__` to support returning the size of the Dataset instance.

•	`__getitem__` to support indexing such that the ith sample of an instance of SceneDataset can be retrieved.
 
To the class, we add a `get_labels` method to get the labels associated with an image at index.


In [3]:
class SceneDataset(Dataset):
    """ Subclass from Dataset and overide __get_item__ for data-specific indexing
     and __len__ to get data-specific length.
     We also add a new method, get_labels(index) to avoid going thru an expensive __getitem__"""
     
    def __init__(self, df, transforms=None):
        super().__init__()
        self.df = df
        self.transforms = transforms

    def get_labels(self, idx):
        record = self.df.iloc[idx]
        return record[1:].tolist()

    def __getitem__(self, idx):
        record = self.df.iloc[idx]
        image = Image.open(record['filename']).convert("RGB")
        label = torch.tensor(record[1:].tolist(), dtype=torch.float32)

        if self.transforms is not None:
            image = self.transforms(image)
        return image, label

    def __len__(self):
        return len(self.df)



### Model
We take a stock ResNet50 model with all the pretrained weights and apply a few changes to the model. First, we create a new head (see method `_create_head`) that consists of three Linear layers.
The new head layers can be sumamrized as:
```
(fc): Sequential(
      (0): Linear(in_features=2048, out_features=1024, bias=True)
      (1): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Dropout(p=0.3, inplace=False)
      (4): Linear(in_features=1024, out_features=512, bias=True)
      (5): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (6): ReLU()
      (7): Dropout(p=0.3, inplace=False)
      (8): Linear(in_features=512, out_features=5, bias=True)
    )
```
We then replace the last fully connected layer with the new head. A conceptual diagram of the final network (`ExtendedResNetModel`) is shown below.

![resnet with head](assets/new-rn50-head.png)

In [4]:
class ExtendedResNetModel(nn.Module):
    """ Extend ResNet with three new fully connected layers and attach them as a head to a ResNet50 trunk"""

    def __init__(self, nb_classes, dropout_prob=0.3, activation_func=nn.ReLU):
        super().__init__()
        # load the pretrained model as feafures
        self.rn50_features = models.resnet50(pretrained=True)
        # get the nb of in_features in last Linear unit
        nb_in_features_last = self.rn50_features.fc.in_features
        for param in self.rn50_features.parameters():
            param.requires_grad_(False)

        head = self._create_head(nb_in_features_last, nb_classes,
                                 dropout_prob, activation_func)
        self.rn50_features.fc = head  # attach head
        # print(self.rn50_features)

    def _create_head(self, nb_features, nb_classes, dropout_prob=0.3, activation_func=nn.ReLU):
        features_lst = [nb_features, nb_features//2, nb_features//4]
        layers = []
        for in_f, out_f in zip(features_lst[:-1], features_lst[1:]):
            layers.append(nn.Linear(in_f, out_f))
            layers.append(nn.BatchNorm1d(out_f))
            layers.append(activation_func())
            if dropout_prob != 0:
                layers.append(nn.Dropout(dropout_prob))
        layers.append(nn.Linear(features_lst[-1], nb_classes))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.rn50_features(x)
        return x


In [5]:
def train(model, data_loader, criterion, optimizer, scheduler, nb_epochs=5):

    for epoch in range(nb_epochs):
        result = []
        for phase in ['train', 'val']:
            if phase == "train":  
                model.train()  # put model in training mode
            else:  
                model.eval()  # # put model in validation mode

            # Track for each epoch
            running_loss = 0.0
            running_f1_score = 0.0
            running_roc_auc_score = 0.0

            for data, targets in data_loader[phase]:
                data, targets = data.to(device), targets.to(device)
                with torch.set_grad_enabled(phase == "train"):
                    outputs = model(data)  # forward pass
                    loss = criterion(outputs, targets)
                    preds = outputs.data > 0.5

                    if phase == "train":
                        loss.backward()  # compute gradient of the loss with respect to model parameters
                        optimizer.step()  # update the model parameters
                        scheduler.step()
                        optimizer.zero_grad()

                running_loss += loss.item() * len(data)
                running_f1_score += f1_score(targets.to("cpu").to(torch.int).numpy(),
                                             preds.to("cpu").to(torch.int).numpy(),
                                             average="samples") * len(data)
                running_roc_auc_score += roc_auc_score(targets.to("cpu").to(torch.int).numpy(),
                                                       preds.to("cpu").to(torch.int).numpy(),
                                                       average="samples") * len(data)

            epoch_loss = running_loss / len(data_loader[phase].dataset)
            epoch_f1_score = running_f1_score / len(data_loader[phase].dataset)
            epoch_roc_auc_score = running_roc_auc_score / \
                len(data_loader[phase].dataset)

            result.append(f'Epoch:{epoch} {phase.upper()}: Loss:{epoch_loss:.4f} '
                          f'F1-Score: {epoch_f1_score:.4f} AUC: {epoch_roc_auc_score:.4f}')
        print(result)

In [6]:
# To download the dataset, see accompanying file, dataset_download_steps.txt.
dataset_root = '.'

# Read the "processed" part for class names and class labels
processed_mat = loadmat(os.path.join(dataset_root, 'miml data.mat'))
class_labels = []
for c in processed_mat['class_name']:  # get the name of each class
    class_labels.append(c[0][0])
nb_classes = len(class_labels)
print('class labels:', class_labels)  # ['desert', 'mountains', 'sea', 'sunset', 'trees']

# Read the labels. If multi-class label for ith images equals [1, -1, -1, 1, -1], it means:
# i-th image belongs to the 1st & 4th class but does not belong to the 2nd, 3rd &  5th classes
labels = copy.deepcopy(processed_mat['targets'].T)
labels[labels == -1] = 0  # convert to range [0, 1] from [-1, 1]

class labels: ['desert', 'mountains', 'sea', 'sunset', 'trees']


In [7]:
# Setup a pandas dataframe with file location and associated (multi) labels as below
#                   filename desert mountains sea sunset trees
# 0  ./original_images/1.jpg      1         0   0      0     0
# 1  ./original_images/2.jpg      1         0   0      0     0
# 2  ./original_images/3.jpg      1         0   0      0     0
# 3  ./original_images/4.jpg      1         1   0      0     0
# 4  ./original_images/5.jpg      1         0   0      0     0

# create empty dataframe with columns, [filename desert mountains sea sunset tree]
data_df = pd.DataFrame(columns=['filename'] + class_labels)
filenames = os.listdir(os.path.join(dataset_root, "original_images/"))
data_df['filename'] = np.array(
    sorted(list(map(lambda x: int(Path(x).stem), np.array(filenames)))))
data_df['filename'] = data_df['filename'].apply(
    lambda x: os.path.join(dataset_root, 'original_images/') + str(x) + '.jpg')
data_df[class_labels] = np.array(labels)

transforms_list = transforms.Compose([transforms.Resize((224, 224)),
                                      transforms.ToTensor(),
                                      transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                                      std=[0.229, 0.224, 0.225])])

In [8]:
split_ratio = 0.3
dataset = SceneDataset(data_df, transforms_list)
split_point = int(len(dataset) * split_ratio)
trainset, valset = random_split(
    dataset, [len(dataset) - split_point, split_point], generator=torch.Generator().manual_seed(0))
print(f"Train set size: {len(trainset)}; Val set size: {len(valset)}")

batch_size = 128
dataloader = {"train": DataLoader(trainset, shuffle=True, batch_size=batch_size),
              "val": DataLoader(valset, shuffle=True, batch_size=batch_size)}

positive_weights = []
for cls in range(nb_classes):
    positive_samples = float(sum([dataset.get_labels(idx)[cls] == 1 for idx in trainset.indices]))
    negative_samples = float(sum([dataset.get_labels(idx)[cls] == 0 for idx in trainset.indices]))
    pos_weight = negative_samples / positive_samples
    positive_weights.append(pos_weight)
positive_weights = torch.FloatTensor(positive_weights).to('cuda')
print('Ratio of Negative samples to positive samples per class:', positive_weights)

Train set size: 1400; Val set size: 600
Ratio of Negative samples to positive samples per class: tensor([4.0909, 3.5161, 2.3981, 3.2424, 2.5088], device='cuda:0')


In [9]:
torch.manual_seed(0)
model = ExtendedResNetModel(nb_classes=nb_classes)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

criterion = nn.BCEWithLogitsLoss(pos_weight=positive_weights)
optimizer = optim.Adam(model.parameters(), lr=0.001)
sgdr_cos_anneal_sched = lr_scheduler.CosineAnnealingLR(optimizer,  # set learning rate schedule
                                                 T_max=5, eta_min=0.005)

train(model, dataloader, criterion, optimizer, sgdr_cos_anneal_sched, nb_epochs=10)

['Epoch:0 TRAIN: Loss:0.5277 F1-Score: 0.7289 AUC: 0.8444', 'Epoch:0 VAL: Loss:0.5010 F1-Score: 0.8118 AUC: 0.8901']
['Epoch:1 TRAIN: Loss:0.2617 F1-Score: 0.8901 AUC: 0.9371', 'Epoch:1 VAL: Loss:0.4053 F1-Score: 0.8393 AUC: 0.9073']
['Epoch:2 TRAIN: Loss:0.1828 F1-Score: 0.9283 AUC: 0.9586', 'Epoch:2 VAL: Loss:0.3926 F1-Score: 0.8519 AUC: 0.9138']
['Epoch:3 TRAIN: Loss:0.1162 F1-Score: 0.9577 AUC: 0.9782', 'Epoch:3 VAL: Loss:0.3444 F1-Score: 0.8694 AUC: 0.9224']
['Epoch:4 TRAIN: Loss:0.0857 F1-Score: 0.9688 AUC: 0.9817', 'Epoch:4 VAL: Loss:0.4510 F1-Score: 0.8208 AUC: 0.8928']
['Epoch:5 TRAIN: Loss:0.0740 F1-Score: 0.9740 AUC: 0.9869', 'Epoch:5 VAL: Loss:0.5443 F1-Score: 0.8249 AUC: 0.8953']
['Epoch:6 TRAIN: Loss:0.0542 F1-Score: 0.9827 AUC: 0.9900', 'Epoch:6 VAL: Loss:0.7529 F1-Score: 0.7750 AUC: 0.8629']
['Epoch:7 TRAIN: Loss:0.0422 F1-Score: 0.9847 AUC: 0.9920', 'Epoch:7 VAL: Loss:0.7464 F1-Score: 0.7943 AUC: 0.8776']
['Epoch:8 TRAIN: Loss:0.0363 F1-Score: 0.9912 AUC: 0.9948', 'Epo