# Transfer Learning

## Installing dependencies

In [27]:
DEPENDENCIES = [
    'tf-slim==1.1.0',
    'numpy==1.21.6',
    'pandas==1.3.5',
    'seaborn',
    'torch==1.11.0',
    'torchvision==0.12.0',
    'matplotlib==3.5.3',
    'opencv-python==4.5.4.60',
    'sklearn==0.0.post1',
    'skorch==0.12.1',
    'tqdm',
    'requests',
    'plotly==5.11.0',
    'scikit-image==0.19.3',
]

In [28]:
import sys
import subprocess
import typing as tp
import re

def install_dependencies(dependencies: tp.List[str], show_progress: bool = True) -> tp.Tuple[tp.List[str], tp.List[Exception]]:
    pip = [sys.executable, '-m', 'pip']

    emit = print if show_progress else lambda x: None

    resolved_dependencies, errors = [], []
    for dependency in dependencies:
        emit(f'Installing "{dependency}"...')

        try:
            subprocess.run([*pip, "install", "--root-user-action=ignore", dependency], stdout=subprocess.DEVNULL)
            
            if '==' in dependency:
                dependency = re.search('(.+)==.+', dependency).group(1)

            if '@' in dependency:
                dependency = re.search('(.+) @ .+', dependency).group(1)
            
            pip_freeze = subprocess.Popen((*pip, "freeze"), stdout=subprocess.PIPE)
            output = subprocess.check_output(("grep", "-E", f"^({dependency}==)|({dependency} @).+$"), stdin=pip_freeze.stdout)
            resolved_dependencies.append(output.decode().strip())
        except subprocess.CalledProcessError as e:
            errors.append(e)
    
    return resolved_dependencies, errors

In [29]:
from pathlib import Path

if (Path("/") / "kaggle").is_dir():
    # Running in kaggle
    install_dependencies(DEPENDENCIES)

Installing "tf-slim==1.1.0"...
Installing "numpy==1.21.6"...
Installing "pandas==1.3.5"...
Installing "seaborn"...
Installing "torch==1.11.0"...
Installing "torchvision==0.12.0"...
Installing "matplotlib==3.5.3"...
Installing "opencv-python==4.5.4.60"...
Installing "sklearn==0.0.post1"...
Installing "skorch==0.12.1"...
Installing "tqdm"...
Installing "requests"...
Installing "plotly==5.11.0"...
Installing "scikit-image==0.19.3"...


## Seeding RNGs

Achieving reproducibility in our results, requires initializing (also known as `seeding`) the random number generators (RNG) utilized by our dependencies. In order to do so, we designate a `RANDOM_SEED` number, namely `1234`, and we use it to initialize the following RNGs:

- `numpy` (`np.random.seed`)
- `random` (`random.seed`)
- `torch (CPU)` (`torch.manual_seed`)
- `torch (GPU)` (`torch.cuda.manual_seed`)

The aforementioned RNGs are utilized by `torch`, `numpy` as well as `sklearn` in order to generate random numbers. `random.seed` corresponds to the python standard library RNG. We are seeding each and every one of them in order to cover any possible edge cases, wherein third party code utilizes any of them unbeknownst to us. Lastly, `PYTHONHASHSEED` controls the hashing of str, bytes and datetime objects. More specifically (as stated in the official `Python` documentation):

_"If this variable is not set or set to random, a random value is used to seed the hashes of str, bytes and datetime objects..."_

In [30]:
import os
import torch
import numpy as np
import random

RANDOM_SEED = 1234

if RANDOM_SEED is not None:
    np.random.seed(RANDOM_SEED)
    random.seed(RANDOM_SEED)
    torch.manual_seed(RANDOM_SEED)
    torch.cuda.manual_seed(RANDOM_SEED)
    os.environ["PYTHONHASHSEED"] = str(RANDOM_SEED)

## Loading the dataset

In [181]:
BASE_DIR = Path.cwd()

INPUT_DIR = Path("/") / "kaggle" / "input"
if not INPUT_DIR.is_dir():
    # Not running in Kaggle
    INPUT_DIR = BASE_DIR / 'data'

DATA_DIR = INPUT_DIR / "planets-dataset" / "planet" / "planet" # https://www.kaggle.com/datasets/nikitarom/planets-dataset

TRAIN_SAMPLES_DIR = DATA_DIR / 'train-jpg'
TRAIN_LABELS_FILE = DATA_DIR / 'train_classes.csv'

TEST_SAMPLES_DIR = DATA_DIR / 'test-jpg'
TEST_SAMPLES_DIR_ADDITIONAL = INPUT_DIR / "planets-dataset" / 'test-jpg-additional'
TEST_LABELS_FILE = DATA_DIR / 'sample_submission.csv'

MODEL_WEIGHTS_DIR = INPUT_DIR / 'resnet-weights'
if not MODEL_WEIGHTS_DIR.is_dir():
    # Not running in Kaggle
    MODEL_WEIGHTS_DIR = BASE_DIR / 'models'

We define our custom Dataset class to manipulate batches of data between RAM and Disk more easily. Some point of attentions:

- __init__: we pass the dataframe along with the target, the transformation, the file path and is_train flag. It is important to distinguish the training phase from the testing phase because we use test augmentation. Test augmentation (TTA) is helpful to diversify our training dataset and build a more robust model. It is applied on each image for each batch, meaning that is doesn't increase the length of our training dataset per say, but it transforms each image randomly during execution time.
- __getitem__: we define what the dataset return upon iteration. It needs to load both image and target. collate_fn: we use this function within the following DataLoader instance. It corresponds to the batch manipulation. This is were transform is called. We also proceed to train and test augmentation there.
- collate_fn: we use this function within the following DataLoader instance. It corresponds to the batch manipulation. This is were transform is called. We also proceed to train and test augmentation there.

In [32]:
from torch.utils.data import Dataset
import pandas as pd
import cv2
import torch
import numpy.typing as ntp

Transform = tp.Callable[[torch.Tensor], torch.Tensor]

class AmazonDataset(Dataset):
    def __init__(self, dataset_dir: Path, image_names: tp.List[str], tags: tp.List[tp.List[int]], transform: tp.Optional[Transform] = None) -> None:
        super().__init__()

        self.dataset_dir = dataset_dir
        self.image_names = image_names
        self.tags = tags
        self.transform = transform

    def __len__(self) -> int:
        return len(self.image_names)

    def __getitem__(self, idx: int) -> tp.Tuple[ntp.NDArray[np.float_], ntp.NDArray[np.int_]]:
        image = self.load_image(idx)
        tags = self.load_tags(idx)
        
        return image, tags

    def load_tags(self, idx: int) -> torch.Tensor:
        tags = self.tags[idx]
        tags = torch.as_tensor(tags)
        tags = tags.float()
        
        return tags
    
    def load_image(self, idx: int) -> torch.Tensor:
        image_name = self.image_names[idx]
        filename = f'{image_name}.jpg'
        filepath = self.dataset_dir / filename

        image = cv2.imread(str(filepath))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = torch.tensor(image)
        image = image.permute(2, 0, 1)
        image = self.transform(image)
        image = image.float()
        
        return image

In [33]:
import torchvision.transforms as T

transform_train = T.Compose([
    T.ToPILImage(),
    T.Resize(224),
    T.ToTensor(),
    T.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
    T.RandomHorizontalFlip(),
    T.RandomRotation(180)
])

transform_val = T.Compose([
    T.ToPILImage(),
    T.Resize(224),
    T.ToTensor(),
    T.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    )
])

In [34]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from torch.utils.data import DataLoader


def create_datasets(
    dataset_dir: Path,
    classes_filepath: Path,
    batch_size: int = 64,
    test_size: float = 0.2,
    shuffle: bool = False,
    limit: tp.Optional[int] = None
) -> tp.Tuple[DataLoader, DataLoader, MultiLabelBinarizer]:
    df = pd.read_csv(classes_filepath)
    
    if limit is not None:
        df = df.head(limit)
    
    df.tags = np.char.split(df.tags.values.astype(str))
    
    df_train, df_val = train_test_split(df, test_size=test_size, shuffle=shuffle)

    encoder = MultiLabelBinarizer()
    tags_train = encoder.fit_transform(df_train.tags)
    tags_val = encoder.transform(df_val.tags)
    
    dataset_train = AmazonDataset(dataset_dir, df_train.image_name.to_numpy(), tags_train, transform_train)
    dataset_val = AmazonDataset(dataset_dir, df_val.image_name.to_numpy(), tags_val, transform_val)

    dataloader_train = DataLoader(
      dataset_train,
      batch_size=batch_size,
      shuffle=True,
    )

    dataloader_val = DataLoader(
      dataset_val,
      batch_size=batch_size,
      shuffle=True,
    )

    return dataloader_train, dataloader_val, encoder

In [35]:
dataloader_train, dataloader_val, encoder = create_datasets(TRAIN_SAMPLES_DIR, TRAIN_LABELS_FILE)

In [36]:
print(f'Training set: {len(dataloader_train)}, Validation set: {len(dataloader_val)}')

Training set: 506, Validation set: 127


For optimal performances, resnet18 need input shape that are multiple of 32 and in our case we have input of size 256. From 256, the closest multiple of 32 is 224.

Therefore, we rescale our input data using this multiple, and we also normalize our dataset based on resnet pretrained mean and standard deviation intensity values. ToTensor() is useful to normalize our image values from 0-255 range to 0-1 range.

In [37]:
from torch import nn
from torchvision import models

class ResNet(nn.Module):
    def __init__(self, freeze: bool = True, dropout: float = 0.2):
        super().__init__()

        self.resnet18 = models.resnet18(pretrained=True)
        for parameter in self.resnet18.parameters():
            parameter.require_grad = not freeze
        
        self.resnet18.avgpool = nn.AdaptiveAvgPool2d(output_size=(1, 1))

        self.resnet18.fc = nn.Sequential(
          nn.Flatten(),
          nn.Linear(512, 128), # 512 for resnet18 or 2048 for resnet 50
          nn.ReLU(inplace=True),
          nn.Dropout(dropout),
          nn.Linear(128, 17)
        )

    def forward(self, batch: torch.Tensor) -> torch.Tensor:
        return self.resnet18(batch)

    @classmethod
    def from_device(cls, *args, device_id="cpu", **kwargs) -> tp.Tuple["ResNet", torch.device]:
        # We firstly initialize an instance of our model
        model = cls(*args, **kwargs)

        # If the cuda backend is available then change the device type to GPU
        if torch.cuda.is_available():
            device_id = "cuda:0"
            # Given that there are multiple GPUs available wrap the model
            # in `nn.DataParallel` in order to take advantage of them
            if torch.cuda.device_count() > 1:
                model = nn.DataParallel(model)

        # Retrieve the `torch.device` corresponding to `device_id`
        # and transfer the model to it
        device = torch.device(device_id)

        return model.to(device), device

    @classmethod
    def from_file(cls, filename: Path, *args, device_id: str = "cpu", **kwargs) -> tp.Tuple["ResNet", torch.device]:
        # Firstly initialize the model and retrieve the device wherein it is located
        model, device = cls.from_device(*args, device_id=device_id, **kwargs)

        # Load the model state from the supplied file
        # and dynamically remap it to the device at hand using the `map_location`
        model.load_state_dict(torch.load(filename, map_location=device))

        return model, device

In [38]:
from datetime import datetime

class EarlyStoppingStrategy(object):
    def __init__(self, 
        tolerance: int = 5,
        min_delta: float = 0,
        checkpoint_dir: tp.Optional[Path] = None,
    ):
        self.tolerance = tolerance
        self.min_delta = min_delta
        self.checkpoint_dir = checkpoint_dir

        self.best_validation_loss = float('inf')
        self.counter = 0

    def __call__(self, validation_loss, model):
        best_validation_loss = self.best_validation_loss
        self.best_validation_loss = min(self.best_validation_loss, validation_loss)

        if best_validation_loss - validation_loss < self.min_delta:
            # if validation loss value at hand is not at least `min_delta`
            # smaller than the so far smallest validation loss
            # increment the tolerance counter by 1
            # If the counter exceeds the specified tolerance level we should
            # halt the training procedure
            self.counter += 1
            if self.counter > self.tolerance:
                return True
        else:
            # Otherwise (meaning the validation loss has decreased considerably)
            # reset the tolerance counter and persist the model state
            self.counter = 0

            if self.checkpoint_dir is not None:
                filename = f'{model.__class__.__name__}_{datetime.now().strftime("%d_%m_%Y_%H_%M_%S_%f")}.pkl'

                torch.save(model.state_dict(), self.checkpoint_dir / filename)

        return False

In [191]:
import time

from torch import nn, optim
from tqdm.notebook import tqdm

def train(
    model: nn.Module,
    device: torch.device,
    train_loader: DataLoader,
    validation_loader: DataLoader,
    early_stopping_strategy: tp.Optional[EarlyStoppingStrategy] = None,
    n_epochs: int = 100,
    weight_decay: float = 0.0,
    lr=0.001,
    eps=1e-08,
) -> tp.Tuple[int, ...]:
    # Define a method to retrieve the tqdm progress bar postfix data
    batch_index, train_losses, validation_losses = 0, [0], [0]
    def get_postfix():
        return {
            'train': f'{train_losses[-1]:.3f}',
            'validation': f'{validation_losses[-1]:.3f}',
            'batch': f'{batch_index:02d} / {len(train_loader):02d}'
        }

    if early_stopping_strategy is not None:
        def get_postfix():
            return {
                'train': f'{train_losses[-1]:.3f}',
                'validation': f'{validation_losses[-1]:.3f}',
                'tolerance': f'{early_stopping_strategy.counter}/{early_stopping_strategy.tolerance}',
                'batch': f'{batch_index:02d} / {len(train_loader):02d}'
            }

    criterion = nn.BCEWithLogitsLoss()
    optimizer = optim.Adam(model.parameters(), weight_decay=weight_decay, lr=lr, eps=eps)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

    epochs_progress_bar = tqdm(range(n_epochs), desc='epochs', position=0)

    timestamp = time.time()
    for _ in epochs_progress_bar:
        # We set the model to training mode
        model.train()

        train_loss = 0
        for batch_index, (batch_X, batch_y) in enumerate(train_loader):
            # Transfer the data to the available device backend (CPU/GPU)
            batch_X, batch_y, = batch_X.to(device), batch_y.to(device)

            optimizer.zero_grad()

            output = model(batch_X)
            loss = criterion(output, batch_y)
            loss.backward()

            optimizer.step()

            train_loss += loss.item() / len(train_loader)

            epochs_progress_bar.set_postfix(**get_postfix())

        train_losses.append(train_loss)

        epochs_progress_bar.set_postfix(**get_postfix())

        scheduler.step()
        
        # We set the model to evaluation mode
        model.eval()
        with torch.no_grad():
            # Turn of gradient calculation
            validation_loss = 0
            for batch_X, batch_y in validation_loader:
                batch_X, batch_y, = batch_X.to(device), batch_y.to(device)

                output = model(batch_X)
                loss = criterion(output, batch_y)

                validation_loss += loss.item() / len(validation_loader)

            validation_losses.append(validation_loss)

            epochs_progress_bar.set_postfix(**get_postfix())
            
        # Invoke the early stopping strategy with the current validation loss
        # in order to increment/reset its internal tolerance counter
        # and determine whether or not to halt the training process
        if early_stopping_strategy is not None:
            if early_stopping_strategy(validation_losses[-1], model):
                break
    
    return time.time() - timestamp, np.array(train_losses[1:]), np.array(validation_losses[1:])

In [40]:
def get_most_recent_checkpoint(model_weights_directory: Path, prefix: str = 'ResNet', suffix: str = 'pkl') -> ResNet:
    most_recent_checkpoint, most_recent_timestamp = None, datetime.min
    for file in model_weights_directory.glob(f'**/*.{suffix}'):
        filename = file.name
        timestamp = datetime.strptime(filename, f"{prefix}_%d_%m_%Y_%H_%M_%S_%f.{suffix}")
        
        if timestamp > most_recent_timestamp:
            most_recent_checkpoint, most_recent_timestamp = filename, timestamp
    
    return model_weights_directory / most_recent_checkpoint

In [41]:
import plotly.graph_objects as go

def learning_curves(
    train_losses,
    validation_losses,
    title: str = 'Loss per Epoch',
    label_x: str = 'Epochs',
    label_y: str = 'Loss',
) -> None:
    epochs = np.arange(max(len(train_losses), len(validation_losses)))

    go.Figure(data=[
            go.Scatter(name='Training', x=epochs, y=train_losses, mode='lines'),
            go.Scatter(name='Validation', x=epochs, y=validation_losses, mode='lines'),
    ]).update_layout(title=title, xaxis_title=label_x, yaxis_title=label_y).show()

In [42]:
most_recent_checkpoint = get_most_recent_checkpoint(MODEL_WEIGHTS_DIR)

most_recent_checkpoint = None # Comment out if you wanna load a checkpoint
if most_recent_checkpoint is None:
    checkpoint_dir = Path.cwd() / 'models'
    checkpoint_dir.mkdir(parents=True, exist_ok=True)

    early_stopping_strategy = EarlyStoppingStrategy(
        tolerance=5,
        min_delta=0.001,
        checkpoint_dir=checkpoint_dir
    )

    model, device = ResNet.from_device()

    _, train_losses, validation_losses = train(
        model, device,
        dataloader_train, dataloader_val,
        early_stopping_strategy=early_stopping_strategy, n_epochs=20, lr=1e-4
    )
    
    learning_curves(train_losses, validation_losses)
else:
    model, device = ResNet.from_file(most_recent_checkpoint)

epochs:   0%|          | 0/20 [00:00<?, ?it/s]

## Calibrating per-class thresholds

In [244]:
def predict(
    model: nn.Module,
    device: torch.device,
    data_loader: DataLoader,
) -> tp.Tuple[ntp.NDArray[np.int_], ntp.NDArray[np.int_]]:
    model.eval()

    y_true, logits  = [], []
    with torch.no_grad():
        iterator = iter(data_loader)
        for _ in tqdm(range(len(data_loader))):
            try:
                batch_X, batch_y = next(iterator)
            except:
                continue

            batch_X, batch_y = batch_X.to(device), batch_y.to(device)

            output = model(batch_X)

            logits.extend(output.detach().cpu().numpy())
            y_true.extend(batch_y.detach().cpu().numpy())

    return np.vstack(y_true), 1/(1 + np.exp(-np.vstack(logits)))

In [245]:
import itertools
from sklearn.metrics import f1_score, fbeta_score, accuracy_score, precision_score, recall_score
from functools import partial

def get_scorers() -> tp.List[tp.Tuple[str, tp.Callable[[ntp.NDArray[np.int_], ntp.NDArray[np.int_]], np.float_]]]:
    return [
        ('F1 (micro)', partial(f1_score, average='micro', zero_division=0)),
        ('F1 (macro)', partial(f1_score, average='macro', zero_division=0)),
        ('F1 (samples)', partial(f1_score, average='samples', zero_division=0)),
        ('F2 (micro)', partial(fbeta_score, beta=2, average='micro', zero_division=0)),
        ('F2 (macro)', partial(fbeta_score, beta=2, average='macro', zero_division=0)),
        ('F2 (samples)', partial(fbeta_score, beta=2, average='samples', zero_division=0)),
        ('Accuracy', accuracy_score),
        ('Precision', partial(precision_score, average='macro', zero_division=0)),
        ('Recall', partial(recall_score, average='macro', zero_division=0)),
    ]

def evaluate(y_true: ntp.NDArray[np.int_], logits: ntp.NDArray[np.float_]) -> pd.DataFrame:
    scorers, data = get_scorers(), {}
    for i in tqdm(range(2500)):
        thresholds = np.random.uniform(low=0.0, high=0.5, size=17)
        
        y_pred = (logits > thresholds).astype(int)

        scores = []
        for _, scorer in scorers:
            scores.append(scorer(y_true, y_pred))

        data[i] = scores + thresholds.tolist()

    return pd.DataFrame.from_dict(data, columns=[name for name, _ in scorers] + [f'Thresh #{i + 1}' for i in range(17)], orient='index')

In [246]:
from plotly.subplots import make_subplots
from sklearn.metrics import confusion_matrix

def confusion_matrices(y_true: ntp.NDArray[np.int_], logits: ntp.NDArray[np.float_], thresholds: ntp.NDArray[np.float_]):
    fig = make_subplots(cols=5, rows=4)

    for jdx in range(y_true.shape[1]):
        y_val = y_true[:, jdx].ravel()
        y_hat_val = logits[:, jdx].ravel() > thresholds[jdx]
        mat = confusion_matrix(y_val, y_hat_val, normalize='all').ravel()
        try:
            tn, fp, fn, tp = mat * 100
        except ValueError:
            tn, fp, fn, tp = mat[0] * 100, 0, 0, 0

        mat = np.array([[fn, tn], [tp, fp]])
        col = jdx // 4+1
        row = jdx % 4+1
        fig.add_trace(
            go.Heatmap(
                z=mat, text=[[f"fn: {fn:.2f}%", f"tn: {tn:.2f}%"], [f"tp: {tp:.2f}%", f"fp: {fp:.2f}%"]], 
                texttemplate="%{text}", colorscale='Blues', name=encoder.classes_[jdx],
                showscale=False
            ),
            col=col, row=row, 
        )
        fig.update_xaxes(title=encoder.classes_[jdx].replace('_', ' ').title(), showticklabels=False, row=row, col=col)
        fig.update_yaxes(showticklabels=False, row=row, col=col)

    fig.update_layout(
        width=1200, height=800
    )
    fig.show()

In [247]:
from sklearn.metrics import classification_report

def create_report(y_true: ntp.NDArray[np.int_], logits: ntp.NDArray[np.float_], thresholds: ntp.NDArray[np.float_], filepath: tp.Optional[Path] = None):
    report = pd.DataFrame.from_dict(classification_report(y_true, logits > thresholds, output_dict=True))
    
    if filepath is not None:
        report.to_csv(filepath, index=False)
    
    return report

In [248]:
y_val, logits_val = predict(model, device, dataloader_val)

  0%|          | 0/127 [00:00<?, ?it/s]

In [250]:
results_val = evaluate(y_val, logits_val)

  0%|          | 0/2500 [00:00<?, ?it/s]

In [267]:
results_val.sort_values(by='F1 (samples)', ascending=False).head()

Unnamed: 0,F1 (micro),F1 (macro),F1 (samples),F2 (micro),F2 (macro),F2 (samples),Accuracy,Precision,Recall,Thresh #1,...,Thresh #8,Thresh #9,Thresh #10,Thresh #11,Thresh #12,Thresh #13,Thresh #14,Thresh #15,Thresh #16,Thresh #17
2219,0.898369,0.623869,0.910492,0.904825,0.629831,0.918708,0.623394,0.64653,0.637467,0.316926,...,0.064759,0.449993,0.286008,0.314033,0.119246,0.359716,0.371304,0.094508,0.304241,0.44431
2390,0.898264,0.620522,0.910343,0.908882,0.616503,0.921138,0.619442,0.649522,0.616655,0.263325,...,0.383143,0.446828,0.270974,0.357729,0.140666,0.339472,0.293577,0.361706,0.305165,0.281181
502,0.896443,0.62525,0.910083,0.907983,0.638554,0.921925,0.614377,0.607537,0.64866,0.338462,...,0.189026,0.440646,0.254979,0.44262,0.197812,0.412533,0.249795,0.198493,0.485975,0.352592
2291,0.897809,0.6166,0.909332,0.900732,0.615044,0.913699,0.634881,0.650276,0.618096,0.344673,...,0.051448,0.459204,0.428282,0.291222,0.397914,0.499477,0.219491,0.442476,0.312321,0.446771
2380,0.897673,0.612376,0.909248,0.902241,0.610082,0.915876,0.625865,0.646115,0.610723,0.45981,...,0.243519,0.331911,0.292318,0.252769,0.421332,0.49641,0.320304,0.187957,0.400153,0.494042


In [253]:
thresholds = results[results['F1 (samples)'] == results['F1 (samples)'].max()].iloc[:, 9:].values.flatten()

In [255]:
confusion_matrices(y_val, logits_val, thresholds)

In [268]:
create_report(y_val, logits_val, thresholds, 'metrics_val.csv')


Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,11,12,13,14,15,16,micro avg,macro avg,weighted avg,samples avg
precision,0.675028,0.730337,0.518182,0.4,0.0,0.82731,0.846547,0.391304,0.402083,0.803787,...,0.85044,0.955197,0.353452,0.383562,0.0,0.284271,0.652184,0.515484,0.75159,0.72377
recall,0.974256,0.942029,0.290816,0.121212,0.0,0.998944,0.8509,0.529412,0.844639,0.654979,...,0.967957,0.998806,0.994984,0.388889,0.0,0.996528,0.960028,0.676984,0.960028,0.965919
f1-score,0.797498,0.822785,0.372549,0.186047,0.0,0.905062,0.848718,0.45,0.544813,0.721793,...,0.905401,0.976515,0.521611,0.386207,0.0,0.442355,0.776715,0.552036,0.820697,0.806973
support,2486.0,69.0,196.0,66.0,20.0,5683.0,389.0,17.0,914.0,713.0,...,1498.0,7535.0,1595.0,72.0,47.0,1440.0,23266.0,23266.0,23266.0,23266.0


## Evaluating our ResNet architecture

In [257]:
from torch.utils.data import ConcatDataset

def create_test_dataset(
    dataset_dir: Path,
    dataset_dir_additional: Path,
    classes_filepath: Path,
    encoder: MultiLabelBinarizer,
    batch_size: int = 32,
) -> DataLoader:
    df = pd.read_csv(classes_filepath)
    df.tags = np.char.split(df.tags.values.astype(str))

    tags = encoder.transform(df.tags)
    
    test_dataset = AmazonDataset(dataset_dir, df.image_name.to_numpy(), tags, transform_val)
    test_dataset_additional = AmazonDataset(dataset_dir_additional, df.image_name.to_numpy(), tags, transform_val)

    dataloader = DataLoader(
      ConcatDataset([test_dataset, test_dataset_additional]),
      batch_size=batch_size
    )

    return dataloader

In [258]:
dataloader_test = create_test_dataset(TEST_SAMPLES_DIR, TEST_SAMPLES_DIR_ADDITIONAL, TEST_LABELS_FILE, encoder)

In [259]:
y_test, logits_test = predict(model, device, dataloader_test)

  0%|          | 0/3825 [00:00<?, ?it/s]

In [262]:
confusion_matrices(y_test, logits_test, thresholds)

In [269]:
create_report(y_test, logits_test, thresholds, 'metrics_test.csv')


Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,11,12,13,14,15,16,micro avg,macro avg,weighted avg,samples avg
precision,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,0.0,1.0,0.813476,0.294118,1.0,0.829845
recall,0.439469,0.0,0.0,0.0,0.0,0.846752,0.0,0.0,0.0,0.0,...,0.0,0.970595,0.546235,0.0,0.0,0.612968,0.683204,0.200942,0.683204,0.683204
f1-score,0.610598,0.0,0.0,0.0,0.0,0.917018,0.0,0.0,0.0,0.0,...,0.0,0.985078,0.706536,0.0,0.0,0.760049,0.74267,0.234075,0.795856,0.709419
support,40640.0,0.0,0.0,0.0,0.0,40640.0,0.0,0.0,0.0,0.0,...,0.0,40640.0,40640.0,0.0,0.0,40640.0,203200.0,203200.0,203200.0,203200.0
