# <center>CIFAR-100</center>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Librairies-Import" data-toc-modified-id="Librairies-Import-1">Librairies Import</a></span></li><li><span><a href="#Dataset-Loading" data-toc-modified-id="Dataset-Loading-2">Dataset Loading</a></span></li><li><span><a href="#Normalization" data-toc-modified-id="Normalization-3">Normalization</a></span></li><li><span><a href="#DataModule" data-toc-modified-id="DataModule-4">DataModule</a></span></li><li><span><a href="#Model" data-toc-modified-id="Model-5">Model</a></span></li><li><span><a href="#Lightning-Pipeline" data-toc-modified-id="Lightning-Pipeline-6">Lightning Pipeline</a></span></li><li><span><a href="#Model-Training-and-Evaluation" data-toc-modified-id="Model-Training-and-Evaluation-7">Model Training and Evaluation</a></span></li></ul></div>

## Introduction

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. There are 600 images per class. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 500 training images and 100 testing images per class.

In this notebook, we attempt to train a convolutional neural network to predict the "fine" label of the CIFAR-100 dataset images.

All the implementation is done using [`pytorch-lightning`](https://www.pytorchlightning.ai), a powerful and customizable PyTorch framework. All parameters and results, as well as a few samples of the train images are logged to Tensorboard.

## Librairies Import

In [1]:
import os
import warnings
import csv
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data.sampler import SubsetRandomSampler
from torch.optim.lr_scheduler import OneCycleLR, ReduceLROnPlateau, MultiStepLR
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torchvision.models import resnet18
from torchvision.utils import make_grid
from torchmetrics import ConfusionMatrix
from torchmetrics.functional import accuracy
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint, LearningRateMonitor, EarlyStopping
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_lightning import LightningModule, LightningDataModule

## Dataset Loading

We start by downloading the train and test sets of CIFAR-100.

In [2]:
train_set = datasets.CIFAR100(root="./data", download=True, train=True)
test_set = datasets.CIFAR100(root="./data", download=True, train=False)

Files already downloaded and verified
Files already downloaded and verified


We then calculate the number of images by class for each set.

In [3]:
train_classes_counts = {}

for image in train_set:
    label = train_set.classes[image[1]]
    if label not in train_classes_counts:
        train_classes_counts[label] = 1
    else:
        train_classes_counts[label] += 1

train_classes_counts

{'cattle': 500,
 'dinosaur': 500,
 'apple': 500,
 'boy': 500,
 'aquarium_fish': 500,
 'telephone': 500,
 'train': 500,
 'cup': 500,
 'cloud': 500,
 'elephant': 500,
 'keyboard': 500,
 'willow_tree': 500,
 'sunflower': 500,
 'castle': 500,
 'sea': 500,
 'bicycle': 500,
 'wolf': 500,
 'squirrel': 500,
 'shrew': 500,
 'pine_tree': 500,
 'rose': 500,
 'television': 500,
 'table': 500,
 'possum': 500,
 'oak_tree': 500,
 'leopard': 500,
 'maple_tree': 500,
 'rabbit': 500,
 'chimpanzee': 500,
 'clock': 500,
 'streetcar': 500,
 'cockroach': 500,
 'snake': 500,
 'lobster': 500,
 'mountain': 500,
 'palm_tree': 500,
 'skyscraper': 500,
 'tractor': 500,
 'shark': 500,
 'butterfly': 500,
 'bottle': 500,
 'bee': 500,
 'chair': 500,
 'woman': 500,
 'hamster': 500,
 'otter': 500,
 'seal': 500,
 'lion': 500,
 'mushroom': 500,
 'girl': 500,
 'sweet_pepper': 500,
 'forest': 500,
 'crocodile': 500,
 'orange': 500,
 'tulip': 500,
 'mouse': 500,
 'camel': 500,
 'caterpillar': 500,
 'man': 500,
 'skunk': 500

In [4]:
test_classes_counts = {}

for image in test_set:
    label = test_set.classes[image[1]]
    if label not in test_classes_counts:
        test_classes_counts[label] = 1
    else:
        test_classes_counts[label] += 1
        
test_classes_counts

{'mountain': 100,
 'forest': 100,
 'seal': 100,
 'mushroom': 100,
 'sea': 100,
 'tulip': 100,
 'camel': 100,
 'butterfly': 100,
 'cloud': 100,
 'apple': 100,
 'skunk': 100,
 'streetcar': 100,
 'rocket': 100,
 'lamp': 100,
 'lion': 100,
 'wolf': 100,
 'rose': 100,
 'orange': 100,
 'dinosaur': 100,
 'chimpanzee': 100,
 'can': 100,
 'keyboard': 100,
 'bicycle': 100,
 'chair': 100,
 'plate': 100,
 'lawn_mower': 100,
 'turtle': 100,
 'palm_tree': 100,
 'shark': 100,
 'pickup_truck': 100,
 'boy': 100,
 'couch': 100,
 'house': 100,
 'porcupine': 100,
 'cockroach': 100,
 'clock': 100,
 'castle': 100,
 'beaver': 100,
 'bee': 100,
 'bottle': 100,
 'pear': 100,
 'baby': 100,
 'flatfish': 100,
 'oak_tree': 100,
 'leopard': 100,
 'snail': 100,
 'crocodile': 100,
 'rabbit': 100,
 'beetle': 100,
 'girl': 100,
 'sunflower': 100,
 'raccoon': 100,
 'train': 100,
 'ray': 100,
 'trout': 100,
 'bowl': 100,
 'snake': 100,
 'orchid': 100,
 'tractor': 100,
 'caterpillar': 100,
 'bus': 100,
 'mouse': 100,
 'cr

The train and test sets both feature the 100 classes to predict and are well-balanced, each class having 500 images in the train set and 100 images in the test set.

We decide to further split the train set in a train and a validation set. We will extract 10% of the train set images for that. This is done in the `CIFAR100DataModule`.

## Normalization

We create a function to compute the normalization statistics for our dataset. These statistics are computed on the train set and applied to the train, validation and test sets.
The function is called in the `setup` method of the `CIFAR100DataModule`.

In [5]:
def compute_normstats(train_set):
    """
    Function to compute the normalization statistics on the train set.
    
    Takes in: train_set

    Returns: (red channel mean, green channel mean, blue channel mean), (red channel std, green channel std, blue channel std)
    """
    red_channels = torch.stack([train_set[i][0][0, :, :] for i in range(len(train_set))], dim=0)
    green_channels = torch.stack([train_set[i][0][1, :, :] for i in range(len(train_set))], dim=0)
    blue_channels = torch.stack([train_set[i][0][2, :, :] for i in range(len(train_set))], dim=0)
    train_set_mean = (red_channels.mean().item(), green_channels.mean().item(), blue_channels.mean().item())
    train_set_std = (red_channels.std().item(), green_channels.std().item(), blue_channels.std().item())
    return train_set_mean, train_set_std

## LightningDataModule

Next, we create the DataModule. This class implements several methods to:
- download the dataset (if not already done)
- create the train, validation and test sets and apply all the necessary transforms (including data augmentation)
- create the dataloaders

In the `setup` method, we split the train set in a train and a validation set. This is done by picking 10% of the train images indices at random, after they are being shuffled. A seed is set prior to it and we ensured that it gave us a well-enough balanced validation set featuring all the 100 classes.

In [6]:
class CIFAR100DataModule(LightningDataModule):
    
    def __init__(self, batch_size=64, num_workers=2):
        super().__init__()
        self.batch_size = batch_size
        self.num_workers = num_workers
    
    
    def prepare_data(self):
        # Download the train and test sets
        datasets.CIFAR100(root="./data", download=True, train=True)
        datasets.CIFAR100(root="./data", download=True, train=False)

        
    def setup(self, stage=None):
        # Load the train set
        train_set = datasets.CIFAR100(root="./data", train=True, transform=transforms.ToTensor())
        # Compute the normalization statistics on the train set
        train_set_mean, train_set_std = compute_normstats(train_set)
        
        # Create the transforms for all sets
        train_set_transforms = transforms.Compose([transforms.RandomCrop(32, padding=4),
                                            transforms.RandomHorizontalFlip(p=0.5),
                                            transforms.RandomRotation(degrees=15),
                                            transforms.ToTensor(),
                                            transforms.Normalize(train_set_mean, train_set_std, inplace=True)])
        validation_set_transforms = transforms.Compose([transforms.ToTensor(),
                                            transforms.Normalize(train_set_mean, train_set_std, inplace=True)])                                            
        test_set_transforms = transforms.Compose([transforms.ToTensor(),
                                            transforms.Normalize(train_set_mean, train_set_std, inplace=True)])

        
        # Get the train set images indices and shuffle them
        train_set_length = len(train_set)
        indices = list(range(train_set_length))
        np.random.seed(42)
        np.random.shuffle(indices)

        # Calculate the split point to have 10% of the train set as a validation set
        split = int(np.floor(0.9 * train_set_length))

        # Create a sampler for the train set (used in train_dataloader)
        self.train_sampler = SubsetRandomSampler(indices[:split])

        # Get the indices for the validation set (used in val_dataloader)
        self.validation_indices = indices[split:]

        # Create the train, validation and test sets (here, the train set is reloaded a second time but with the appropriate transforms)
        self.cifar100_train = datasets.CIFAR100(root="./data", train=True, transform=train_set_transforms)
        self.cifar100_validation = datasets.CIFAR100(root="./data", train=True, transform=validation_set_transforms)
        self.cifar100_test = datasets.CIFAR100(root="./data", train=False, transform=test_set_transforms)

        # Retrieve classes from the train set
        self.classes = self.cifar100_train.classes
        
        
    def train_dataloader(self):
        cifar100_train = DataLoader(self.cifar100_train, batch_size=self.batch_size, sampler=self.train_sampler, num_workers=self.num_workers)
        return cifar100_train

    
    def val_dataloader(self):
        cifar100_validation = DataLoader(self.cifar100_validation, batch_size=self.batch_size, sampler=self.validation_indices, num_workers=self.num_workers)
        return cifar100_validation

    
    def test_dataloader(self):
        cifar100_test = DataLoader(self.cifar100_test, batch_size=self.batch_size, shuffle=False, num_workers=self.num_workers)
        return cifar100_test

## Model

Then, we define a function to create the model. Here, we take the `resnet18` architecture but not pre-trained.
We adapt it to our classification problem by modifying the first convolutional layer and the maxpool layer.

In [7]:
def create_model():
    """
    Function to create the model.
    
    Takes in: -

    Returns: model
    """
    model = resnet18(pretrained=False, num_classes=100)
    model.conv1 = nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    model.maxpool = nn.Identity()
    return model

## LightningModule

The Lightning module is where everything happens. It implements the following methods:
- `configure_optimizers`
- `forward`
- `training_step`
- `training_epoch_end`
- `validation_step`
- `validation_epoch_end`
- `test_step`
- `test_epoch_end`
- `on_save_checkpoint`

In [8]:
class CIFAR100ResNet(LightningModule):
    
    def __init__(self, learning_rate, batch_size):
        super().__init__()      
        # Save hyperparameters to the checkpoint
        self.save_hyperparameters()     
        # Creation of the model
        self.model = create_model()
        # Instantiation of the confusion matrix
        self.confmat = ConfusionMatrix(num_classes=100)
        # Instantiation of the number of classes
        self.n_classes = 100 
        # Instantiation of the learning rate
        self.learning_rate = learning_rate
        # Instantiation of the batch_size
        self.batch_size = batch_size
        
        
    def configure_optimizers(self): 
        optimizer = torch.optim.SGD(
            self.parameters(),
            lr=self.hparams.learning_rate,
            momentum=0.9,
            weight_decay=5e-4,
        )
        #scheduler_dict = {
            #"scheduler": MultiStepLR(
                #optimizer,
                #milestones=[60,120,160],
                #gamma=0.2
            #),
            #"interval": "epoch"
        #}        
        scheduler_dict = {
            "scheduler": ReduceLROnPlateau(
                optimizer,
                mode="min",
                factor=0.2,
                patience=10
                ),
            "interval": "epoch",
            "frequency": 1,
            "monitor": "validation_loss"
        }        
        #steps_per_epoch = int(np.ceil(45000 / self.batch_size))
        #scheduler_dict = {
            #"scheduler": OneCycleLR(
            #    optimizer,
            #    max_lr=0.1,
            #    epochs=self.trainer.max_epochs,
            #    steps_per_epoch=steps_per_epoch
            #),
            #"interval": "step"
        #}
        return {"optimizer": optimizer, "lr_scheduler": scheduler_dict}
    
    
    def forward(self, x):
        logits = self.model(x)
        return logits

    
    def training_step(self, batch, batch_idx):
        inputs, targets = batch
        logits = self(inputs)
        loss = F.cross_entropy(logits, targets)
        predictions = torch.argmax(logits, dim=1)
        self.log("train_loss", loss, on_epoch=True, prog_bar=True)
        return {"inputs":inputs, "targets":targets, "predictions":predictions, "loss":loss}    

    
    def training_epoch_end(self, outputs):
        # Log weights and biases for all layers of the model
        for name, params in self.named_parameters():
            self.logger.experiment.add_histogram(name, params,self.current_epoch)
        # Only after the first training epoch, log some of the training images and the model graph
        if self.current_epoch == 0:
            image_samples = outputs[0]["inputs"][:10]
            image_samples = image_samples.cpu()
            image_samples_grid = make_grid(image_samples, normalize=True)
            image_samples_grid = image_samples_grid.numpy()
            fig = plt.figure()
            ax = fig.add_subplot(111)
            ax.imshow(np.transpose(image_samples_grid, (1, 2, 0)))
            self.logger.experiment.add_figure(f"Training sample normalized images", fig)
            input_sample = outputs[0]["inputs"][0]
            input_sample = torch.unsqueeze(input_sample, 3)
            input_sample = torch.permute(input_sample, (3,0,1,2))
            self.logger.experiment.add_graph(self, input_sample)

            
    def validation_step(self, batch, batch_idx):
        inputs, targets = batch
        logits = self(inputs)
        loss = F.cross_entropy(logits, targets)
        predictions = torch.argmax(logits, dim=1)
        acc = accuracy(predictions, targets)
        self.log(f"validation_loss", loss, on_epoch=True, prog_bar=True)
        self.log(f"validation_acc", acc, on_epoch=True, prog_bar=True)
        return {"inputs":inputs, "targets":targets, "predictions":predictions, "loss":loss} 

    
    def validation_epoch_end(self, outputs):
        # Concatenate the targets of all batches
        targets = torch.cat([output["targets"] for output in outputs])
        # Concatenate the predictions of all batches
        predictions = torch.cat([output["predictions"] for output in outputs])
        # Compute the confusion matrix
        cm = self.confmat(predictions, targets)
        # Send it to the CPU
        cm = cm.cpu()

                
    def test_step(self, batch, batch_idx):
        inputs, targets = batch
        logits = self(inputs)
        loss = F.cross_entropy(logits, targets)
        probabilities = F.softmax(logits, dim=1)
        predictions = torch.argmax(logits, dim=1)
        acc = accuracy(predictions, targets)
        self.log(f"test_loss", loss, prog_bar=True)
        self.log(f"test_acc", acc, prog_bar=True)
        return {"targets":targets, "predictions":predictions, "probabilities":probabilities}

    
    def test_epoch_end(self, outputs):
        targets = torch.cat([output["targets"] for output in outputs])
        predictions = torch.cat([output["predictions"] for output in outputs])
        probabilities = torch.cat([output["probabilities"] for output in outputs])
        # Compute the total prediction accuracy on the full test set
        acc = accuracy(predictions, targets)
        # Compute the confusion matrix
        cm = self.confmat(predictions, targets)
        # Send it to the CPU
        cm = cm.cpu()
        # Calculate the accuracy for each class
        classes_precisions = []
        for class_id in range(self.n_classes):
            precision = cm[class_id, class_id] / torch.sum(cm[:,class_id])            
            precision = round(precision.item()*100, 1)
            classes_precisions.append(precision)
        # Write the test set prediction performances to a csv file
        with open("test_set_predictions.csv", "w", newline="") as f:
            writer = csv.writer(f)
            writer.writerow(self.trainer.datamodule.classes)
            for _, image_probs in enumerate(probabilities.cpu().numpy()):
                writer.writerow(np.around(image_probs, decimals=2))      
        # Write the test set prediction performances to an output file
        with open("test_set_predictions.txt", "w") as f:
            f.write("==================================================\n")
            f.write("ACCURACY\n")
            f.write("==================================================\n")
            f.write("\n")            
            f.write(f"Total: {round(acc.item()*100, 1)}%\n")
            f.write("\n")
            f.write("Per Class:\n")
            f.write("Class - Accuracy (%)\n")
            for class_id in range(self.n_classes):
                f.write(f"{self.trainer.datamodule.classes[class_id]} - {classes_precisions[class_id]}\n")
            f.write("\n")
            f.write("\n")
            f.write("==================================================\n")
            f.write("PREDICTIONS DETAIL\n")
            f.write("==================================================\n")
            f.write("Image index - Target class - Predicted class\n")
            for i in range(len(targets)):
                f.write(f"{i} - {self.trainer.datamodule.classes[targets[i]]} - {self.trainer.datamodule.classes[predictions[i]]}\n")
    
    
    def on_save_checkpoint(self, checkpoint):
        # Get the state_dict from self.model to get rid of the "model." prefix
        checkpoint["state_dict"] = self.state_dict()

## Model Training and Evaluation

Finally, we set some of our hyper-parameters, instantiate the classes defined above as well as some callbacks (tensorboard logger, learning rate monitor, early stopping and checkpoint saving), and train and test our model.

In [9]:
# Filter harmless waarnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", ".*Your `val_dataloader` has `shuffle=True`.*")
warnings.filterwarnings("ignore", ".*Checkpoint directory.*")

# Print name of graphics card used
gpus = min(1, torch.cuda.device_count())

# Set number of workers (for dataloaders)
num_workers = int(os.cpu_count() / 3)
print(f"Number of workers used: {num_workers}")

# Set maximum number of epochs to train for
max_epochs = 200
print(f"Maximum number of epochs: {max_epochs}")

# Set the batch size
batch_size = 256
print(f"Batch size: {batch_size}")

# Set the initial learning rate
learning_rate = 0.1
print(f"Initial learning rate: {learning_rate}")    

# Instantiate the DataModule
dm = CIFAR100DataModule(batch_size=batch_size, num_workers=num_workers)

# Instantiate the logger
tensorboard_logger = TensorBoardLogger(save_dir="logs")

# Instantiate early stopping based on epoch validation loss
early_stopping = EarlyStopping("validation_loss", patience=20, verbose=True)

# Instantiate a learning rate monitor
lr_monitor = LearningRateMonitor(logging_interval='step')

# Instantiate a checkpoint callback
checkpoint = ModelCheckpoint(
                            dirpath=f"./checkpoints/",
                            filename="{epoch}-{validation_loss:.2f}",
                            verbose=True,
                            monitor="validation_loss",
                            save_last = False,
                            save_top_k=1,      
                            mode="min",
                            save_weights_only=True
                            )

# Instantiate the trainer
trainer = Trainer(
                gpus=gpus,
                max_epochs=max_epochs, 
                logger=tensorboard_logger,
                log_every_n_steps = 1,
                callbacks=[lr_monitor, early_stopping, checkpoint]
                ) 

# Instantiate the pipeline
pipeline = CIFAR100ResNet(learning_rate=learning_rate, batch_size=batch_size)  
    
# Fit the trainer on the training set
trainer.fit(pipeline, dm)

# Test on the test set
trainer.test(pipeline, dm)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs


Number of workers used: 4
Maximum number of epochs: 200
Batch size: 256
Initial learning rate: 0.1
Files already downloaded and verified
Files already downloaded and verified


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name    | Type            | Params
--------------------------------------------
0 | model   | ResNet          | 11.2 M
1 | confmat | ConfusionMatrix | 0     
--------------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.881    Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Metric validation_loss improved. New best score: 3.644
Epoch 0, global step 175: validation_loss reached 3.64440 (best 3.64440), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=0-validation_loss=3.64.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.414 >= min_delta = 0.0. New best score: 3.231
Epoch 1, global step 351: validation_loss reached 3.23067 (best 3.23067), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=1-validation_loss=3.23-v2.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.151 >= min_delta = 0.0. New best score: 3.080
Epoch 2, global step 527: validation_loss reached 3.07953 (best 3.07953), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=2-validation_loss=3.08.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.347 >= min_delta = 0.0. New best score: 2.733
Epoch 3, global step 703: validation_loss reached 2.73287 (best 2.73287), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=3-validation_loss=2.73.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.344 >= min_delta = 0.0. New best score: 2.389
Epoch 4, global step 879: validation_loss reached 2.38936 (best 2.38936), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=4-validation_loss=2.39.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 5, global step 1055: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.334 >= min_delta = 0.0. New best score: 2.055
Epoch 6, global step 1231: validation_loss reached 2.05528 (best 2.05528), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=6-validation_loss=2.06.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 7, global step 1407: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.116 >= min_delta = 0.0. New best score: 1.940
Epoch 8, global step 1583: validation_loss reached 1.93967 (best 1.93967), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=8-validation_loss=1.94.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.035 >= min_delta = 0.0. New best score: 1.905
Epoch 9, global step 1759: validation_loss reached 1.90460 (best 1.90460), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=9-validation_loss=1.90.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 10, global step 1935: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.016 >= min_delta = 0.0. New best score: 1.889
Epoch 11, global step 2111: validation_loss reached 1.88853 (best 1.88853), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=11-validation_loss=1.89.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.020 >= min_delta = 0.0. New best score: 1.869
Epoch 12, global step 2287: validation_loss reached 1.86890 (best 1.86890), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=12-validation_loss=1.87.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.146 >= min_delta = 0.0. New best score: 1.723
Epoch 13, global step 2463: validation_loss reached 1.72307 (best 1.72307), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=13-validation_loss=1.72.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 14, global step 2639: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 15, global step 2815: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 16, global step 2991: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 17, global step 3167: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 18, global step 3343: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.037 >= min_delta = 0.0. New best score: 1.686
Epoch 19, global step 3519: validation_loss reached 1.68650 (best 1.68650), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=19-validation_loss=1.69.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 20, global step 3695: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.145 >= min_delta = 0.0. New best score: 1.541
Epoch 21, global step 3871: validation_loss reached 1.54137 (best 1.54137), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=21-validation_loss=1.54.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 22, global step 4047: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 23, global step 4223: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 24, global step 4399: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 25, global step 4575: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 26, global step 4751: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 27, global step 4927: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 28, global step 5103: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 29, global step 5279: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 30, global step 5455: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.010 >= min_delta = 0.0. New best score: 1.532
Epoch 31, global step 5631: validation_loss reached 1.53167 (best 1.53167), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=31-validation_loss=1.53.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 32, global step 5807: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 33, global step 5983: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 34, global step 6159: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 35, global step 6335: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 36, global step 6511: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 37, global step 6687: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.046 >= min_delta = 0.0. New best score: 1.486
Epoch 38, global step 6863: validation_loss reached 1.48580 (best 1.48580), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=38-validation_loss=1.49.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 39, global step 7039: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 40, global step 7215: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 41, global step 7391: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 42, global step 7567: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 43, global step 7743: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 44, global step 7919: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 45, global step 8095: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 46, global step 8271: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 47, global step 8447: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 48, global step 8623: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 49, global step 8799: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric validation_loss improved by 0.412 >= min_delta = 0.0. New best score: 1.074
Epoch 50, global step 8975: validation_loss reached 1.07382 (best 1.07382), saving model to "C:\Users\APU\Projects\CIFAR-100\checkpoints\epoch=50-validation_loss=1.07.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 51, global step 9151: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 52, global step 9327: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 53, global step 9503: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 54, global step 9679: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 55, global step 9855: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 56, global step 10031: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 57, global step 10207: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 58, global step 10383: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 59, global step 10559: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 60, global step 10735: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 61, global step 10911: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 62, global step 11087: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 63, global step 11263: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 64, global step 11439: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 65, global step 11615: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 66, global step 11791: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 67, global step 11967: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 68, global step 12143: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 69, global step 12319: validation_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Monitored metric validation_loss did not improve in the last 20 records. Best score: 1.074. Signaling Trainer to stop.
Epoch 70, global step 12495: validation_loss was not in top 1
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.7348999977111816, 'test_loss': 1.0653947591781616}
--------------------------------------------------------------------------------


[{'test_loss': 1.0653947591781616, 'test_acc': 0.7348999977111816}]

The model's performance metrics and the evolution of its hyperparameters, as well as images samples from the train set, can be visualised on the tensorboard.