# ResNet-50 CNN: PyTorch & CIFAR-10

End-to-end programming tutorial including:

1. Dropout - combat _overfitting_ prevelant in ResNet-50 for CIFAR-10 dataset
1. Progress bar - training model
1. Learning rate scheduler - needed for deep learning architectures


#### References 
- [Dive into Deep Learning - ResNet reference](https://d2l.ai/chapter_convolutional-modern/resnet.html)

- [YouTube reference](https://www.youtube.com/watch?v=DkNIBBBvcPs)

Refer to ResNet research paper: __Deep Residual Learning for Image Recognition__ by Kaiming He et al.

In page 4, Figure 3. ResNet-34 architecture is shown. It can learn more complex and new features in these 2 layers. But it's also going to use the skip connection/identity mapping from it previously leanred/coputed. So the CNN can kind of choose what it wants to learn. Either a combination of what it has learned before (skip connection) and the new things it has learned using the 2 conv layers within a ResNet block. The argument here is that the CNN is going to learn new things but it's at least never going to forget what it learned before. So, in theory, it should never become worse as we increase the depth of the CNN. Hence, by increasing the depth of the CNN, it never worsens the performance.

In page 5, Table 1, different architecture specific details are specified. In this tutorial, ResNet-50/101 and 152 will be implemented. The first conv layer: kernel size = (7, 7), stride = 2, number of kernels = 64, padding = 3; Max pool: kernel size = (3, 3), stride =2 

ResNet-50 has Four ResNet layers. If we look at the first ResNet layer, it has a block here which is-
- 1x1 filter with 64 channels
- 3x3, 64
- 1x1, 256

it repeats/performs this block 3 times.

Same convolutions - none of them change the size of input. Stride is used to reduce the spatial output in each of the conv layers.

One more thing to note is that if we look at the input channels for the first ResNet layer/block, it is 64 and the number of channels at the end is 256. For ResNet layer/block 2, the number of input channels is 128 and the number of channels at the end is 512. For ResNet layer/block 3, the number of input channels is 256 and the number of channels at the end is 1024. The ResNet architecture follows the pattern that the output channel is going to be 4x the input channel for that particular ResNet layer/block.

In [1]:
# Specify GPU to be used-
%env CUDA_DEVICE_ORDER=PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES = 2

env: CUDA_DEVICE_ORDER=PCI_BUS_ID
env: CUDA_VISIBLE_DEVICES=2


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms


from tqdm import tqdm
from tqdm import trange
import matplotlib.pyplot as plt
import numpy as np
import os

In [3]:
# Device configuration-
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"currently available device: {device}")

currently available device: cuda


In [4]:
# Get number of GPUs-
print(f"Number of available GPUs = {torch.cuda.device_count()}")

Number of available GPUs = 1


In [5]:
# Check the current GPU-
print(f"Current GPU = {torch.cuda.current_device()}")

Current GPU = 0


In [6]:
# Get the name of the current GPU
print(f"Name of current GPU = {torch.cuda.get_device_name(torch.cuda.current_device())}")

Name of current GPU = Quadro M6000


In [7]:
# Is PyTorch using a GPU?
print(f"PyTorch using a GPU? {torch.cuda.is_available()}")

PyTorch using a GPU? True


In [8]:
print(f"PyTorch version: {torch.__version__}")

PyTorch version: 1.8.0


In [9]:
# Hyper-parameters-
num_epochs = 65
batch_size = 128
learning_rate = 0.01

In [10]:
print(f"number of epochs for training = {num_epochs} with default LR = {learning_rate}")

number of epochs for training = 65 with default LR = 0.01


In [11]:
# Define transformations for training and test sets-
transform_train = transforms.Compose(
    [
      transforms.RandomCrop(32, padding = 4),
      transforms.RandomHorizontalFlip(),
      transforms.ToTensor(),
      transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
     ]
     )

transform_test = transforms.Compose(
    [
      transforms.ToTensor(),
      transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
     ]
     )

In [12]:
# Load dataset-
train_dataset = torchvision.datasets.CIFAR10(
        root = './data', train = True,
        download = True, transform = transform_train
        )

test_dataset = torchvision.datasets.CIFAR10(
        root = './data', train = False,
        download = True, transform = transform_test
        )

Files already downloaded and verified
Files already downloaded and verified


In [13]:
print(f"len(train_dataset) = {len(train_dataset)} & len(test_dataset) = {len(test_dataset)}")

len(train_dataset) = 50000 & len(test_dataset) = 10000


In [14]:
# Create training and testing loaders-
train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size = batch_size,
        shuffle = True
        )

test_loader = torch.utils.data.DataLoader(
        test_dataset, batch_size = batch_size,
        shuffle = False
        )

In [15]:
print(f"len(train_loader) = {len(train_loader)} & len(test_loader) = {len(test_loader)}")

len(train_loader) = 391 & len(test_loader) = 79


In [16]:
# Sanity check-
len(train_dataset) / batch_size, len(test_dataset) / batch_size

(390.625, 78.125)

In [17]:
# Sanity check-
images, labels = next(iter(train_loader))

images.size(), labels.shape

(torch.Size([128, 3, 32, 32]), torch.Size([128]))

### Define _ResNet-50_ architecture:

#### Basic _ResNet_ block

Note that a _conv_ layer followed by a _batch norm_ layer should __not__ have _bias_ term for each filter/kernel as it's redundant. For the mathematical proof, refer [here](https://github.com/arjun-majumdar/CNN_Classifications/blob/master/BatchNorm_bias_cancellation.pdf). 

```identity_downsample``` is a conv layer which we might need to use depending on if we have changed the input size or if we have changed the number of channels. Hence, we need to adapt the identity so that we can use it later on when we have used a few conv layers. We use ```identity_downsample``` layer if we need to change the shape in some way.

In [18]:
class ResNet_block(nn.Module):
    def __init__(self, input_channels, output_channels, identity_downsample = None, stride = 1,
                dropout = 0.2):
        super(ResNet_block, self).__init__()

        # number of channels after a block is 4x of what it entered/was passed-
        self.expansion = 4

        self.conv1 = nn.Conv2d(
            in_channels = input_channels, out_channels = output_channels,
            kernel_size = 1, stride = 1,
            padding = 0, bias = False)
        self.bn1 = nn.BatchNorm2d(num_features = output_channels)
        
        self.dropout = nn.Dropout(p = dropout)
        
        self.conv2 = nn.Conv2d(
            in_channels = output_channels, out_channels = output_channels,
            kernel_size = 3, stride = stride,
            padding = 1, bias = False)
        self.bn2 = nn.BatchNorm2d(num_features = output_channels)
        
        self.conv3 = nn.Conv2d(
            in_channels = output_channels, out_channels = output_channels * self.expansion,
            kernel_size = 1, stride = 1,
            padding = 0, bias = False)
        self.bn3 = nn.BatchNorm2d(num_features = output_channels * self.expansion)
        
        self.relu = nn.ReLU()
        
        # A conv layer-
        self.identity_downsample = identity_downsample
        

    def forward(self, x):
        identity = x
        
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        x = self.conv3(x)
        x = self.bn3(x)
        
        if self.identity_downsample is not None:
            identity = self.identity_downsample(identity)
        
        x += identity
        x = self.relu(x)
        x = self.dropout(x)
        
        return x

    
    def initialize_weights(self):
        for m in self.modules():
            # print(m)
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight)

                '''
                # Do not initialize bias (due to batchnorm)-
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
                '''
            
            elif isinstance(m, nn.BatchNorm2d):
                # Standard initialization for batch normalization-
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

            elif isinstance(m, nn.Linear):
                nn.init.kaiming_normal_(m.weight)
                nn.init.constant_(m.bias, 0)

        

### ResNet-50/101/152 architecture is _slightly_ changed for CIFAR-10 dataset instead of ImageNet dataset.

In [19]:
class ResNet(nn.Module):
    '''
    layers - a Python3 list specifying the number of times to use 'ResNet_block'
    '''
    
    def __init__(self, ResNet_block, layers, image_channels = 3, num_classes = 10):
        super(ResNet, self).__init__()
        
        self.input_channels = 64
        
        '''
        # For ImageNet-
        self.conv1 = nn.Conv2d(
            in_channels = image_channels, out_channels = 64,
            kernel_size = 7, stride = 2,
            padding = 3, bias = False)
        '''
        self.conv1 = nn.Conv2d(
            in_channels = image_channels, out_channels = 64,
            kernel_size = 3, stride = 1,
            padding = 1, bias = False)
        self.bn1 = nn.BatchNorm2d(num_features = 64)
        self.relu = nn.ReLU()
        '''
        # For ImageNet-
        self.maxpool = nn.MaxPool2d(
            kernel_size = 3, stride = 2,
            padding = 1
            )
        '''
        
        # ResNet blocks-
        self.layer1 = self._make_layer(ResNet_block, layers[0], output_channels = 64, stride = 1)
        self.layer2 = self._make_layer(ResNet_block, layers[1], output_channels = 128, stride = 2)
        self.layer3 = self._make_layer(ResNet_block, layers[2], output_channels = 256, stride = 2)
        self.layer4 = self._make_layer(ResNet_block, layers[3], output_channels = 512, stride = 2)
        
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * 4, num_classes)
    
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        # x = self.maxpool(x)  # For ImageNet
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avgpool(x)
        # Reshape before passing to dense layer-
        x = x.reshape(x.shape[0], -1)
        x = self.fc(x)
        
        return x


    def _make_layer(self, ResNet_block, num_residual_blocks, output_channels, stride):
        identity_downsample = None
        layers = []
        
        '''
        We want to know when are we going to actually use/do an identity_downsample? When are we going to
        have the conv layer change the identity?
        1. Either we change the input size
        2.
        '''
        if stride != 1 or self.input_channels != output_channels * 4:
            identity_downsample = nn.Sequential(
                nn.Conv2d(
                    in_channels = self.input_channels, out_channels = 4 * output_channels,
                    kernel_size = 1, stride = stride,
                    bias = False),
                nn.BatchNorm2d(num_features = output_channels * 4)
                )
        
        # This is the layer that changes the number of channels-
        layers.append(ResNet_block(self.input_channels, output_channels, identity_downsample, stride))
        # After this first block, the number of channels is going to be changed
        
        self.input_channels = output_channels * 4       # 64 x 4 = 256
        # At the end of the first block, the output = 256
        
        for i in range(num_residual_blocks - 1):
            layers.append(ResNet_block(self.input_channels, output_channels, dropout = 0.2))
        
        
        return (nn.Sequential(*layers))
        # *layers unpacks the list so that PyTorch knows that each comes after the other
        


In [25]:
def ResNet50(img_channels, op_neurons = 1000):
    # Function to define ResNet-50 architecture
    return ResNet(ResNet_block, [3, 4, 6, 3], img_channels, op_neurons)

In [26]:
def ResNet101(img_channels, op_neurons = 1000):
    # Function to define ResNet-101 architecture
    return ResNet(ResNet_block, [3, 4, 23, 3], img_channels, op_neurons)

In [27]:
def ResNet152(img_channels, op_neurons = 1000):
    # Function to define ResNet-152 architecture
    return ResNet(ResNet_block, [3, 8, 36, 3], img_channels, op_neurons)

In [23]:
def test(model):
    # Three images of (32, 32, 3). Number of in_channels = 3-
    x = torch.randn(3, 3, 32, 32)
    
    y = model(x).to(device)
    print(f"Output.shape: {y.shape}")
    
    return None


In [28]:
# Initialize a ResNet-50 model-
model = ResNet50(img_channels = 3, op_neurons = 10)

In [29]:
# Sanity check-
model(images).shape

torch.Size([128, 10])

In [30]:
# Sanity check-
test(model)

Output.shape: torch.Size([3, 10])


In [31]:
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (layer1): Sequential(
    (0): ResNet_block(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (dropout): Dropout(p=0.2, inplace=False)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU()
      (identity_downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-0

In [32]:
# Place model on GPU-
model.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (layer1): Sequential(
    (0): ResNet_block(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (dropout): Dropout(p=0.2, inplace=False)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU()
      (identity_downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-0

In [33]:
# Count number of layer-wise parameters and total parameters-
tot_params = 0
for param in model.parameters():
    print(f"layer.shape = {param.shape} has {param.nelement()} parameters")
    tot_params += param.nelement()

layer.shape = torch.Size([64, 3, 3, 3]) has 1728 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([64, 64, 1, 1]) has 4096 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([64, 64, 3, 3]) has 36864 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([256, 64, 1, 1]) has 16384 parameters
layer.shape = torch.Size([256]) has 256 parameters
layer.shape = torch.Size([256]) has 256 parameters
layer.shape = torch.Size([256, 64, 1, 1]) has 16384 parameters
layer.shape = torch.Size([256]) has 256 parameters
layer.shape = torch.Size([256]) has 256 parameters
layer.shape = torch.Size([64, 256, 1, 1]) has 16384 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([64]) has 64 parameters
layer.shape = torch.Size([64, 64

In [34]:
print(f"Total number of parameters in ResNet-50 CNN for CIFAR-10 = {tot_params}")

Total number of parameters in ResNet-50 CNN for CIFAR-10 = 23520842


In [None]:
'''
# Print layer names-
for layer in model.state_dict().keys():
    print(f"{layer} has dimension = {model.state_dict()[layer].shape}")
'''

In [35]:
# Save random initial weights-
torch.save(model.state_dict(), 'ResNet50_random_weights.pth')

In [21]:
# Load randomly initialised weights-
# model.load_state_dict(torch.load('ResNet50_random_weights.pth'))

<All keys matched successfully>

### Train model with _learning rate scheduler_

- Training dataset = 50000, batch size = 128, number of training steps/iterations = 50000 / 128 = 391

- Initial learning rate warmup: 391 x 10 = 3910 steps or, 10 epochs at LR = 0.1

- Until 25th epoch or, 9775 steps use LR = 0.1

- From 26th epoch until 40th epoch or, 15640 steps use LR = 0.01

- From 41st epoch until 50th epoch or, 19550 steps use LR = 0.001

- From 51st epoch until 60th epoch use LR = 0.0001

In [36]:
boundaries = [9775, 15640, 19550]
values = [0.1, 0.01, 0.001, 0.0001]

In [37]:
# Define loss function and optimizer-
loss = nn.CrossEntropyLoss()

# optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.0, momentum = 0.9, weight_decay = 5e-4)

In [38]:
# Sanity check-
optimizer.param_groups[0]['lr']

0.0

In [39]:
def decay_function(step, boundaries = [9775, 15640, 19550], values = [0.1, 0.01, 0.001, 0.0001]):
    '''
    1 epoch has 391 steps/iterations using batch size used above.
    
    Until 25th epochs, or 25 x 391 = 9775 steps, use lr = 0.1
    From 26th epoch until 40th epoch, or 15640 steps use LR = 0.01
    From 41st epoch until 50th epoch or, 19550 steps use LR = 0.001
    From 51st epoch until 60th epoch use LR = 0.0001
    '''
    
    for idx, bound in enumerate(boundaries):
        if step < bound:
            return values[idx]

    return values[-1]


In [40]:
class schedule():

    def __init__(self, initial_learning_rate = 0.1, warmup_steps = 1000, decay_func = None):
        self.initial_learning_rate = initial_learning_rate
        self.warmup_steps = warmup_steps
        self.decay_func = decay_func
        self.warmup_step_size = initial_learning_rate/warmup_steps
        self.current_lr = 0

    def get_lr(self, step):
        if step == 0:
            return self.current_lr
        elif step <= self.warmup_steps:
            self.current_lr+=self.warmup_step_size
            return self.current_lr
        elif step > self.warmup_steps:
            if self.decay_func:
                return self.decay_func(step)
        else:
            return self.current_lr


In [41]:
# 391 x 10 = 3910 steps (or, 10 epochs) is learning rate warmup
custom_lr_scheduler = schedule(
    initial_learning_rate = 0.1, warmup_steps = 3910,
    decay_func = decay_function
)

In [42]:
step = 0

In [43]:
def train_model_progress(model, train_loader):
    '''
    Function to perform one epoch of training by using 'train_loader'.
    Returns loss and number of correct predictions for this epoch.
    '''
    running_loss = 0.0
    running_corrects = 0.0
    
    model.train()
    
    with tqdm(train_loader, unit = 'batch') as tepoch:
        for images, labels in tepoch:
            tepoch.set_description(f"Training: ")
            
            images = images.to(device)
            labels = labels.to(device)
            
            # Get model predictions-
            outputs = model(images)
            
            # Compute loss-
            J = loss(outputs, labels)
            
            # Empty accumulated gradients-
            optimizer.zero_grad()
            
            # Perform backprop-
            J.backward()
            
            # Update parameters-
            optimizer.step()
            
            global step
            optimizer.param_groups[0]['lr'] = custom_lr_scheduler.get_lr(step)

            step += 1
            
            # Compute model's performance statistics-
            running_loss += J.item() * images.size(0)
            _, predicted = torch.max(outputs, 1)
            running_corrects += torch.sum(predicted == labels.data)
            
            tepoch.set_postfix(
                loss = running_loss / len(train_dataset),
                accuracy = (running_corrects.double().cpu().numpy() / len(train_dataset)) * 100
            )
            
    
    train_loss = running_loss / len(train_dataset)
    train_acc = (running_corrects.double() / len(train_dataset)) * 100
    

    # return running_loss, running_corrects
    return train_loss, train_acc.cpu().numpy()
    


In [44]:
def test_model_progress(model, test_loader):
    total = 0.0
    correct = 0.0
    running_loss_val = 0.0

    with torch.no_grad():
        with tqdm(test_loader, unit = 'batch') as tepoch:
            for images, labels in tepoch:
                tepoch.set_description(f"Validation: ")
                
                images = images.to(device)
                labels = labels.to(device)
                
                # Set model to evaluation mode-
                model.eval()
            
                # Predict using trained model-
                outputs = model(images)
                _, y_pred = torch.max(outputs, 1)
                
                # Compute validation loss-
                J_val = loss(outputs, labels)
                
                running_loss_val += J_val.item() * labels.size(0)
    
                # Total number of labels-
                total += labels.size(0)

                # Total number of correct predictions-
                correct += (y_pred == labels).sum()
                
                tepoch.set_postfix(
                    val_loss = running_loss_val / len(test_dataset),
                    val_acc = 100 * (correct.cpu().numpy() / total)
                )
            
        
    # return (running_loss_val, correct, total)
    val_loss = running_loss_val / len(test_dataset)
    val_acc = (correct / total) * 100

    return val_loss, val_acc.cpu().numpy()


In [45]:
training_history_lr_scheduler = {}

In [46]:
# Initialize parameters for Early Stopping manual implementation-
best_val_loss = 100
loc_patience = 0

In [47]:
for epoch in range(num_epochs):

    train_loss, train_acc = train_model_progress(model, train_loader)
    val_loss, val_acc = test_model_progress(model, test_loader)
    
    print(f"\nepoch: {epoch + 1} training loss = {train_loss:.4f}, "
          f"training accuracy = {train_acc:.2f}%, val_loss = {val_loss:.4f}"
          f", val_accuracy = {val_acc:.2f}% & "
          f"LR = {optimizer.param_groups[0]['lr']:.4f}\n")
    
    training_history_lr_scheduler[epoch + 1] = {
        'loss': train_loss, 'acc': train_acc,
        'val_loss': val_loss, 'val_acc': val_acc,
        'lr': optimizer.param_groups[0]['lr']
    }

    
    # Save best weights achieved until now-
    if (val_loss < best_val_loss):    
        # update 'best_val_loss' variable to lowest loss encountered so far-
        best_val_loss = val_loss

        print(f"Saving model with lowest val_loss = {val_loss:.4f}\n")
        
        # Save trained model with 'best' validation accuracy-
        torch.save(model.state_dict(), "ResNet50_lr_scheduler_best_model.pth")
    


Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=17.6, loss=2.27]  
Validation: : 100%|██████████| 79/79 [00:14<00:00,  5.58batch/s, val_acc=11, val_loss=5.21]   



epoch: 1 training loss = 2.2709, training accuracy = 17.58%, val_loss = 5.2072, val_accuracy = 10.99% & LR = 0.0100

Saving model with lowest val_loss = 5.2072



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.62batch/s, accuracy=25.3, loss=1.96]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.94batch/s, val_acc=14.6, val_loss=3.27] 



epoch: 2 training loss = 1.9647, training accuracy = 25.33%, val_loss = 3.2678, val_accuracy = 14.59% & LR = 0.0200

Saving model with lowest val_loss = 3.2678



Training: : 100%|██████████| 391/391 [03:56<00:00,  1.66batch/s, accuracy=36, loss=1.7]     
Validation: : 100%|██████████| 79/79 [00:13<00:00,  5.99batch/s, val_acc=33.1, val_loss=2]    



epoch: 3 training loss = 1.6968, training accuracy = 36.03%, val_loss = 2.0009, val_accuracy = 33.13% & LR = 0.0300

Saving model with lowest val_loss = 2.0009



Training: : 100%|██████████| 391/391 [04:05<00:00,  1.59batch/s, accuracy=45.7, loss=1.46]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.01batch/s, val_acc=47, val_loss=1.49]   



epoch: 4 training loss = 1.4612, training accuracy = 45.71%, val_loss = 1.4885, val_accuracy = 46.99% & LR = 0.0400

Saving model with lowest val_loss = 1.4885



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.60batch/s, accuracy=52.6, loss=1.3]   
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.02batch/s, val_acc=41.9, val_loss=1.8]  



epoch: 5 training loss = 1.3035, training accuracy = 52.57%, val_loss = 1.7951, val_accuracy = 41.85% & LR = 0.0500



Training: : 100%|██████████| 391/391 [03:48<00:00,  1.71batch/s, accuracy=58.4, loss=1.16]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.19batch/s, val_acc=61.3, val_loss=1.14] 



epoch: 6 training loss = 1.1556, training accuracy = 58.37%, val_loss = 1.1412, val_accuracy = 61.28% & LR = 0.0600

Saving model with lowest val_loss = 1.1412



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=61.1, loss=1.08]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.10batch/s, val_acc=58.4, val_loss=1.24] 



epoch: 7 training loss = 1.0770, training accuracy = 61.08%, val_loss = 1.2434, val_accuracy = 58.40% & LR = 0.0700



Training: : 100%|██████████| 391/391 [03:55<00:00,  1.66batch/s, accuracy=63.9, loss=1.01]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.99batch/s, val_acc=65.2, val_loss=1.01] 



epoch: 8 training loss = 1.0135, training accuracy = 63.86%, val_loss = 1.0091, val_accuracy = 65.16% & LR = 0.0800

Saving model with lowest val_loss = 1.0091



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=65.8, loss=0.965] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.24batch/s, val_acc=65.5, val_loss=0.995]



epoch: 9 training loss = 0.9653, training accuracy = 65.81%, val_loss = 0.9949, val_accuracy = 65.49% & LR = 0.0900

Saving model with lowest val_loss = 0.9949



Training: : 100%|██████████| 391/391 [03:56<00:00,  1.65batch/s, accuracy=66.9, loss=0.931] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.14batch/s, val_acc=62.5, val_loss=1.09] 



epoch: 10 training loss = 0.9309, training accuracy = 66.95%, val_loss = 1.0925, val_accuracy = 62.47% & LR = 0.1000



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.60batch/s, accuracy=68.6, loss=0.893] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.03batch/s, val_acc=67.7, val_loss=0.934]



epoch: 11 training loss = 0.8933, training accuracy = 68.60%, val_loss = 0.9335, val_accuracy = 67.68% & LR = 0.1000

Saving model with lowest val_loss = 0.9335



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.61batch/s, accuracy=70.2, loss=0.851] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.01batch/s, val_acc=69.3, val_loss=0.928]



epoch: 12 training loss = 0.8506, training accuracy = 70.19%, val_loss = 0.9280, val_accuracy = 69.31% & LR = 0.1000

Saving model with lowest val_loss = 0.9280



Training: : 100%|██████████| 391/391 [04:01<00:00,  1.62batch/s, accuracy=71.3, loss=0.825] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.10batch/s, val_acc=68.2, val_loss=0.988]



epoch: 13 training loss = 0.8252, training accuracy = 71.30%, val_loss = 0.9880, val_accuracy = 68.15% & LR = 0.1000



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.61batch/s, accuracy=72.4, loss=0.789] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.04batch/s, val_acc=68.1, val_loss=0.929]



epoch: 14 training loss = 0.7893, training accuracy = 72.40%, val_loss = 0.9287, val_accuracy = 68.10% & LR = 0.1000



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=73.1, loss=0.767] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.09batch/s, val_acc=70.7, val_loss=0.867]



epoch: 15 training loss = 0.7672, training accuracy = 73.08%, val_loss = 0.8673, val_accuracy = 70.65% & LR = 0.1000

Saving model with lowest val_loss = 0.8673



Training: : 100%|██████████| 391/391 [03:53<00:00,  1.67batch/s, accuracy=74.1, loss=0.743] 
Validation: : 100%|██████████| 79/79 [00:11<00:00,  6.90batch/s, val_acc=75.1, val_loss=0.727] 



epoch: 16 training loss = 0.7430, training accuracy = 74.07%, val_loss = 0.7271, val_accuracy = 75.11% & LR = 0.1000

Saving model with lowest val_loss = 0.7271



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=74.9, loss=0.728] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.22batch/s, val_acc=63.8, val_loss=1.2]  



epoch: 17 training loss = 0.7285, training accuracy = 74.88%, val_loss = 1.1993, val_accuracy = 63.80% & LR = 0.1000



Training: : 100%|██████████| 391/391 [03:55<00:00,  1.66batch/s, accuracy=75.2, loss=0.713] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.05batch/s, val_acc=72.1, val_loss=0.797] 



epoch: 18 training loss = 0.7132, training accuracy = 75.19%, val_loss = 0.7974, val_accuracy = 72.09% & LR = 0.1000



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=75.8, loss=0.698] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.96batch/s, val_acc=74, val_loss=0.76]    



epoch: 19 training loss = 0.6977, training accuracy = 75.78%, val_loss = 0.7602, val_accuracy = 74.02% & LR = 0.1000



Training: : 100%|██████████| 391/391 [03:56<00:00,  1.65batch/s, accuracy=76.3, loss=0.685] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.97batch/s, val_acc=76.7, val_loss=0.669] 



epoch: 20 training loss = 0.6850, training accuracy = 76.33%, val_loss = 0.6694, val_accuracy = 76.71% & LR = 0.1000

Saving model with lowest val_loss = 0.6694



Training: : 100%|██████████| 391/391 [03:54<00:00,  1.67batch/s, accuracy=76.7, loss=0.672] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.10batch/s, val_acc=75.5, val_loss=0.726] 



epoch: 21 training loss = 0.6720, training accuracy = 76.72%, val_loss = 0.7263, val_accuracy = 75.49% & LR = 0.1000



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.62batch/s, accuracy=77.2, loss=0.66]  
Validation: : 100%|██████████| 79/79 [00:12<00:00,  6.54batch/s, val_acc=72.3, val_loss=0.818]



epoch: 22 training loss = 0.6601, training accuracy = 77.18%, val_loss = 0.8185, val_accuracy = 72.30% & LR = 0.1000



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=77.6, loss=0.65]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.11batch/s, val_acc=76.3, val_loss=0.699] 



epoch: 23 training loss = 0.6498, training accuracy = 77.64%, val_loss = 0.6989, val_accuracy = 76.26% & LR = 0.1000



Training: : 100%|██████████| 391/391 [03:56<00:00,  1.65batch/s, accuracy=77.8, loss=0.643] 
Validation: : 100%|██████████| 79/79 [00:14<00:00,  5.49batch/s, val_acc=71.9, val_loss=0.844]



epoch: 24 training loss = 0.6427, training accuracy = 77.81%, val_loss = 0.8439, val_accuracy = 71.88% & LR = 0.1000



Training: : 100%|██████████| 391/391 [04:05<00:00,  1.59batch/s, accuracy=78, loss=0.639]   
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.02batch/s, val_acc=77.6, val_loss=0.673] 



epoch: 25 training loss = 0.6394, training accuracy = 77.97%, val_loss = 0.6728, val_accuracy = 77.59% & LR = 0.1000



Training: : 100%|██████████| 391/391 [03:55<00:00,  1.66batch/s, accuracy=84.4, loss=0.456] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.94batch/s, val_acc=87, val_loss=0.378]   



epoch: 26 training loss = 0.4561, training accuracy = 84.43%, val_loss = 0.3778, val_accuracy = 86.98% & LR = 0.0100

Saving model with lowest val_loss = 0.3778



Training: : 100%|██████████| 391/391 [03:56<00:00,  1.66batch/s, accuracy=86.1, loss=0.401]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.10batch/s, val_acc=87, val_loss=0.377]   



epoch: 27 training loss = 0.4015, training accuracy = 86.09%, val_loss = 0.3767, val_accuracy = 87.00% & LR = 0.0100

Saving model with lowest val_loss = 0.3767



Training: : 100%|██████████| 391/391 [04:07<00:00,  1.58batch/s, accuracy=86.8, loss=0.386]  
Validation: : 100%|██████████| 79/79 [00:11<00:00,  6.60batch/s, val_acc=88, val_loss=0.354]   



epoch: 28 training loss = 0.3863, training accuracy = 86.77%, val_loss = 0.3541, val_accuracy = 87.96% & LR = 0.0100

Saving model with lowest val_loss = 0.3541



Training: : 100%|██████████| 391/391 [03:54<00:00,  1.67batch/s, accuracy=87.2, loss=0.37]  
Validation: : 100%|██████████| 79/79 [00:11<00:00,  6.90batch/s, val_acc=88.5, val_loss=0.337] 



epoch: 29 training loss = 0.3701, training accuracy = 87.17%, val_loss = 0.3368, val_accuracy = 88.47% & LR = 0.0100

Saving model with lowest val_loss = 0.3368



Training: : 100%|██████████| 391/391 [04:11<00:00,  1.55batch/s, accuracy=87.5, loss=0.358]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.00batch/s, val_acc=88.6, val_loss=0.34]  



epoch: 30 training loss = 0.3576, training accuracy = 87.52%, val_loss = 0.3397, val_accuracy = 88.60% & LR = 0.0100



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=88, loss=0.346]    
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.24batch/s, val_acc=88.5, val_loss=0.34]  



epoch: 31 training loss = 0.3463, training accuracy = 88.05%, val_loss = 0.3404, val_accuracy = 88.47% & LR = 0.0100



Training: : 100%|██████████| 391/391 [03:53<00:00,  1.67batch/s, accuracy=88.1, loss=0.342] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.05batch/s, val_acc=88.7, val_loss=0.332] 



epoch: 32 training loss = 0.3423, training accuracy = 88.14%, val_loss = 0.3322, val_accuracy = 88.67% & LR = 0.0100

Saving model with lowest val_loss = 0.3322



Training: : 100%|██████████| 391/391 [04:12<00:00,  1.55batch/s, accuracy=88.4, loss=0.334]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.10batch/s, val_acc=88.6, val_loss=0.333] 



epoch: 33 training loss = 0.3344, training accuracy = 88.45%, val_loss = 0.3326, val_accuracy = 88.60% & LR = 0.0100



Training: : 100%|██████████| 391/391 [03:55<00:00,  1.66batch/s, accuracy=88.7, loss=0.325]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.03batch/s, val_acc=88, val_loss=0.344]   



epoch: 34 training loss = 0.3250, training accuracy = 88.69%, val_loss = 0.3442, val_accuracy = 88.05% & LR = 0.0100



Training: : 100%|██████████| 391/391 [03:53<00:00,  1.67batch/s, accuracy=88.9, loss=0.321]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.20batch/s, val_acc=89.3, val_loss=0.313] 



epoch: 35 training loss = 0.3205, training accuracy = 88.92%, val_loss = 0.3132, val_accuracy = 89.33% & LR = 0.0100

Saving model with lowest val_loss = 0.3132



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=89.1, loss=0.313] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.23batch/s, val_acc=89.2, val_loss=0.323] 



epoch: 36 training loss = 0.3129, training accuracy = 89.08%, val_loss = 0.3228, val_accuracy = 89.20% & LR = 0.0100



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=89.3, loss=0.311]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.98batch/s, val_acc=89.3, val_loss=0.313] 



epoch: 37 training loss = 0.3114, training accuracy = 89.26%, val_loss = 0.3133, val_accuracy = 89.31% & LR = 0.0100



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=89.4, loss=0.305] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.14batch/s, val_acc=89.6, val_loss=0.309] 



epoch: 38 training loss = 0.3047, training accuracy = 89.35%, val_loss = 0.3089, val_accuracy = 89.63% & LR = 0.0100

Saving model with lowest val_loss = 0.3089



Training: : 100%|██████████| 391/391 [03:48<00:00,  1.71batch/s, accuracy=89.5, loss=0.301]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.11batch/s, val_acc=89.6, val_loss=0.31]  



epoch: 39 training loss = 0.3011, training accuracy = 89.47%, val_loss = 0.3095, val_accuracy = 89.63% & LR = 0.0100



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.61batch/s, accuracy=89.8, loss=0.294]  
Validation: : 100%|██████████| 79/79 [00:11<00:00,  6.63batch/s, val_acc=89.6, val_loss=0.318] 



epoch: 40 training loss = 0.2938, training accuracy = 89.81%, val_loss = 0.3177, val_accuracy = 89.58% & LR = 0.0100



Training: : 100%|██████████| 391/391 [04:01<00:00,  1.62batch/s, accuracy=91, loss=0.256]    
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.13batch/s, val_acc=91, val_loss=0.269]   



epoch: 41 training loss = 0.2558, training accuracy = 91.04%, val_loss = 0.2693, val_accuracy = 90.98% & LR = 0.0010

Saving model with lowest val_loss = 0.2693



Training: : 100%|██████████| 391/391 [04:12<00:00,  1.55batch/s, accuracy=91.6, loss=0.241]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.14batch/s, val_acc=91, val_loss=0.268]   



epoch: 42 training loss = 0.2414, training accuracy = 91.65%, val_loss = 0.2677, val_accuracy = 91.02% & LR = 0.0010

Saving model with lowest val_loss = 0.2677



Training: : 100%|██████████| 391/391 [03:46<00:00,  1.73batch/s, accuracy=91.8, loss=0.235] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.98batch/s, val_acc=91.1, val_loss=0.27]  



epoch: 43 training loss = 0.2348, training accuracy = 91.78%, val_loss = 0.2700, val_accuracy = 91.09% & LR = 0.0010



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=91.9, loss=0.233]  
Validation: : 100%|██████████| 79/79 [00:13<00:00,  5.92batch/s, val_acc=91.3, val_loss=0.263] 



epoch: 44 training loss = 0.2335, training accuracy = 91.91%, val_loss = 0.2628, val_accuracy = 91.29% & LR = 0.0010

Saving model with lowest val_loss = 0.2628



Training: : 100%|██████████| 391/391 [03:57<00:00,  1.65batch/s, accuracy=92, loss=0.23]     
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.05batch/s, val_acc=91.1, val_loss=0.267] 



epoch: 45 training loss = 0.2302, training accuracy = 92.04%, val_loss = 0.2666, val_accuracy = 91.09% & LR = 0.0010



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.60batch/s, accuracy=92.2, loss=0.223] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.99batch/s, val_acc=91.3, val_loss=0.267] 



epoch: 46 training loss = 0.2233, training accuracy = 92.25%, val_loss = 0.2667, val_accuracy = 91.27% & LR = 0.0010



Training: : 100%|██████████| 391/391 [03:48<00:00,  1.71batch/s, accuracy=92.4, loss=0.221] 
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.03batch/s, val_acc=91.3, val_loss=0.264] 



epoch: 47 training loss = 0.2214, training accuracy = 92.39%, val_loss = 0.2636, val_accuracy = 91.29% & LR = 0.0010



Training: : 100%|██████████| 391/391 [03:55<00:00,  1.66batch/s, accuracy=92.3, loss=0.221]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.07batch/s, val_acc=91.4, val_loss=0.264] 



epoch: 48 training loss = 0.2208, training accuracy = 92.29%, val_loss = 0.2639, val_accuracy = 91.44% & LR = 0.0010



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.60batch/s, accuracy=92.3, loss=0.22]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.94batch/s, val_acc=91.3, val_loss=0.263] 



epoch: 49 training loss = 0.2203, training accuracy = 92.28%, val_loss = 0.2634, val_accuracy = 91.27% & LR = 0.0010



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=92.5, loss=0.216]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.07batch/s, val_acc=91.3, val_loss=0.265] 



epoch: 50 training loss = 0.2164, training accuracy = 92.50%, val_loss = 0.2651, val_accuracy = 91.26% & LR = 0.0010



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.61batch/s, accuracy=92.7, loss=0.211]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.06batch/s, val_acc=91.5, val_loss=0.259] 



epoch: 51 training loss = 0.2108, training accuracy = 92.70%, val_loss = 0.2588, val_accuracy = 91.48% & LR = 0.0001

Saving model with lowest val_loss = 0.2588



Training: : 100%|██████████| 391/391 [03:54<00:00,  1.67batch/s, accuracy=92.7, loss=0.211]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.96batch/s, val_acc=91.5, val_loss=0.261] 



epoch: 52 training loss = 0.2113, training accuracy = 92.65%, val_loss = 0.2608, val_accuracy = 91.52% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=92.7, loss=0.211] 
Validation: : 100%|██████████| 79/79 [00:13<00:00,  5.84batch/s, val_acc=91.5, val_loss=0.26]  



epoch: 53 training loss = 0.2114, training accuracy = 92.72%, val_loss = 0.2600, val_accuracy = 91.53% & LR = 0.0001



Training: : 100%|██████████| 391/391 [03:57<00:00,  1.64batch/s, accuracy=92.6, loss=0.211]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.01batch/s, val_acc=91.5, val_loss=0.257] 



epoch: 54 training loss = 0.2111, training accuracy = 92.62%, val_loss = 0.2568, val_accuracy = 91.50% & LR = 0.0001

Saving model with lowest val_loss = 0.2568



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=92.9, loss=0.208]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.10batch/s, val_acc=91.5, val_loss=0.259] 



epoch: 55 training loss = 0.2075, training accuracy = 92.85%, val_loss = 0.2588, val_accuracy = 91.52% & LR = 0.0001



Training: : 100%|██████████| 391/391 [03:49<00:00,  1.70batch/s, accuracy=92.8, loss=0.207]  
Validation: : 100%|██████████| 79/79 [00:13<00:00,  5.81batch/s, val_acc=91.5, val_loss=0.261] 



epoch: 56 training loss = 0.2067, training accuracy = 92.81%, val_loss = 0.2609, val_accuracy = 91.51% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=92.7, loss=0.209]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.98batch/s, val_acc=91.5, val_loss=0.264] 



epoch: 57 training loss = 0.2089, training accuracy = 92.74%, val_loss = 0.2642, val_accuracy = 91.52% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.60batch/s, accuracy=92.8, loss=0.207]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.15batch/s, val_acc=91.6, val_loss=0.26]  



epoch: 58 training loss = 0.2073, training accuracy = 92.82%, val_loss = 0.2596, val_accuracy = 91.57% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.61batch/s, accuracy=92.8, loss=0.207]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.06batch/s, val_acc=91.5, val_loss=0.262] 



epoch: 59 training loss = 0.2072, training accuracy = 92.80%, val_loss = 0.2619, val_accuracy = 91.47% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:04<00:00,  1.60batch/s, accuracy=92.8, loss=0.205]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.96batch/s, val_acc=91.5, val_loss=0.261] 



epoch: 60 training loss = 0.2051, training accuracy = 92.83%, val_loss = 0.2607, val_accuracy = 91.47% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.61batch/s, accuracy=92.9, loss=0.206]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.20batch/s, val_acc=91.5, val_loss=0.26]  



epoch: 61 training loss = 0.2064, training accuracy = 92.88%, val_loss = 0.2597, val_accuracy = 91.54% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:01<00:00,  1.62batch/s, accuracy=92.8, loss=0.208] 
Validation: : 100%|██████████| 79/79 [00:11<00:00,  6.83batch/s, val_acc=91.7, val_loss=0.26]  



epoch: 62 training loss = 0.2081, training accuracy = 92.84%, val_loss = 0.2598, val_accuracy = 91.69% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:02<00:00,  1.61batch/s, accuracy=92.9, loss=0.204]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.13batch/s, val_acc=91.5, val_loss=0.262] 



epoch: 63 training loss = 0.2042, training accuracy = 92.94%, val_loss = 0.2624, val_accuracy = 91.54% & LR = 0.0001



Training: : 100%|██████████| 391/391 [03:54<00:00,  1.67batch/s, accuracy=92.9, loss=0.204]  
Validation: : 100%|██████████| 79/79 [00:15<00:00,  5.05batch/s, val_acc=91.5, val_loss=0.261] 



epoch: 64 training loss = 0.2042, training accuracy = 92.91%, val_loss = 0.2609, val_accuracy = 91.53% & LR = 0.0001



Training: : 100%|██████████| 391/391 [04:03<00:00,  1.60batch/s, accuracy=93, loss=0.205]    
Validation: : 100%|██████████| 79/79 [00:15<00:00,  4.96batch/s, val_acc=91.6, val_loss=0.26]  


epoch: 65 training loss = 0.2048, training accuracy = 93.00%, val_loss = 0.2601, val_accuracy = 91.61% & LR = 0.0001






In [48]:
# Save model from last training epoch-
torch.save(model.state_dict(), "ResNet50_lr_scheduler_last_epoch_model.pth")

In [None]:
# Initialize a new ResNet-50 model-
best_model = ResNet50(img_channels = 3, num_channels = 10)

In [66]:
# Load randomly initialised weights-
best_model.load_state_dict(torch.load('ResNet50_lr_scheduler_last_epoch_model.pth'))

<All keys matched successfully>

In [None]:
# Place model on GPU (if available)-
best_model.to(device)

In [None]:
val_loss, val_acc = test_model_progress(best_model, test_loader)

In [70]:
print(f"ResNet-50 'best' (LR scheduler) model metrics: val_loss = {val_loss:.4f} & val_acc = {val_acc:.2f}%")

ResNet-50 'best' (LR scheduler) model metrics: val_loss = 0.2255 & val_acc = 93.85%


In [71]:
del best_model, model

### Observation:

For this particular experiment, it seems that using ```val_loss``` as the metric to save the _best_ model is not the optimum choice.

_Highest validation accuracy_ achieved = 93.85%.

Also, there seems to be _overfitting_ happening. _Dropout_ needs to be employed.

In [None]:
# Sanity check-
training_history_lr_scheduler.keys()

In [None]:
training_history_lr_scheduler[12].keys()

In [4]:
import pickle

In [50]:
# Save training metrics as Python3 history for later analysis-
with open("ResNet50_training_history_lr_scheduler.pkl", "wb") as file:
    pickle.dump(training_history_lr_scheduler, file)

In [7]:
with open("ResNet50_training_history_lr_scheduler.pkl", "rb") as file:
    training_history_lr_scheduler = pickle.load(file)

EOFError: Ran out of input

### Training Visualizations

In [3]:
plt.figure(figsize = (9, 7))
plt.plot(list(training_history_lr_scheduler.keys()), [training_history_lr_scheduler[k]['acc'] for k in training_history_lr_scheduler.keys()], label = 'training acc')
plt.plot(list(training_history_lr_scheduler.keys()), [training_history_lr_scheduler[k]['val_acc'] for k in training_history_lr_scheduler.keys()], label = 'val acc')
plt.title("ResNet-50: Training Accuracy")
plt.xlabel("epochs")
plt.ylabel("accuracy (%)")
plt.legend(loc = 'best')
plt.show()

NameError: name 'training_history_lr_scheduler' is not defined

<Figure size 648x504 with 0 Axes>

In [78]:
plt.figure(figsize = (9, 7))
plt.plot(list(training_history_lr_scheduler.keys()), [training_history_lr_scheduler[k]['loss'] for k in training_history_lr_scheduler.keys()], label = 'training loss')
plt.plot(list(training_history_lr_scheduler.keys()), [training_history_lr_scheduler[k]['val_loss'] for k in training_history_lr_scheduler.keys()], label = 'val loss')
plt.xlabel("epochs")
plt.ylabel("loss")
plt.legend(loc = 'best')
plt.title("ResNet-50: Training Loss")
plt.show()

<Figure size 900x700 with 1 Axes>

In [79]:
plt.figure(figsize = (9, 7))
plt.plot(list(training_history_lr_scheduler.keys()), [training_history_lr_scheduler[k]['lr'] for k in training_history_lr_scheduler.keys()])
plt.xlabel("epochs")
plt.ylabel("lr")
plt.title("ResNet-50: Learning-Rate")
plt.show()

<Figure size 900x700 with 1 Axes>