# Importing Necessary Libraries and Modules

This first cell of the notebook imports all the necessary libraries and modules that will be used in the rest of the code. Here's a breakdown of each import:

- **`import torch`**: This imports the main torch library, which is the core of PyTorch. PyTorch is an open-source machine learning framework that allows for building and training neural networks.

- **`import torchvision`**: This imports the torchvision library, which is part of the PyTorch project. It provides access to popular datasets, model architectures, and image transformations for computer vision tasks.

- **`from torch import nn`**: This imports the `nn` module from torch. The `nn` module provides a set of building blocks for creating neural networks, such as layers (like convolutional layers, linear layers), activation functions, and loss functions.

- **`from torch.nn import CrossEntropyLoss`**: This specifically imports the `CrossEntropyLoss` class from the `torch.nn` module. This is a common loss function used for classification tasks.

- **`from torch.optim import Adam`**: This imports the Adam optimizer from the `torch.optim` module. Optimizers are algorithms used to update the weights of a neural network during training to minimize the loss function. Adam is a popular and effective optimizer.

- **`from torch.utils.data import DataLoader, Dataset`**: This imports the `DataLoader` and `Dataset` classes from `torch.utils.data`. The `Dataset` class is used to represent a collection of data samples, while the `DataLoader` class provides an efficient way to iterate over the dataset in batches during training.

- **`from torchvision import transforms`**: This imports the `transforms` module from torchvision. The `transforms` module provides a collection of common image transformations that can be applied to images for data augmentation and preprocessing.

- **`from torchvision.datasets import ImageFolder`**: This imports the `ImageFolder` class from `torchvision.datasets`. This class is a convenient way to load image datasets organized in a specific folder structure where the class labels are inferred from the directory names.

- **`from torchvision.models import ResNet152_Weights, resnet152`**: This imports the `resnet152` model architecture and its default weights (`ResNet152_Weights`) from the `torchvision.models` module. ResNet152 is a pre-trained convolutional neural network model that is often used as a starting point for various computer vision tasks.

- **`from tqdm import tqdm`**: This imports the tqdm library. tqdm is a library that provides a smart progress bar for loops, which can be helpful for visualizing the progress of training or other iterative processes.


In [None]:
import numpy as np
import torch
import torchvision
from torch import nn
from torch.nn import CrossEntropyLoss
from torch.optim import Adam
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torchvision.models import ResNet152_Weights, resnet152
from tqdm import tqdm

In [None]:
BATCH_SIZE = 64
# LEARNING_RATE = 1e-5
LEARNING_RATE = 5e-6
NO_OF_ITERATIONS = 300
CUDA_DEVICE = "cuda:1" if torch.cuda.is_available() else "cpu"
PATH_TO_MODEL_SAVE = "resnet_model_weight_for_cnc.pth"

# Image Transformations for Data Augmentation

This section of the code defines a sequence of image transformations that will be applied to the images in the dataset. These transformations are used for data augmentation, which helps to increase the size and variability of the training data. This can improve the generalization ability of the trained model and prevent overfitting. The `transforms.Compose` function is used to chain multiple transformations together to be applied in a specific order.

Here's a breakdown of each transformation:

- **`transforms.Resize((256, 256))`**: This transformation resizes the input image to a fixed size of 256x256 pixels. This is often done to ensure that all images have a consistent size before being fed into a neural network.

- **`transforms.RandomCrop((224, 224))`**: This transformation randomly crops a section of size 224x224 pixels from the resized image. Random cropping introduces variability in the training data by forcing the model to learn features from different parts of the image.

- **`transforms.RandomVerticalFlip(p=0.5)`**: This transformation randomly flips the image vertically with a probability of 0.5.

- **`transforms.RandomHorizontalFlip(p=0.5)`**: This transformation randomly flips the image horizontally with a probability of 0.5. Flipping the image both vertically and horizontally helps the model become invariant to the orientation of objects in the image.

- **`transforms.RandomRotation(degrees=30)`**: This transformation randomly rotates the image by an angle between -30 and +30 degrees. Random rotation helps the model become robust to variations in the rotation of objects.

- **`transforms.ToTensor()`**: This transformation converts the image from a PIL Image or a NumPy array to a PyTorch tensor. It also scales the pixel values to the range [0, 1]. This is necessary because PyTorch models work with tensors.

- **`transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])`**: This transformation normalizes the tensor image with the specified mean and standard deviation for each color channel (Red, Green, Blue). Normalization is a common preprocessing step that helps to standardize the input data and can improve the training process. The values used here are common mean and standard deviation values for images pre-trained on the ImageNet dataset.


In [None]:
transforms_functions = transforms.Compose(
    [
        transforms.Resize((256, 256)),
        transforms.RandomCrop((224, 224)),
        transforms.RandomVerticalFlip(p=0.5),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(degrees=30),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

# Creating a Dataset with ImageFolder
This line of code creates a dataset using the ImageFolder class from the torchvision.datasets module.


Here's a breakdown of what's happening:

- **`dataset_of_cnc:`** This is the variable name assigned to the newly created dataset object.

- **`ImageFolder():`** This is the constructor for the ImageFolder class. This class is specifically designed to handle datasets where images are organized in subdirectories, with each subdirectory representing a different class.

- **`root="building_only":`** This argument specifies the root directory where the image dataset is located. In this case, it's expected that your image files are within a folder named "building_only" and that this folder contains subdirectories, where each subdirectory name is the name of a class `(e.g., "building_only/class1", "building_only/class2", etc.).` The ImageFolder class will automatically infer the class labels from these subdirectory names [1].
- **`transform=transforms_functions:`** This argument applies the sequence of image transformations defined earlier in the notebook (stored in the `transforms_functions variable`) to each image as it is loaded from the specified directory. These transformations are used for data augmentation and preprocessing.

In [None]:
dataset_of_cnc = ImageFolder(root="building_only", transform=transforms_functions)

The ImageFolder class, which was used to create the dataset_of_cnc object, automatically infers the class labels for your dataset based on the names of the subdirectories within the specified root directory [1]. The classes attribute is a list that stores these inferred class names in alphabetical order.

Therefore, executing this line of code will display the names of all the classes that were found in the "building_only" directory when the ImageFolder dataset was created. This is useful for verifying that the ImageFolder correctly identified your desired classes.

In [None]:
dataset_of_cnc.classes

['Administrative', 'Chemistry', 'Gurudeb', 'Heritage', 'canteen']

# Creating a DataLoader for the Dataset

This line of code creates a DataLoader object, which is used to efficiently load and iterate over the dataset during the training process.


Here's a breakdown of the components:

- **`training_dataloader_for_cnc`**: This is the variable name assigned to the newly created `DataLoader` object.

- **`DataLoader()`**: This is the constructor for the `DataLoader` class from `torch.utils.data`. It's designed to handle the process of loading data from a dataset in batches.

- **`dataset_of_cnc`**: This is the first argument passed to the `DataLoader`. It is the dataset object that was created in a previous step using `ImageFolder`. The `DataLoader` will load data from this dataset.

- **`batch_size=BATCH_SIZE`**: This argument specifies the number of samples (images and their corresponding labels) that will be loaded in each batch. The value for `BATCH_SIZE` is defined earlier in the notebook; in this case, it is 64. Using batches is crucial for efficient training, especially with large datasets.

- **`shuffle=True`**: This argument indicates whether the data should be shuffled at the beginning of each training epoch. When `shuffle` is set to `True`, the `DataLoader` will randomize the order of the samples in the dataset for each pass through the data. This helps to prevent the model from learning the order of the data and improves the generalization of the model.


In [None]:
training_dataloader_for_cnc = DataLoader(
    dataset_of_cnc, batch_size=BATCH_SIZE, shuffle=True
)

# Initializes a neural network model using the ResNet152 architecture.

ResNet152 is a deep convolutional neural network that is commonly used for image classification tasks.

The code calls the `resnet152()` function from the torchvision.models module. This function creates an instance of the **ResNet152 model.**

- The line # `resnet_model_for_cnc = resnet152(weights=ResNet152_Weights.DEFAULT)` is commented out. If this line were active, it would initialize the ResNet152 model with pre-trained weights. These weights are typically learned from a very large dataset like ImageNet and can provide a good starting point for training on a new dataset, a technique known as transfer learning.

- The active line, `resnet_model_for_cnc = resnet152(),` initializes the ResNet152 model with randomly initialized weights. This means the model will be trained from scratch on your dataset.

The created model is assigned to the variable resnet_model_for_cnc



In [None]:
# resnet_model_for_cnc = resnet152(weights=ResNet152_Weights.DEFAULT)
resnet_model_for_cnc = resnet152()

This line of code simply displays the structure of the neural network model that was just initialized and assigned to the variable `resnet_model_for_cnc.`



In [None]:
resnet_model_for_cnc

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

# Freezing the Model Parameters
This line of code is used to control which parameters of your neural network model, named resnet_model_for_cnc, will be updated during the training process.



Let's clarify the difference between `resnet_model_for_cnc.fc` and `resnet_model_for_cnc`.

- **`resnet_model_for_cnc:`** This refers to the entire neural network model object. It's the complete instance of the ResNet152 architecture that you initialized earlier. Think of this as the whole building that houses all the different parts of the network. When you apply operations or methods directly to `resnet_model_for_cnc`,* you are typically affecting or interacting with the entire model.*

- **`resnet_model_for_cnc.fc:`** This accesses a specific part or module within the `resnet_model_for_cnc` object. The .fc is an attribute of the resnet_model_for_cnc object,* and it specifically points to the *final fully connected layer* (or a sequence of layers in this case, after you redefined it) of the ResNet152 architecture. Think of this as a specific room or section within the building. When you apply operations or methods to` resnet_model_for_cnc`.fc, you are only affecting or interacting with this particular part of the model.

In [None]:
resnet_model_for_cnc.requires_grad_(requires_grad=False)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

# Freezing and Unfreezing Model Layers

This section of the code is related to controlling which parts of the neural network model will be updated during the training process. In deep learning, especially when using pre-trained models (although in this case, the model is initialized with random weights), it's common to selectively train only certain layers of the network. This is often done to fine-tune the model for a specific task while keeping the initial learned features from the earlier layers fixed.

The line you are asking about specifically focuses on the final fully connected layer (often named `fc` in many common network architectures like ResNet) of the `resnet_model_for_cnc`.

Let's break this down:

*   **`resnet_model_for_cnc.fc`**: This accesses the fully connected layer of the `resnet_model_for_cnc` object. In a standard ResNet, this is the layer that takes the high-level features extracted by the convolutional layers and maps them to the final class predictions.
*   **`.requires_grad_()`**: This is a method available for PyTorch tensors and modules that allows you to set the `requires_grad` attribute. The underscore at the end of the method name (`_`) indicates that this operation modifies the object in place.
*   **`requires_grad=True`:** This argument sets the `requires_grad` attribute of the parameters within the fully connected layer to `True`.

**What does `requires_grad=True` mean?**

In PyTorch, the `requires_grad` attribute is crucial for the automatic differentiation engine (autograd). When `requires_grad` is set to `True` for a tensor or the parameters within a module, PyTorch will keep track of the operations performed on that tensor or module during the forward pass. This allows PyTorch to compute the gradients of the loss with respect to these parameters during the backward pass.

By setting `resnet_model_for_cnc.fc.requires_grad_` to `True`, you are indicating that the weights and biases of the final fully connected layer **should** be updated during the training process. This means that the optimization algorithm (defined later in the notebook) will calculate the gradients for these parameters and adjust them to minimize the loss function.

In contrast, in the previous code cell, `resnet_model_for_cnc.requires_grad_(requires_grad=False)` was executed. This sets the `requires_grad` attribute to `False` for *all* parameters in the entire `resnet_model_for_cnc` model. This effectively "freezes" the weights of the pre-trained layers (the convolutional and pooling layers before the final fully connected layer), meaning they will not be updated during training.

Therefore, the combined effect of these two lines is to freeze the weights of the majority of the ResNet152 model and only allow the parameters of the final fully connected layer to be trained. This is a common strategy for transfer learning, where you use a pre-trained model as a feature extractor and train a new classification head (the fully connected layer) on your specific dataset. Even when training from scratch, you might selectively freeze layers during initial training phases.

In [None]:
resnet_model_for_cnc.fc.requires_grad_(requires_grad=True)

Linear(in_features=2048, out_features=1000, bias=True)

# Redefining the Final Layer of the Model

This block of code replaces the original final fully connected layer of the `resnet_model_for_cnc` with a new sequence of layers. This is a common practice when adapting a pre-existing model architecture (like ResNet152) for a specific task, especially when the number of output classes for your task is different from the original task the model was trained on.

Here's a breakdown of what's happening:

- **`resnet_model_for_cnc.fc = ...`**: This line assigns a new module to the `fc` attribute of the `resnet_model_for_cnc` object. As explained previously, the `fc` attribute typically refers to the final fully connected layer responsible for producing the class predictions. By assigning a new sequence of layers here, you are essentially replacing the model's original output layer.

- **`torch.nn.Sequential(...)`**: This is a container in PyTorch that holds a sequence of modules. Data will pass through the modules in the order they are added to the Sequential container.

- **`torch.nn.Linear(in_features=2048, out_features=512)`**: This creates the first new layer, which is a linear (or fully connected) layer. It takes an input of size 2048 (which is the output size of the ResNet's layers before the original `fc` layer) and outputs a tensor of size 512. This layer acts as an intermediate layer in the new classification head.

- **`torch.nn.ReLU()`**: This adds a Rectified Linear Unit (ReLU) activation function after the first linear layer. ReLU introduces non-linearity into the model, which is essential for learning complex patterns in the data.

- **`torch.nn.Dropout(p=0.2)`**: This adds a Dropout layer with a dropout probability of 0.2. Dropout is a regularization technique that randomly sets a fraction of input units to zero during training. This helps prevent the model from overfitting by making it less reliant on specific neurons.

- **`torch.nn.Linear(in_features=512, out_features=len(dataset_of_cnc.classes))`**: This is the second linear layer in the sequence. It takes the output of the previous layers (size 512) and maps it to a tensor with a size equal to the number of classes in your dataset. The `len(dataset_of_cnc.classes)` dynamically gets the number of unique classes identified when the `ImageFolder` dataset was created. This layer is the final output layer of your redefined classification head.

- **`torch.nn.Softmax(dim=1)`**: This adds a Softmax activation function to the output of the final linear layer. Softmax is commonly used in multi-class classification problems. It converts the raw output scores (logits) from the linear layer into probabilities, where each probability represents the likelihood of the input belonging to a specific class. The `dim=1` argument specifies that the Softmax operation should be applied across the class dimension.

In summary, this code replaces the standard final layer of ResNet152 with a new custom classification head consisting of two linear layers, a ReLU activation, and a Dropout layer, followed by a Softmax activation to produce class probabilities. This new head is tailored to the specific number of classes in your dataset.


In [None]:
resnet_model_for_cnc.fc = torch.nn.Sequential(
    torch.nn.Linear(in_features=2048, out_features=512),
    torch.nn.ReLU(),
    torch.nn.Dropout(p=0.2),
    torch.nn.Linear(
        in_features=512,
        out_features=len(dataset_of_cnc.classes),
    ),
    torch.nn.Softmax(dim=1),
)

# Initializing the Weights and Biases of Linear Layers

This code block iterates through all the modules (layers and sub-modules) within the neural network model `resnet_model_for_cnc` and specifically initializes the weights and biases of any **linear layers** (`nn.Linear`) that it finds. Proper initialization of weights and biases is important for the effective training of neural networks.

Here's a breakdown:

- **`for module in resnet_model_for_cnc.modules():`** This line starts a loop that goes through every `module` within the `resnet_model_for_cnc` model. A module can be a single layer (like a linear layer or a convolutional layer) or a container of other modules (like a `Sequential` module).

- **`if type(module) == nn.Linear:`** Inside the loop, this condition checks if the current `module` is an instance of a `nn.Linear` layer. The code inside this `if` block will only execute for linear layers.

- **` nn.init.kaiming_normal_( module.weight.data, a=0, mode="fan_out", nonlinearity="relu", ):`**
If the module is a linear layer, this line initializes the **weights** of that layer.
  *   `nn.init.kaiming_normal_`: This is a function from PyTorch's initialization module. It implements the Kaiming initialization method (also known as He initialization). This method is particularly well-suited for layers that use the ReLU activation function, which is used in the redefined final layer of the model.
  *   `module.weight.data`: This accesses the weight tensor of the linear layer. The `_` at the end of `kaiming_normal_` indicates that the initialization is performed in-place on the weight tensor's data.
  *   `a=0`, `mode="fan_out"`, `nonlinearity="relu"`: These are parameters for the Kaiming normal initialization, tailored for layers followed by ReLU activation.

- **` if module.bias is not None:`**
 This condition checks if the current linear layer has a **bias** term. By default, `nn.Linear` layers include a bias, but it can be disabled during initialization.


- **`fan_in, fan_out = nn.init.calculate_fan_in_and_fan_out(module.weight.data) bound = 1 / (fan_out) ** 0.5 nn.init.normal(module.bias, -bound, bound)`**:
If the linear layer has a bias, these lines initialize the bias term.
  *   `nn.init._calculate_fan_in_and_fan_out(module.weight.data)`: This internal PyTorch function calculates the number of input and output connections for the linear layer, which is used to determine the scale of the bias initialization.
  *   `bound = 1 / (fan_out) ** 0.5`: This calculates the boundary for the uniform distribution from which the bias values will be drawn. This specific calculation is based on a common practice for initializing biases, often related to Glorot/Xavier or Kaiming initialization schemes.
  *   `nn.init.normal_(module.bias, -bound, bound)`: This initializes the bias tensor (`module.bias`) with values drawn from a uniform distribution between `-bound` and `bound` [1]. The `_` indicates an in-place operation.

In essence, this code ensures that the weights of the linear layers within the model, particularly the newly added layers in the redefined classification head, are initialized using a method suitable for ReLU activations, and the biases are initialized to small random values around zero. This helps the training process converge more effectively.

In [None]:
for module in resnet_model_for_cnc.modules():
    if type(module) == nn.Linear:
        nn.init.kaiming_normal_(
            module.weight.data,
            a=0,
            mode="fan_out",
            nonlinearity="relu",
        )
        if module.bias is not None:
            fan_in, fan_out = nn.init._calculate_fan_in_and_fan_out(module.weight.data)
            bound = 1 / (fan_out) ** 0.5
            nn.init.normal_(module.bias, -bound, bound)

# Moving the Model to the Specified Device

This line of code is straightforward but essential in PyTorch for managing where your neural network model resides during computations.
The `.to()` method is a general-purpose method in PyTorch for moving objects (like tensors or modules) to a different device. In this case, it's being used on the `resnet_model_for_cnc` object, which represents your neural network.

*   `device=CUDA_DEVICE`: This argument specifies the target device to which you want to move the model. The `CUDA_DEVICE` variable was defined earlier in the notebook based on whether a CUDA-enabled GPU is available.
    *   If a CUDA-enabled GPU is available, `CUDA_DEVICE` will likely be set to a string like `"cuda:1"` (indicating GPU device 1, depending on your system configuration). In this case, the model will be moved to the GPU.
    *   If a CUDA-enabled GPU is **not** available, `CUDA_DEVICE` will be set to `"cpu"`, and the model will be moved to the CPU.

Moving the model to a GPU when available is standard practice in deep learning because GPUs are highly optimized for the parallel computations required for neural network training and inference, leading to significantly faster processing compared to using a CPU.

By executing this line, you ensure that your `resnet_model_for_cnc` is ready to perform calculations on the specified device, which is crucial before starting any training or inference operations.

In [None]:
resnet_model_for_cnc.to(device=CUDA_DEVICE)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

# Loading Model Weights

This line of code is responsible for loading the previously saved weights and biases of the neural network model into the currently initialized `resnet_model_for_cnc`. This is a crucial step if you want to resume training from a checkpoint or use a pre-trained model that you have saved.
Here's a breakdown:

*   `resnet_model_for_cnc.load_state_dict(...)`: This is a method of a PyTorch model (`resnet_model_for_cnc` in this case) that allows you to load a `state_dict`. A `state_dict` in PyTorch is essentially a Python dictionary that maps each layer to its learnable parameters (weights and biases) [2]. By calling `load_state_dict`, you are instructing the model to replace its current parameters with the ones provided in the dictionary.

*   `torch.load(...)`: This function is used to deserialize and load an object that was saved using `torch.save`. In this context, it's loading the saved `state_dict` from a file.

*   `PATH_TO_MODEL_SAVE`: This variable holds the file path where the model's `state_dict` was previously saved. This file contains the learned weights and biases of the model from a prior training session.

*   `map_location=CUDA_DEVICE`: This argument to `torch.load` specifies where the loaded `state_dict` should be placed in memory. The `CUDA_DEVICE` variable determines whether the model should be loaded onto a GPU (`cuda:1` if available) or the CPU (`cpu`). This is important to ensure that the loaded weights are in the correct device memory for subsequent operations.

**In summary:** This line of code loads the saved weights and biases of the model from the file specified by `PATH_TO_MODEL_SAVE` and applies them to the `resnet_model_for_cnc` object. This allows the model to start training or inference from a previously saved state [1].

In [None]:
resnet_model_for_cnc.load_state_dict(
    torch.load(PATH_TO_MODEL_SAVE, map_location=CUDA_DEVICE)
)

<All keys matched successfully>

In [None]:
torch.cuda.is_available()

True

# Defining the Optimizer

This line of code initializes an optimizer, which is a crucial component in the training process of a neural network. The optimizer's role is to update the model's parameters (weights and biases) during training in a way that minimizes the loss function.
Here's a breakdown:

*   `optimizer`: This is the variable name assigned to the optimizer object that is being created.
*   `Adam(...)`: This is the constructor for the **Adam optimizer**, imported from `torch.optim`. Adam is a popular and effective optimization algorithm that adapts the learning rate for each parameter individually based on estimates of first and second moments of the gradients.
*   `resnet_model_for_cnc.parameters()`: This argument provides the optimizer with the parameters (weights and biases) of the neural network model (`resnet_model_for_cnc`) that it needs to optimize. The `.parameters()` method is a standard way in PyTorch to access all the learnable parameters of a model.
*   `lr=LEARNING_RATE`: This argument sets the **learning rate** for the optimizer. The learning rate is a hyperparameter that determines the step size at each iteration while moving towards a minimum of the loss function. A smaller learning rate means smaller updates, and a larger learning rate means larger updates. The value for `LEARNING_RATE` is defined earlier in the notebook as `5e-6`. Choosing an appropriate learning rate is important for successful training.

In essence, this line sets up the Adam optimizer to be used for training the `resnet_model_for_cnc`. It specifies which parameters of the model should be updated and the initial learning rate to be used by the optimizer.

In [None]:
optimizer = Adam(resnet_model_for_cnc.parameters(), lr=LEARNING_RATE)

# Defining the Loss Function

Here's what it does:

- **`loss_function`**: This is the variable name you are assigning to the created loss function object. You can use this variable later in your code to calculate the loss.

- **`CrossEntropyLoss()`**: This is the constructor for the `CrossEntropyLoss` class, which was imported from the `torch.nn` module earlier in the notebook. The `CrossEntropyLoss` is a very common and widely used loss function for classification problems. In the context of training a neural network for classification (like the image classification task this code is likely for), the loss function measures the difference between the model's predicted output and the actual target label.

Specifically, `CrossEntropyLoss` is often used when the model outputs raw, unnormalized scores (sometimes called logits) for each class, and the target labels are integers representing the correct class index. The `CrossEntropyLoss` combines two operations:

1. **Softmax**: It applies a Softmax function to the model's output logits to convert them into probabilities across all classes. These probabilities represent the model's confidence that the input belongs to each class.

2. **Negative Log Likelihood (NLL) Loss**: It then calculates the negative log likelihood of the true class given the predicted probabilities. This measures how "surprised" the model is by the correct answer. A lower negative log likelihood means the model assigned a high probability to the correct class, resulting in a lower loss.

By using `CrossEntropyLoss`, the training process aims to adjust the model's parameters so that it assigns higher probabilities to the correct classes for the training data, thereby minimizing this loss function. This leads to the model learning to correctly classify the input images.

In simpler terms, `loss_function = CrossEntropyLoss()` sets up the mechanism to tell your model how wrong it is during training, specifically for a task where you are trying to categorize inputs into different classes. The optimizer then uses this loss value to figure out how to adjust the model to make it less wrong in the future.


In [None]:
loss_function = CrossEntropyLoss()

# The Training Loop

This block of code represents the core training loop for your neural network model. It iterates through the dataset multiple times (epochs) to train the model.

Let's break down the code step by step:
- **`for i_iteration in tqdm(range(NO_OF_ITERATIONS)):`**

This is the outer loop that iterates for a specified number of **iterations** (or epochs). The number of iterations is determined by the value of the `NO_OF_ITERATIONS` variable, which was set earlier in the notebook to 300. The `tqdm` function wraps the `range` object, providing a progress bar that shows the progress of this loop, making it easy to track how many iterations have been completed.
- **`training_loss = []`**
    **`for i_image, i_target in dataloader_for_cnc:`**

Inside the main iteration loop, this starts a nested loop. This inner loop iterates over batches of data provided by the `dataloader_for_cnc`. The `dataloader_for_cnc` is configured to load the images and their corresponding labels in batches of size `BATCH_SIZE` (64). In each iteration of this inner loop, `i_image` will contain a batch of images, and `i_target` will contain the corresponding batch of labels. An empty list `training_loss` is initialized at the beginning of each outer iteration to store the loss for each batch within that iteration.
- **`i_image = i_image.to(device=CUDA_DEVICE) \
i_target = i_target.to(device=CUDA_DEVICE)`**

These lines move the current batch of images (`i_image`) and their target labels (`i_target`) to the specified device, which is determined by the `CUDA_DEVICE` variable. This ensures that the data is on the same device as the model before computations are performed, which is essential for efficient training, especially when using a GPU.
- **`optimizer.zero_grad()`**  

Before performing the forward pass and calculating gradients, it's crucial to clear any previously computed gradients. This line calls the `zero_grad()` method of the optimizer, which sets the gradients of all optimized parameters to zero. This prevents gradients from accumulating across batches.
- **`output = resnet_model_for_cnc(i_image)`**  

This line performs the **forward pass**. The batch of images (`i_image`) is passed through the neural network model (`resnet_model_for_cnc`). The model processes the images and produces an `output` tensor, which represents the model's predictions for each image in the batch.
- **`loss = loss_function(output, i_target)`**  

This line calculates the **loss**. The `loss_function` (which was defined as `CrossEntropyLoss` earlier) compares the model's predictions (`output`) with the actual target labels (`i_target`) and calculates a measure of how well the model is performing. The lower the loss, the better the model's predictions match the true labels [1].
- **`loss.backward()`**  

This line performs the **backward pass**. Based on the calculated `loss`, PyTorch's automatic differentiation engine computes the gradients of the loss with respect to the model's parameters. These gradients indicate how much each parameter should be adjusted to reduce the loss.
- **`optimizer.step()`**  

This line updates the model's parameters. The `optimizer` uses the gradients computed during the backward pass to adjust the weights and biases of the model. The `step()` method applies the chosen optimization algorithm (Adam in this case) to update the parameters in a way that minimizes the loss function.
- **`training_loss.append(loss.item())`**  

The loss for the current batch is extracted as a Python number using `.item()` and appended to the `training_loss` list. This list accumulates the loss for each batch within the current iteration.
- **`print("Training loss at", i_iteration, "iteration is", np.mean(training_loss))`**  

After processing all batches in an iteration (the inner loop finishes), this line calculates the average training loss for that iteration by taking the mean of the losses stored in the `training_loss` list. It then prints the iteration number and the calculated average training loss. This provides feedback on the model's training progress.
- **`torch.save(resnet_model_for_cnc.state_dict(), PATH_TO_MODEL_SAVE)`**  

Finally, after each iteration, this line saves the current state of the model's parameters (weights and biases). The `state_dict()` method returns a dictionary containing the learnable parameters of the model [2]. This dictionary is then saved to a file specified by `PATH_TO_MODEL_SAVE`. Saving the model's state periodically allows you to resume training from a checkpoint or use the trained model later for inference.

In [None]:
for i_iteration in tqdm(range(NO_OF_ITERATIONS)):
    training_loss = []
    for i_image, i_target in dataloader_for_cnc:
        i_image = i_image.to(device=CUDA_DEVICE)
        i_target = i_target.to(device=CUDA_DEVICE)
        optimizer.zero_grad()

        output = resnet_model_for_cnc(i_image)
        loss = loss_function(output, i_target)
        loss.backward()
        optimizer.step()

        training_loss.append(loss.item())

    print("Training loss at", i_iteration, "iteration is", np.mean(training_loss))

    torch.save(resnet_model_for_cnc.state_dict(), PATH_TO_MODEL_SAVE)

  0%|                                                   | 0/300 [00:00<?, ?it/s]

Training loss at 0 iteration is 0.9054904466583615


  0%|▏                                        | 1/300 [00:24<2:00:50, 24.25s/it]

Training loss at 1 iteration is 0.9049198882920402


  1%|▎                                        | 2/300 [00:48<2:00:06, 24.18s/it]

Training loss at 2 iteration is 0.906199091956729


  1%|▍                                        | 3/300 [01:12<2:00:08, 24.27s/it]

Training loss at 3 iteration is 0.9049553502173651


  1%|▌                                        | 4/300 [01:36<1:59:30, 24.23s/it]

Training loss at 4 iteration is 0.905931830406189


  2%|▋                                        | 5/300 [02:01<1:59:39, 24.34s/it]

Training loss at 5 iteration is 0.9055460606302533


  2%|▊                                        | 6/300 [02:25<1:59:26, 24.38s/it]

Training loss at 6 iteration is 0.9058047334353129


  2%|▉                                        | 7/300 [02:50<1:58:53, 24.35s/it]

Training loss at 7 iteration is 0.9057417455173674


  3%|█                                        | 8/300 [03:14<1:58:30, 24.35s/it]

Training loss at 8 iteration is 0.9057710369427999


  3%|█▏                                       | 9/300 [03:38<1:58:04, 24.34s/it]

Training loss at 9 iteration is 0.9060954224495661


  3%|█▎                                      | 10/300 [04:03<1:57:52, 24.39s/it]

Training loss at 10 iteration is 0.9069323369434902


  4%|█▍                                      | 11/300 [04:27<1:57:34, 24.41s/it]

Training loss at 11 iteration is 0.9063944617907206


  4%|█▌                                      | 12/300 [04:52<1:57:05, 24.39s/it]

Training loss at 12 iteration is 0.9049525119009472


  4%|█▋                                      | 13/300 [05:16<1:56:48, 24.42s/it]

Training loss at 13 iteration is 0.9068637830870492


  5%|█▊                                      | 14/300 [05:41<1:56:21, 24.41s/it]

Training loss at 14 iteration is 0.9064975948560805


  5%|██                                      | 15/300 [06:05<1:55:53, 24.40s/it]

Training loss at 15 iteration is 0.9061731809661502


  5%|██▏                                     | 16/300 [06:29<1:55:22, 24.37s/it]

Training loss at 16 iteration is 0.9054468671480814


  6%|██▎                                     | 17/300 [06:54<1:54:57, 24.37s/it]

Training loss at 17 iteration is 0.9058017276582264


  6%|██▍                                     | 18/300 [07:18<1:54:41, 24.40s/it]

Training loss at 18 iteration is 0.9053418182191395


  6%|██▌                                     | 19/300 [07:43<1:54:32, 24.46s/it]

Training loss at 19 iteration is 0.9069038601148696


  7%|██▋                                     | 20/300 [08:07<1:53:45, 24.38s/it]

Training loss at 20 iteration is 0.9055296097482953


  7%|██▊                                     | 21/300 [08:31<1:52:37, 24.22s/it]

Training loss at 21 iteration is 0.9053895870844523


  7%|██▉                                     | 22/300 [08:55<1:52:38, 24.31s/it]

Training loss at 22 iteration is 0.9053482753889901


  8%|███                                     | 23/300 [09:20<1:52:11, 24.30s/it]

Training loss at 23 iteration is 0.9052759323801313


  8%|███▏                                    | 24/300 [09:44<1:52:43, 24.51s/it]

Training loss at 24 iteration is 0.9052831104823521


  8%|███▎                                    | 25/300 [10:10<1:53:39, 24.80s/it]

Training loss at 25 iteration is 0.9057487845420837


  9%|███▍                                    | 26/300 [10:35<1:53:44, 24.91s/it]

Training loss at 26 iteration is 0.9064892615590777


  9%|███▌                                    | 27/300 [11:00<1:53:32, 24.95s/it]

Training loss at 27 iteration is 0.907181753998711


  9%|███▋                                    | 28/300 [11:24<1:51:57, 24.70s/it]

Training loss at 28 iteration is 0.9054522684642247


 10%|███▊                                    | 29/300 [11:49<1:51:21, 24.66s/it]

Training loss at 29 iteration is 0.9049139647256761


 10%|████                                    | 30/300 [12:13<1:50:08, 24.47s/it]

Training loss at 30 iteration is 0.9048893082709539


 10%|████▏                                   | 31/300 [12:38<1:50:31, 24.65s/it]

Training loss at 31 iteration is 0.9051167141823542


 11%|████▎                                   | 32/300 [13:03<1:50:13, 24.68s/it]

Training loss at 32 iteration is 0.9052298324448722


 11%|████▍                                   | 33/300 [13:27<1:49:01, 24.50s/it]

Training loss at 33 iteration is 0.9054422123091561


 11%|████▌                                   | 34/300 [13:51<1:48:16, 24.42s/it]

Training loss at 34 iteration is 0.9052776467232477


 12%|████▋                                   | 35/300 [14:15<1:47:31, 24.35s/it]

Training loss at 35 iteration is 0.9056197433244615


 12%|████▊                                   | 36/300 [14:39<1:46:47, 24.27s/it]

Training loss at 36 iteration is 0.9061226078442165


 12%|████▉                                   | 37/300 [15:03<1:46:04, 24.20s/it]

Training loss at 37 iteration is 0.9055267033122835


 13%|█████                                   | 38/300 [15:27<1:45:31, 24.17s/it]

Training loss at 38 iteration is 0.9054295363880339


 13%|█████▏                                  | 39/300 [15:52<1:45:09, 24.18s/it]

Training loss at 39 iteration is 0.904909420581091


 13%|█████▎                                  | 40/300 [16:16<1:45:00, 24.23s/it]

Training loss at 40 iteration is 0.9050684769948324


 14%|█████▍                                  | 41/300 [16:40<1:44:54, 24.30s/it]

Training loss at 41 iteration is 0.9054190459705534


 14%|█████▌                                  | 42/300 [17:05<1:44:47, 24.37s/it]

Training loss at 42 iteration is 0.904854246548244


 14%|█████▋                                  | 43/300 [17:29<1:44:12, 24.33s/it]

Training loss at 43 iteration is 0.9058002602486384


 15%|█████▊                                  | 44/300 [17:54<1:44:18, 24.45s/it]

Training loss at 44 iteration is 0.9055039740744091


 15%|██████                                  | 45/300 [18:19<1:44:31, 24.60s/it]

Training loss at 45 iteration is 0.9054190204257057


 15%|██████▏                                 | 46/300 [18:43<1:43:46, 24.51s/it]

Training loss at 46 iteration is 0.9050804376602173


 16%|██████▎                                 | 47/300 [19:08<1:43:27, 24.54s/it]

Training loss at 47 iteration is 0.9060696817579723


 16%|██████▍                                 | 48/300 [19:32<1:43:03, 24.54s/it]

Training loss at 48 iteration is 0.9064055965060279


 16%|██████▌                                 | 49/300 [19:56<1:42:08, 24.42s/it]

Training loss at 49 iteration is 0.904847020194644


 17%|██████▋                                 | 50/300 [20:21<1:41:18, 24.31s/it]

Training loss at 50 iteration is 0.904908021291097


 17%|██████▊                                 | 51/300 [20:45<1:40:36, 24.24s/it]

Training loss at 51 iteration is 0.9062785903612772


 17%|██████▉                                 | 52/300 [21:09<1:40:03, 24.21s/it]

Training loss at 52 iteration is 0.9055063242004031


 18%|███████                                 | 53/300 [21:33<1:39:30, 24.17s/it]

Training loss at 53 iteration is 0.9050418535868326


 18%|███████▏                                | 54/300 [21:57<1:39:07, 24.18s/it]

Training loss at 54 iteration is 0.9066180615198045


 18%|███████▎                                | 55/300 [22:21<1:38:44, 24.18s/it]

Training loss at 55 iteration is 0.9048589212553841


 19%|███████▍                                | 56/300 [22:45<1:38:20, 24.18s/it]

Training loss at 56 iteration is 0.9054267434846788


 19%|███████▌                                | 57/300 [23:10<1:37:55, 24.18s/it]

Training loss at 57 iteration is 0.9051986081259591


 19%|███████▋                                | 58/300 [23:34<1:37:30, 24.18s/it]

Training loss at 58 iteration is 0.9070101692563012


 20%|███████▊                                | 59/300 [23:58<1:37:05, 24.17s/it]

Training loss at 59 iteration is 0.9048965459778195


 20%|████████                                | 60/300 [24:22<1:36:41, 24.17s/it]

Training loss at 60 iteration is 0.9049297230584281


 20%|████████▏                               | 61/300 [24:46<1:36:16, 24.17s/it]

Training loss at 61 iteration is 0.905160095010485


 21%|████████▎                               | 62/300 [25:11<1:36:06, 24.23s/it]

Training loss at 62 iteration is 0.9049426629429772


 21%|████████▍                               | 63/300 [25:35<1:35:55, 24.29s/it]

Training loss at 63 iteration is 0.9048658041726976


 21%|████████▌                               | 64/300 [25:59<1:35:40, 24.32s/it]

Training loss at 64 iteration is 0.9057249994505019


 22%|████████▋                               | 65/300 [26:24<1:35:07, 24.29s/it]

Training loss at 65 iteration is 0.9051159081004915


 22%|████████▊                               | 66/300 [26:48<1:34:36, 24.26s/it]

Training loss at 66 iteration is 0.9056797481718517


 22%|████████▉                               | 67/300 [27:12<1:34:06, 24.23s/it]

Training loss at 67 iteration is 0.9064184796242487


 23%|█████████                               | 68/300 [27:36<1:33:41, 24.23s/it]

Training loss at 68 iteration is 0.9066891017414275


 23%|█████████▏                              | 69/300 [28:00<1:33:18, 24.24s/it]

Training loss at 69 iteration is 0.9064855603944688


 23%|█████████▎                              | 70/300 [28:25<1:32:54, 24.24s/it]

Training loss at 70 iteration is 0.9049062331517538


 24%|█████████▍                              | 71/300 [28:49<1:32:24, 24.21s/it]

Training loss at 71 iteration is 0.9049468693279085


 24%|█████████▌                              | 72/300 [29:13<1:32:00, 24.21s/it]

Training loss at 72 iteration is 0.9048944513003031


 24%|█████████▋                              | 73/300 [29:37<1:31:26, 24.17s/it]

Training loss at 73 iteration is 0.9059579457555499


 25%|█████████▊                              | 74/300 [30:01<1:30:37, 24.06s/it]

Training loss at 74 iteration is 0.9058924885023207


 25%|██████████                              | 75/300 [30:25<1:29:56, 23.99s/it]

Training loss at 75 iteration is 0.9061126084554763


 25%|██████████▏                             | 76/300 [30:48<1:29:13, 23.90s/it]

Training loss at 76 iteration is 0.9064755865505764


 26%|██████████▎                             | 77/300 [31:12<1:28:35, 23.84s/it]

Training loss at 77 iteration is 0.9050779796781994


 26%|██████████▍                             | 78/300 [31:36<1:28:02, 23.80s/it]

Training loss at 78 iteration is 0.9051290154457092


 26%|██████████▌                             | 79/300 [32:00<1:27:42, 23.81s/it]

Training loss at 79 iteration is 0.9053934443564642


 27%|██████████▋                             | 80/300 [32:24<1:27:41, 23.92s/it]

Training loss at 80 iteration is 0.9056432360694522


 27%|██████████▊                             | 81/300 [32:48<1:27:32, 23.99s/it]

Training loss at 81 iteration is 0.9055781534739903


 27%|██████████▉                             | 82/300 [33:12<1:27:20, 24.04s/it]

Training loss at 82 iteration is 0.9049896881693885


 28%|███████████                             | 83/300 [33:36<1:27:09, 24.10s/it]

Training loss at 83 iteration is 0.9049942209607079


 28%|███████████▏                            | 84/300 [34:01<1:26:53, 24.13s/it]

Training loss at 84 iteration is 0.9063838408106849


 28%|███████████▎                            | 85/300 [34:25<1:26:36, 24.17s/it]

Training loss at 85 iteration is 0.9063910501343864


 29%|███████████▍                            | 86/300 [34:49<1:26:16, 24.19s/it]

Training loss at 86 iteration is 0.9049444226991563


 29%|███████████▌                            | 87/300 [35:13<1:25:56, 24.21s/it]

Training loss at 87 iteration is 0.9064444473811558


 29%|███████████▋                            | 88/300 [35:38<1:25:38, 24.24s/it]

Training loss at 88 iteration is 0.905192996774401


 30%|███████████▊                            | 89/300 [36:02<1:25:07, 24.20s/it]

Training loss at 89 iteration is 0.9053756679807391


 30%|████████████                            | 90/300 [36:26<1:24:42, 24.20s/it]

Training loss at 90 iteration is 0.9056510840143476


 30%|████████████▏                           | 91/300 [36:50<1:24:11, 24.17s/it]

Training loss at 91 iteration is 0.9066353525434222


 31%|████████████▎                           | 92/300 [37:14<1:23:45, 24.16s/it]

Training loss at 92 iteration is 0.9053764740626017


 31%|████████████▍                           | 93/300 [37:38<1:23:20, 24.16s/it]

Training loss at 93 iteration is 0.9048656253587632


 31%|████████████▌                           | 94/300 [38:02<1:22:52, 24.14s/it]

Training loss at 94 iteration is 0.9048727779161363


 32%|████████████▋                           | 95/300 [38:27<1:22:32, 24.16s/it]

Training loss at 95 iteration is 0.9051274714015779


 32%|████████████▊                           | 96/300 [38:51<1:21:51, 24.08s/it]

Training loss at 96 iteration is 0.9048752699579511


 32%|████████████▉                           | 97/300 [39:14<1:21:15, 24.02s/it]

Training loss at 97 iteration is 0.904959343728565


 33%|█████████████                           | 98/300 [39:38<1:20:44, 23.98s/it]

Training loss at 98 iteration is 0.9064308830669948


 33%|█████████████▏                          | 99/300 [40:02<1:20:20, 23.98s/it]

Training loss at 99 iteration is 0.9070961219923837


 33%|█████████████                          | 100/300 [40:26<1:19:57, 23.99s/it]

Training loss at 100 iteration is 0.9050456824756804


 34%|█████████████▏                         | 101/300 [40:50<1:19:36, 24.00s/it]

Training loss at 101 iteration is 0.9057389611289615


 34%|█████████████▎                         | 102/300 [41:14<1:18:59, 23.94s/it]

Training loss at 102 iteration is 0.90505998475211


 34%|█████████████▍                         | 103/300 [41:38<1:18:35, 23.93s/it]

Training loss at 103 iteration is 0.9052639035951524


 35%|█████████████▌                         | 104/300 [42:02<1:18:14, 23.95s/it]

Training loss at 104 iteration is 0.9048623073668707


 35%|█████████████▋                         | 105/300 [42:26<1:17:44, 23.92s/it]

Training loss at 105 iteration is 0.9060263860793341


 35%|█████████████▊                         | 106/300 [42:50<1:17:21, 23.92s/it]

Training loss at 106 iteration is 0.9053970944313776


 36%|█████████████▉                         | 107/300 [43:14<1:16:52, 23.90s/it]

Training loss at 107 iteration is 0.9048464979444232


 36%|██████████████                         | 108/300 [43:37<1:16:18, 23.85s/it]

Training loss at 108 iteration is 0.9051718456404549


 36%|██████████████▏                        | 109/300 [44:01<1:15:49, 23.82s/it]

Training loss at 109 iteration is 0.9062347014745077


 37%|██████████████▎                        | 110/300 [44:25<1:15:25, 23.82s/it]

Training loss at 110 iteration is 0.9063822144553775


 37%|██████████████▍                        | 111/300 [44:49<1:14:58, 23.80s/it]

Training loss at 111 iteration is 0.906399993669419


 37%|██████████████▌                        | 112/300 [45:12<1:14:30, 23.78s/it]

Training loss at 112 iteration is 0.9053887441044762


 38%|██████████████▋                        | 113/300 [45:37<1:14:24, 23.88s/it]

Training loss at 113 iteration is 0.9075732089224315


 38%|██████████████▊                        | 114/300 [46:00<1:13:57, 23.86s/it]

Training loss at 114 iteration is 0.905355433622996


 38%|██████████████▉                        | 115/300 [46:25<1:13:53, 23.96s/it]

Training loss at 115 iteration is 0.9049343722207206


 39%|███████████████                        | 116/300 [46:49<1:13:28, 23.96s/it]

Training loss at 116 iteration is 0.9062337307702928


 39%|███████████████▏                       | 117/300 [47:13<1:13:18, 24.03s/it]

Training loss at 117 iteration is 0.9061160456566584


 39%|███████████████▎                       | 118/300 [47:37<1:13:02, 24.08s/it]

Training loss at 118 iteration is 0.9053109344981966


 40%|███████████████▍                       | 119/300 [48:01<1:12:43, 24.11s/it]

Training loss at 119 iteration is 0.9056921686444964


 40%|███████████████▌                       | 120/300 [48:25<1:12:28, 24.16s/it]

Training loss at 120 iteration is 0.9055960149992079


 40%|███████████████▋                       | 121/300 [48:49<1:11:50, 24.08s/it]

Training loss at 121 iteration is 0.9051296200071063


 41%|███████████████▊                       | 122/300 [49:13<1:11:17, 24.03s/it]

Training loss at 122 iteration is 0.905355033420381


 41%|███████████████▉                       | 123/300 [49:37<1:11:03, 24.09s/it]

Training loss at 123 iteration is 0.9057162262144542


 41%|████████████████                       | 124/300 [50:02<1:10:37, 24.08s/it]

Training loss at 124 iteration is 0.9048505851200649


 42%|████████████████▎                      | 125/300 [50:25<1:10:01, 24.01s/it]

Training loss at 125 iteration is 0.9072655950273786


 42%|████████████████▍                      | 126/300 [50:49<1:09:43, 24.04s/it]

Training loss at 126 iteration is 0.9049218069939386


 42%|████████████████▌                      | 127/300 [51:13<1:09:10, 23.99s/it]

Training loss at 127 iteration is 0.9054514027777172


 43%|████████████████▋                      | 128/300 [51:37<1:08:41, 23.96s/it]

Training loss at 128 iteration is 0.9066802291643052


 43%|████████████████▊                      | 129/300 [52:01<1:08:20, 23.98s/it]

Training loss at 129 iteration is 0.9057564252898807


 43%|████████████████▉                      | 130/300 [52:25<1:07:53, 23.96s/it]

Training loss at 130 iteration is 0.9058578127906436


 44%|█████████████████                      | 131/300 [52:49<1:07:20, 23.91s/it]

Training loss at 131 iteration is 0.9049820871580214


 44%|█████████████████▏                     | 132/300 [53:13<1:06:50, 23.87s/it]

Training loss at 132 iteration is 0.9051710821333385


 44%|█████████████████▎                     | 133/300 [53:37<1:06:20, 23.84s/it]

Training loss at 133 iteration is 0.905606902780987


 45%|█████████████████▍                     | 134/300 [54:00<1:05:54, 23.82s/it]

Training loss at 134 iteration is 0.9050662120183309


 45%|█████████████████▌                     | 135/300 [54:24<1:05:28, 23.81s/it]

Training loss at 135 iteration is 0.9056533858889625


 45%|█████████████████▋                     | 136/300 [54:48<1:05:00, 23.78s/it]

Training loss at 136 iteration is 0.9063315249624706


 46%|█████████████████▊                     | 137/300 [55:12<1:04:37, 23.79s/it]

Training loss at 137 iteration is 0.9063502039228167


 46%|█████████████████▉                     | 138/300 [55:35<1:04:16, 23.80s/it]

Training loss at 138 iteration is 0.9055438751266116


 46%|██████████████████                     | 139/300 [55:59<1:03:51, 23.80s/it]

Training loss at 139 iteration is 0.9054458879289173


 47%|██████████████████▏                    | 140/300 [56:23<1:03:28, 23.80s/it]

Training loss at 140 iteration is 0.9057781469254267


 47%|██████████████████▎                    | 141/300 [56:47<1:03:02, 23.79s/it]

Training loss at 141 iteration is 0.9067591003009251


 47%|██████████████████▍                    | 142/300 [57:11<1:02:37, 23.78s/it]

Training loss at 142 iteration is 0.9049953307424273


 48%|██████████████████▌                    | 143/300 [57:34<1:02:12, 23.78s/it]

Training loss at 143 iteration is 0.9062523160661969


 48%|██████████████████▋                    | 144/300 [57:58<1:01:48, 23.77s/it]

Training loss at 144 iteration is 0.9064340875262306


 48%|██████████████████▊                    | 145/300 [58:22<1:01:22, 23.76s/it]

Training loss at 145 iteration is 0.9051262736320496


 49%|██████████████████▉                    | 146/300 [58:46<1:00:58, 23.75s/it]

Training loss at 146 iteration is 0.9053393318539574


 49%|███████████████████                    | 147/300 [59:09<1:00:37, 23.77s/it]

Training loss at 147 iteration is 0.9049869151342482


 49%|███████████████████▏                   | 148/300 [59:33<1:00:13, 23.77s/it]

Training loss at 148 iteration is 0.9049336370967683


 50%|████████████████████▎                    | 149/300 [59:57<59:49, 23.77s/it]

Training loss at 149 iteration is 0.9058132370313009


 50%|███████████████████▌                   | 150/300 [1:00:21<59:44, 23.90s/it]

Training loss at 150 iteration is 0.9063901958011445


 50%|███████████████████▋                   | 151/300 [1:00:45<59:33, 23.98s/it]

Training loss at 151 iteration is 0.905207633972168


 51%|███████████████████▊                   | 152/300 [1:01:09<59:17, 24.03s/it]

Training loss at 152 iteration is 0.9048691789309183


 51%|███████████████████▉                   | 153/300 [1:01:34<59:01, 24.09s/it]

Training loss at 153 iteration is 0.9054853263355437


 51%|████████████████████                   | 154/300 [1:01:58<58:42, 24.13s/it]

Training loss at 154 iteration is 0.9049590911184039


 52%|████████████████████▏                  | 155/300 [1:02:22<58:20, 24.14s/it]

Training loss at 155 iteration is 0.9050839117595127


 52%|████████████████████▎                  | 156/300 [1:02:46<58:01, 24.17s/it]

Training loss at 156 iteration is 0.9051374140239897


 52%|████████████████████▍                  | 157/300 [1:03:11<57:42, 24.22s/it]

Training loss at 157 iteration is 0.9050468263171968


 53%|████████████████████▌                  | 158/300 [1:03:35<57:21, 24.24s/it]

Training loss at 158 iteration is 0.9050030254182362


 53%|████████████████████▋                  | 159/300 [1:03:59<56:57, 24.24s/it]

Training loss at 159 iteration is 0.9056359160514105


 53%|████████████████████▊                  | 160/300 [1:04:23<56:29, 24.21s/it]

Training loss at 160 iteration is 0.9063077597391038


 54%|████████████████████▉                  | 161/300 [1:04:47<56:03, 24.20s/it]

Training loss at 161 iteration is 0.9049949475697109


 54%|█████████████████████                  | 162/300 [1:05:12<55:36, 24.18s/it]

Training loss at 162 iteration is 0.9051911688986278


 54%|█████████████████████▏                 | 163/300 [1:05:36<55:13, 24.19s/it]

Training loss at 163 iteration is 0.9049668141773769


 55%|█████████████████████▎                 | 164/300 [1:06:00<55:07, 24.32s/it]

Training loss at 164 iteration is 0.9052271019844782


 55%|█████████████████████▍                 | 165/300 [1:06:25<54:37, 24.28s/it]

Training loss at 165 iteration is 0.9050610207376026


 55%|█████████████████████▌                 | 166/300 [1:06:49<54:07, 24.23s/it]

Training loss at 166 iteration is 0.905143834295727


 56%|█████████████████████▋                 | 167/300 [1:07:13<53:38, 24.20s/it]

Training loss at 167 iteration is 0.9048427513667515


 56%|█████████████████████▊                 | 168/300 [1:07:37<53:14, 24.20s/it]

Training loss at 168 iteration is 0.9050432926132566


 56%|█████████████████████▉                 | 169/300 [1:08:01<52:47, 24.18s/it]

Training loss at 169 iteration is 0.9056309319677807


 57%|██████████████████████                 | 170/300 [1:08:25<52:08, 24.07s/it]

Training loss at 170 iteration is 0.9049644413448515


 57%|██████████████████████▏                | 171/300 [1:08:49<51:34, 23.99s/it]

Training loss at 171 iteration is 0.9059536485444932


 57%|██████████████████████▎                | 172/300 [1:09:13<51:15, 24.02s/it]

Training loss at 172 iteration is 0.9057877290816534


 58%|██████████████████████▍                | 173/300 [1:09:37<50:53, 24.04s/it]

Training loss at 173 iteration is 0.904870019072578


 58%|██████████████████████▌                | 174/300 [1:10:01<50:30, 24.05s/it]

Training loss at 174 iteration is 0.9055435146604266


 58%|██████████████████████▊                | 175/300 [1:10:25<50:08, 24.07s/it]

Training loss at 175 iteration is 0.9055678361938113


 59%|██████████████████████▉                | 176/300 [1:10:49<49:50, 24.12s/it]

Training loss at 176 iteration is 0.9049153838838849


 59%|███████████████████████                | 177/300 [1:11:14<49:29, 24.14s/it]

Training loss at 177 iteration is 0.90483672278268


 59%|███████████████████████▏               | 178/300 [1:11:38<49:08, 24.17s/it]

Training loss at 178 iteration is 0.9049342473347982


 60%|███████████████████████▎               | 179/300 [1:12:02<48:52, 24.23s/it]

Training loss at 179 iteration is 0.9060987801778884


 60%|███████████████████████▍               | 180/300 [1:12:27<48:34, 24.28s/it]

Training loss at 180 iteration is 0.9050157183692569


 60%|███████████████████████▌               | 181/300 [1:12:51<48:08, 24.27s/it]

Training loss at 181 iteration is 0.9057232964606512


 61%|███████████████████████▋               | 182/300 [1:13:15<47:42, 24.26s/it]

Training loss at 182 iteration is 0.9054338534673055


 61%|███████████████████████▊               | 183/300 [1:13:39<47:16, 24.24s/it]

Training loss at 183 iteration is 0.9053833654948643


 61%|███████████████████████▉               | 184/300 [1:14:03<46:47, 24.20s/it]

Training loss at 184 iteration is 0.9048651740664527


 62%|████████████████████████               | 185/300 [1:14:27<46:18, 24.16s/it]

Training loss at 185 iteration is 0.904855557850429


 62%|████████████████████████▏              | 186/300 [1:14:52<45:52, 24.14s/it]

Training loss at 186 iteration is 0.9062790303003221


 62%|████████████████████████▎              | 187/300 [1:15:16<45:27, 24.14s/it]

Training loss at 187 iteration is 0.9048549589656648


 63%|████████████████████████▍              | 188/300 [1:15:40<45:01, 24.12s/it]

Training loss at 188 iteration is 0.9048506305331275


 63%|████████████████████████▌              | 189/300 [1:16:04<44:38, 24.13s/it]

Training loss at 189 iteration is 0.9049138455163865


 63%|████████████████████████▋              | 190/300 [1:16:28<44:14, 24.13s/it]

Training loss at 190 iteration is 0.9056603284109206


 64%|████████████████████████▊              | 191/300 [1:16:52<43:51, 24.14s/it]

Training loss at 191 iteration is 0.9064746697743734


 64%|████████████████████████▉              | 192/300 [1:17:16<43:29, 24.16s/it]

Training loss at 192 iteration is 0.904921463557652


 64%|█████████████████████████              | 193/300 [1:17:41<43:05, 24.16s/it]

Training loss at 193 iteration is 0.9061563156899952


 65%|█████████████████████████▏             | 194/300 [1:18:05<42:44, 24.20s/it]

Training loss at 194 iteration is 0.9053400442713783


 65%|█████████████████████████▎             | 195/300 [1:18:29<42:20, 24.20s/it]

Training loss at 195 iteration is 0.9066026891980853


 65%|█████████████████████████▍             | 196/300 [1:18:53<41:54, 24.18s/it]

Training loss at 196 iteration is 0.9057065503937858


 66%|█████████████████████████▌             | 197/300 [1:19:17<41:27, 24.15s/it]

Training loss at 197 iteration is 0.905034928094773


 66%|█████████████████████████▋             | 198/300 [1:19:41<41:03, 24.15s/it]

Training loss at 198 iteration is 0.9053737038657779


 66%|█████████████████████████▊             | 199/300 [1:20:06<40:41, 24.17s/it]

Training loss at 199 iteration is 0.9051411350568136


 67%|██████████████████████████             | 200/300 [1:20:30<40:16, 24.16s/it]

Training loss at 200 iteration is 0.9048702801976886


 67%|██████████████████████████▏            | 201/300 [1:20:54<39:50, 24.15s/it]

Training loss at 201 iteration is 0.9048877358436584


 67%|██████████████████████████▎            | 202/300 [1:21:18<39:25, 24.14s/it]

Training loss at 202 iteration is 0.9058529649462018


 68%|██████████████████████████▍            | 203/300 [1:21:42<39:01, 24.14s/it]

Training loss at 203 iteration is 0.9052360568727765


 68%|██████████████████████████▌            | 204/300 [1:22:06<38:38, 24.15s/it]

Training loss at 204 iteration is 0.9049837305432274


 68%|██████████████████████████▋            | 205/300 [1:22:30<38:13, 24.14s/it]

Training loss at 205 iteration is 0.9058908451171148


 69%|██████████████████████████▊            | 206/300 [1:22:55<37:58, 24.24s/it]

Training loss at 206 iteration is 0.9048947918982733


 69%|██████████████████████████▉            | 207/300 [1:23:20<37:53, 24.44s/it]

Training loss at 207 iteration is 0.9056205863044375


 69%|███████████████████████████            | 208/300 [1:23:45<37:38, 24.55s/it]

Training loss at 208 iteration is 0.9049833189873469


 70%|███████████████████████████▏           | 209/300 [1:24:09<37:12, 24.54s/it]

Training loss at 209 iteration is 0.9048540109679812


 70%|███████████████████████████▎           | 210/300 [1:24:34<36:52, 24.58s/it]

Training loss at 210 iteration is 0.904985507329305


 70%|███████████████████████████▍           | 211/300 [1:24:59<36:40, 24.72s/it]

Training loss at 211 iteration is 0.9054850481805348


 71%|███████████████████████████▌           | 212/300 [1:25:24<36:27, 24.86s/it]

Training loss at 212 iteration is 0.9049503888402667


 71%|███████████████████████████▋           | 213/300 [1:25:50<36:31, 25.19s/it]

Training loss at 213 iteration is 0.9049398075966608


 71%|███████████████████████████▊           | 214/300 [1:26:15<35:49, 24.99s/it]

Training loss at 214 iteration is 0.9051804088410877


 72%|███████████████████████████▉           | 215/300 [1:26:39<35:03, 24.75s/it]

Training loss at 215 iteration is 0.9053736811592465


 72%|████████████████████████████           | 216/300 [1:27:03<34:37, 24.74s/it]

Training loss at 216 iteration is 0.9048505907966977


 72%|████████████████████████████▏          | 217/300 [1:27:29<34:24, 24.87s/it]

Training loss at 217 iteration is 0.9049933779807318


 73%|████████████████████████████▎          | 218/300 [1:27:53<33:50, 24.77s/it]

Training loss at 218 iteration is 0.9062277646291823


 73%|████████████████████████████▍          | 219/300 [1:28:18<33:16, 24.64s/it]

Training loss at 219 iteration is 0.9051088321776617


 73%|████████████████████████████▌          | 220/300 [1:28:42<32:48, 24.60s/it]

Training loss at 220 iteration is 0.9055789737474351


 74%|████████████████████████████▋          | 221/300 [1:29:07<32:22, 24.59s/it]

Training loss at 221 iteration is 0.904881219069163


 74%|████████████████████████████▊          | 222/300 [1:29:32<32:07, 24.71s/it]

Training loss at 222 iteration is 0.9051206622804914


 74%|████████████████████████████▉          | 223/300 [1:29:57<31:53, 24.85s/it]

Training loss at 223 iteration is 0.905931810537974


 75%|█████████████████████████████          | 224/300 [1:30:21<31:17, 24.71s/it]

Training loss at 224 iteration is 0.9056844313939413


 75%|█████████████████████████████▎         | 225/300 [1:30:45<30:42, 24.56s/it]

Training loss at 225 iteration is 0.9048564036687216


 75%|█████████████████████████████▍         | 226/300 [1:31:10<30:08, 24.43s/it]

Training loss at 226 iteration is 0.9048787412189302


 76%|█████████████████████████████▌         | 227/300 [1:31:34<29:35, 24.33s/it]

Training loss at 227 iteration is 0.905049721399943


 76%|█████████████████████████████▋         | 228/300 [1:31:58<29:10, 24.31s/it]

Training loss at 228 iteration is 0.9052184025446574


 76%|█████████████████████████████▊         | 229/300 [1:32:23<29:04, 24.57s/it]

Training loss at 229 iteration is 0.9048873924073719


 77%|█████████████████████████████▉         | 230/300 [1:32:48<28:40, 24.59s/it]

Training loss at 230 iteration is 0.9059263467788696


 77%|██████████████████████████████         | 231/300 [1:33:12<28:13, 24.54s/it]

Training loss at 231 iteration is 0.9054490327835083


 77%|██████████████████████████████▏        | 232/300 [1:33:37<27:53, 24.61s/it]

Training loss at 232 iteration is 0.9048812644822257


 78%|██████████████████████████████▎        | 233/300 [1:34:01<27:23, 24.53s/it]

Training loss at 233 iteration is 0.9053123224349249


 78%|██████████████████████████████▍        | 234/300 [1:34:26<26:55, 24.47s/it]

Training loss at 234 iteration is 0.9049779971440634


 78%|██████████████████████████████▌        | 235/300 [1:34:50<26:26, 24.40s/it]

Training loss at 235 iteration is 0.9061618078322637


 79%|██████████████████████████████▋        | 236/300 [1:35:14<26:02, 24.41s/it]

Training loss at 236 iteration is 0.9048848663057599


 79%|██████████████████████████████▊        | 237/300 [1:35:39<25:36, 24.38s/it]

Training loss at 237 iteration is 0.9048364474659875


 79%|██████████████████████████████▉        | 238/300 [1:36:03<25:11, 24.37s/it]

Training loss at 238 iteration is 0.9049970053491139


 80%|███████████████████████████████        | 239/300 [1:36:27<24:41, 24.28s/it]

Training loss at 239 iteration is 0.906182138692765


 80%|███████████████████████████████▏       | 240/300 [1:36:51<24:19, 24.33s/it]

Training loss at 240 iteration is 0.9052738746007284


 80%|███████████████████████████████▎       | 241/300 [1:37:16<23:55, 24.33s/it]

Training loss at 241 iteration is 0.9061042694818406


 81%|███████████████████████████████▍       | 242/300 [1:37:40<23:31, 24.34s/it]

Training loss at 242 iteration is 0.9056747584115892


 81%|███████████████████████████████▌       | 243/300 [1:38:04<23:06, 24.33s/it]

Training loss at 243 iteration is 0.9052943048023042


 81%|███████████████████████████████▋       | 244/300 [1:38:29<22:42, 24.33s/it]

Training loss at 244 iteration is 0.9051377375920614


 82%|███████████████████████████████▊       | 245/300 [1:38:53<22:18, 24.33s/it]

Training loss at 245 iteration is 0.9055569853101458


 82%|███████████████████████████████▉       | 246/300 [1:39:17<21:53, 24.33s/it]

Training loss at 246 iteration is 0.9056113333929152


 82%|████████████████████████████████       | 247/300 [1:39:42<21:29, 24.33s/it]

Training loss at 247 iteration is 0.9053201505116054


 83%|████████████████████████████████▏      | 248/300 [1:40:06<21:04, 24.31s/it]

Training loss at 248 iteration is 0.9049147026879447


 83%|████████████████████████████████▎      | 249/300 [1:40:30<20:38, 24.28s/it]

Training loss at 249 iteration is 0.9048760561715989


 83%|████████████████████████████████▌      | 250/300 [1:40:54<20:13, 24.26s/it]

Training loss at 250 iteration is 0.9060529243378412


 84%|████████████████████████████████▋      | 251/300 [1:41:19<19:50, 24.29s/it]

Training loss at 251 iteration is 0.905105607850211


 84%|████████████████████████████████▊      | 252/300 [1:41:43<19:23, 24.24s/it]

Training loss at 252 iteration is 0.9052436351776123


 84%|████████████████████████████████▉      | 253/300 [1:42:07<19:00, 24.26s/it]

Training loss at 253 iteration is 0.9048604880060468


 85%|█████████████████████████████████      | 254/300 [1:42:31<18:33, 24.21s/it]

Training loss at 254 iteration is 0.9050538937250773


 85%|█████████████████████████████████▏     | 255/300 [1:42:56<18:10, 24.22s/it]

Training loss at 255 iteration is 0.9060616663524083


 85%|█████████████████████████████████▎     | 256/300 [1:43:20<17:45, 24.21s/it]

Training loss at 256 iteration is 0.9059346545310247


 86%|█████████████████████████████████▍     | 257/300 [1:43:44<17:19, 24.18s/it]

Training loss at 257 iteration is 0.9048526542527335


 86%|█████████████████████████████████▌     | 258/300 [1:44:08<16:55, 24.19s/it]

Training loss at 258 iteration is 0.9050086793445405


 86%|█████████████████████████████████▋     | 259/300 [1:44:32<16:31, 24.18s/it]

Training loss at 259 iteration is 0.9052224783670335


 87%|█████████████████████████████████▊     | 260/300 [1:44:57<16:10, 24.25s/it]

Training loss at 260 iteration is 0.905043846084958


 87%|█████████████████████████████████▉     | 261/300 [1:45:22<15:53, 24.46s/it]

Training loss at 261 iteration is 0.9048633745738438


 87%|██████████████████████████████████     | 262/300 [1:45:46<15:32, 24.55s/it]

Training loss at 262 iteration is 0.9051328642027718


 88%|██████████████████████████████████▏    | 263/300 [1:46:11<15:08, 24.56s/it]

Training loss at 263 iteration is 0.9049093921979269


 88%|██████████████████████████████████▎    | 264/300 [1:46:35<14:44, 24.56s/it]

Training loss at 264 iteration is 0.905595722652617


 88%|██████████████████████████████████▍    | 265/300 [1:47:00<14:19, 24.57s/it]

Training loss at 265 iteration is 0.908747789405641


 89%|██████████████████████████████████▌    | 266/300 [1:47:25<13:57, 24.64s/it]

Training loss at 266 iteration is 0.9055998495646885


 89%|██████████████████████████████████▋    | 267/300 [1:47:50<13:34, 24.69s/it]

Training loss at 267 iteration is 0.9052281663531349


 89%|██████████████████████████████████▊    | 268/300 [1:48:14<13:10, 24.71s/it]

Training loss at 268 iteration is 0.9050897388231187


 90%|██████████████████████████████████▉    | 269/300 [1:48:39<12:42, 24.60s/it]

Training loss at 269 iteration is 0.905018477212815


 90%|███████████████████████████████████    | 270/300 [1:49:03<12:15, 24.53s/it]

Training loss at 270 iteration is 0.9049271969568162


 90%|███████████████████████████████████▏   | 271/300 [1:49:28<11:51, 24.54s/it]

Training loss at 271 iteration is 0.9053898851076762


 91%|███████████████████████████████████▎   | 272/300 [1:49:52<11:26, 24.52s/it]

Training loss at 272 iteration is 0.904866479692005


 91%|███████████████████████████████████▍   | 273/300 [1:50:17<11:02, 24.53s/it]

Training loss at 273 iteration is 0.9048932563690912


 91%|███████████████████████████████████▌   | 274/300 [1:50:42<10:42, 24.70s/it]

Training loss at 274 iteration is 0.9050116624150958


 92%|███████████████████████████████████▊   | 275/300 [1:51:06<10:14, 24.60s/it]

Training loss at 275 iteration is 0.9054898676418123


 92%|███████████████████████████████████▉   | 276/300 [1:51:30<09:46, 24.45s/it]

Training loss at 276 iteration is 0.9048448432059515


 92%|████████████████████████████████████   | 277/300 [1:51:55<09:24, 24.56s/it]

Training loss at 277 iteration is 0.9053219358126322


 93%|████████████████████████████████████▏  | 278/300 [1:52:19<08:57, 24.44s/it]

Training loss at 278 iteration is 0.9053112381980533


 93%|████████████████████████████████████▎  | 279/300 [1:52:43<08:31, 24.36s/it]

Training loss at 279 iteration is 0.9049490009035382


 93%|████████████████████████████████████▍  | 280/300 [1:53:08<08:08, 24.42s/it]

Training loss at 280 iteration is 0.9054518483933949


 94%|████████████████████████████████████▌  | 281/300 [1:53:33<07:47, 24.59s/it]

Training loss at 281 iteration is 0.9054341571671622


 94%|████████████████████████████████████▋  | 282/300 [1:53:58<07:23, 24.66s/it]

Training loss at 282 iteration is 0.9049004770460582


 94%|████████████████████████████████████▊  | 283/300 [1:54:23<06:59, 24.70s/it]

Training loss at 283 iteration is 0.9052742577734447


 95%|████████████████████████████████████▉  | 284/300 [1:54:48<06:36, 24.77s/it]

Training loss at 284 iteration is 0.905314442657289


 95%|█████████████████████████████████████  | 285/300 [1:55:12<06:10, 24.68s/it]

Training loss at 285 iteration is 0.904961884021759


 95%|█████████████████████████████████████▏ | 286/300 [1:55:36<05:43, 24.51s/it]

Training loss at 286 iteration is 0.9051718257722401


 96%|█████████████████████████████████████▎ | 287/300 [1:56:01<05:20, 24.64s/it]

Training loss at 287 iteration is 0.9062056002162752


 96%|█████████████████████████████████████▍ | 288/300 [1:56:26<04:55, 24.65s/it]

Training loss at 288 iteration is 0.9049276993388221


 96%|█████████████████████████████████████▌ | 289/300 [1:56:50<04:31, 24.67s/it]

Training loss at 289 iteration is 0.9049006870814732


 97%|█████████████████████████████████████▋ | 290/300 [1:57:16<04:07, 24.79s/it]

Training loss at 290 iteration is 0.9053798346292405


 97%|█████████████████████████████████████▊ | 291/300 [1:57:40<03:42, 24.77s/it]

Training loss at 291 iteration is 0.9057718146414984


 97%|█████████████████████████████████████▉ | 292/300 [1:58:05<03:18, 24.76s/it]

Training loss at 292 iteration is 0.9048514281000409


 98%|██████████████████████████████████████ | 293/300 [1:58:30<02:53, 24.74s/it]

Training loss at 293 iteration is 0.9048615580513364


 98%|██████████████████████████████████████▏| 294/300 [1:58:54<02:28, 24.71s/it]

Training loss at 294 iteration is 0.9052247944332305


 98%|██████████████████████████████████████▎| 295/300 [1:59:18<02:02, 24.52s/it]

Training loss at 295 iteration is 0.9051691208566938


 99%|██████████████████████████████████████▍| 296/300 [1:59:43<01:37, 24.45s/it]

Training loss at 296 iteration is 0.9048652450243632


 99%|██████████████████████████████████████▌| 297/300 [2:00:07<01:13, 24.34s/it]

Training loss at 297 iteration is 0.9055892002014887


 99%|██████████████████████████████████████▋| 298/300 [2:00:31<00:48, 24.32s/it]

Training loss at 298 iteration is 0.9056140354701451


100%|██████████████████████████████████████▊| 299/300 [2:00:56<00:24, 24.48s/it]

Training loss at 299 iteration is 0.9056733335767474


100%|███████████████████████████████████████| 300/300 [2:01:20<00:00, 24.27s/it]
