
### Reference (if any): https://github.com/motokimura

In this notebook, you would need to use **Python 3.6+** along with the following packages:
```
1. pytorch 1.2
2. torchvision
3. numpy
4. matplotlib
5. tqdm (for better, cuter progress bar. Yay!)
```
To install pytorch, please follow the instructions on the [Official website](https://pytorch.org/). In addition, the [official document](https://pytorch.org/docs/stable/) could be very helpful when you want to find certain functionalities. 




### Colab Setup

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import sys
# modify "customized_path_to_homework", path of folder in drive, where you uploaded your homework
customized_path_to_homework = "/content/drive/My Drive/Assignment_3 /Assignment_3"
sys.path.append(customized_path_to_homework)

In [None]:
# run this to download dataset, give path to the download.sh file from your drive
#!sh "/content/drive/My Drive/Assignment_3 /Assignment_3/download_data.sh"

In [None]:
# copy and place downloaded dataset to your drive. To access dataset multiple times, no need to download everytime you open colab.
#!cp -r  /content/VOCdevkit_2007/ '/content/drive/My Drive/Assignment_3 /Assignment_3'

In [None]:
import os
import random
import cv2
import numpy as np
import torch
from torch.utils.data import DataLoader
from torchvision import models

from resnet_yolo import resnet50
from dataset import VocDetectorDataset
from eval_voc import evaluate
from predict import predict_image
from config import VOC_CLASSES, COLORS
import matplotlib.pyplot as plt
from tqdm import tqdm

%matplotlib inline
%load_ext autoreload
%autoreload 2

## Initialization

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(torch.cuda.get_device_name()) # GPU name

Tesla T4


# You Only Look Once: Unified, Real-Time Object Detection 
In this notebook,  the loss function is implemented and train the **YOLO object detector** (specfically, YOLO-v1). 

 a variant of YOLO are adopted which:
1. Use pretrained ResNet50 classifier as detector backbone. The pretrained model is offered in `torchvision.models`.
2. Instead of using a $7\times7$ detection grid, we use $14\times14$ to get a more finegrained detection.

In general, the backbone models are usually pretrained on ImageNet dataset (> 1 million images) with numerous classes. As a result, having these pretrained backbone can greatly shorten the required training time, as well as improve the performance. <span style="color:red"></span>

<img src="figure/example.png" width="450">


In [None]:
# YOLO network hyperparameters
B = 2  # number of bounding box predictions per cell
S = 14  # width/height of network output grid (larger than 7x7 from paper since we use a different network)

## Load the pretrained ResNet classifier
Load the pretrained classifier. By default, it would use the pretrained model provided by `Pytorch`.

In [None]:
load_network_path = None
pretrained = True

# use to load a previously trained network
if load_network_path is not None:
    print('Loading saved network from {}'.format(load_network_path))
    net = resnet50().to(device)
    net.load_state_dict(torch.load(load_network_path))
else:
    print('Load pre-trained model')
    net = resnet50(pretrained=pretrained).to(device)

Load pre-trained model


Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth


HBox(children=(FloatProgress(value=0.0, max=102502400.0), HTML(value='')))




Some basic hyperparameter settings that you probably don't have to tune.

In [None]:
learning_rate = 0.001
num_epochs = 50
batch_size = 15

# Yolo loss component coefficients (as given in Yolo v1 paper)
lambda_coord = 5
lambda_noobj = 0.5

## Implement the YOLO-v1 loss [80 pts]
Now, you have to implement the `YoloLoss` for training your object detector. Please read closely to the [YOLO original paper](https://arxiv.org/pdf/1506.02640.pdf) so that you can implement it.

In general, there are 4 components in the YOLO loss. Consider that we have our prediction grid of size$(N, S, S, 5B+c)$ ( (x, y, w, h, C) for each bounding box, and c is the number of classes), where $N$ is the batch size, $S$ is the grid size, $B$ is the number of bounding boxes. We have :
1. Bounding box regression loss on the bounding box$(x, y, w, h)$
    - $l_{coord}=\sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{1}^{obj}_{ij}\left[(x_i-\hat{x}_i)^2+(y_i-\hat{y}_i)^2\right]$ + $\sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{1}^{obj}_{ij}\left[(\sqrt{w_i}-\sqrt{\hat{w}_i})^2+(\sqrt{h_i}-\sqrt{\hat{h}_i})^2\right]$
    - $\mathbb{1}^{obj}_{ij}$: equals to 1 when object appears in cell $i$, and the bounding box $j$ is responsible for the prediction. 0 otherwise.
2. Contain object loss on the confidence prediction $c$ (only calculate for those boxes that actually have objects)
    - $l_{contain}=\sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{1}^{obj}_{ij}(C_i-\hat{C}_i)^2$
    - $C_i$ the predicted confidence score for cell $i$ from predicted box $j$
    - For each grid cell, you only calculate the contain object loss for the predicted bounding box that has maximum overlap (iou) with the gruond truth box.
    - We say that this predicted box with maximum iou is **responsible** for the prediction.
3. No object loss on the confidence prediction $c$ (only calculate for those boxes that don't have objects)
    - $l_{noobj}=\sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{1}^{noobj}_{ij}(C_i-\hat{C}_i)^2$
    - $\mathbb{1}^{obj}_{ij}$: equals to 1 when **no object appears** in cell $i$.
4. Classification error loss.
    - $l_{class}=\sum_{i=0}^{S^2}\mathbb{1}_i^{obj}\sum_{c\in classes}\left(p_i(c)-\hat{p_i}(c)\right)^2$
    - $p_i(c)$ is the predicted score for class $c$
    
Putting them together, we get the yolo loss:
\begin{equation}
yolo=\lambda_{coord}l_{coord}+l_{contain}+\lambda_{noobj}l_{noobj}+l_{class}
\end{equation}
where $\lambda$ are hyperparameters. We have provided detailed comments to guide you through implementing the loss. 

In [None]:
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
 
class YoloLoss(nn.Module):
    def __init__(self,S,B,l_coord,l_noobj):
        super(YoloLoss,self).__init__()
        self.S = S
        self.B = B
        self.l_coord = l_coord
        self.l_noobj = l_noobj
        
    def compute_iou(self, box1, box2):                                                                                                                                                             
        """Compute the intersection over union of two set of boxes, each box is [x1,y1,x2,y2].
        Args:
          box1: (tensor) bounding boxes, sized [N,4].
          box2: (tensor) bounding boxes, sized [M,4].
        Return:
          (tensor) iou, sized [N,M].
        """
        N = box1.size(0)
        M = box2.size(0)
 
        lt = torch.max(
            box1[:,:2].unsqueeze(1).expand(N,M,2),  # [N,2] -> [N,1,2] -> [N,M,2]
            box2[:,:2].unsqueeze(0).expand(N,M,2),  # [M,2] -> [1,M,2] -> [N,M,2]
        )   
 
        rb = torch.min(
            box1[:,2:].unsqueeze(1).expand(N,M,2),  # [N,2] -> [N,1,2] -> [N,M,2]
            box2[:,2:].unsqueeze(0).expand(N,M,2),  # [M,2] -> [1,M,2] -> [N,M,2]
        )   
 
        wh = rb - lt  # [N,M,2]
        wh[wh<0] = 0  # clip at 0
        inter = wh[:,:,0] * wh[:,:,1]  # [N,M]
 
        area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1])  # [N,]
        area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1])  # [M,]
        area1 = area1.unsqueeze(1).expand_as(inter)  # [N,] -> [N,1] -> [N,M]
        area2 = area2.unsqueeze(0).expand_as(inter)  # [M,] -> [1,M] -> [N,M]
 
        iou = inter / (area1 + area2 - inter)
        return iou 
    
    def get_class_prediction_loss(self, classes_pred, classes_target):
        """ 
        Parameters:
        classes_pred : (tensor) size (batch_size, S, S, 20) //pred_tensor                                                                                                                                       
        classes_target : (tensor) size (batch_size, S, S, 20)//target_tensor
         
        Returns:
        class_loss : scalar//loss, sized [1, ]
        """
        ##### CODE #####
       #S: grid size, N: batch size, B: number of bounding box, c:number of classes

        class_loss = F.mse_loss(classes_pred, classes_target, reduction='sum')

        ##### CODE #####
        return class_loss
         
         
    def get_regression_loss(self, box_pred_response, box_target_response):
        """
        Parameters:
        box_pred_response : (tensor) size (-1, 5)//bbox_pred
        box_target_response : (tensor) size (-1, 5)
        Note : -1 corresponds to ravels the tensor into the dimension specified 
        See : https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view_as
         
        Returns:
        reg_loss : scalar
        """
        ##### CODE #####

        loss_xy = F.mse_loss(box_pred_response[:, :2], box_target_response[:, :2], reduction='sum')
        loss_wh = F.mse_loss(torch.sqrt(box_pred_response[:, 2:4]), torch.sqrt(box_target_response[:, 2:4]), reduction='sum')
        reg_loss =  (loss_xy + loss_wh)

        ##### CODE #####
        return reg_loss
         
    def get_contain_object_loss(self, box_pred_response, box_target_response_iou):
        """
        Parameters:
        box_pred_response : (tensor) size ( -1 , 5)//
        box_target_response_iou : (tensor) size ( -1 , 5)
        Note : -1 corresponds to ravels the tensor into the dimension specified 
        See : https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view_as
         
        Returns:
        contain_loss : scalar
        """
        ##### CODE #####

        contain_loss = F.mse_loss(box_pred_response[:, 4], box_target_response_iou[:, 4], reduction='sum')  
        ##### CODE #####
        return contain_loss
         
    def get_no_object_loss(self, target_tensor, pred_tensor, no_object_mask):
        """                                                                                                                                                                                        
        Parameters:
        target_tensor : (tensor) size (batch_size, S , S, 30)
        pred_tensor : (tensor) size (batch_size, S , S, 30)
        no_object_mask : (tensor) size (batch_size, S , S)
         
        Returns:
        no_object_loss : scalar
         
        Hints:
        1) Create 2 tensors no_object_prediction and no_object_target which only have the 
        values which have no object. 
        2) Have another tensor no_object_prediction_mask of the same size such that 
        mask with respect to both confidences of bounding boxes set to 1. 
        3) Create 2 tensors which are extracted from no_object_prediction and no_object_target using
        the mask created above to find the loss. 
        """
        ##### CODE #####
        no_object_prediction = pred_tensor[no_object_mask].view(-1, 30)        # pred tensor on the cells which do not contain objects. [n_noobj, N]
                                                                # n_noobj: number of the cells which do not contain objects.
        no_object_target = target_tensor[no_object_mask].view(-1, 30)    # target tensor on the cells which do not contain objects. [n_noobj, N]
                                                                # n_noobj: number of the cells which do not contain objects.
        no_object_prediction_mask = torch.cuda.ByteTensor(no_object_prediction.size()).fill_(0) # [n_noobj, N]
        #coord_not_response_mask = torch.cuda.ByteTensor(no_object_prediction.size().fill_(1)# [n_coord x B, 5]

        for b in range(B):
            no_object_prediction_mask[:, 4 + b*5] = 1 # noobj_conf_mask[:, 4] = 1; noobj_conf_mask[:, 9] = 1
        noobj_pred_conf = no_object_prediction[no_object_prediction_mask]       # [n_noobj, 2=len([conf1, conf2])]
        noobj_target_conf = no_object_target[no_object_prediction_mask]   # [n_noobj, 2=len([conf1, conf2])]
        no_object_loss = F.mse_loss(noobj_pred_conf, noobj_target_conf, reduction='sum')
        ##### CODE #####
        return no_object_loss
          
    def find_best_iou_boxes(self, box_target, box_pred):
        """
        Parameters: 
        box_target : (tensor)  size (-1, 5)
        box_pred : (tensor) size (-1, 5)
        Note : -1 corresponds to ravels the tensor into the dimension specified 
        See : https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view_as
         
        Returns: 
        box_target_iou: (tensor)
        contains_object_response_mask : (tensor)
         
        Hints:
        1) Find the iou's of each of the 2 bounding boxes of each grid cell of each image.
        2) Set the corresponding contains_object_response_mask of the bounding box with the max iou
        of the 2 bounding boxes of each grid cell to 1.
        3) For finding iou's use the compute_iou function
        4) Before using compute preprocess the bounding box coordinates in such a way that 
        if for a Box b the coordinates are represented by [x, y, w, h] then 
        x, y = x/S - 0.5*w, y/S - 0.5*h ; w, h = x/S + 0.5*w, y/S + 0.5*h
        Note: Over here initially x, y are the center of the box and w,h are width and height. 
        We perform this transformation to convert the correct coordinates into bounding box coordinates.
        5) Set the confidence of the box_target_iou of the bounding box to the maximum iou
        """
        contains_object_response_mask = torch.cuda.ByteTensor(box_target.size()).fill_(1)# [n_coord x B, 5]
        coord_not_response_mask = torch.cuda.ByteTensor(box_target.size()).fill_(1)# [n_coord x B, 5]
        box_target_iou = torch.zeros(box_target.size()).cuda()                    # [n_coord x B, 5], only the last 1=(conf,) is used
        for i in range(0, box_target.size(0), B):
            pred = box_pred[i:i+B] # predicted bboxes at i-th cell, [B, 5=len([x, y, w, h, conf])]
            pred_xyxy = Variable(torch.FloatTensor(pred.size())) # [B, 5=len([x1, y1, x2, y2, conf])]
            # Because (center_x,center_y)=pred[:, 2] and (w,h)=pred[:,2:4] are normalized for cell-size and image-size respectively,
            # rescale (center_x,center_y) for the image-size to compute IoU correctly.
            pred_xyxy[:,  :2] = pred[:, :2]/float(S) - 0.5 * pred[:, 2:4]
            pred_xyxy[:, 2:4] = pred[:, :2]/float(S) + 0.5 * pred[:, 2:4]

            target = box_target[i] # target bbox at i-th cell. Because target boxes contained by each cell are identical in current implementation, enough to extract the first one.
            target = box_target[i].view(-1, 5) # target bbox at i-th cell, [1, 5=len([x, y, w, h, conf])]
            target_xyxy = Variable(torch.FloatTensor(target.size())) # [1, 5=len([x1, y1, x2, y2, conf])]
            # Because (center_x,center_y)=target[:, 2] and (w,h)=target[:,2:4] are normalized for cell-size and image-size respectively,
            # rescale (center_x,center_y) for the image-size to compute IoU correctly.
            target_xyxy[:,  :2] = target[:, :2]/float(S) - 0.5 * target[:, 2:4]
            target_xyxy[:, 2:4] = target[:, :2]/float(S) + 0.5 * target[:, 2:4]

            iou = self.compute_iou(pred_xyxy[:, :4], target_xyxy[:, :4]) # [B, 1]
            max_iou, max_index = iou.max(0)
            max_index = max_index.data.cuda()

            contains_object_response_mask [i+max_index] = 1
            coord_not_response_mask[i+max_index] = 0

            # "we want the confidence score to equal the intersection over union (IOU) between the predicted box and the ground truth"
            # from the original paper of YOLO.
            box_target_iou[i+max_index, torch.LongTensor([4]).cuda()] = (max_iou).data.cuda()
        #box_target_iou = Variable(box_target_iou).cuda()

        ##### CODE #####
        return box_target_iou, contains_object_response_mask 
         
    def forward(self, pred_tensor,target_tensor):
        '''
        pred_tensor: (tensor) size(batchsize,S,S,Bx5+20=30)
                      where B - number of bounding boxes this grid cell is a part of = 2
                            5 - number of bounding box values corresponding to [x, y, w, h, c]
                                where x - x_coord, y - y_coord, w - width, h - height, c - confidence of having an object
                            20 - number of classes
         
        target_tensor: (tensor) size(batchsize,S,S,30)
         
        Returns:
        Total Loss
        '''
        N = pred_tensor.size(0)
         
        total_loss = None
        # Create 2 tensors contains_object_mask and no_object_mask 
        # of size (Batch_size, S, S) such that each value corresponds to if the confidence of having 
        # an object > 0 in the target tensor.

        ##### CODE #####
        contains_object_mask = target_tensor[:, :, :, 4] > 0  # mask for the cells which contain objects. [Batch_size, S, S]
        no_object_mask = target_tensor[:, :, :, 4] == 0 # mask for the cells which do not contain objects. [Batch_size, S, S]

        ##### CODE #####
        """
        Create a tensor contains_object_pred that corresponds to 
        to all the predictions which seem to confidence > 0 for having an object
        Then, split this tensor into 2 tensors :                                                                                                                                                       
        1) bounding_box_pred : Contains all the Bounding box predictions (x, y, w, h, c) of all grid 
                                cells of all images
        2) classes_pred : Contains all the class predictions for each grid cell of each image
        Hint : Use contains_object_mask
        """ 
        ##### CODE #####
        coord_mask = contains_object_mask.unsqueeze(-1).expand_as(target_tensor) # [n_batch, S, S] -> [n_batch, S, S, N]
        contains_object_pred = pred_tensor[coord_mask].view(-1, N)            # pred tensor on the cells which contain objects. [n_coord, N]
                                                                    # n_coord: number of the cells which contain objects.
        bounding_box_pred = contains_object_pred[:, :5*B].contiguous().view(-1, 5)    # [n_coord x B, 5=len([x, y, w, h, conf])]
        classes_pred = contains_object_pred[:, 5*B:]                            # [n_coord, C]

        ##### CODE #####                   
        """
        # Similarly, create 2 tensors bounding_box_target and classes_target
        # using the contains_object_mask.
        """
        ##### CODE #####
        coord_target = target_tensor[contains_object_mask].view(-1, N)        # target tensor on the cells which contain objects. [n_coord, N]
                                                                    # n_coord: number of the cells which contain objects.
        bounding_box_target = coord_target[:, :5*B].contiguous().view(-1, 5)# [n_coord x B, 5=len([x, y, w, h, conf])]
        classes_target = coord_target[:, 5*B:]                        # [n_coord, C]
        ##### CODE #####
        
        #Compute the No object loss here
        # Instruction: finish your get_no_object_loss
        ##### CODE #####
        loss_noobj= self.get_no_object_loss(target_tensor, pred_tensor, no_object_mask)
        ##### CODE #####
        """
        # Compute the iou's of all bounding boxes and the mask for which bounding box 
        # of 2 has the maximum iou the bounding boxes for each grid cell of each image.
        # Instruction: finish your find_best_iou_boxes and use it.
        """
        ##### CODE #####
        box_target_iou, contains_object_response_mask = self.find_best_iou_boxes( bounding_box_target, bounding_box_pred);
        box_target_iou = Variable(box_target_iou).cuda()

        # Create 3 tensors :
        # 1) box_prediction_response - bounding box predictions for each grid cell which has the maximum iou
        # 2) box_target_response_iou - bounding box target ious for each grid cell which has the maximum iou
        # 3) box_target_response -  bounding box targets for each grid cell which has the maximum iou
        # Hint : Use coo_response_mask
        

        ##### CODE #####
        box_pred_response = bounding_box_pred[contains_object_response_mask].view(-1, 5)      # [n_response, 5]
        box_target_response = bounding_box_target[contains_object_response_mask].view(-1, 5)  # [n_response, 5], only the first 4=(x, y, w, h) are used
        box_target_response_iou = box_target_iou[contains_object_response_mask].view(-1, 5)        # [n_response, 5], only the last 1=(conf,) is used
        ##### CODE #####
        """
        # Find the class_loss, containing object loss and regression loss
        """
        ##### CODE #####
 
        class_loss = self.get_class_prediction_loss(classes_pred, classes_target) 
        regression_loss =self. get_regression_loss(box_pred_response, box_target_response)
        containing_object_loss = self.get_contain_object_loss( box_pred_response, box_target_response_iou)

        # Total loss
        total_loss = self.l_coord * regression_loss + containing_object_loss + self.l_noobj * loss_noobj + class_loss

        ##### CODE #####
        return total_loss / N

In [None]:
criterion = YoloLoss(S, B, lambda_coord, lambda_noobj)
optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9, weight_decay=5e-4)

## Reading Pascal Data

Since Pascal is a small dataset (5000 in train+val) we have combined the train and val splits to train our detector. This is not typically a good practice, but we will make an exception in this case to be able to get reasonable detection results with a comparatively small object detection dataset. Use `download_data.sh` to download the dataset.

The train dataset loader also using a variety of data augmentation techniques including random shift, scaling, crop, and flips. Data augmentation is slightly more complicated for detection dataset since the bounding box annotations must be kept consistent through the transformations.

Since the output of the dector network we train is a $(S, S, 5B+c)$ tensor, we use an encoder to convert the original bounding box coordinates into relative grid bounding box coordinates corresponding to the the expected output. We also use a decoder which allows us to convert the opposite direction into image coordinate bounding boxes.

In [None]:
file_root_train = '/content/drive/My Drive/Assignment_3 /Assignment_3/VOCdevkit_2007/VOC2007/JPEGImages/'
annotation_file_train = '/content/drive/My Drive/Assignment_3 /Assignment_3/voc2007.txt'

train_dataset = VocDetectorDataset(root_img_dir=file_root_train,dataset_file=annotation_file_train,train=True, S=S)
train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle=True,num_workers=2)
print('Loaded %d train images' % len(train_dataset))

Initializing dataset
Loaded 5011 train images


In [None]:
file_root_test = '/content/drive/My Drive/Assignment_3 /Assignment_3/VOCdevkit_2007/VOC2007test/JPEGImages/'
annotation_file_test = '/content/drive/My Drive/Assignment_3 /Assignment_3/voc2007test.txt'

test_dataset = VocDetectorDataset(root_img_dir=file_root_test,dataset_file=annotation_file_test,train=False, S=S)
test_loader = DataLoader(test_dataset,batch_size=batch_size,shuffle=False,num_workers=2)
print('Loaded %d test images' % len(test_dataset))

Initializing dataset
Loaded 4950 test images


## Train detector
Now, train the detector.

In [None]:
best_test_loss = np.inf
torch.cuda.empty_cache()

#net.load_state_dict(torch.load('/content/drive/My Drive/Assignment_3 /Assignment_3/best_detector.pth'))

for epoch in range(20, 51):
    net.train()
    
    # Update learning rate late in training
    if epoch == 30 or epoch == 40:
        learning_rate /= 10.0

    for param_group in optimizer.param_groups:
        param_group['lr'] = learning_rate
     
    print('\n\nStarting epoch %d / %d' % (epoch + 1, num_epochs))
    print('Learning Rate for this epoch: {}'.format(learning_rate))
    
    total_loss = 0.


    
    for i, (images, target) in enumerate(tqdm(train_loader, total=len(train_loader))):
        images, target = images.to(device), target.to(device)

        pred = net(images)
        loss = criterion(pred,target)
        total_loss += loss.item()
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print('Epoch [%d/%d], average_loss: %.4f'
            % (epoch+1, num_epochs, total_loss / (i+1)))
    
    # evaluate the network on the test data
    with torch.no_grad():
        test_loss = 0.0
        net.eval()
        for i, (images, target) in enumerate(tqdm(test_loader, total=len(test_loader))):
            images, target = images.to(device), target.to(device)

            pred = net(images)
            loss = criterion(pred,target)
            test_loss += loss.item()
        test_loss /= len(test_loader)
    
    if best_test_loss > test_loss:
        best_test_loss = test_loss
        print('Updating best test loss: %.5f' % best_test_loss)
        torch.save(net.state_dict(),os.path.join('/content/drive/My Drive/Assignment_3 /Assignment_3/', 'best_detector.pth'))

    torch.save(net.state_dict(),os.path.join('/content/drive/My Drive/Assignment_3 /Assignment_3/', 'detector.pth'))



  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 21 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:00<05:20,  1.04it/s][A[A

  1%|          | 2/335 [00:01<04:54,  1.13it/s][A[A

  1%|          | 3/335 [00:02<04:32,  1.22it/s][A[A

  1%|          | 4/335 [00:03<04:16,  1.29it/s][A[A

  1%|▏         | 5/335 [00:03<04:03,  1.36it/s][A[A

  2%|▏         | 6/335 [00:04<03:55,  1.40it/s][A[A

  2%|▏         | 7/335 [00:04<03:50,  1.42it/s][A[A

  2%|▏         | 8/335 [00:05<03:45,  1.45it/s][A[A

  3%|▎         | 9/335 [00:06<03:42,  1.46it/s][A[A

  3%|▎         | 10/335 [00:07<03:44,  1.45it/s][A[A

  3%|▎         | 11/335 [00:07<03:43,  1.45it/s][A[A

  4%|▎         | 12/335 [00:08<03:42,  1.45it/s][A[A

  4%|▍         | 13/335 [00:09<03:40,  1.46it/s][A[A

  4%|▍         | 14/335 [00:09<03:37,  1.47it/s][A[A

  4%|▍         | 15/335 [00:10<03:38,  1.47it/s][A[A

  5%|▍         | 16/335 [00:11<03:35,  1.48it/s][A[A

  5%|▌         | 17/335 [00:11<03:37,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:37,  1.46it/s][A[A


Epoch [21/50], average_loss: 4.0496




  0%|          | 1/330 [00:00<04:05,  1.34it/s][A[A

  1%|          | 2/330 [00:01<03:21,  1.63it/s][A[A

  1%|          | 3/330 [00:01<02:53,  1.89it/s][A[A

  1%|          | 4/330 [00:01<02:30,  2.17it/s][A[A

  2%|▏         | 5/330 [00:01<02:12,  2.46it/s][A[A

  2%|▏         | 6/330 [00:02<01:59,  2.72it/s][A[A

  2%|▏         | 7/330 [00:02<01:50,  2.93it/s][A[A

  2%|▏         | 8/330 [00:02<01:43,  3.12it/s][A[A

  3%|▎         | 9/330 [00:03<01:40,  3.19it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.33it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.30it/s][A[A

  4%|▎         | 12/330 [00:03<01:36,  3.30it/s][A[A

  4%|▍         | 13/330 [00:04<01:33,  3.40it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.41it/s][A[A

  5%|▍         | 15/330 [00:04<01:32,  3.42it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.47it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.52it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.56it/s][A[A


Updating best test loss: 5.90685




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 22 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:01<05:35,  1.00s/it][A[A

  1%|          | 2/335 [00:01<05:05,  1.09it/s][A[A

  1%|          | 3/335 [00:02<04:40,  1.18it/s][A[A

  1%|          | 4/335 [00:03<04:24,  1.25it/s][A[A

  1%|▏         | 5/335 [00:03<04:11,  1.31it/s][A[A

  2%|▏         | 6/335 [00:04<04:05,  1.34it/s][A[A

  2%|▏         | 7/335 [00:05<03:57,  1.38it/s][A[A

  2%|▏         | 8/335 [00:05<03:51,  1.41it/s][A[A

  3%|▎         | 9/335 [00:06<03:48,  1.43it/s][A[A

  3%|▎         | 10/335 [00:07<03:44,  1.45it/s][A[A

  3%|▎         | 11/335 [00:07<03:44,  1.44it/s][A[A

  4%|▎         | 12/335 [00:08<03:45,  1.43it/s][A[A

  4%|▍         | 13/335 [00:09<03:46,  1.42it/s][A[A

  4%|▍         | 14/335 [00:10<03:50,  1.39it/s][A[A

  4%|▍         | 15/335 [00:10<03:47,  1.41it/s][A[A

  5%|▍         | 16/335 [00:11<03:50,  1.39it/s][A[A

  5%|▌         | 17/335 [00:12<03:47,  1.40it/s][A[A

  5%|▌         | 18/335 [00:12<03:47,  1.39it/s][A[A


Epoch [22/50], average_loss: 4.5356




  0%|          | 1/330 [00:00<03:46,  1.45it/s][A[A

  1%|          | 2/330 [00:01<03:14,  1.68it/s][A[A

  1%|          | 3/330 [00:01<02:47,  1.95it/s][A[A

  1%|          | 4/330 [00:01<02:26,  2.22it/s][A[A

  2%|▏         | 5/330 [00:01<02:09,  2.51it/s][A[A

  2%|▏         | 6/330 [00:02<01:56,  2.79it/s][A[A

  2%|▏         | 7/330 [00:02<01:48,  2.98it/s][A[A

  2%|▏         | 8/330 [00:02<01:42,  3.13it/s][A[A

  3%|▎         | 9/330 [00:03<01:41,  3.15it/s][A[A

  3%|▎         | 10/330 [00:03<01:37,  3.28it/s][A[A

  3%|▎         | 11/330 [00:03<01:38,  3.24it/s][A[A

  4%|▎         | 12/330 [00:03<01:35,  3.33it/s][A[A

  4%|▍         | 13/330 [00:04<01:32,  3.42it/s][A[A

  4%|▍         | 14/330 [00:04<01:33,  3.40it/s][A[A

  5%|▍         | 15/330 [00:04<01:32,  3.40it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.46it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.53it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.55it/s][A[A


Updating best test loss: 5.67353




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 23 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:01<06:01,  1.08s/it][A[A

  1%|          | 2/335 [00:01<05:26,  1.02it/s][A[A

  1%|          | 3/335 [00:02<04:54,  1.13it/s][A[A

  1%|          | 4/335 [00:03<04:34,  1.21it/s][A[A

  1%|▏         | 5/335 [00:03<04:18,  1.28it/s][A[A

  2%|▏         | 6/335 [00:04<04:05,  1.34it/s][A[A

  2%|▏         | 7/335 [00:05<04:00,  1.36it/s][A[A

  2%|▏         | 8/335 [00:05<03:53,  1.40it/s][A[A

  3%|▎         | 9/335 [00:06<03:49,  1.42it/s][A[A

  3%|▎         | 10/335 [00:07<03:48,  1.42it/s][A[A

  3%|▎         | 11/335 [00:07<03:49,  1.41it/s][A[A

  4%|▎         | 12/335 [00:08<03:47,  1.42it/s][A[A

  4%|▍         | 13/335 [00:09<03:47,  1.41it/s][A[A

  4%|▍         | 14/335 [00:10<03:44,  1.43it/s][A[A

  4%|▍         | 15/335 [00:10<03:40,  1.45it/s][A[A

  5%|▍         | 16/335 [00:11<03:38,  1.46it/s][A[A

  5%|▌         | 17/335 [00:12<03:37,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:40,  1.44it/s][A[A


Epoch [23/50], average_loss: 4.2460




  0%|          | 1/330 [00:00<04:00,  1.37it/s][A[A

  1%|          | 2/330 [00:01<03:24,  1.61it/s][A[A

  1%|          | 3/330 [00:01<02:56,  1.85it/s][A[A

  1%|          | 4/330 [00:01<02:33,  2.12it/s][A[A

  2%|▏         | 5/330 [00:02<02:14,  2.42it/s][A[A

  2%|▏         | 6/330 [00:02<02:00,  2.69it/s][A[A

  2%|▏         | 7/330 [00:02<01:51,  2.90it/s][A[A

  2%|▏         | 8/330 [00:02<01:45,  3.06it/s][A[A

  3%|▎         | 9/330 [00:03<01:42,  3.12it/s][A[A

  3%|▎         | 10/330 [00:03<01:37,  3.27it/s][A[A

  3%|▎         | 11/330 [00:03<01:37,  3.28it/s][A[A

  4%|▎         | 12/330 [00:04<01:34,  3.36it/s][A[A

  4%|▍         | 13/330 [00:04<01:31,  3.47it/s][A[A

  4%|▍         | 14/330 [00:04<01:30,  3.48it/s][A[A

  5%|▍         | 15/330 [00:04<01:29,  3.50it/s][A[A

  5%|▍         | 16/330 [00:05<01:28,  3.53it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.54it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.58it/s][A[A




Starting epoch 24 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:00<05:16,  1.05it/s][A[A

  1%|          | 2/335 [00:01<04:50,  1.15it/s][A[A

  1%|          | 3/335 [00:02<04:31,  1.22it/s][A[A

  1%|          | 4/335 [00:02<04:14,  1.30it/s][A[A

  1%|▏         | 5/335 [00:03<04:03,  1.35it/s][A[A

  2%|▏         | 6/335 [00:04<03:56,  1.39it/s][A[A

  2%|▏         | 7/335 [00:05<03:54,  1.40it/s][A[A

  2%|▏         | 8/335 [00:05<03:51,  1.41it/s][A[A

  3%|▎         | 9/335 [00:06<03:45,  1.45it/s][A[A

  3%|▎         | 10/335 [00:07<03:47,  1.43it/s][A[A

  3%|▎         | 11/335 [00:07<03:43,  1.45it/s][A[A

  4%|▎         | 12/335 [00:08<03:43,  1.45it/s][A[A

  4%|▍         | 13/335 [00:09<03:42,  1.45it/s][A[A

  4%|▍         | 14/335 [00:09<03:41,  1.45it/s][A[A

  4%|▍         | 15/335 [00:10<03:39,  1.46it/s][A[A

  5%|▍         | 16/335 [00:11<03:36,  1.47it/s][A[A

  5%|▌         | 17/335 [00:11<03:35,  1.48it/s][A[A

  5%|▌         | 18/335 [00:12<03:33,  1.48it/s][A[A


Epoch [24/50], average_loss: 4.6135




  0%|          | 1/330 [00:00<04:07,  1.33it/s][A[A

  1%|          | 2/330 [00:01<03:31,  1.55it/s][A[A

  1%|          | 3/330 [00:01<02:56,  1.85it/s][A[A

  1%|          | 4/330 [00:01<02:33,  2.13it/s][A[A

  2%|▏         | 5/330 [00:02<02:15,  2.41it/s][A[A

  2%|▏         | 6/330 [00:02<02:00,  2.68it/s][A[A

  2%|▏         | 7/330 [00:02<01:51,  2.88it/s][A[A

  2%|▏         | 8/330 [00:02<01:45,  3.05it/s][A[A

  3%|▎         | 9/330 [00:03<01:42,  3.12it/s][A[A

  3%|▎         | 10/330 [00:03<01:37,  3.28it/s][A[A

  3%|▎         | 11/330 [00:03<01:37,  3.27it/s][A[A

  4%|▎         | 12/330 [00:04<01:35,  3.32it/s][A[A

  4%|▍         | 13/330 [00:04<01:32,  3.43it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.42it/s][A[A

  5%|▍         | 15/330 [00:04<01:32,  3.42it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.47it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.50it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.57it/s][A[A


Updating best test loss: 5.53341




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 25 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:01<05:42,  1.02s/it][A[A

  1%|          | 2/335 [00:01<05:05,  1.09it/s][A[A

  1%|          | 3/335 [00:02<04:43,  1.17it/s][A[A

  1%|          | 4/335 [00:03<04:25,  1.25it/s][A[A

  1%|▏         | 5/335 [00:03<04:11,  1.31it/s][A[A

  2%|▏         | 6/335 [00:04<04:01,  1.36it/s][A[A

  2%|▏         | 7/335 [00:05<03:54,  1.40it/s][A[A

  2%|▏         | 8/335 [00:05<03:48,  1.43it/s][A[A

  3%|▎         | 9/335 [00:06<03:45,  1.44it/s][A[A

  3%|▎         | 10/335 [00:07<03:45,  1.44it/s][A[A

  3%|▎         | 11/335 [00:07<03:41,  1.46it/s][A[A

  4%|▎         | 12/335 [00:08<03:38,  1.48it/s][A[A

  4%|▍         | 13/335 [00:09<03:38,  1.48it/s][A[A

  4%|▍         | 14/335 [00:09<03:39,  1.46it/s][A[A

  4%|▍         | 15/335 [00:10<03:38,  1.46it/s][A[A

  5%|▍         | 16/335 [00:11<03:37,  1.47it/s][A[A

  5%|▌         | 17/335 [00:11<03:38,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:38,  1.45it/s][A[A


Epoch [25/50], average_loss: 4.0930




  0%|          | 1/330 [00:00<04:05,  1.34it/s][A[A

  1%|          | 2/330 [00:01<03:28,  1.57it/s][A[A

  1%|          | 3/330 [00:01<02:56,  1.85it/s][A[A

  1%|          | 4/330 [00:01<02:31,  2.15it/s][A[A

  2%|▏         | 5/330 [00:02<02:15,  2.40it/s][A[A

  2%|▏         | 6/330 [00:02<02:01,  2.66it/s][A[A

  2%|▏         | 7/330 [00:02<01:52,  2.87it/s][A[A

  2%|▏         | 8/330 [00:02<01:45,  3.04it/s][A[A

  3%|▎         | 9/330 [00:03<01:41,  3.15it/s][A[A

  3%|▎         | 10/330 [00:03<01:38,  3.26it/s][A[A

  3%|▎         | 11/330 [00:03<01:37,  3.27it/s][A[A

  4%|▎         | 12/330 [00:04<01:37,  3.28it/s][A[A

  4%|▍         | 13/330 [00:04<01:35,  3.32it/s][A[A

  4%|▍         | 14/330 [00:04<01:37,  3.24it/s][A[A

  5%|▍         | 15/330 [00:04<01:36,  3.27it/s][A[A

  5%|▍         | 16/330 [00:05<01:33,  3.35it/s][A[A

  5%|▌         | 17/330 [00:05<01:31,  3.44it/s][A[A

  5%|▌         | 18/330 [00:05<01:28,  3.54it/s][A[A




Starting epoch 26 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:01<05:33,  1.00it/s][A[A

  1%|          | 2/335 [00:01<05:01,  1.10it/s][A[A

  1%|          | 3/335 [00:02<04:35,  1.20it/s][A[A

  1%|          | 4/335 [00:03<04:21,  1.27it/s][A[A

  1%|▏         | 5/335 [00:03<04:10,  1.32it/s][A[A

  2%|▏         | 6/335 [00:04<04:01,  1.36it/s][A[A

  2%|▏         | 7/335 [00:05<03:53,  1.41it/s][A[A

  2%|▏         | 8/335 [00:05<03:48,  1.43it/s][A[A

  3%|▎         | 9/335 [00:06<03:44,  1.45it/s][A[A

  3%|▎         | 10/335 [00:07<03:40,  1.47it/s][A[A

  3%|▎         | 11/335 [00:07<03:39,  1.48it/s][A[A

  4%|▎         | 12/335 [00:08<03:42,  1.45it/s][A[A

  4%|▍         | 13/335 [00:09<03:38,  1.47it/s][A[A

  4%|▍         | 14/335 [00:09<03:38,  1.47it/s][A[A

  4%|▍         | 15/335 [00:10<03:36,  1.48it/s][A[A

  5%|▍         | 16/335 [00:11<03:36,  1.47it/s][A[A

  5%|▌         | 17/335 [00:11<03:36,  1.47it/s][A[A

  5%|▌         | 18/335 [00:12<03:32,  1.49it/s][A[A


Epoch [26/50], average_loss: 4.0744




  0%|          | 1/330 [00:00<04:06,  1.33it/s][A[A

  1%|          | 2/330 [00:01<03:24,  1.60it/s][A[A

  1%|          | 3/330 [00:01<02:53,  1.89it/s][A[A

  1%|          | 4/330 [00:01<02:29,  2.18it/s][A[A

  2%|▏         | 5/330 [00:01<02:12,  2.45it/s][A[A

  2%|▏         | 6/330 [00:02<01:58,  2.72it/s][A[A

  2%|▏         | 7/330 [00:02<01:49,  2.94it/s][A[A

  2%|▏         | 8/330 [00:02<01:42,  3.13it/s][A[A

  3%|▎         | 9/330 [00:03<01:40,  3.21it/s][A[A

  3%|▎         | 10/330 [00:03<01:35,  3.34it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.31it/s][A[A

  4%|▎         | 12/330 [00:03<01:36,  3.30it/s][A[A

  4%|▍         | 13/330 [00:04<01:33,  3.38it/s][A[A

  4%|▍         | 14/330 [00:04<01:33,  3.37it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.43it/s][A[A

  5%|▍         | 16/330 [00:05<01:31,  3.44it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.52it/s][A[A

  5%|▌         | 18/330 [00:05<01:26,  3.59it/s][A[A


Updating best test loss: 5.47407




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 27 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:01<05:40,  1.02s/it][A[A

  1%|          | 2/335 [00:01<05:07,  1.08it/s][A[A

  1%|          | 3/335 [00:02<04:42,  1.18it/s][A[A

  1%|          | 4/335 [00:03<04:24,  1.25it/s][A[A

  1%|▏         | 5/335 [00:03<04:11,  1.31it/s][A[A

  2%|▏         | 6/335 [00:04<04:02,  1.35it/s][A[A

  2%|▏         | 7/335 [00:05<03:56,  1.39it/s][A[A

  2%|▏         | 8/335 [00:05<03:50,  1.42it/s][A[A

  3%|▎         | 9/335 [00:06<03:48,  1.43it/s][A[A

  3%|▎         | 10/335 [00:07<03:43,  1.45it/s][A[A

  3%|▎         | 11/335 [00:07<03:40,  1.47it/s][A[A

  4%|▎         | 12/335 [00:08<03:44,  1.44it/s][A[A

  4%|▍         | 13/335 [00:09<03:39,  1.47it/s][A[A

  4%|▍         | 14/335 [00:09<03:42,  1.44it/s][A[A

  4%|▍         | 15/335 [00:10<03:41,  1.45it/s][A[A

  5%|▍         | 16/335 [00:11<03:38,  1.46it/s][A[A

  5%|▌         | 17/335 [00:11<03:38,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:35,  1.47it/s][A[A


Epoch [27/50], average_loss: 3.9672




  0%|          | 1/330 [00:00<03:41,  1.49it/s][A[A

  1%|          | 2/330 [00:01<03:07,  1.75it/s][A[A

  1%|          | 3/330 [00:01<02:40,  2.03it/s][A[A

  1%|          | 4/330 [00:01<02:24,  2.25it/s][A[A

  2%|▏         | 5/330 [00:01<02:10,  2.48it/s][A[A

  2%|▏         | 6/330 [00:02<01:59,  2.72it/s][A[A

  2%|▏         | 7/330 [00:02<01:50,  2.93it/s][A[A

  2%|▏         | 8/330 [00:02<01:43,  3.11it/s][A[A

  3%|▎         | 9/330 [00:03<01:40,  3.18it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.33it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.30it/s][A[A

  4%|▎         | 12/330 [00:03<01:34,  3.36it/s][A[A

  4%|▍         | 13/330 [00:04<01:31,  3.47it/s][A[A

  4%|▍         | 14/330 [00:04<01:33,  3.37it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.44it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.47it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.50it/s][A[A

  5%|▌         | 18/330 [00:05<01:28,  3.53it/s][A[A


Updating best test loss: 5.46140




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 28 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:00<05:23,  1.03it/s][A[A

  1%|          | 2/335 [00:01<04:55,  1.13it/s][A[A

  1%|          | 3/335 [00:02<04:32,  1.22it/s][A[A

  1%|          | 4/335 [00:03<04:18,  1.28it/s][A[A

  1%|▏         | 5/335 [00:03<04:09,  1.32it/s][A[A

  2%|▏         | 6/335 [00:04<04:01,  1.36it/s][A[A

  2%|▏         | 7/335 [00:05<03:53,  1.40it/s][A[A

  2%|▏         | 8/335 [00:05<03:49,  1.42it/s][A[A

  3%|▎         | 9/335 [00:06<03:48,  1.42it/s][A[A

  3%|▎         | 10/335 [00:07<03:45,  1.44it/s][A[A

  3%|▎         | 11/335 [00:07<03:45,  1.43it/s][A[A

  4%|▎         | 12/335 [00:08<03:41,  1.46it/s][A[A

  4%|▍         | 13/335 [00:09<03:40,  1.46it/s][A[A

  4%|▍         | 14/335 [00:09<03:37,  1.48it/s][A[A

  4%|▍         | 15/335 [00:10<03:34,  1.49it/s][A[A

  5%|▍         | 16/335 [00:11<03:38,  1.46it/s][A[A

  5%|▌         | 17/335 [00:11<03:37,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:37,  1.46it/s][A[A


Epoch [28/50], average_loss: 3.9457




  0%|          | 1/330 [00:00<04:05,  1.34it/s][A[A

  1%|          | 2/330 [00:01<03:25,  1.59it/s][A[A

  1%|          | 3/330 [00:01<02:55,  1.86it/s][A[A

  1%|          | 4/330 [00:01<02:33,  2.12it/s][A[A

  2%|▏         | 5/330 [00:02<02:15,  2.39it/s][A[A

  2%|▏         | 6/330 [00:02<02:02,  2.65it/s][A[A

  2%|▏         | 7/330 [00:02<01:52,  2.88it/s][A[A

  2%|▏         | 8/330 [00:02<01:44,  3.07it/s][A[A

  3%|▎         | 9/330 [00:03<01:42,  3.14it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.30it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.30it/s][A[A

  4%|▎         | 12/330 [00:04<01:34,  3.35it/s][A[A

  4%|▍         | 13/330 [00:04<01:31,  3.46it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.43it/s][A[A

  5%|▍         | 15/330 [00:04<01:30,  3.49it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.48it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.51it/s][A[A

  5%|▌         | 18/330 [00:05<01:28,  3.54it/s][A[A




Starting epoch 29 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:00<05:30,  1.01it/s][A[A

  1%|          | 2/335 [00:01<05:00,  1.11it/s][A[A

  1%|          | 3/335 [00:02<04:37,  1.20it/s][A[A

  1%|          | 4/335 [00:03<04:21,  1.27it/s][A[A

  1%|▏         | 5/335 [00:03<04:08,  1.33it/s][A[A

  2%|▏         | 6/335 [00:04<03:58,  1.38it/s][A[A

  2%|▏         | 7/335 [00:05<03:52,  1.41it/s][A[A

  2%|▏         | 8/335 [00:05<03:48,  1.43it/s][A[A

  3%|▎         | 9/335 [00:06<03:49,  1.42it/s][A[A

  3%|▎         | 10/335 [00:07<03:46,  1.43it/s][A[A

  3%|▎         | 11/335 [00:07<03:43,  1.45it/s][A[A

  4%|▎         | 12/335 [00:08<03:41,  1.46it/s][A[A

  4%|▍         | 13/335 [00:09<03:39,  1.47it/s][A[A

  4%|▍         | 14/335 [00:09<03:38,  1.47it/s][A[A

  4%|▍         | 15/335 [00:10<03:38,  1.46it/s][A[A

  5%|▍         | 16/335 [00:11<03:36,  1.47it/s][A[A

  5%|▌         | 17/335 [00:11<03:34,  1.48it/s][A[A

  5%|▌         | 18/335 [00:12<03:33,  1.48it/s][A[A


Epoch [29/50], average_loss: 3.8688




  0%|          | 1/330 [00:00<03:50,  1.42it/s][A[A

  1%|          | 2/330 [00:01<03:12,  1.71it/s][A[A

  1%|          | 3/330 [00:01<02:45,  1.98it/s][A[A

  1%|          | 4/330 [00:01<02:27,  2.22it/s][A[A

  2%|▏         | 5/330 [00:01<02:10,  2.49it/s][A[A

  2%|▏         | 6/330 [00:02<01:58,  2.74it/s][A[A

  2%|▏         | 7/330 [00:02<01:50,  2.91it/s][A[A

  2%|▏         | 8/330 [00:02<01:46,  3.03it/s][A[A

  3%|▎         | 9/330 [00:03<01:46,  3.03it/s][A[A

  3%|▎         | 10/330 [00:03<01:42,  3.14it/s][A[A

  3%|▎         | 11/330 [00:03<01:43,  3.08it/s][A[A

  4%|▎         | 12/330 [00:04<01:41,  3.12it/s][A[A

  4%|▍         | 13/330 [00:04<01:38,  3.21it/s][A[A

  4%|▍         | 14/330 [00:04<01:37,  3.24it/s][A[A

  5%|▍         | 15/330 [00:04<01:35,  3.29it/s][A[A

  5%|▍         | 16/330 [00:05<01:33,  3.36it/s][A[A

  5%|▌         | 17/330 [00:05<01:31,  3.43it/s][A[A

  5%|▌         | 18/330 [00:05<01:28,  3.52it/s][A[A




Starting epoch 30 / 50
Learning Rate for this epoch: 0.001




  0%|          | 1/335 [00:00<05:29,  1.01it/s][A[A

  1%|          | 2/335 [00:01<04:58,  1.11it/s][A[A

  1%|          | 3/335 [00:02<04:35,  1.21it/s][A[A

  1%|          | 4/335 [00:03<04:20,  1.27it/s][A[A

  1%|▏         | 5/335 [00:03<04:10,  1.32it/s][A[A

  2%|▏         | 6/335 [00:04<04:00,  1.37it/s][A[A

  2%|▏         | 7/335 [00:05<03:53,  1.40it/s][A[A

  2%|▏         | 8/335 [00:05<03:48,  1.43it/s][A[A

  3%|▎         | 9/335 [00:06<03:45,  1.44it/s][A[A

  3%|▎         | 10/335 [00:07<03:46,  1.43it/s][A[A

  3%|▎         | 11/335 [00:07<03:45,  1.44it/s][A[A

  4%|▎         | 12/335 [00:08<03:41,  1.46it/s][A[A

  4%|▍         | 13/335 [00:09<03:38,  1.47it/s][A[A

  4%|▍         | 14/335 [00:09<03:36,  1.48it/s][A[A

  4%|▍         | 15/335 [00:10<03:35,  1.48it/s][A[A

  5%|▍         | 16/335 [00:11<03:38,  1.46it/s][A[A

  5%|▌         | 17/335 [00:11<03:40,  1.44it/s][A[A

  5%|▌         | 18/335 [00:12<03:37,  1.46it/s][A[A


Epoch [30/50], average_loss: 4.0411




  0%|          | 1/330 [00:00<03:55,  1.40it/s][A[A

  1%|          | 2/330 [00:01<03:17,  1.66it/s][A[A

  1%|          | 3/330 [00:01<02:47,  1.95it/s][A[A

  1%|          | 4/330 [00:01<02:30,  2.17it/s][A[A

  2%|▏         | 5/330 [00:02<02:16,  2.38it/s][A[A

  2%|▏         | 6/330 [00:02<02:01,  2.67it/s][A[A

  2%|▏         | 7/330 [00:02<01:50,  2.93it/s][A[A

  2%|▏         | 8/330 [00:02<01:42,  3.13it/s][A[A

  3%|▎         | 9/330 [00:03<01:39,  3.22it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.33it/s][A[A

  3%|▎         | 11/330 [00:03<01:35,  3.33it/s][A[A

  4%|▎         | 12/330 [00:03<01:33,  3.40it/s][A[A

  4%|▍         | 13/330 [00:04<01:31,  3.45it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.43it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.45it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.47it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.52it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.55it/s][A[A


Updating best test loss: 5.46100




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 31 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:01<05:42,  1.02s/it][A[A

  1%|          | 2/335 [00:01<05:05,  1.09it/s][A[A

  1%|          | 3/335 [00:02<04:40,  1.18it/s][A[A

  1%|          | 4/335 [00:03<04:23,  1.26it/s][A[A

  1%|▏         | 5/335 [00:03<04:10,  1.31it/s][A[A

  2%|▏         | 6/335 [00:04<04:02,  1.36it/s][A[A

  2%|▏         | 7/335 [00:05<03:55,  1.39it/s][A[A

  2%|▏         | 8/335 [00:05<03:49,  1.42it/s][A[A

  3%|▎         | 9/335 [00:06<03:44,  1.45it/s][A[A

  3%|▎         | 10/335 [00:07<03:43,  1.46it/s][A[A

  3%|▎         | 11/335 [00:07<03:43,  1.45it/s][A[A

  4%|▎         | 12/335 [00:08<03:39,  1.47it/s][A[A

  4%|▍         | 13/335 [00:09<03:42,  1.45it/s][A[A

  4%|▍         | 14/335 [00:09<03:42,  1.44it/s][A[A

  4%|▍         | 15/335 [00:10<03:41,  1.44it/s][A[A

  5%|▍         | 16/335 [00:11<03:39,  1.45it/s][A[A

  5%|▌         | 17/335 [00:11<03:37,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:37,  1.46it/s][A[A


Epoch [31/50], average_loss: 3.6485




  0%|          | 1/330 [00:00<03:47,  1.44it/s][A[A

  1%|          | 2/330 [00:01<03:09,  1.73it/s][A[A

  1%|          | 3/330 [00:01<02:43,  2.00it/s][A[A

  1%|          | 4/330 [00:01<02:23,  2.27it/s][A[A

  2%|▏         | 5/330 [00:01<02:08,  2.53it/s][A[A

  2%|▏         | 6/330 [00:02<01:57,  2.75it/s][A[A

  2%|▏         | 7/330 [00:02<01:50,  2.93it/s][A[A

  2%|▏         | 8/330 [00:02<01:43,  3.13it/s][A[A

  3%|▎         | 9/330 [00:03<01:40,  3.20it/s][A[A

  3%|▎         | 10/330 [00:03<01:37,  3.30it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.29it/s][A[A

  4%|▎         | 12/330 [00:03<01:34,  3.38it/s][A[A

  4%|▍         | 13/330 [00:04<01:31,  3.48it/s][A[A

  4%|▍         | 14/330 [00:04<01:31,  3.45it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.45it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.46it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.51it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.58it/s][A[A


Updating best test loss: 5.20775




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 32 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:01<05:37,  1.01s/it][A[A

  1%|          | 2/335 [00:01<05:06,  1.09it/s][A[A

  1%|          | 3/335 [00:02<04:40,  1.18it/s][A[A

  1%|          | 4/335 [00:03<04:25,  1.25it/s][A[A

  1%|▏         | 5/335 [00:03<04:10,  1.32it/s][A[A

  2%|▏         | 6/335 [00:04<04:01,  1.36it/s][A[A

  2%|▏         | 7/335 [00:05<03:56,  1.39it/s][A[A

  2%|▏         | 8/335 [00:05<03:51,  1.41it/s][A[A

  3%|▎         | 9/335 [00:06<03:51,  1.41it/s][A[A

  3%|▎         | 10/335 [00:07<03:46,  1.43it/s][A[A

  3%|▎         | 11/335 [00:07<03:43,  1.45it/s][A[A

  4%|▎         | 12/335 [00:08<03:43,  1.45it/s][A[A

  4%|▍         | 13/335 [00:09<03:41,  1.45it/s][A[A

  4%|▍         | 14/335 [00:09<03:41,  1.45it/s][A[A

  4%|▍         | 15/335 [00:10<03:39,  1.46it/s][A[A

  5%|▍         | 16/335 [00:11<03:37,  1.47it/s][A[A

  5%|▌         | 17/335 [00:11<03:37,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:37,  1.46it/s][A[A


Epoch [32/50], average_loss: 3.4774




  0%|          | 1/330 [00:00<03:48,  1.44it/s][A[A

  1%|          | 2/330 [00:01<03:12,  1.71it/s][A[A

  1%|          | 3/330 [00:01<02:45,  1.97it/s][A[A

  1%|          | 4/330 [00:01<02:26,  2.22it/s][A[A

  2%|▏         | 5/330 [00:01<02:10,  2.50it/s][A[A

  2%|▏         | 6/330 [00:02<01:57,  2.76it/s][A[A

  2%|▏         | 7/330 [00:02<01:49,  2.95it/s][A[A

  2%|▏         | 8/330 [00:02<01:43,  3.12it/s][A[A

  3%|▎         | 9/330 [00:03<01:39,  3.21it/s][A[A

  3%|▎         | 10/330 [00:03<01:38,  3.26it/s][A[A

  3%|▎         | 11/330 [00:03<01:41,  3.14it/s][A[A

  4%|▎         | 12/330 [00:04<01:37,  3.25it/s][A[A

  4%|▍         | 13/330 [00:04<01:33,  3.39it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.41it/s][A[A

  5%|▍         | 15/330 [00:04<01:32,  3.41it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.48it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.53it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.57it/s][A[A


Updating best test loss: 5.17173




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 33 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:01<06:08,  1.10s/it][A[A

  1%|          | 2/335 [00:01<05:25,  1.02it/s][A[A

  1%|          | 3/335 [00:02<04:55,  1.12it/s][A[A

  1%|          | 4/335 [00:03<04:32,  1.21it/s][A[A

  1%|▏         | 5/335 [00:03<04:15,  1.29it/s][A[A

  2%|▏         | 6/335 [00:04<04:08,  1.33it/s][A[A

  2%|▏         | 7/335 [00:05<03:58,  1.37it/s][A[A

  2%|▏         | 8/335 [00:05<03:52,  1.40it/s][A[A

  3%|▎         | 9/335 [00:06<03:47,  1.43it/s][A[A

  3%|▎         | 10/335 [00:07<03:46,  1.44it/s][A[A

  3%|▎         | 11/335 [00:07<03:47,  1.42it/s][A[A

  4%|▎         | 12/335 [00:08<03:46,  1.43it/s][A[A

  4%|▍         | 13/335 [00:09<03:46,  1.42it/s][A[A

  4%|▍         | 14/335 [00:10<03:43,  1.44it/s][A[A

  4%|▍         | 15/335 [00:10<03:41,  1.45it/s][A[A

  5%|▍         | 16/335 [00:11<03:40,  1.45it/s][A[A

  5%|▌         | 17/335 [00:12<03:40,  1.45it/s][A[A

  5%|▌         | 18/335 [00:12<03:38,  1.45it/s][A[A


Epoch [33/50], average_loss: 3.4471




  0%|          | 1/330 [00:00<04:14,  1.29it/s][A[A

  1%|          | 2/330 [00:01<03:36,  1.51it/s][A[A

  1%|          | 3/330 [00:01<02:59,  1.83it/s][A[A

  1%|          | 4/330 [00:01<02:33,  2.12it/s][A[A

  2%|▏         | 5/330 [00:02<02:14,  2.41it/s][A[A

  2%|▏         | 6/330 [00:02<02:01,  2.66it/s][A[A

  2%|▏         | 7/330 [00:02<01:51,  2.89it/s][A[A

  2%|▏         | 8/330 [00:02<01:44,  3.07it/s][A[A

  3%|▎         | 9/330 [00:03<01:42,  3.14it/s][A[A

  3%|▎         | 10/330 [00:03<01:37,  3.27it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.30it/s][A[A

  4%|▎         | 12/330 [00:04<01:35,  3.34it/s][A[A

  4%|▍         | 13/330 [00:04<01:32,  3.43it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.41it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.46it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.49it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.54it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.57it/s][A[A


Updating best test loss: 5.13012




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 34 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:01<07:33,  1.36s/it][A[A

  1%|          | 2/335 [00:02<06:25,  1.16s/it][A[A

  1%|          | 3/335 [00:02<05:35,  1.01s/it][A[A

  1%|          | 4/335 [00:03<04:59,  1.10it/s][A[A

  1%|▏         | 5/335 [00:04<04:35,  1.20it/s][A[A

  2%|▏         | 6/335 [00:04<04:20,  1.26it/s][A[A

  2%|▏         | 7/335 [00:05<04:08,  1.32it/s][A[A

  2%|▏         | 8/335 [00:06<04:02,  1.35it/s][A[A

  3%|▎         | 9/335 [00:06<03:54,  1.39it/s][A[A

  3%|▎         | 10/335 [00:07<03:48,  1.42it/s][A[A

  3%|▎         | 11/335 [00:08<03:47,  1.42it/s][A[A

  4%|▎         | 12/335 [00:08<03:44,  1.44it/s][A[A

  4%|▍         | 13/335 [00:09<03:41,  1.46it/s][A[A

  4%|▍         | 14/335 [00:10<03:42,  1.44it/s][A[A

  4%|▍         | 15/335 [00:10<03:42,  1.44it/s][A[A

  5%|▍         | 16/335 [00:11<03:38,  1.46it/s][A[A

  5%|▌         | 17/335 [00:12<03:40,  1.44it/s][A[A

  5%|▌         | 18/335 [00:12<03:41,  1.43it/s][A[A


Epoch [34/50], average_loss: 3.3995




  0%|          | 1/330 [00:00<03:57,  1.38it/s][A[A

  1%|          | 2/330 [00:01<03:23,  1.61it/s][A[A

  1%|          | 3/330 [00:01<02:51,  1.91it/s][A[A

  1%|          | 4/330 [00:01<02:29,  2.18it/s][A[A

  2%|▏         | 5/330 [00:01<02:11,  2.47it/s][A[A

  2%|▏         | 6/330 [00:02<01:59,  2.72it/s][A[A

  2%|▏         | 7/330 [00:02<01:49,  2.96it/s][A[A

  2%|▏         | 8/330 [00:02<01:42,  3.13it/s][A[A

  3%|▎         | 9/330 [00:03<01:40,  3.19it/s][A[A

  3%|▎         | 10/330 [00:03<01:37,  3.29it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.30it/s][A[A

  4%|▎         | 12/330 [00:03<01:35,  3.32it/s][A[A

  4%|▍         | 13/330 [00:04<01:32,  3.41it/s][A[A

  4%|▍         | 14/330 [00:04<01:31,  3.45it/s][A[A

  5%|▍         | 15/330 [00:04<01:30,  3.46it/s][A[A

  5%|▍         | 16/330 [00:05<01:31,  3.45it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.48it/s][A[A

  5%|▌         | 18/330 [00:05<01:28,  3.52it/s][A[A


Updating best test loss: 5.11714




  0%|          | 0/335 [00:00<?, ?it/s][A[A



Starting epoch 35 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:01<06:30,  1.17s/it][A[A

  1%|          | 2/335 [00:01<05:40,  1.02s/it][A[A

  1%|          | 3/335 [00:02<05:06,  1.08it/s][A[A

  1%|          | 4/335 [00:03<04:42,  1.17it/s][A[A

  1%|▏         | 5/335 [00:03<04:29,  1.23it/s][A[A

  2%|▏         | 6/335 [00:04<04:17,  1.28it/s][A[A

  2%|▏         | 7/335 [00:05<04:05,  1.33it/s][A[A

  2%|▏         | 8/335 [00:06<04:00,  1.36it/s][A[A

  3%|▎         | 9/335 [00:06<03:55,  1.39it/s][A[A

  3%|▎         | 10/335 [00:07<03:48,  1.42it/s][A[A

  3%|▎         | 11/335 [00:08<03:45,  1.44it/s][A[A

  4%|▎         | 12/335 [00:08<03:45,  1.43it/s][A[A

  4%|▍         | 13/335 [00:09<03:42,  1.45it/s][A[A

  4%|▍         | 14/335 [00:10<03:42,  1.44it/s][A[A

  4%|▍         | 15/335 [00:10<03:38,  1.46it/s][A[A

  5%|▍         | 16/335 [00:11<03:39,  1.45it/s][A[A

  5%|▌         | 17/335 [00:12<03:37,  1.46it/s][A[A

  5%|▌         | 18/335 [00:12<03:37,  1.46it/s][A[A


Epoch [35/50], average_loss: nan




  0%|          | 1/330 [00:00<03:57,  1.39it/s][A[A

  1%|          | 2/330 [00:01<03:16,  1.67it/s][A[A

  1%|          | 3/330 [00:01<02:44,  1.99it/s][A[A

  1%|          | 4/330 [00:01<02:25,  2.24it/s][A[A

  2%|▏         | 5/330 [00:01<02:09,  2.51it/s][A[A

  2%|▏         | 6/330 [00:02<01:55,  2.80it/s][A[A

  2%|▏         | 7/330 [00:02<01:47,  3.01it/s][A[A

  2%|▏         | 8/330 [00:02<01:41,  3.18it/s][A[A

  3%|▎         | 9/330 [00:03<01:41,  3.15it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.32it/s][A[A

  3%|▎         | 11/330 [00:03<01:35,  3.35it/s][A[A

  4%|▎         | 12/330 [00:03<01:31,  3.47it/s][A[A

  4%|▍         | 13/330 [00:04<01:28,  3.57it/s][A[A

  4%|▍         | 14/330 [00:04<01:29,  3.53it/s][A[A

  5%|▍         | 15/330 [00:04<01:28,  3.57it/s][A[A

  5%|▍         | 16/330 [00:04<01:27,  3.59it/s][A[A

  5%|▌         | 17/330 [00:05<01:26,  3.60it/s][A[A

  5%|▌         | 18/330 [00:05<01:25,  3.66it/s][A[A




Starting epoch 36 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:00<05:17,  1.05it/s][A[A

  1%|          | 2/335 [00:01<04:50,  1.15it/s][A[A

  1%|          | 3/335 [00:02<04:25,  1.25it/s][A[A

  1%|          | 4/335 [00:02<04:05,  1.35it/s][A[A

  1%|▏         | 5/335 [00:03<03:51,  1.43it/s][A[A

  2%|▏         | 6/335 [00:04<03:42,  1.48it/s][A[A

  2%|▏         | 7/335 [00:04<03:34,  1.53it/s][A[A

  2%|▏         | 8/335 [00:05<03:33,  1.53it/s][A[A

  3%|▎         | 9/335 [00:05<03:29,  1.55it/s][A[A

  3%|▎         | 10/335 [00:06<03:28,  1.56it/s][A[A

  3%|▎         | 11/335 [00:07<03:22,  1.60it/s][A[A

  4%|▎         | 12/335 [00:07<03:20,  1.61it/s][A[A

  4%|▍         | 13/335 [00:08<03:17,  1.63it/s][A[A

  4%|▍         | 14/335 [00:09<03:17,  1.62it/s][A[A

  4%|▍         | 15/335 [00:09<03:14,  1.64it/s][A[A

  5%|▍         | 16/335 [00:10<03:17,  1.62it/s][A[A

  5%|▌         | 17/335 [00:10<03:14,  1.63it/s][A[A

  5%|▌         | 18/335 [00:11<03:15,  1.62it/s][A[A


Epoch [36/50], average_loss: nan




  0%|          | 1/330 [00:00<04:05,  1.34it/s][A[A

  1%|          | 2/330 [00:01<03:24,  1.60it/s][A[A

  1%|          | 3/330 [00:01<02:50,  1.91it/s][A[A

  1%|          | 4/330 [00:01<02:29,  2.19it/s][A[A

  2%|▏         | 5/330 [00:01<02:12,  2.45it/s][A[A

  2%|▏         | 6/330 [00:02<02:00,  2.69it/s][A[A

  2%|▏         | 7/330 [00:02<01:49,  2.95it/s][A[A

  2%|▏         | 8/330 [00:02<01:41,  3.17it/s][A[A

  3%|▎         | 9/330 [00:03<01:39,  3.22it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.32it/s][A[A

  3%|▎         | 11/330 [00:03<01:38,  3.23it/s][A[A

  4%|▎         | 12/330 [00:03<01:35,  3.33it/s][A[A

  4%|▍         | 13/330 [00:04<01:31,  3.48it/s][A[A

  4%|▍         | 14/330 [00:04<01:29,  3.52it/s][A[A

  5%|▍         | 15/330 [00:04<01:28,  3.56it/s][A[A

  5%|▍         | 16/330 [00:05<01:28,  3.53it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.49it/s][A[A

  5%|▌         | 18/330 [00:05<01:29,  3.49it/s][A[A




Starting epoch 37 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:01<05:50,  1.05s/it][A[A

  1%|          | 2/335 [00:01<05:05,  1.09it/s][A[A

  1%|          | 3/335 [00:02<04:34,  1.21it/s][A[A

  1%|          | 4/335 [00:02<04:12,  1.31it/s][A[A

  1%|▏         | 5/335 [00:03<03:56,  1.39it/s][A[A

  2%|▏         | 6/335 [00:04<03:45,  1.46it/s][A[A

  2%|▏         | 7/335 [00:04<03:38,  1.50it/s][A[A

  2%|▏         | 8/335 [00:05<03:35,  1.52it/s][A[A

  3%|▎         | 9/335 [00:05<03:30,  1.55it/s][A[A

  3%|▎         | 10/335 [00:06<03:28,  1.56it/s][A[A

  3%|▎         | 11/335 [00:07<03:24,  1.59it/s][A[A

  4%|▎         | 12/335 [00:07<03:24,  1.58it/s][A[A

  4%|▍         | 13/335 [00:08<03:20,  1.60it/s][A[A

  4%|▍         | 14/335 [00:09<03:19,  1.61it/s][A[A

  4%|▍         | 15/335 [00:09<03:17,  1.62it/s][A[A

  5%|▍         | 16/335 [00:10<03:16,  1.62it/s][A[A

  5%|▌         | 17/335 [00:10<03:17,  1.61it/s][A[A

  5%|▌         | 18/335 [00:11<03:14,  1.63it/s][A[A


Epoch [37/50], average_loss: nan




  0%|          | 1/330 [00:00<03:56,  1.39it/s][A[A

  1%|          | 2/330 [00:01<03:17,  1.66it/s][A[A

  1%|          | 3/330 [00:01<02:46,  1.97it/s][A[A

  1%|          | 4/330 [00:01<02:23,  2.27it/s][A[A

  2%|▏         | 5/330 [00:01<02:07,  2.55it/s][A[A

  2%|▏         | 6/330 [00:02<01:54,  2.83it/s][A[A

  2%|▏         | 7/330 [00:02<01:46,  3.03it/s][A[A

  2%|▏         | 8/330 [00:02<01:40,  3.21it/s][A[A

  3%|▎         | 9/330 [00:02<01:38,  3.27it/s][A[A

  3%|▎         | 10/330 [00:03<01:33,  3.43it/s][A[A

  3%|▎         | 11/330 [00:03<01:33,  3.42it/s][A[A

  4%|▎         | 12/330 [00:03<01:31,  3.49it/s][A[A

  4%|▍         | 13/330 [00:04<01:28,  3.59it/s][A[A

  4%|▍         | 14/330 [00:04<01:30,  3.47it/s][A[A

  5%|▍         | 15/330 [00:04<01:29,  3.52it/s][A[A

  5%|▍         | 16/330 [00:04<01:29,  3.53it/s][A[A

  5%|▌         | 17/330 [00:05<01:27,  3.58it/s][A[A

  5%|▌         | 18/330 [00:05<01:26,  3.62it/s][A[A




Starting epoch 38 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:00<04:45,  1.17it/s][A[A

  1%|          | 2/335 [00:01<04:23,  1.26it/s][A[A

  1%|          | 3/335 [00:02<04:02,  1.37it/s][A[A

  1%|          | 4/335 [00:02<03:50,  1.44it/s][A[A

  1%|▏         | 5/335 [00:03<03:41,  1.49it/s][A[A

  2%|▏         | 6/335 [00:03<03:34,  1.53it/s][A[A

  2%|▏         | 7/335 [00:04<03:29,  1.57it/s][A[A

  2%|▏         | 8/335 [00:05<03:23,  1.61it/s][A[A

  3%|▎         | 9/335 [00:05<03:21,  1.62it/s][A[A

  3%|▎         | 10/335 [00:06<03:27,  1.57it/s][A[A

  3%|▎         | 11/335 [00:07<03:24,  1.58it/s][A[A

  4%|▎         | 12/335 [00:07<03:23,  1.58it/s][A[A

  4%|▍         | 13/335 [00:08<03:21,  1.60it/s][A[A

  4%|▍         | 14/335 [00:08<03:19,  1.61it/s][A[A

  4%|▍         | 15/335 [00:09<03:15,  1.63it/s][A[A

  5%|▍         | 16/335 [00:10<03:13,  1.65it/s][A[A

  5%|▌         | 17/335 [00:10<03:13,  1.64it/s][A[A

  5%|▌         | 18/335 [00:11<03:13,  1.64it/s][A[A


Epoch [38/50], average_loss: nan




  0%|          | 1/330 [00:00<03:55,  1.40it/s][A[A

  1%|          | 2/330 [00:01<03:14,  1.68it/s][A[A

  1%|          | 3/330 [00:01<02:43,  2.00it/s][A[A

  1%|          | 4/330 [00:01<02:26,  2.23it/s][A[A

  2%|▏         | 5/330 [00:01<02:08,  2.52it/s][A[A

  2%|▏         | 6/330 [00:02<01:57,  2.76it/s][A[A

  2%|▏         | 7/330 [00:02<01:47,  3.01it/s][A[A

  2%|▏         | 8/330 [00:02<01:39,  3.22it/s][A[A

  3%|▎         | 9/330 [00:03<01:37,  3.29it/s][A[A

  3%|▎         | 10/330 [00:03<01:32,  3.45it/s][A[A

  3%|▎         | 11/330 [00:03<01:33,  3.42it/s][A[A

  4%|▎         | 12/330 [00:03<01:32,  3.45it/s][A[A

  4%|▍         | 13/330 [00:04<01:30,  3.51it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.40it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.43it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.45it/s][A[A

  5%|▌         | 17/330 [00:05<01:27,  3.57it/s][A[A

  5%|▌         | 18/330 [00:05<01:25,  3.64it/s][A[A




Starting epoch 39 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:00<05:27,  1.02it/s][A[A

  1%|          | 2/335 [00:01<04:53,  1.13it/s][A[A

  1%|          | 3/335 [00:02<04:27,  1.24it/s][A[A

  1%|          | 4/335 [00:02<04:05,  1.35it/s][A[A

  1%|▏         | 5/335 [00:03<03:50,  1.43it/s][A[A

  2%|▏         | 6/335 [00:04<03:43,  1.47it/s][A[A

  2%|▏         | 7/335 [00:04<03:34,  1.53it/s][A[A

  2%|▏         | 8/335 [00:05<03:28,  1.57it/s][A[A

  3%|▎         | 9/335 [00:05<03:30,  1.55it/s][A[A

  3%|▎         | 10/335 [00:06<03:27,  1.56it/s][A[A

  3%|▎         | 11/335 [00:07<03:31,  1.53it/s][A[A

  4%|▎         | 12/335 [00:07<03:27,  1.55it/s][A[A

  4%|▍         | 13/335 [00:08<03:27,  1.55it/s][A[A

  4%|▍         | 14/335 [00:09<03:24,  1.57it/s][A[A

  4%|▍         | 15/335 [00:09<03:20,  1.60it/s][A[A

  5%|▍         | 16/335 [00:10<03:19,  1.60it/s][A[A

  5%|▌         | 17/335 [00:10<03:17,  1.61it/s][A[A

  5%|▌         | 18/335 [00:11<03:17,  1.61it/s][A[A


Epoch [39/50], average_loss: nan




  0%|          | 1/330 [00:00<04:01,  1.36it/s][A[A

  1%|          | 2/330 [00:01<03:20,  1.63it/s][A[A

  1%|          | 3/330 [00:01<02:47,  1.95it/s][A[A

  1%|          | 4/330 [00:01<02:26,  2.23it/s][A[A

  2%|▏         | 5/330 [00:01<02:08,  2.53it/s][A[A

  2%|▏         | 6/330 [00:02<01:55,  2.81it/s][A[A

  2%|▏         | 7/330 [00:02<01:46,  3.03it/s][A[A

  2%|▏         | 8/330 [00:02<01:40,  3.21it/s][A[A

  3%|▎         | 9/330 [00:02<01:36,  3.31it/s][A[A

  3%|▎         | 10/330 [00:03<01:32,  3.44it/s][A[A

  3%|▎         | 11/330 [00:03<01:34,  3.38it/s][A[A

  4%|▎         | 12/330 [00:03<01:31,  3.47it/s][A[A

  4%|▍         | 13/330 [00:04<01:28,  3.59it/s][A[A

  4%|▍         | 14/330 [00:04<01:27,  3.59it/s][A[A

  5%|▍         | 15/330 [00:04<01:29,  3.52it/s][A[A

  5%|▍         | 16/330 [00:04<01:29,  3.50it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.54it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.57it/s][A[A




Starting epoch 40 / 50
Learning Rate for this epoch: 0.0001




  0%|          | 1/335 [00:00<05:08,  1.08it/s][A[A

  1%|          | 2/335 [00:01<04:39,  1.19it/s][A[A

  1%|          | 3/335 [00:02<04:15,  1.30it/s][A[A

  1%|          | 4/335 [00:02<03:56,  1.40it/s][A[A

  1%|▏         | 5/335 [00:03<03:44,  1.47it/s][A[A

  2%|▏         | 6/335 [00:03<03:37,  1.52it/s][A[A

  2%|▏         | 7/335 [00:04<03:36,  1.51it/s][A[A

  2%|▏         | 8/335 [00:05<03:30,  1.55it/s][A[A

  3%|▎         | 9/335 [00:05<03:26,  1.58it/s][A[A

  3%|▎         | 10/335 [00:06<03:23,  1.60it/s][A[A

  3%|▎         | 11/335 [00:07<03:20,  1.62it/s][A[A

  4%|▎         | 12/335 [00:07<03:23,  1.59it/s][A[A

  4%|▍         | 13/335 [00:08<03:25,  1.57it/s][A[A

  4%|▍         | 14/335 [00:08<03:22,  1.59it/s][A[A

  4%|▍         | 15/335 [00:09<03:18,  1.61it/s][A[A

  5%|▍         | 16/335 [00:10<03:18,  1.61it/s][A[A

  5%|▌         | 17/335 [00:10<03:18,  1.60it/s][A[A

  5%|▌         | 18/335 [00:11<03:18,  1.59it/s][A[A


Epoch [40/50], average_loss: nan




  0%|          | 1/330 [00:00<03:59,  1.37it/s][A[A

  1%|          | 2/330 [00:01<03:20,  1.63it/s][A[A

  1%|          | 3/330 [00:01<02:46,  1.96it/s][A[A

  1%|          | 4/330 [00:01<02:25,  2.24it/s][A[A

  2%|▏         | 5/330 [00:01<02:09,  2.51it/s][A[A

  2%|▏         | 6/330 [00:02<01:55,  2.80it/s][A[A

  2%|▏         | 7/330 [00:02<01:46,  3.03it/s][A[A

  2%|▏         | 8/330 [00:02<01:40,  3.20it/s][A[A

  3%|▎         | 9/330 [00:03<01:37,  3.30it/s][A[A

  3%|▎         | 10/330 [00:03<01:33,  3.42it/s][A[A

  3%|▎         | 11/330 [00:03<01:32,  3.45it/s][A[A

  4%|▎         | 12/330 [00:03<01:31,  3.49it/s][A[A

  4%|▍         | 13/330 [00:04<01:27,  3.62it/s][A[A

  4%|▍         | 14/330 [00:04<01:27,  3.60it/s][A[A

  5%|▍         | 15/330 [00:04<01:28,  3.55it/s][A[A

  5%|▍         | 16/330 [00:04<01:29,  3.51it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.54it/s][A[A

  5%|▌         | 18/330 [00:05<01:28,  3.52it/s][A[A




Starting epoch 41 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:00<05:24,  1.03it/s][A[A

  1%|          | 2/335 [00:01<04:48,  1.15it/s][A[A

  1%|          | 3/335 [00:02<04:21,  1.27it/s][A[A

  1%|          | 4/335 [00:02<04:02,  1.36it/s][A[A

  1%|▏         | 5/335 [00:03<03:49,  1.44it/s][A[A

  2%|▏         | 6/335 [00:04<03:40,  1.49it/s][A[A

  2%|▏         | 7/335 [00:04<03:32,  1.54it/s][A[A

  2%|▏         | 8/335 [00:05<03:29,  1.56it/s][A[A

  3%|▎         | 9/335 [00:05<03:24,  1.59it/s][A[A

  3%|▎         | 10/335 [00:06<03:23,  1.59it/s][A[A

  3%|▎         | 11/335 [00:07<03:20,  1.61it/s][A[A

  4%|▎         | 12/335 [00:07<03:18,  1.63it/s][A[A

  4%|▍         | 13/335 [00:08<03:16,  1.64it/s][A[A

  4%|▍         | 14/335 [00:08<03:14,  1.65it/s][A[A

  4%|▍         | 15/335 [00:09<03:16,  1.62it/s][A[A

  5%|▍         | 16/335 [00:10<03:18,  1.61it/s][A[A

  5%|▌         | 17/335 [00:10<03:17,  1.61it/s][A[A

  5%|▌         | 18/335 [00:11<03:18,  1.60it/s][A[A


Epoch [41/50], average_loss: nan




  0%|          | 1/330 [00:00<04:01,  1.36it/s][A[A

  1%|          | 2/330 [00:01<03:20,  1.63it/s][A[A

  1%|          | 3/330 [00:01<02:49,  1.93it/s][A[A

  1%|          | 4/330 [00:01<02:32,  2.13it/s][A[A

  2%|▏         | 5/330 [00:01<02:14,  2.42it/s][A[A

  2%|▏         | 6/330 [00:02<01:59,  2.71it/s][A[A

  2%|▏         | 7/330 [00:02<01:48,  2.96it/s][A[A

  2%|▏         | 8/330 [00:02<01:41,  3.17it/s][A[A

  3%|▎         | 9/330 [00:03<01:39,  3.24it/s][A[A

  3%|▎         | 10/330 [00:03<01:33,  3.42it/s][A[A

  3%|▎         | 11/330 [00:03<01:32,  3.43it/s][A[A

  4%|▎         | 12/330 [00:03<01:29,  3.54it/s][A[A

  4%|▍         | 13/330 [00:04<01:28,  3.59it/s][A[A

  4%|▍         | 14/330 [00:04<01:27,  3.60it/s][A[A

  5%|▍         | 15/330 [00:04<01:27,  3.59it/s][A[A

  5%|▍         | 16/330 [00:04<01:26,  3.61it/s][A[A

  5%|▌         | 17/330 [00:05<01:25,  3.66it/s][A[A

  5%|▌         | 18/330 [00:05<01:23,  3.73it/s][A[A




Starting epoch 42 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:00<05:03,  1.10it/s][A[A

  1%|          | 2/335 [00:01<04:37,  1.20it/s][A[A

  1%|          | 3/335 [00:02<04:13,  1.31it/s][A[A

  1%|          | 4/335 [00:02<03:56,  1.40it/s][A[A

  1%|▏         | 5/335 [00:03<03:44,  1.47it/s][A[A

  2%|▏         | 6/335 [00:03<03:36,  1.52it/s][A[A

  2%|▏         | 7/335 [00:04<03:30,  1.56it/s][A[A

  2%|▏         | 8/335 [00:05<03:25,  1.59it/s][A[A

  3%|▎         | 9/335 [00:05<03:25,  1.58it/s][A[A

  3%|▎         | 10/335 [00:06<03:22,  1.61it/s][A[A

  3%|▎         | 11/335 [00:07<03:19,  1.63it/s][A[A

  4%|▎         | 12/335 [00:07<03:17,  1.64it/s][A[A

  4%|▍         | 13/335 [00:08<03:15,  1.64it/s][A[A

  4%|▍         | 14/335 [00:08<03:15,  1.64it/s][A[A

  4%|▍         | 15/335 [00:09<03:15,  1.64it/s][A[A

  5%|▍         | 16/335 [00:10<03:17,  1.62it/s][A[A

  5%|▌         | 17/335 [00:10<03:17,  1.61it/s][A[A

  5%|▌         | 18/335 [00:11<03:14,  1.63it/s][A[A


Epoch [42/50], average_loss: nan




  0%|          | 1/330 [00:00<04:02,  1.36it/s][A[A

  1%|          | 2/330 [00:01<03:26,  1.59it/s][A[A

  1%|          | 3/330 [00:01<02:50,  1.92it/s][A[A

  1%|          | 4/330 [00:01<02:26,  2.23it/s][A[A

  2%|▏         | 5/330 [00:01<02:10,  2.50it/s][A[A

  2%|▏         | 6/330 [00:02<01:55,  2.80it/s][A[A

  2%|▏         | 7/330 [00:02<01:46,  3.02it/s][A[A

  2%|▏         | 8/330 [00:02<01:40,  3.20it/s][A[A

  3%|▎         | 9/330 [00:03<01:37,  3.28it/s][A[A

  3%|▎         | 10/330 [00:03<01:33,  3.41it/s][A[A

  3%|▎         | 11/330 [00:03<01:34,  3.38it/s][A[A

  4%|▎         | 12/330 [00:03<01:32,  3.43it/s][A[A

  4%|▍         | 13/330 [00:04<01:29,  3.54it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.43it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.44it/s][A[A

  5%|▍         | 16/330 [00:05<01:32,  3.40it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.50it/s][A[A

  5%|▌         | 18/330 [00:05<01:26,  3.59it/s][A[A




Starting epoch 43 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:01<05:40,  1.02s/it][A[A

  1%|          | 2/335 [00:01<05:04,  1.09it/s][A[A

  1%|          | 3/335 [00:02<04:34,  1.21it/s][A[A

  1%|          | 4/335 [00:02<04:15,  1.29it/s][A[A

  1%|▏         | 5/335 [00:03<03:59,  1.38it/s][A[A

  2%|▏         | 6/335 [00:04<03:47,  1.45it/s][A[A

  2%|▏         | 7/335 [00:04<03:39,  1.49it/s][A[A

  2%|▏         | 8/335 [00:05<03:31,  1.55it/s][A[A

  3%|▎         | 9/335 [00:06<03:30,  1.55it/s][A[A

  3%|▎         | 10/335 [00:06<03:27,  1.57it/s][A[A

  3%|▎         | 11/335 [00:07<03:22,  1.60it/s][A[A

  4%|▎         | 12/335 [00:07<03:19,  1.62it/s][A[A

  4%|▍         | 13/335 [00:08<03:20,  1.61it/s][A[A

  4%|▍         | 14/335 [00:09<03:17,  1.63it/s][A[A

  4%|▍         | 15/335 [00:09<03:15,  1.64it/s][A[A

  5%|▍         | 16/335 [00:10<03:18,  1.61it/s][A[A

  5%|▌         | 17/335 [00:10<03:17,  1.61it/s][A[A

  5%|▌         | 18/335 [00:11<03:17,  1.61it/s][A[A


Epoch [43/50], average_loss: nan




  0%|          | 1/330 [00:00<03:41,  1.49it/s][A[A

  1%|          | 2/330 [00:01<03:06,  1.76it/s][A[A

  1%|          | 3/330 [00:01<02:37,  2.07it/s][A[A

  1%|          | 4/330 [00:01<02:19,  2.34it/s][A[A

  2%|▏         | 5/330 [00:01<02:06,  2.58it/s][A[A

  2%|▏         | 6/330 [00:02<01:53,  2.86it/s][A[A

  2%|▏         | 7/330 [00:02<01:44,  3.08it/s][A[A

  2%|▏         | 8/330 [00:02<01:38,  3.28it/s][A[A

  3%|▎         | 9/330 [00:02<01:35,  3.36it/s][A[A

  3%|▎         | 10/330 [00:03<01:30,  3.52it/s][A[A

  3%|▎         | 11/330 [00:03<01:31,  3.47it/s][A[A

  4%|▎         | 12/330 [00:03<01:30,  3.50it/s][A[A

  4%|▍         | 13/330 [00:04<01:27,  3.62it/s][A[A

  4%|▍         | 14/330 [00:04<01:27,  3.63it/s][A[A

  5%|▍         | 15/330 [00:04<01:27,  3.60it/s][A[A

  5%|▍         | 16/330 [00:04<01:26,  3.64it/s][A[A

  5%|▌         | 17/330 [00:05<01:24,  3.71it/s][A[A

  5%|▌         | 18/330 [00:05<01:24,  3.69it/s][A[A




Starting epoch 44 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:00<05:22,  1.04it/s][A[A

  1%|          | 2/335 [00:01<04:56,  1.12it/s][A[A

  1%|          | 3/335 [00:02<04:27,  1.24it/s][A[A

  1%|          | 4/335 [00:02<04:08,  1.33it/s][A[A

  1%|▏         | 5/335 [00:03<03:51,  1.42it/s][A[A

  2%|▏         | 6/335 [00:04<03:41,  1.49it/s][A[A

  2%|▏         | 7/335 [00:04<03:34,  1.53it/s][A[A

  2%|▏         | 8/335 [00:05<03:31,  1.55it/s][A[A

  3%|▎         | 9/335 [00:05<03:30,  1.55it/s][A[A

  3%|▎         | 10/335 [00:06<03:25,  1.58it/s][A[A

  3%|▎         | 11/335 [00:07<03:23,  1.59it/s][A[A

  4%|▎         | 12/335 [00:07<03:20,  1.61it/s][A[A

  4%|▍         | 13/335 [00:08<03:18,  1.62it/s][A[A

  4%|▍         | 14/335 [00:09<03:15,  1.64it/s][A[A

  4%|▍         | 15/335 [00:09<03:18,  1.61it/s][A[A

  5%|▍         | 16/335 [00:10<03:19,  1.60it/s][A[A

  5%|▌         | 17/335 [00:10<03:21,  1.58it/s][A[A

  5%|▌         | 18/335 [00:11<03:18,  1.60it/s][A[A


Epoch [44/50], average_loss: nan




  0%|          | 1/330 [00:00<04:05,  1.34it/s][A[A

  1%|          | 2/330 [00:01<03:24,  1.60it/s][A[A

  1%|          | 3/330 [00:01<02:50,  1.92it/s][A[A

  1%|          | 4/330 [00:01<02:29,  2.17it/s][A[A

  2%|▏         | 5/330 [00:01<02:14,  2.42it/s][A[A

  2%|▏         | 6/330 [00:02<01:59,  2.71it/s][A[A

  2%|▏         | 7/330 [00:02<01:50,  2.92it/s][A[A

  2%|▏         | 8/330 [00:02<01:45,  3.06it/s][A[A

  3%|▎         | 9/330 [00:03<01:43,  3.09it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.30it/s][A[A

  3%|▎         | 11/330 [00:03<01:38,  3.25it/s][A[A

  4%|▎         | 12/330 [00:04<01:37,  3.25it/s][A[A

  4%|▍         | 13/330 [00:04<01:34,  3.36it/s][A[A

  4%|▍         | 14/330 [00:04<01:32,  3.41it/s][A[A

  5%|▍         | 15/330 [00:04<01:30,  3.48it/s][A[A

  5%|▍         | 16/330 [00:05<01:28,  3.53it/s][A[A

  5%|▌         | 17/330 [00:05<01:26,  3.62it/s][A[A

  5%|▌         | 18/330 [00:05<01:26,  3.59it/s][A[A




Starting epoch 45 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:01<05:36,  1.01s/it][A[A

  1%|          | 2/335 [00:01<04:55,  1.13it/s][A[A

  1%|          | 3/335 [00:02<04:25,  1.25it/s][A[A

  1%|          | 4/335 [00:02<04:07,  1.34it/s][A[A

  1%|▏         | 5/335 [00:03<03:55,  1.40it/s][A[A

  2%|▏         | 6/335 [00:04<03:42,  1.48it/s][A[A

  2%|▏         | 7/335 [00:04<03:34,  1.53it/s][A[A

  2%|▏         | 8/335 [00:05<03:27,  1.57it/s][A[A

  3%|▎         | 9/335 [00:05<03:26,  1.58it/s][A[A

  3%|▎         | 10/335 [00:06<03:22,  1.60it/s][A[A

  3%|▎         | 11/335 [00:07<03:19,  1.62it/s][A[A

  4%|▎         | 12/335 [00:07<03:21,  1.60it/s][A[A

  4%|▍         | 13/335 [00:08<03:18,  1.62it/s][A[A

  4%|▍         | 14/335 [00:08<03:18,  1.62it/s][A[A

  4%|▍         | 15/335 [00:09<03:18,  1.62it/s][A[A

  5%|▍         | 16/335 [00:10<03:17,  1.62it/s][A[A

  5%|▌         | 17/335 [00:10<03:14,  1.63it/s][A[A

  5%|▌         | 18/335 [00:11<03:16,  1.62it/s][A[A


Epoch [45/50], average_loss: nan




  0%|          | 1/330 [00:00<03:56,  1.39it/s][A[A

  1%|          | 2/330 [00:01<03:14,  1.68it/s][A[A

  1%|          | 3/330 [00:01<02:44,  1.98it/s][A[A

  1%|          | 4/330 [00:01<02:23,  2.27it/s][A[A

  2%|▏         | 5/330 [00:01<02:06,  2.56it/s][A[A

  2%|▏         | 6/330 [00:02<01:54,  2.82it/s][A[A

  2%|▏         | 7/330 [00:02<01:48,  2.98it/s][A[A

  2%|▏         | 8/330 [00:02<01:41,  3.17it/s][A[A

  3%|▎         | 9/330 [00:03<01:38,  3.26it/s][A[A

  3%|▎         | 10/330 [00:03<01:33,  3.41it/s][A[A

  3%|▎         | 11/330 [00:03<01:33,  3.41it/s][A[A

  4%|▎         | 12/330 [00:03<01:34,  3.38it/s][A[A

  4%|▍         | 13/330 [00:04<01:30,  3.50it/s][A[A

  4%|▍         | 14/330 [00:04<01:31,  3.44it/s][A[A

  5%|▍         | 15/330 [00:04<01:29,  3.51it/s][A[A

  5%|▍         | 16/330 [00:04<01:27,  3.57it/s][A[A

  5%|▌         | 17/330 [00:05<01:25,  3.68it/s][A[A

  5%|▌         | 18/330 [00:05<01:22,  3.77it/s][A[A




Starting epoch 46 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:00<05:09,  1.08it/s][A[A

  1%|          | 2/335 [00:01<04:45,  1.17it/s][A[A

  1%|          | 3/335 [00:02<04:17,  1.29it/s][A[A

  1%|          | 4/335 [00:02<04:03,  1.36it/s][A[A

  1%|▏         | 5/335 [00:03<03:55,  1.40it/s][A[A

  2%|▏         | 6/335 [00:04<03:43,  1.47it/s][A[A

  2%|▏         | 7/335 [00:04<03:36,  1.51it/s][A[A

  2%|▏         | 8/335 [00:05<03:32,  1.54it/s][A[A

  3%|▎         | 9/335 [00:05<03:27,  1.57it/s][A[A

  3%|▎         | 10/335 [00:06<03:22,  1.60it/s][A[A

  3%|▎         | 11/335 [00:07<03:20,  1.61it/s][A[A

  4%|▎         | 12/335 [00:07<03:18,  1.63it/s][A[A

  4%|▍         | 13/335 [00:08<03:16,  1.64it/s][A[A

  4%|▍         | 14/335 [00:08<03:15,  1.64it/s][A[A

  4%|▍         | 15/335 [00:09<03:16,  1.63it/s][A[A

  5%|▍         | 16/335 [00:10<03:17,  1.62it/s][A[A

  5%|▌         | 17/335 [00:10<03:16,  1.61it/s][A[A

  5%|▌         | 18/335 [00:11<03:15,  1.63it/s][A[A


Epoch [46/50], average_loss: nan




  0%|          | 1/330 [00:00<03:40,  1.49it/s][A[A

  1%|          | 2/330 [00:01<03:09,  1.73it/s][A[A

  1%|          | 3/330 [00:01<02:39,  2.06it/s][A[A

  1%|          | 4/330 [00:01<02:21,  2.30it/s][A[A

  2%|▏         | 5/330 [00:01<02:05,  2.59it/s][A[A

  2%|▏         | 6/330 [00:02<01:52,  2.88it/s][A[A

  2%|▏         | 7/330 [00:02<01:45,  3.07it/s][A[A

  2%|▏         | 8/330 [00:02<01:39,  3.22it/s][A[A

  3%|▎         | 9/330 [00:02<01:38,  3.27it/s][A[A

  3%|▎         | 10/330 [00:03<01:32,  3.45it/s][A[A

  3%|▎         | 11/330 [00:03<01:32,  3.46it/s][A[A

  4%|▎         | 12/330 [00:03<01:31,  3.49it/s][A[A

  4%|▍         | 13/330 [00:04<01:28,  3.58it/s][A[A

  4%|▍         | 14/330 [00:04<01:29,  3.53it/s][A[A

  5%|▍         | 15/330 [00:04<01:28,  3.55it/s][A[A

  5%|▍         | 16/330 [00:04<01:27,  3.59it/s][A[A

  5%|▌         | 17/330 [00:05<01:25,  3.66it/s][A[A

  5%|▌         | 18/330 [00:05<01:24,  3.69it/s][A[A




Starting epoch 47 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:00<05:22,  1.03it/s][A[A

  1%|          | 2/335 [00:01<04:46,  1.16it/s][A[A

  1%|          | 3/335 [00:02<04:20,  1.28it/s][A[A

  1%|          | 4/335 [00:02<04:03,  1.36it/s][A[A

  1%|▏         | 5/335 [00:03<03:53,  1.41it/s][A[A

  2%|▏         | 6/335 [00:04<03:43,  1.47it/s][A[A

  2%|▏         | 7/335 [00:04<03:34,  1.53it/s][A[A

  2%|▏         | 8/335 [00:05<03:28,  1.57it/s][A[A

  3%|▎         | 9/335 [00:05<03:25,  1.59it/s][A[A

  3%|▎         | 10/335 [00:06<03:24,  1.59it/s][A[A

  3%|▎         | 11/335 [00:07<03:21,  1.61it/s][A[A

  4%|▎         | 12/335 [00:07<03:21,  1.61it/s][A[A

  4%|▍         | 13/335 [00:08<03:19,  1.61it/s][A[A

  4%|▍         | 14/335 [00:08<03:17,  1.62it/s][A[A

  4%|▍         | 15/335 [00:09<03:16,  1.63it/s][A[A

  5%|▍         | 16/335 [00:10<03:23,  1.57it/s][A[A

  5%|▌         | 17/335 [00:10<03:23,  1.56it/s][A[A

  5%|▌         | 18/335 [00:11<03:19,  1.59it/s][A[A


Epoch [47/50], average_loss: nan




  0%|          | 1/330 [00:00<03:57,  1.38it/s][A[A

  1%|          | 2/330 [00:01<03:16,  1.67it/s][A[A

  1%|          | 3/330 [00:01<02:45,  1.98it/s][A[A

  1%|          | 4/330 [00:01<02:24,  2.26it/s][A[A

  2%|▏         | 5/330 [00:01<02:08,  2.54it/s][A[A

  2%|▏         | 6/330 [00:02<01:56,  2.79it/s][A[A

  2%|▏         | 7/330 [00:02<01:46,  3.03it/s][A[A

  2%|▏         | 8/330 [00:02<01:40,  3.22it/s][A[A

  3%|▎         | 9/330 [00:02<01:36,  3.32it/s][A[A

  3%|▎         | 10/330 [00:03<01:32,  3.46it/s][A[A

  3%|▎         | 11/330 [00:03<01:33,  3.41it/s][A[A

  4%|▎         | 12/330 [00:03<01:31,  3.47it/s][A[A

  4%|▍         | 13/330 [00:04<01:29,  3.56it/s][A[A

  4%|▍         | 14/330 [00:04<01:28,  3.55it/s][A[A

  5%|▍         | 15/330 [00:04<01:28,  3.58it/s][A[A

  5%|▍         | 16/330 [00:04<01:26,  3.64it/s][A[A

  5%|▌         | 17/330 [00:05<01:25,  3.68it/s][A[A

  5%|▌         | 18/330 [00:05<01:24,  3.69it/s][A[A




Starting epoch 48 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:01<05:40,  1.02s/it][A[A

  1%|          | 2/335 [00:01<05:04,  1.09it/s][A[A

  1%|          | 3/335 [00:02<04:33,  1.21it/s][A[A

  1%|          | 4/335 [00:02<04:11,  1.32it/s][A[A

  1%|▏         | 5/335 [00:03<03:56,  1.39it/s][A[A

  2%|▏         | 6/335 [00:04<03:44,  1.47it/s][A[A

  2%|▏         | 7/335 [00:04<03:36,  1.51it/s][A[A

  2%|▏         | 8/335 [00:05<03:31,  1.55it/s][A[A

  3%|▎         | 9/335 [00:05<03:26,  1.58it/s][A[A

  3%|▎         | 10/335 [00:06<03:22,  1.61it/s][A[A

  3%|▎         | 11/335 [00:07<03:18,  1.63it/s][A[A

  4%|▎         | 12/335 [00:07<03:17,  1.63it/s][A[A

  4%|▍         | 13/335 [00:08<03:18,  1.62it/s][A[A

  4%|▍         | 14/335 [00:09<03:19,  1.61it/s][A[A

  4%|▍         | 15/335 [00:09<03:16,  1.63it/s][A[A

  5%|▍         | 16/335 [00:10<03:16,  1.63it/s][A[A

  5%|▌         | 17/335 [00:10<03:15,  1.63it/s][A[A

  5%|▌         | 18/335 [00:11<03:15,  1.62it/s][A[A


Epoch [48/50], average_loss: nan




  0%|          | 1/330 [00:00<03:41,  1.49it/s][A[A

  1%|          | 2/330 [00:01<03:10,  1.72it/s][A[A

  1%|          | 3/330 [00:01<02:41,  2.02it/s][A[A

  1%|          | 4/330 [00:01<02:22,  2.29it/s][A[A

  2%|▏         | 5/330 [00:01<02:06,  2.57it/s][A[A

  2%|▏         | 6/330 [00:02<01:53,  2.84it/s][A[A

  2%|▏         | 7/330 [00:02<01:44,  3.08it/s][A[A

  2%|▏         | 8/330 [00:02<01:39,  3.23it/s][A[A

  3%|▎         | 9/330 [00:03<01:37,  3.29it/s][A[A

  3%|▎         | 10/330 [00:03<01:35,  3.35it/s][A[A

  3%|▎         | 11/330 [00:03<01:39,  3.20it/s][A[A

  4%|▎         | 12/330 [00:03<01:36,  3.30it/s][A[A

  4%|▍         | 13/330 [00:04<01:32,  3.42it/s][A[A

  4%|▍         | 14/330 [00:04<01:31,  3.44it/s][A[A

  5%|▍         | 15/330 [00:04<01:31,  3.46it/s][A[A

  5%|▍         | 16/330 [00:05<01:29,  3.51it/s][A[A

  5%|▌         | 17/330 [00:05<01:27,  3.58it/s][A[A

  5%|▌         | 18/330 [00:05<01:24,  3.68it/s][A[A




Starting epoch 49 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:00<04:54,  1.14it/s][A[A

  1%|          | 2/335 [00:01<04:32,  1.22it/s][A[A

  1%|          | 3/335 [00:02<04:10,  1.33it/s][A[A

  1%|          | 4/335 [00:02<03:54,  1.41it/s][A[A

  1%|▏         | 5/335 [00:03<03:46,  1.46it/s][A[A

  2%|▏         | 6/335 [00:04<03:39,  1.50it/s][A[A

  2%|▏         | 7/335 [00:04<03:32,  1.54it/s][A[A

  2%|▏         | 8/335 [00:05<03:30,  1.55it/s][A[A

  3%|▎         | 9/335 [00:05<03:26,  1.58it/s][A[A

  3%|▎         | 10/335 [00:06<03:22,  1.61it/s][A[A

  3%|▎         | 11/335 [00:07<03:19,  1.63it/s][A[A

  4%|▎         | 12/335 [00:07<03:20,  1.61it/s][A[A

  4%|▍         | 13/335 [00:08<03:17,  1.63it/s][A[A

  4%|▍         | 14/335 [00:08<03:16,  1.64it/s][A[A

  4%|▍         | 15/335 [00:09<03:15,  1.64it/s][A[A

  5%|▍         | 16/335 [00:10<03:23,  1.57it/s][A[A

  5%|▌         | 17/335 [00:10<03:22,  1.57it/s][A[A

  5%|▌         | 18/335 [00:11<03:19,  1.59it/s][A[A


Epoch [49/50], average_loss: nan




  0%|          | 1/330 [00:00<04:07,  1.33it/s][A[A

  1%|          | 2/330 [00:01<03:22,  1.62it/s][A[A

  1%|          | 3/330 [00:01<02:50,  1.92it/s][A[A

  1%|          | 4/330 [00:01<02:33,  2.12it/s][A[A

  2%|▏         | 5/330 [00:02<02:15,  2.40it/s][A[A

  2%|▏         | 6/330 [00:02<02:03,  2.62it/s][A[A

  2%|▏         | 7/330 [00:02<01:54,  2.81it/s][A[A

  2%|▏         | 8/330 [00:02<01:46,  3.03it/s][A[A

  3%|▎         | 9/330 [00:03<01:43,  3.10it/s][A[A

  3%|▎         | 10/330 [00:03<01:37,  3.27it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.30it/s][A[A

  4%|▎         | 12/330 [00:04<01:33,  3.40it/s][A[A

  4%|▍         | 13/330 [00:04<01:29,  3.54it/s][A[A

  4%|▍         | 14/330 [00:04<01:28,  3.58it/s][A[A

  5%|▍         | 15/330 [00:04<01:27,  3.59it/s][A[A

  5%|▍         | 16/330 [00:05<01:27,  3.58it/s][A[A

  5%|▌         | 17/330 [00:05<01:25,  3.65it/s][A[A

  5%|▌         | 18/330 [00:05<01:24,  3.68it/s][A[A




Starting epoch 50 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:01<05:46,  1.04s/it][A[A

  1%|          | 2/335 [00:01<05:03,  1.10it/s][A[A

  1%|          | 3/335 [00:02<04:31,  1.23it/s][A[A

  1%|          | 4/335 [00:02<04:11,  1.32it/s][A[A

  1%|▏         | 5/335 [00:03<03:55,  1.40it/s][A[A

  2%|▏         | 6/335 [00:04<03:46,  1.46it/s][A[A

  2%|▏         | 7/335 [00:04<03:40,  1.49it/s][A[A

  2%|▏         | 8/335 [00:05<03:33,  1.53it/s][A[A

  3%|▎         | 9/335 [00:05<03:29,  1.56it/s][A[A

  3%|▎         | 10/335 [00:06<03:27,  1.57it/s][A[A

  3%|▎         | 11/335 [00:07<03:23,  1.59it/s][A[A

  4%|▎         | 12/335 [00:07<03:21,  1.60it/s][A[A

  4%|▍         | 13/335 [00:08<03:19,  1.62it/s][A[A

  4%|▍         | 14/335 [00:09<03:18,  1.62it/s][A[A

  4%|▍         | 15/335 [00:09<03:17,  1.62it/s][A[A

  5%|▍         | 16/335 [00:10<03:17,  1.62it/s][A[A

  5%|▌         | 17/335 [00:10<03:14,  1.63it/s][A[A

  5%|▌         | 18/335 [00:11<03:16,  1.61it/s][A[A


Epoch [50/50], average_loss: nan




  0%|          | 1/330 [00:00<03:49,  1.44it/s][A[A

  1%|          | 2/330 [00:01<03:14,  1.69it/s][A[A

  1%|          | 3/330 [00:01<02:47,  1.95it/s][A[A

  1%|          | 4/330 [00:01<02:31,  2.15it/s][A[A

  2%|▏         | 5/330 [00:02<02:13,  2.44it/s][A[A

  2%|▏         | 6/330 [00:02<02:01,  2.67it/s][A[A

  2%|▏         | 7/330 [00:02<01:53,  2.84it/s][A[A

  2%|▏         | 8/330 [00:02<01:47,  3.01it/s][A[A

  3%|▎         | 9/330 [00:03<01:42,  3.13it/s][A[A

  3%|▎         | 10/330 [00:03<01:36,  3.32it/s][A[A

  3%|▎         | 11/330 [00:03<01:36,  3.32it/s][A[A

  4%|▎         | 12/330 [00:04<01:33,  3.42it/s][A[A

  4%|▍         | 13/330 [00:04<01:30,  3.50it/s][A[A

  4%|▍         | 14/330 [00:04<01:31,  3.44it/s][A[A

  5%|▍         | 15/330 [00:04<01:32,  3.41it/s][A[A

  5%|▍         | 16/330 [00:05<01:30,  3.48it/s][A[A

  5%|▌         | 17/330 [00:05<01:29,  3.51it/s][A[A

  5%|▌         | 18/330 [00:05<01:29,  3.50it/s][A[A




Starting epoch 51 / 50
Learning Rate for this epoch: 1e-05




  0%|          | 1/335 [00:00<05:19,  1.05it/s][A[A

  1%|          | 2/335 [00:01<04:51,  1.14it/s][A[A

  1%|          | 3/335 [00:02<04:23,  1.26it/s][A[A

  1%|          | 4/335 [00:02<04:04,  1.35it/s][A[A

  1%|▏         | 5/335 [00:03<03:48,  1.44it/s][A[A

  2%|▏         | 6/335 [00:04<03:41,  1.49it/s][A[A

  2%|▏         | 7/335 [00:04<03:36,  1.52it/s][A[A

  2%|▏         | 8/335 [00:05<03:29,  1.56it/s][A[A

  3%|▎         | 9/335 [00:05<03:27,  1.57it/s][A[A

  3%|▎         | 10/335 [00:06<03:23,  1.60it/s][A[A

  3%|▎         | 11/335 [00:07<03:20,  1.61it/s][A[A

  4%|▎         | 12/335 [00:07<03:23,  1.58it/s][A[A

  4%|▍         | 13/335 [00:08<03:20,  1.61it/s][A[A

  4%|▍         | 14/335 [00:08<03:18,  1.62it/s][A[A

  4%|▍         | 15/335 [00:09<03:18,  1.61it/s][A[A

  5%|▍         | 16/335 [00:10<03:18,  1.61it/s][A[A

  5%|▌         | 17/335 [00:10<03:16,  1.62it/s][A[A

  5%|▌         | 18/335 [00:11<03:17,  1.60it/s][A[A


Epoch [51/50], average_loss: nan




  0%|          | 1/330 [00:00<03:39,  1.50it/s][A[A

  1%|          | 2/330 [00:00<03:04,  1.77it/s][A[A

  1%|          | 3/330 [00:01<02:38,  2.07it/s][A[A

  1%|          | 4/330 [00:01<02:24,  2.25it/s][A[A

  2%|▏         | 5/330 [00:01<02:08,  2.54it/s][A[A

  2%|▏         | 6/330 [00:02<01:53,  2.84it/s][A[A

  2%|▏         | 7/330 [00:02<01:45,  3.07it/s][A[A

  2%|▏         | 8/330 [00:02<01:39,  3.22it/s][A[A

  3%|▎         | 9/330 [00:02<01:37,  3.30it/s][A[A

  3%|▎         | 10/330 [00:03<01:32,  3.45it/s][A[A

  3%|▎         | 11/330 [00:03<01:32,  3.43it/s][A[A

  4%|▎         | 12/330 [00:03<01:30,  3.50it/s][A[A

  4%|▍         | 13/330 [00:04<01:27,  3.61it/s][A[A

  4%|▍         | 14/330 [00:04<01:28,  3.58it/s][A[A

  5%|▍         | 15/330 [00:04<01:30,  3.48it/s][A[A

  5%|▍         | 16/330 [00:04<01:30,  3.49it/s][A[A

  5%|▌         | 17/330 [00:05<01:28,  3.54it/s][A[A

  5%|▌         | 18/330 [00:05<01:27,  3.55it/s][A[A


# View example predictions

Now, take a glance at how the detector works:

In [None]:
net.eval()
net.load_state_dict(torch.load('best_detector.pth'))
# select random image from train set
image_name = random.choice(train_dataset.fnames)
image = cv2.imread(os.path.join(file_root_train, image_name))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
threshold = 0.1
print('predicting...')
print(image.shape)
result = predict_image(net, image_name, root_img_directory=file_root_train, threshold=threshold)
for left_up, right_bottom, class_name, _, prob in result:
    color = COLORS[VOC_CLASSES.index(class_name)]
    cv2.rectangle(image, left_up, right_bottom, color, 2)
    label = class_name + str(round(prob, 2))
    text_size, baseline = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
    p1 = (left_up[0], left_up[1] - text_size[1])
    cv2.rectangle(image, (p1[0] - 2 // 2, p1[1] - 2 - baseline), (p1[0] + text_size[0], p1[1] + text_size[1]),
                  color, -1)
    cv2.putText(image, label, (p1[0], p1[1] + baseline), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255, 255, 255), 1, 8)

plt.figure(figsize = (15,15))
plt.imshow(image)

## Evaluate on Test [20 pts]

To evaluate detection results we use mAP (mean of average precision over each class), You are expected to get an map of at least 49.

In [None]:
from eval_voc import evaluate
from predict import predict_image   #actually, we didn't modify predict, but who knows....
test_aps = evaluate(net, test_dataset_file=annotation_file_test, threshold=threshold)