# Faster RCNN - Entrenamientos

El entrenamiento de faster se realizo únicamente con el dataset 3 debido a que ya habíamos identificado el problema del etiquetado durante el entrenamiento de las otras arquitecturas.

## Descarga del dataset

A continuación se procede a descargar el dataset

In [6]:
import os
import gdown

datasets_parent_dir = "datasets"
os.makedirs(datasets_parent_dir, exist_ok=True)

dataset_dir = os.path.join(datasets_parent_dir, "dataset_coco_3")
os.makedirs(dataset_dir, exist_ok=True)

# Definir la ruta para el archivo ZIP
destination = os.path.join(dataset_dir, 'dataset_coco_3.zip')

# Descargar el archivo desde Google Drive
gdown.download(id="19uIFLXV7ku0e7IAqKzmOAei9PUqfFdPg", output=destination, quiet=False)

# Descomprimir el archivo ZIP en la carpeta del dataset
!unzip "{destination}" -d "{dataset_dir}/"

# Eliminar el archivo ZIP después de descomprimir
os.remove(destination)

Downloading...
From (original): https://drive.google.com/uc?id=19uIFLXV7ku0e7IAqKzmOAei9PUqfFdPg
From (redirected): https://drive.google.com/uc?id=19uIFLXV7ku0e7IAqKzmOAei9PUqfFdPg&confirm=t&uuid=3c7526a6-e5b3-4db1-a49b-2dff98143f30
To: /home/isidro-marco/proyectos/CEIA/materias/bimestre4/vc2/vpc-II/datasets/dataset_coco_3/dataset_coco_3.zip
100%|██████████| 293M/293M [00:20<00:00, 14.4MB/s] 


Archive:  datasets/dataset_coco_3/dataset_coco_3.zip
   creating: datasets/dataset_coco_3/test/
  inflating: datasets/dataset_coco_3/test/train_121_png_jpg.rf.a7ca1cb50a79073ac64e5508f9b476d5.jpg  
  inflating: datasets/dataset_coco_3/test/train_124_png_jpg.rf.5b014b1b3cda024d60959bd7c4dd127a.jpg  
  inflating: datasets/dataset_coco_3/test/train_130_png_jpg.rf.3c1ddb33a79f32a6103d6f51b0420fa2.jpg  
  inflating: datasets/dataset_coco_3/test/train_13_png_jpg.rf.9d96cc7ec5368e3a0ca194de15ba184f.jpg  
  inflating: datasets/dataset_coco_3/test/train_142_png_jpg.rf.cc7b40022c65b9180d4bca8957d2979b.jpg  
  inflating: datasets/dataset_coco_3/test/train_148_png_jpg.rf.1554ad2bd6cb4fde6e0eea4b04acd442.jpg  
  inflating: datasets/dataset_coco_3/test/train_151_png_jpg.rf.0ab95d8ea76584566a8593eca074321a.jpg  
  inflating: datasets/dataset_coco_3/test/train_158_png_jpg.rf.bbe8ff78b3f8279699e11513d5198277.jpg  
  inflating: datasets/dataset_coco_3/test/train_166_png_jpg.rf.f1b97a64169a5ee209291ab876

## Funciones para la creación de los dataloaders

In [1]:
import torch, os, json
from torch.utils.data import Dataset
import numpy as np
from collections import defaultdict
from torchvision import ops
import matplotlib.patches as patches
from torchvision import transforms as T
from PIL import Image
import cv2
import copy
from pycocotools.cocoeval import COCOeval

In [2]:
# the dataset class
class CocoDataset(Dataset):
    def __init__(self, image_folder, annotations_file, width, height, transforms=None):

        self.transforms = transforms
        self.image_folder = image_folder
        self.annotations_file = annotations_file
        self.height = height
        self.width = width

        if not isinstance(self.image_folder, str):
            raise ValueError("image_folder should be a string")

        if not isinstance(annotations_file, str):
            raise ValueError("annotations_file should be a string")

        self.annotations_file = annotations_file
        self.image_folder = image_folder
        self.width = width
        self.height = height

        with open(annotations_file, "r") as f:
            self.annotations = json.load(f)

        self.image_ids = defaultdict(list)
        for i in self.annotations["images"]:
            self.image_ids[i["id"]] = i  # key = image_id

        self.annotation_ids = defaultdict(list)
        for i in self.annotations["annotations"]:
            self.annotation_ids[i["image_id"]].append(i)  # key = image_id

        self.cats_id2label = {}
        self.label_names = []

        first_label_id = self.annotations["categories"][0]["id"]
        if first_label_id == 0:
            for i in self.annotations["categories"][1:]:
                self.cats_id2label[i["id"]] = i["name"]
                self.label_names.append(i["name"])
        if first_label_id == 1:
            for i in self.annotations["categories"]:
                self.cats_id2label[i["id"]] = i["name"]
                self.label_names.append(i["name"])
        if first_label_id > 1:
            raise AssertionError(
                "Something went wrong in categories, check the annotation file!"
            )

    def get_total_classes_count(self):
        return len(self.cats_id2label)

    def get_classnames(self):
        return [v for k, v in self.cats_id2label.items()]

    def load_images_annotations(self, index):
        image_info = self.image_ids[index]
        image_path = os.path.join(self.image_folder, image_info["file_name"])

        image = cv2.imread(image_path)
        rimage = cv2.cvtColor(
            image, cv2.COLOR_BGR2RGB
        )  # .astype(np.float32) # convert BGR to RGB color format
        rimage = cv2.resize(rimage, (self.width, self.height))
        # rimage /= 255.0
        rimage = Image.fromarray(rimage)

        image_height, image_width = (
            image_info["height"],
            image_info["width"],
        )  # original height & width
        anno_info = self.annotation_ids[index]

        if len(anno_info) == 0:  # for negative images (Images without annotations)
            boxes = torch.zeros((0, 4), dtype=torch.float32)
            labels = torch.zeros((0, 1), dtype=torch.int64)
            iscrowd = torch.zeros((boxes.shape[0],), dtype=torch.int64)
            area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        else:
            boxes = []
            labels_id = []

            for ainfo in anno_info:
                xmin, ymin, w, h = ainfo["bbox"]
                xmax, ymax = xmin + w, ymin + h

                xmin_final = (xmin / image_width) * self.width
                xmax_final = (xmax / image_width) * self.width
                ymin_final = (ymin / image_height) * self.height
                ymax_final = (ymax / image_height) * self.height

                category_id = ainfo["category_id"]

                boxes.append([xmin_final, ymin_final, xmax_final, ymax_final])
                labels_id.append(category_id)

            boxes = torch.as_tensor(
                boxes, dtype=torch.float32
            )  # bounding box to tensor
            area = (boxes[:, 3] - boxes[:, 1]) * (
                boxes[:, 2] - boxes[:, 0]
            )  # area of the bounding boxes
            iscrowd = torch.zeros(
                (boxes.shape[0],), dtype=torch.int64
            )  # no crowd instances
            labels = torch.as_tensor(labels_id, dtype=torch.int64)  # labels to tensor

        # final `target` dictionary
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["area"] = area
        target["iscrowd"] = iscrowd
        image_id = torch.tensor([index])
        target["image_id"] = image_id

        return {
            "image": rimage,
            "height": image_height,
            "width": image_width,
            "target": target,
        }
    
    @staticmethod
    def transform_image_for_inference(image_path,width,height):
        
        image = cv2.imread(image_path)
        ori_h, ori_w, _ = image.shape
        
        oimage = copy.deepcopy(image)
        oimage = Image.fromarray(oimage)
        oimage = T.ToTensor()(oimage)
        
        rimage = cv2.cvtColor(
            image, cv2.COLOR_BGR2RGB
        )
        rimage = cv2.resize(rimage, (width,height))
        rimage = Image.fromarray(rimage)
        rimage = T.ToTensor()(rimage)
        # rimage = torch.unsqueeze(rimage, 0)
        
        transform_info = {'original_width':ori_w,
                          'original_height':ori_h,
                          'resized_width':width,
                          'resized_height':height,
                          'resized_image':rimage,
                          'original_image':oimage}
        
        return transform_info # this can directly go to model for inference

    @staticmethod
    def display_bbox(
        bboxes, fig, ax, classes=None, in_format="xyxy", color="y", line_width=3
    ):
        if type(bboxes) == np.ndarray:
            bboxes = torch.from_numpy(bboxes)
        if classes:
            assert len(bboxes) == len(classes)
        # convert boxes to xywh format
        bboxes = ops.box_convert(bboxes, in_fmt=in_format, out_fmt="xywh")
        c = 0
        for box in bboxes:
            x, y, w, h = box.numpy()
            # display bounding box
            rect = patches.Rectangle(
                (x, y), w, h, linewidth=line_width, edgecolor=color, facecolor="none"
            )
            ax.add_patch(rect)
            # display category
            if classes:
                if classes[c] == "pad":
                    continue
                ax.text(
                    x + 5, y + 20, classes[c], bbox=dict(facecolor="yellow", alpha=0.5)
                )
            c += 1

        return fig, ax

    def __getitem__(self, idx):

        sample = self.load_images_annotations(idx)
        image_resized = sample["image"]
        target = sample["target"]

        # apply the image transforms
        if self.transforms:
            sample = self.transforms(
                image=image_resized, bboxes=target["boxes"], labels=sample["labels"]
            )
            image_resized = sample["image"]
            target["boxes"] = torch.Tensor(sample["bboxes"])

        return T.ToTensor()(image_resized), target

    def __len__(self):
        return len(self.image_ids)

Cabe destacar que se utilizo una resolución 640x640 debido a limitaciones con la vRAM del equipo donde se realizo el entrenamiento. Sin embargo podría haber sido una buena opción aumentar la resolución.

In [3]:
def get_datasets(train_image_dir:str,
                 val_image_dir:str,
                 test_image_dir:str,
                 train_coco_json:str,
                 val_coco_json:str,
                 test_coco_json:str):

    train_ds = CocoDataset(
        image_folder=train_image_dir,
        annotations_file=train_coco_json,
        height=640,
        width=640,
    )

    val_ds = CocoDataset(
        image_folder=val_image_dir,
        annotations_file=val_coco_json,
        height=640,
        width=640,
    )
    
    test_ds = CocoDataset(
        image_folder=test_image_dir,
        annotations_file=test_coco_json,
        height=640,
        width=640,
    ) 

    return train_ds, val_ds, test_ds


## Funciones para el entrenamiento del modelo

In [4]:
import time
from torch.utils.tensorboard import SummaryWriter
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import (
    FasterRCNN_ResNet50_FPN_Weights,
    FasterRCNN_ResNet50_FPN_V2_Weights,
)
from tqdm import tqdm
import gc
from pycocotools.coco import COCO
import matplotlib as plt

Se utilizo el modelo base de faster rcnn disponible en torchvision con la utilización de los pesos preentrenados

In [5]:
def create_model(num_classes,checkpoint=None,device='cpu'):
    """
    Create a model for object detection using the Faster R-CNN architecture.

    Parameters:
    - num_classes (int): The number of classes for object detection. (Total classes + 1 [for background class])
    - checkpoint (str) : checkpoint path for the pretrained custom model
    - device (str) : cpu / cuda
    Returns:
    - model (torchvision.models.detection.fasterrcnn_resnet50_fpn): The created model for object detection.
    """
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
        pretrained=True,
        weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT,
        pretrained_backbone=True,
        #weights_backbone = 'ResNet50_Weights.DEFAULT',
    )
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    if checkpoint:
        checkpoint = torch.load(checkpoint, map_location=device)
        model.load_state_dict(checkpoint['model_state_dict'])
    else:
        model = model.to(device)
    return model

In [6]:
class InferFasterRCNN:
    def __init__(self, num_classes=None, classnames=[]):

        assert type(num_classes) != type(None), "Define number of classes"

        self.num_classes = num_classes  # total_class_no + 1 (for background)

        self.classnames = ["__background__"]
        self.classnames.extend(classnames)

        self.colors = np.random.uniform(0, 255, size=(len(self.classnames), 3))

        assert (
            len(self.classnames) == self.num_classes
        ), f"num_classes: {self.num_classes}, len(classnames): {len(self.classnames)}.\
            num_classes should be equal to count of actual classes in classnames list without background + 1"

    def load_model(self, checkpoint, device="cpu"):
        self.device = device
        self.model = create_model(
            self.num_classes, checkpoint=checkpoint, device=self.device
        )
        self.model = self.model.eval()

    def infer_image(self, transform_info ,detection_threshold=0.5, visualize=False):
        
        '''
        image : original unscaled image
        '''
        
        display_unscaled = True
        h_ratio = transform_info['original_height']/transform_info['resized_height']
        w_ratio = transform_info['original_width']/transform_info['resized_width']
        
        orig_image = transform_info['resized_image']
        orig_image = orig_image.cpu().numpy()
        orig_image = np.transpose(orig_image, (1, 2, 0))
        orig_image = np.ascontiguousarray(orig_image, dtype=np.float32)
        image = torch.unsqueeze(transform_info['resized_image'], 0)

        with torch.no_grad():
            self.model = self.model.to(self.device)
            outputs = self.model(image.to(self.device))

        # load all detection to CPU for further operations
        outputs = [{k: v.to("cpu") for k, v in t.items()} for t in outputs]

        results = {}
        _f_boxes,_f_scores,_f_labels = [],[],[]
        
        # carry further only if there are detected boxes
        if len(outputs[0]["boxes"]) != 0:
            boxes = outputs[0]["boxes"].data.numpy() # xyxy
            scores = outputs[0]["scores"].data.numpy()
            labels = outputs[0]["labels"].cpu().numpy()
            
            # filter out boxes according to `detection_threshold`
            for i in range(len(boxes)):
                if scores[i] >= detection_threshold:
                    _f_boxes.append(boxes[i])
                    _f_labels.append(labels[i])
                    _f_scores.append(scores[i])
            
            boxes,labels,scores = _f_boxes,_f_labels,_f_scores
            #boxes = boxes[scores >= detection_threshold].astype(np.int32)
            draw_boxes = boxes.copy()
            
            # get all the predicited class names
            pred_classes = [
                self.classnames[i] for i in labels
            ]
            
            results['unscaled_boxes'] = [[i[0]*w_ratio, i[1]*h_ratio, i[2]*w_ratio, i[3]*h_ratio] for i in boxes] # in original image size
            results['scaled_boxes'] = boxes # in resize image size
            results['scores'] = scores
            results['pred_classes'] = pred_classes
            results['labels'] = labels

            if not display_unscaled:
                # draw the bounding boxes and write the class name on top of it
                for j, box in enumerate(draw_boxes):
                    class_name = pred_classes[j]
                    color = self.colors[self.classnames.index(class_name)]
                    cv2.rectangle(
                        orig_image,
                        (int(box[0]), int(box[1])),
                        (int(box[2]), int(box[3])),
                        color,
                        2,
                    )
                    cv2.putText(
                        orig_image,
                        class_name,
                        (int(box[0]), int(box[1] - 5)),
                        cv2.FONT_HERSHEY_SIMPLEX,
                        0.7,
                        color,
                        2,
                        lineType=cv2.LINE_AA,
                    )

                if visualize:
                    plt.figure(figsize=(10, 10))
                    plt.imshow(orig_image[:,:,::-1])
                    plt.show()
            
            else:
                # draw the bounding boxes and write the class name on top of it
                draw_boxes_scaled = results['unscaled_boxes']
                scaled_orig_image = transform_info['original_image']
                scaled_orig_image = scaled_orig_image.cpu().numpy()
                scaled_orig_image = np.transpose(scaled_orig_image, (1, 2, 0))
                scaled_orig_image = np.ascontiguousarray(scaled_orig_image, dtype=np.float32)
                
                for j, box in enumerate(draw_boxes_scaled):
                    class_name = pred_classes[j]
                    color = self.colors[self.classnames.index(class_name)]
                    cv2.rectangle(
                        scaled_orig_image,
                        (int(box[0]), int(box[1])),
                        (int(box[2]), int(box[3])),
                        color,
                        2,
                    )
                    cv2.putText(
                        scaled_orig_image,
                        class_name,
                        (int(box[0]), int(box[1] - 5)),
                        cv2.FONT_HERSHEY_SIMPLEX,
                        0.7,
                        color,
                        2,
                        lineType=cv2.LINE_AA,
                    )

                if visualize:
                    plt.figure(figsize=(10, 10))
                    plt.imshow(scaled_orig_image)  # [:,:,::-1])
                    plt.show()

        return results

    def infer_image_path(self, image_path, detection_threshold=0.5, visualize=False):

        image = cv2.imread(image_path)
        orig_image = image.copy()

        # BGR to RGB
        image = cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB).astype(np.float32)
        # make the pixel range between 0 and 1
        image /= 255.0
        # bring color channels to front
        image = np.transpose(image, (2, 0, 1)).astype(np.float32)
        # convert to tensor
        image = torch.tensor(image, dtype=torch.float).cpu()

        # add batch dimension
        image = torch.unsqueeze(image, 0)
        with torch.no_grad():
            self.model = self.model.to(self.device)
            outputs = self.model(image.to(self.device))

        # load all detection to CPU for further operations
        outputs = [{k: v.to("cpu") for k, v in t.items()} for t in outputs]

        
        # carry further only if there are detected boxes
        if len(outputs[0]["boxes"]) != 0:
            boxes = outputs[0]["boxes"].data.numpy()
            scores = outputs[0]["scores"].data.numpy()

            # filter out boxes according to `detection_threshold`
            boxes = boxes[scores >= detection_threshold].astype(np.int32)
            draw_boxes = boxes.copy()

            # get all the predicited class names
            pred_classes = [
                self.classnames[i] for i in outputs[0]["labels"].cpu().numpy()
            ]

            # draw the bounding boxes and write the class name on top of it
            for j, box in enumerate(draw_boxes):
                class_name = pred_classes[j]
                color = self.colors[self.classnames.index(class_name)]
                cv2.rectangle(
                    orig_image,
                    (int(box[0]), int(box[1])),
                    (int(box[2]), int(box[3])),
                    color,
                    2,
                )
                cv2.putText(
                    orig_image,
                    class_name,
                    (int(box[0]), int(box[1] - 5)),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.7,
                    color,
                    2,
                    lineType=cv2.LINE_AA,
                )

            if visualize:
                plt.figure(figsize=(10, 10))
                plt.imshow(orig_image[:, :, ::-1])
                plt.show()

        return outputs
    
    
    def draw_bounding_boxes(self,image, bboxes, class_labels, figsize=(12,12)):
        class_labels = class_labels.cpu().numpy()
        bboxes = bboxes.cpu().numpy()
        for j, box in enumerate(bboxes):
            label = class_labels[j]
            color = self.colors[label]
            cv2.rectangle(
                    image,
                    (int(box[0]), int(box[1])),
                    (int(box[2]), int(box[3])),
                    (0,200,0),
                    1,
                )
            cv2.putText(
                image,
                self.classnames[int(label)],
                (int(box[0] + 15), int(box[1] + 15)),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.5,
                (0,200,0),
                2,
                lineType=cv2.LINE_AA,
            )
        
        plt.figure(figsize=figsize)
        plt.imshow(image)
        plt.show()


In [7]:
class SaveBestModel:
    """
    Class to save the best model while training. If the current epoch's 
    validation loss is less than the previous least less, then save the
    model state.
    """
    def __init__(
        self, best_valid_loss=float('inf'), output_dir = 'weight_outputs',
    ):
        self.best_valid_loss = best_valid_loss
    
        os.makedirs(output_dir,exist_ok=True)
        
        self.output_dir = output_dir
        
    def __call__(
        self, current_valid_loss, 
        epoch, model, optimizer
    ):
        self.model_save_path = f'{self.output_dir}/best_model.pth'
        if current_valid_loss < self.best_valid_loss:
            self.best_valid_loss = current_valid_loss
            print(f"\nBest validation loss: {self.best_valid_loss}")
            print(f"\nSaving best model for epoch: {epoch+1}\n")
            torch.save({
                'epoch': epoch+1,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                }, self.model_save_path)

In [8]:
@torch.inference_mode() #desactivar el cálculo de gradientes, optimizaciones y la creación de gráficos computacionales
def val_one_epoch(model, val_dl, writer, epoch_no, total_epoch, device, log=True):
    with tqdm(val_dl, unit="batch") as tepoch:
        epoch_loss = 0
        _classifier_loss = 0
        _loss_box_reg = 0
        _loss_rpn_box_reg = 0
        _loss_objectness = 0
        for data in tepoch:
            tepoch.set_description(f"Val:Epoch {epoch_no}/{total_epoch}")
            imgs = []
            targets = []
            for d in data:
                imgs.append(d[0].to(device))
                targ = {}
                targ["boxes"] = d[1]["boxes"].to(device)
                targ["labels"] = d[1]["labels"].to(device)
                targets.append(targ)
            loss_dict = model(imgs, targets)

            loss = sum(v for v in loss_dict.values())
            classifier_loss = loss_dict.get("loss_classifier").cpu().detach().numpy()
            loss_box_reg = loss_dict.get("loss_box_reg").cpu().detach().numpy()
            loss_objectness = loss_dict.get("loss_objectness").cpu().detach().numpy()
            loss_rpn_box_reg = loss_dict.get("loss_rpn_box_reg").cpu().detach().numpy()

            epoch_loss += loss.cpu().detach().numpy()
            _classifier_loss += classifier_loss
            _loss_box_reg += loss_box_reg
            _loss_objectness += loss_objectness
            _loss_rpn_box_reg += loss_rpn_box_reg

            tepoch.set_postfix(
                total_loss=epoch_loss,
                loss_classifier=_classifier_loss,
                boxreg_loss=_loss_box_reg,
                obj_loss=_loss_objectness,
                rpn_boxreg_loss=_loss_rpn_box_reg,
            )

        if log:
            writer.add_scalar("Val/total_loss", epoch_loss, epoch_no)
            writer.add_scalar("Val/classifier_loss", _classifier_loss, epoch_no)
            writer.add_scalar("Val/box_reg_loss", _loss_box_reg, epoch_no)
            writer.add_scalar("Val/objectness_loss", _loss_objectness, epoch_no)
            writer.add_scalar("Val/rpn_box_reg_loss", _loss_rpn_box_reg, epoch_no)

    return writer, epoch_loss

def train_one_epoch(model, train_dl, optimizer, writer, epoch_no, total_epoch, device):
    with tqdm(train_dl, unit="batch") as tepoch:
        epoch_loss = 0
        _classifier_loss = 0
        _loss_box_reg = 0
        _loss_rpn_box_reg = 0
        _loss_objectness = 0
        for data in tepoch:
            tepoch.set_description(f"Train:Epoch {epoch_no}/{total_epoch}")
            imgs = []
            targets = []
            for d in data:
                imgs.append(d[0].to(device))
                targ = {}
                targ["boxes"] = d[1]["boxes"].to(device)
                targ["labels"] = d[1]["labels"].to(device)
                targets.append(targ)
            loss_dict = model(imgs, targets)

            loss = sum(v for v in loss_dict.values())
            classifier_loss = loss_dict.get("loss_classifier").cpu().detach().numpy()
            loss_box_reg = loss_dict.get("loss_box_reg").cpu().detach().numpy()
            loss_objectness = loss_dict.get("loss_objectness").cpu().detach().numpy()
            loss_rpn_box_reg = loss_dict.get("loss_rpn_box_reg").cpu().detach().numpy()

            epoch_loss += loss.cpu().detach().numpy()
            _classifier_loss += classifier_loss
            _loss_box_reg += loss_box_reg
            _loss_objectness += loss_objectness
            _loss_rpn_box_reg += loss_rpn_box_reg

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            tepoch.set_postfix(
                total_loss=epoch_loss,
                loss_classifier=_classifier_loss,
                boxreg_loss=_loss_box_reg,
                obj_loss=_loss_objectness,
                rpn_boxreg_loss=_loss_rpn_box_reg,
            )

        writer.add_scalar("Train/total_loss", epoch_loss, epoch_no)
        writer.add_scalar("Train/classifier_loss", _classifier_loss, epoch_no)
        writer.add_scalar("Train/box_reg_loss", _loss_box_reg, epoch_no)
        writer.add_scalar("Train/objectness_loss", _loss_objectness, epoch_no)
        writer.add_scalar("Train/rpn_box_reg_loss", _loss_rpn_box_reg, epoch_no)

    return model, optimizer, writer, epoch_loss

In [9]:
def save_json(data, file_path):
    def convert(o):
        if isinstance(o, np.float32):
            return float(o)
        raise TypeError(f'Object of type {o.__class__.__name__} is not JSON serializable')

    with open(file_path, 'w') as file:
        json.dump(data, file, default=convert)

def evaluate_model(image_dir,
                   gt_ann_file,
                   model_weight):
    
    _ds = CocoDataset(
            image_folder=image_dir,
            annotations_file=gt_ann_file,
            height=640,
            width=640,
        )
    
    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

    IF_C = InferFasterRCNN(num_classes=_ds.get_total_classes_count() + 1,
                        classnames=_ds.get_classnames())

    IF_C.load_model(checkpoint=model_weight,
                    device=device)

    image_dir = image_dir

    cocoGt=COCO(annotation_file=gt_ann_file)
    imgIds = cocoGt.getImgIds() # all image ids

    res_id = 1
    res_all = []
        
    for id in tqdm(imgIds,total=len(imgIds)):
        id = id
        img_info = cocoGt.loadImgs(imgIds[id])[0]
        annIds = cocoGt.getAnnIds(imgIds=img_info['id'])
        ann_info = cocoGt.loadAnns(annIds)
        image_path = os.path.join(image_dir, 
                                img_info['file_name'])
        transform_info = CocoDataset.transform_image_for_inference(image_path,width=640,height=640)
        result = IF_C.infer_image(transform_info=transform_info,
                                visualize=False)

        if len(result)>0:
            pred_boxes_xyxy = result['unscaled_boxes']
            pred_boxes_xywh = [[i[0],i[1],i[2]-i[0],i[3]-i[1]] for i in pred_boxes_xyxy]
            pred_classes = result['pred_classes']
            pred_scores = result['scores']
            pred_labels = result['labels']

            for i in range(len(pred_boxes_xywh)):
                res_temp = {"id":res_id,
                            "image_id":id,
                            "bbox":pred_boxes_xywh[i],
                            "segmentation":[],
                            "iscrowd": 0,
                            "category_id": int(pred_labels[i]),
                            "area":pred_boxes_xywh[i][2]*pred_boxes_xywh[i][3],
                            "score": float(pred_scores[i])}
                res_all.append(res_temp)
                res_id+=1

    save_json_path = 'test_dect.json'
    save_json(res_all,save_json_path)
    
    cocoGt=COCO(gt_ann_file)
    cocoDt=cocoGt.loadRes(save_json_path)

    cocoEval = COCOeval(cocoGt,cocoDt,iouType='bbox')
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()
    
    AP_50_95 = cocoEval.stats.tolist()[0]
    AP_50 = cocoEval.stats.tolist()[1]
    
    del IF_C,_ds
    os.remove(save_json_path)
    
    torch.cuda.empty_cache()
    gc.collect()
    
    return {'AP_50_95':AP_50_95,
            'AP_50':AP_50}

In [10]:
def train(
    train_dataset,
    val_dataset,
    epochs=2,
    batch_size=8,
    exp_folder="exp",
    val_eval_freq=1,
):

    date_format = "%d-%m-%Y-%H-%M-%S"
    date_string = time.strftime(date_format)

    exp_folder = os.path.join("exp", "summary", date_string)
    writer = SummaryWriter(exp_folder)

    def custom_collate(data):
        return data

    train_dl = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        collate_fn=custom_collate,
        pin_memory=True if torch.cuda.is_available() else False,
    )

    val_dl = torch.utils.data.DataLoader(
        val_dataset,
        batch_size=batch_size,
        shuffle=False,
        collate_fn=custom_collate,
        pin_memory=True if torch.cuda.is_available() else False,
    )

    # Device --
    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

    # Model --
    faster_rcnn_model = create_model(train_dataset.get_total_classes_count() + 1)
    faster_rcnn_model = faster_rcnn_model.to(device)

    # Optimizer --
    pg0, pg1, pg2 = [], [], []  # optimizer parameter groups

    for k, v in faster_rcnn_model.named_modules():
        if hasattr(v, "bias") and isinstance(v.bias, torch.nn.Parameter):
            pg2.append(v.bias)  # biases
        if isinstance(v, torch.nn.BatchNorm2d) or "bn" in k:
            pg0.append(v.weight)  # no decay
        elif hasattr(v, "weight") and isinstance(v.weight, torch.nn.Parameter):
            pg1.append(v.weight)  # apply decay

    optimizer = torch.optim.SGD(
        pg0, lr=0.001, momentum=0.9, nesterov=True
    ) # BN
    
    optimizer.add_param_group(
        {"params": pg1, "weight_decay":  5e-4}
    )  # add pg1 with weight_decay # Weights
    
    optimizer.add_param_group({"params": pg2}) # Biases
    

    num_epochs = epochs
    save_best_model = SaveBestModel(output_dir=exp_folder)

    for epoch in range(num_epochs):

        faster_rcnn_model, optimizer, writer, epoch_loss = train_one_epoch(
            faster_rcnn_model,
            train_dl,
            optimizer,
            writer,
            epoch + 1,
            num_epochs,
            device,
        )
        
        time.sleep(0.1)

        if (epoch % val_eval_freq == 0) and epoch != 0:  # Do evaluation of validation set
            eval_result = evaluate_model(image_dir=val_dataset.image_folder,
                                         gt_ann_file=val_dataset.annotations_file,
                                         model_weight=save_best_model.model_save_path)
            
            time.sleep(0.1)
            
            writer.add_scalar("Val/AP_50_95", eval_result['AP_50_95'], epoch + 1)
            writer.add_scalar("Val/AP_50", eval_result['AP_50'], epoch + 1)
        
        else:
            writer, val_epoch_loss = val_one_epoch(
                faster_rcnn_model,
                val_dl,
                writer,
                epoch + 1,
                num_epochs,
                device,
                log=True,
            )
        
            time.sleep(0.1)
            
            save_best_model(val_epoch_loss, 
                            epoch, 
                            faster_rcnn_model, 
                            optimizer)
            

    _, _ = val_one_epoch(
        faster_rcnn_model, val_dl, writer, epoch + 1, num_epochs, device, log=False
    )

    writer.add_hparams(
        {"epochs": epochs, "batch_size": batch_size},
        {"Train/total_loss": epoch_loss, "Val/total_loss": val_epoch_loss},
    )

## Entrenamiento - dataset 3 - tran 1

In [11]:
train_image_dir = r"/home/isidro-marco/proyectos/CEIA/materias/bimestre4/vc2/vpc-II/datasets/dataset_coco_3/train"
val_image_dir = r"/home/isidro-marco/proyectos/CEIA/materias/bimestre4/vc2/vpc-II/datasets/dataset_coco_3/valid"
test_image_dir = r"/home/isidro-marco/proyectos/CEIA/materias/bimestre4/vc2/vpc-II/datasets/dataset_coco_3/test"
train_coco_json = r"/home/isidro-marco/proyectos/CEIA/materias/bimestre4/vc2/vpc-II/datasets/dataset_coco_3/train/_annotations.coco.json"
val_coco_json = r"/home/isidro-marco/proyectos/CEIA/materias/bimestre4/vc2/vpc-II/datasets/dataset_coco_3/valid/_annotations.coco.json"
test_coco_json = r"/home/isidro-marco/proyectos/CEIA/materias/bimestre4/vc2/vpc-II/datasets/dataset_coco_3/test/_annotations.coco.json"

epochs = 15
batch_size = 4
val_eval_freq = 5
exp_folder = 'results/faster_rcnn_results'

In [12]:
train_ds, val_ds, test_ds = get_datasets(train_image_dir=train_image_dir,
                                    train_coco_json=train_coco_json,
                                    val_image_dir=val_image_dir,
                                    val_coco_json=val_coco_json,
                                    test_image_dir=test_image_dir,
                                    test_coco_json=test_coco_json)

In [13]:
train(train_ds, val_ds,
          epochs=epochs, 
          batch_size=batch_size,
          val_eval_freq=val_eval_freq,
          exp_folder=exp_folder)

Train:Epoch 1/15: 100%|██████████| 423/423 [08:47<00:00,  1.25s/batch, boxreg_loss=119, loss_classifier=105, obj_loss=17.5, rpn_boxreg_loss=11.5, total_loss=253]     
Val:Epoch 1/15: 100%|██████████| 41/41 [00:41<00:00,  1.00s/batch, boxreg_loss=9.75, loss_classifier=7.22, obj_loss=1.11, rpn_boxreg_loss=0.926, total_loss=19]       



Best validation loss: 19.003620147705078

Saving best model for epoch: 1



Train:Epoch 2/15: 100%|██████████| 423/423 [14:16<00:00,  2.03s/batch, boxreg_loss=96.1, loss_classifier=74.1, obj_loss=7.44, rpn_boxreg_loss=8.55, total_loss=186]      
Val:Epoch 2/15: 100%|██████████| 41/41 [00:40<00:00,  1.00batch/s, boxreg_loss=9.67, loss_classifier=6.79, obj_loss=0.945, rpn_boxreg_loss=0.939, total_loss=18.3]    



Best validation loss: 18.33844566345215

Saving best model for epoch: 2



Train:Epoch 3/15: 100%|██████████| 423/423 [06:19<00:00,  1.11batch/s, boxreg_loss=90.1, loss_classifier=66.2, obj_loss=5.41, rpn_boxreg_loss=7.69, total_loss=169]       
Val:Epoch 3/15: 100%|██████████| 41/41 [00:12<00:00,  3.28batch/s, boxreg_loss=9.57, loss_classifier=6.28, obj_loss=0.805, rpn_boxreg_loss=0.965, total_loss=17.6]    



Best validation loss: 17.61784553527832

Saving best model for epoch: 3



Train:Epoch 4/15: 100%|██████████| 423/423 [04:13<00:00,  1.67batch/s, boxreg_loss=87.1, loss_classifier=60.6, obj_loss=4.29, rpn_boxreg_loss=7.12, total_loss=159]     
Val:Epoch 4/15: 100%|██████████| 41/41 [00:12<00:00,  3.28batch/s, boxreg_loss=8.89, loss_classifier=6.31, obj_loss=0.928, rpn_boxreg_loss=0.884, total_loss=17]      



Best validation loss: 17.00993537902832

Saving best model for epoch: 4



Train:Epoch 5/15: 100%|██████████| 423/423 [04:13<00:00,  1.67batch/s, boxreg_loss=83.2, loss_classifier=54.8, obj_loss=3.49, rpn_boxreg_loss=6.61, total_loss=148]       
Val:Epoch 5/15: 100%|██████████| 41/41 [00:12<00:00,  3.29batch/s, boxreg_loss=8.74, loss_classifier=6.29, obj_loss=0.793, rpn_boxreg_loss=0.861, total_loss=16.7]    



Best validation loss: 16.679475784301758

Saving best model for epoch: 5



Train:Epoch 6/15: 100%|██████████| 423/423 [12:34<00:00,  1.78s/batch, boxreg_loss=79.5, loss_classifier=49.7, obj_loss=2.95, rpn_boxreg_loss=6.34, total_loss=139]       
  checkpoint = torch.load(checkpoint, map_location=device)


loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


100%|██████████| 161/161 [00:43<00:00,  3.70it/s]


loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.13s).
Accumulating evaluation results...
DONE (t=0.03s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.603
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.339
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.226
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.380
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.152
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDet

Train:Epoch 7/15: 100%|██████████| 423/423 [04:18<00:00,  1.64batch/s, boxreg_loss=76.5, loss_classifier=45, obj_loss=2.34, rpn_boxreg_loss=6.03, total_loss=130]         
Val:Epoch 7/15: 100%|██████████| 41/41 [00:12<00:00,  3.24batch/s, boxreg_loss=9.5, loss_classifier=6.61, obj_loss=1.03, rpn_boxreg_loss=0.865, total_loss=18]        
Train:Epoch 8/15: 100%|██████████| 423/423 [05:35<00:00,  1.26batch/s, boxreg_loss=73.3, loss_classifier=40.2, obj_loss=2.04, rpn_boxreg_loss=5.7, total_loss=121]      
Val:Epoch 8/15: 100%|██████████| 41/41 [00:12<00:00,  3.21batch/s, boxreg_loss=8.66, loss_classifier=6.9, obj_loss=1.15, rpn_boxreg_loss=0.842, total_loss=17.6]      
Train:Epoch 9/15: 100%|██████████| 423/423 [08:55<00:00,  1.27s/batch, boxreg_loss=69.4, loss_classifier=35.7, obj_loss=1.72, rpn_boxreg_loss=5.38, total_loss=112]       
Val:Epoch 9/15: 100%|██████████| 41/41 [00:39<00:00,  1.04batch/s, boxreg_loss=8.95, loss_classifier=7.56, obj_loss=1.56, rpn_boxreg_loss=0.885, total_loss

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


100%|██████████| 161/161 [00:44<00:00,  3.59it/s]


loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.15s).
Accumulating evaluation results...
DONE (t=0.03s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.603
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.339
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.226
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.380
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.152
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDet

Train:Epoch 12/15: 100%|██████████| 423/423 [16:03<00:00,  2.28s/batch, boxreg_loss=59.5, loss_classifier=26.9, obj_loss=1.17, rpn_boxreg_loss=4.6, total_loss=92.1]         
Val:Epoch 12/15: 100%|██████████| 41/41 [00:41<00:00,  1.02s/batch, boxreg_loss=8.79, loss_classifier=9.17, obj_loss=2.09, rpn_boxreg_loss=0.898, total_loss=20.9]     
Train:Epoch 13/15: 100%|██████████| 423/423 [16:07<00:00,  2.29s/batch, boxreg_loss=56, loss_classifier=24.9, obj_loss=0.933, rpn_boxreg_loss=4.37, total_loss=86.3]        
Val:Epoch 13/15: 100%|██████████| 41/41 [00:41<00:00,  1.01s/batch, boxreg_loss=9.18, loss_classifier=10.3, obj_loss=2.7, rpn_boxreg_loss=0.877, total_loss=23]       
Train:Epoch 14/15: 100%|██████████| 423/423 [16:05<00:00,  2.28s/batch, boxreg_loss=52.4, loss_classifier=22.7, obj_loss=0.858, rpn_boxreg_loss=4.11, total_loss=80]       
Val:Epoch 14/15: 100%|██████████| 41/41 [00:41<00:00,  1.01s/batch, boxreg_loss=9.25, loss_classifier=10.2, obj_loss=2.78, rpn_boxreg_loss=0.857, 

In [1]:
%load_ext tensorboard
%tensorboard --logdir ../results/faster_rcnn_results