# Model 2: FASTER R-CNN[(Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks)](https://arxiv.org/abs/1506.01497)

## Imports and Setups


Installing libraries:
 - [fiftyone](https://docs.voxel51.com/api/fiftyone.html)
 - [torch](https://pytorch.org/docs/stable/index.html)
 - [torchvision](https://pytorch.org/vision/stable/index.html)

In [1]:
!pip install fiftyone torch torchvision

Collecting fiftyone
  Downloading fiftyone-0.23.2-py3-none-any.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
Collecting aiofiles (from fiftyone)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting argcomplete (from fiftyone)
  Downloading argcomplete-3.2.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.3/42.3 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting boto3 (from fiftyone)
  Downloading boto3-1.34.11-py3-none-any.whl (139 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.3/139.3 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
Collecting dacite<1.8.0,>=1.6.0 (from fiftyone)
  Downloading dacite-1.7.0-py3-none-any.whl (12 kB)
Collecting Deprecated (from fiftyone)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting ftfy (from fiftyone)
  Downloading ftfy-6.1.3-py3-none-any.whl (53 kB)
[2K   

In [68]:
import torch
import torchvision
from torchvision.transforms import functional as func

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

from PIL import Image
from IPython.display import display

import matplotlib.pyplot as plt

import gc

## Downloading Datasets

1. Downloading [2017 Coco Validation Images](https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html#coco-2017)

In [3]:
coco_val_data = foz.load_zoo_dataset("coco-2017", split="validation")
coco_val_data.persistent = True
coco_classes = coco_val_data.default_classes

Downloading split 'validation' to '/root/fiftyone/coco-2017/validation' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'validation' to '/root/fiftyone/coco-2017/validation' if necessary


Downloading annotations to '/root/fiftyone/coco-2017/tmp-download/annotations_trainval2017.zip'


INFO:fiftyone.utils.coco:Downloading annotations to '/root/fiftyone/coco-2017/tmp-download/annotations_trainval2017.zip'


 100% |██████|    1.9Gb/1.9Gb [5.6s elapsed, 0s remaining, 371.3Mb/s]       


INFO:eta.core.utils: 100% |██████|    1.9Gb/1.9Gb [5.6s elapsed, 0s remaining, 371.3Mb/s]       


Extracting annotations to '/root/fiftyone/coco-2017/raw/instances_val2017.json'


INFO:fiftyone.utils.coco:Extracting annotations to '/root/fiftyone/coco-2017/raw/instances_val2017.json'


Downloading images to '/root/fiftyone/coco-2017/tmp-download/val2017.zip'


INFO:fiftyone.utils.coco:Downloading images to '/root/fiftyone/coco-2017/tmp-download/val2017.zip'


 100% |██████|    6.1Gb/6.1Gb [17.1s elapsed, 0s remaining, 380.3Mb/s]      


INFO:eta.core.utils: 100% |██████|    6.1Gb/6.1Gb [17.1s elapsed, 0s remaining, 380.3Mb/s]      


Extracting images to '/root/fiftyone/coco-2017/validation/data'


INFO:fiftyone.utils.coco:Extracting images to '/root/fiftyone/coco-2017/validation/data'


Writing annotations to '/root/fiftyone/coco-2017/validation/labels.json'


INFO:fiftyone.utils.coco:Writing annotations to '/root/fiftyone/coco-2017/validation/labels.json'


Dataset info written to '/root/fiftyone/coco-2017/info.json'


INFO:fiftyone.zoo.datasets:Dataset info written to '/root/fiftyone/coco-2017/info.json'


Loading 'coco-2017' split 'validation'


INFO:fiftyone.zoo.datasets:Loading 'coco-2017' split 'validation'


 100% |███████████████| 5000/5000 [30.6s elapsed, 0s remaining, 131.7 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 5000/5000 [30.6s elapsed, 0s remaining, 131.7 samples/s]      


Dataset 'coco-2017-validation' created


INFO:fiftyone.zoo.datasets:Dataset 'coco-2017-validation' created


Getting information about 2017 Coco validation images:

In [4]:
print(coco_val_data)

Name:        coco-2017-validation
Media type:  image
Num samples: 5000
Persistent:  True
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)


2. Downloading [Pascal Validation Images](https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html#dataset-zoo-voc-2007)

In [7]:
pascal_val_data = foz.load_zoo_dataset("voc-2012", split="validation")
pascal_val_data.persistent = True

Downloading split 'validation' to '/root/fiftyone/voc-2012/validation'


INFO:fiftyone.zoo.datasets:Downloading split 'validation' to '/root/fiftyone/voc-2012/validation'


Downloading http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar to /root/fiftyone/voc-2012/tmp-download/VOCtrainval_11-May-2012.tar


100%|██████████| 1999639040/1999639040 [01:33<00:00, 21353241.74it/s]


Extracting /root/fiftyone/voc-2012/tmp-download/VOCtrainval_11-May-2012.tar to /root/fiftyone/voc-2012/tmp-download
 100% |███████████████| 5823/5823 [1.2m elapsed, 0s remaining, 90.8 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 5823/5823 [1.2m elapsed, 0s remaining, 90.8 samples/s]      


Dataset info written to '/root/fiftyone/voc-2012/info.json'


INFO:fiftyone.zoo.datasets:Dataset info written to '/root/fiftyone/voc-2012/info.json'


Loading 'voc-2012' split 'validation'


INFO:fiftyone.zoo.datasets:Loading 'voc-2012' split 'validation'


 100% |███████████████| 5823/5823 [16.4s elapsed, 0s remaining, 351.0 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 5823/5823 [16.4s elapsed, 0s remaining, 351.0 samples/s]      


Dataset 'voc-2012-validation' created


INFO:fiftyone.zoo.datasets:Dataset 'voc-2012-validation' created


A dictionary that fixes the labels of Pascal

In [5]:
alt_classes = {
    "airplane" : "aeroplane",
    "motorcycle" : "motorbike",
    "dining table" : "diningtable",
    "potted plant" : "pottedplant",
    "couch" : "sofa",
    "tv" : "tvmonitor",
    "bicycle" : "bicycle"
}

Getting information about 2007 Pascal Validation Images:

In [8]:
print(pascal_val_data)

Name:        voc-2012-validation
Media type:  image
Num samples: 5823
Persistent:  True
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)


## Loading Model

In [9]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

model.to(device)

model_name = "faster_rcnn"

model.eval()

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:01<00:00, 160MB/s]


FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=0.0)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=0.0)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=0.0)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=0.0)
          (relu): ReLU(

# Coco Dataset

## Model Inference

In [10]:
def add_predictions(model, model_name, classes, predictions_view, alt_classes):
  # Add predictions to samples
  with fo.ProgressBar() as pb:
      for sample in pb(predictions_view):
          # Load image
          image = Image.open(sample.filepath)
          image = func.to_tensor(image).to(device)
          c, h, w = image.shape

          # Perform inference
          preds = model([image])[0]
          labels = preds["labels"].cpu().detach().numpy()
          scores = preds["scores"].cpu().detach().numpy()
          boxes = preds["boxes"].cpu().detach().numpy()

          # Convert detections to FiftyOne format
          detections = []
          for label, score, box in zip(labels, scores, boxes):
              # Convert to [top-left-x, top-left-y, width, height]
              # in relative coordinates in [0, 1] x [0, 1]
              x1, y1, x2, y2 = box

              rel_box = [x1 / w, y1 / h, (x2 - x1) / w, (y2 - y1) / h]

              detections.append(
                  fo.Detection(
                      label=alt_classes[classes[label]] if classes[label] in alt_classes else classes[label],
                      bounding_box=rel_box,
                      confidence=score
                  )
              )

          # Save predictions to dataset
          sample[model_name] = fo.Detections(detections=detections)
          sample.save()

In [11]:
coco_predictions_view = coco_val_data.take(500, seed=51)

add_predictions(model, model_name, coco_classes, coco_predictions_view, alt_classes)

 100% |█████████████████| 500/500 [1.3m elapsed, 0s remaining, 6.6 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 500/500 [1.3m elapsed, 0s remaining, 6.6 samples/s]      


In [12]:
session = fo.launch_app(coco_predictions_view)


Welcome to

███████╗██╗███████╗████████╗██╗   ██╗ ██████╗ ███╗   ██╗███████╗
██╔════╝██║██╔════╝╚══██╔══╝╚██╗ ██╔╝██╔═══██╗████╗  ██║██╔════╝
█████╗  ██║█████╗     ██║    ╚████╔╝ ██║   ██║██╔██╗ ██║█████╗
██╔══╝  ██║██╔══╝     ██║     ╚██╔╝  ██║   ██║██║╚██╗██║██╔══╝
██║     ██║██║        ██║      ██║   ╚██████╔╝██║ ╚████║███████╗
╚═╝     ╚═╝╚═╝        ╚═╝      ╚═╝    ╚═════╝ ╚═╝  ╚═══╝╚══════╝ v0.23.2

If you're finding FiftyOne helpful, here's how you can get involved:

|
|  ⭐⭐⭐ Give the project a star on GitHub ⭐⭐⭐
|  https://github.com/voxel51/fiftyone
|
|  🚀🚀🚀 Join the FiftyOne Slack community 🚀🚀🚀
|  https://slack.voxel51.com
|



INFO:fiftyone.core.session.session:
Welcome to

███████╗██╗███████╗████████╗██╗   ██╗ ██████╗ ███╗   ██╗███████╗
██╔════╝██║██╔════╝╚══██╔══╝╚██╗ ██╔╝██╔═══██╗████╗  ██║██╔════╝
█████╗  ██║█████╗     ██║    ╚████╔╝ ██║   ██║██╔██╗ ██║█████╗
██╔══╝  ██║██╔══╝     ██║     ╚██╔╝  ██║   ██║██║╚██╗██║██╔══╝
██║     ██║██║        ██║      ██║   ╚██████╔╝██║ ╚████║███████╗
╚═╝     ╚═╝╚═╝        ╚═╝      ╚═╝    ╚═════╝ ╚═╝  ╚═══╝╚══════╝ v0.23.2

If you're finding FiftyOne helpful, here's how you can get involved:

|
|  ⭐⭐⭐ Give the project a star on GitHub ⭐⭐⭐
|  https://github.com/voxel51/fiftyone
|
|  🚀🚀🚀 Join the FiftyOne Slack community 🚀🚀🚀
|  https://slack.voxel51.com
|



In [14]:
session.freeze()  # fixes the session (like a screenshot of the session)

## Evaluation

In [15]:
def compute_mAP(predictions_view, model_name, confidence, iou_threshold):
  high_conf_view = predictions_view.filter_labels(model_name, F("confidence") > confidence, only_matches=False)

  results = high_conf_view.evaluate_detections(
    model_name,
    gt_field="ground_truth",
    eval_key="eval",
    compute_mAP=True,
    iou=iou_threshold
  )

  return results.mAP()

In [16]:
def compute_mean_iou(predictions_view, confidence):
  high_conf_view = predictions_view.filter_labels(model_name, F("confidence") > confidence, only_matches=False)

  count = 0
  sum = 0.0

  with fo.ProgressBar() as pb:
      for sample in pb(high_conf_view.iter_samples()):
          for detection in sample[model_name].detections:
              if detection.eval == 'tp':
                  sum += detection.eval_iou
                  count += 1

  if count > 0:
    return sum / count

  return None

In [17]:
coco_mAP = compute_mAP(coco_predictions_view, model_name, 0.2, 0.5)
coco_mean_iou = compute_mean_iou(coco_predictions_view, 0.2)

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |█████████████████| 500/500 [47.0s elapsed, 0s remaining, 8.6 samples/s]       


INFO:eta.core.utils: 100% |█████████████████| 500/500 [47.0s elapsed, 0s remaining, 8.6 samples/s]       


Performing IoU sweep...


INFO:fiftyone.utils.eval.coco:Performing IoU sweep...


 100% |█████████████████| 500/500 [15.0s elapsed, 0s remaining, 34.5 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 500/500 [15.0s elapsed, 0s remaining, 34.5 samples/s]      


 500 [4.2s elapsed, 111.6 samples/s]   


INFO:eta.core.utils: 500 [4.2s elapsed, 111.6 samples/s]   


In [18]:
print(f'mAP using coco: {coco_mAP}')
print(f'mean IoU using coco: {coco_mean_iou}')

mAP using coco: 0.36677690094653637
mean IoU using coco: 0.8561567575722661


In [19]:
def filter_detections(predictions_view, confidence, eval):
  high_conf_view = predictions_view.filter_labels(model_name, F("confidence") > confidence, only_matches=False)

  return high_conf_view.sort_by(eval, reverse=True)

Best predictions according to TP with confidence: 0.5

In [20]:
session.view = filter_detections(coco_predictions_view, 0.5, "eval_tp")

In [21]:
session.freeze()  # fixes the session (like a screenshot of the session)

Worst predictions according to FP with confidence: 0.5

In [22]:
session.view = filter_detections(coco_predictions_view, 0.5, "eval_fp")

In [23]:
session.freeze()  # fixes the session (like a screenshot of the session)

# Pascal Dataset

## Model Inference

In [24]:
pascal_predictions_view = pascal_val_data.take(500, seed=51)

add_predictions(model, model_name, coco_classes, pascal_predictions_view, alt_classes)

 100% |█████████████████| 500/500 [1.1m elapsed, 0s remaining, 7.8 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 500/500 [1.1m elapsed, 0s remaining, 7.8 samples/s]      


In [25]:
session = fo.launch_app(pascal_predictions_view)

In [26]:
session.freeze()  # fixes the session (like a screenshot of the session)

## Evaluation

In [27]:
pascal_mAP = compute_mAP(pascal_predictions_view, model_name, 0.2, 0.5)
pascal_mean_iou = compute_mean_iou(pascal_predictions_view, 0.2)

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |█████████████████| 500/500 [21.4s elapsed, 0s remaining, 34.6 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 500/500 [21.4s elapsed, 0s remaining, 34.6 samples/s]      


Performing IoU sweep...


INFO:fiftyone.utils.eval.coco:Performing IoU sweep...


 100% |█████████████████| 500/500 [6.9s elapsed, 0s remaining, 84.4 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 500/500 [6.9s elapsed, 0s remaining, 84.4 samples/s]      


 500 [3.0s elapsed, 149.1 samples/s]    


INFO:eta.core.utils: 500 [3.0s elapsed, 149.1 samples/s]    


In [28]:
print(f'mAP using pascal: {pascal_mAP}')
print(f'mean IoU using pascal: {pascal_mean_iou}')

mAP using pascal: 0.5215609444255508
mean IoU using pascal: 0.8152248596074178


Best predictions according to TP with confidence: 0.5

In [29]:
session.view = filter_detections(pascal_predictions_view, 0.5, "eval_tp")

In [31]:
session.freeze()  # fixes the session (like a screenshot of the session)

Worst predictions according to FP with confidence: 0.5

In [34]:
session.view = filter_detections(pascal_predictions_view, 0.5, "eval_fn")