# Introduction

In this notebook we do a basic EDA on the image data and features revealed with the help of the Detectron2 Library

In [1]:
# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2
import os
import torch

%matplotlib inline


# import some common detectron2 utilities
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2.engine import DefaultTrainer, DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog, build_detection_test_loader
from detectron2.data.datasets import register_coco_instances
from detectron2.config import get_cfg
from detectron2.utils.visualizer import ColorMode
from detectron2.evaluation import COCOEvaluator, inference_on_dataset

First we must register our images and annotation files into the Detectron2 network

In [2]:
register_coco_instances("train_lane_cone_detector", {}, "../test/testdata/train/train.json", "../test/testdata/train")

We can then load in the saved metadata and images

In [3]:
train_dataset_metadata = MetadataCatalog.get("train_lane_cone_detector")
train_dataset_dicts = DatasetCatalog.get("train_lane_cone_detector")

[32m[02/07 19:24:14 d2.data.datasets.coco]: [0mLoaded 2 images in COCO format from ../test/testdata/train/train.json


Let's take a look at what the metadata contains

In [4]:
train_dataset_metadata

namespace(name='train_lane_cone_detector',
          json_file='../test/testdata/train/train.json',
          image_root='../test/testdata/train',
          evaluator_type='coco',
          thing_classes=['lane', 'cone'],
          thing_dataset_id_to_contiguous_id={1: 0, 2: 1})

As we can see it contains a lot of information about our dataset. Namely the location of the annotation files and the location of the images. It also was able to autodetect that are images are annotated in the COCO format (Our .json file is based of the COCO JSON captioning). Lastly it knows the possible labels our objects have (lane and cone) and has assigned integers to them for the Detectron2 network to use in calculating metrics.

Let's next take a look at the dictionary saved from the DatasetCatalog

In [5]:
train_dataset_dicts

[{'file_name': '../test/testdata/train/frame0034.jpg',
  'height': 360,
  'width': 1280,
  'image_id': 1,
  'annotations': [{'iscrowd': 0,
    'bbox': [1.103448275862069,
     102.20689655172414,
     395.03448275862064,
     243.86206896551727],
    'category_id': 0,
    'segmentation': [[1.103448275862069,
      311.8620689655172,
      1.103448275862069,
      346.0689655172414,
      328.82758620689657,
      171.72413793103448,
      396.13793103448273,
      115.44827586206897,
      387.3103448275862,
      102.20689655172414,
      366.3448275862069,
      105.51724137931035,
      366.3448275862069,
      123.17241379310344]],
    'bbox_mode': <BoxMode.XYWH_ABS: 1>},
   {'iscrowd': 0,
    'bbox': [987.5862068965517,
     76.82758620689656,
     290.20689655172407,
     55.172413793103445],
    'category_id': 0,
    'segmentation': [[1276.6896551724137,
      132,
      1277.7931034482758,
      120.9655172413793,
      987.5862068965517,
      76.82758620689656,
      989.7931

First we notice that it is saved into a list where each element has the same format. Let's take a look at an element

In [6]:
train_dataset_dicts[0]

{'file_name': '../test/testdata/train/frame0034.jpg',
 'height': 360,
 'width': 1280,
 'image_id': 1,
 'annotations': [{'iscrowd': 0,
   'bbox': [1.103448275862069,
    102.20689655172414,
    395.03448275862064,
    243.86206896551727],
   'category_id': 0,
   'segmentation': [[1.103448275862069,
     311.8620689655172,
     1.103448275862069,
     346.0689655172414,
     328.82758620689657,
     171.72413793103448,
     396.13793103448273,
     115.44827586206897,
     387.3103448275862,
     102.20689655172414,
     366.3448275862069,
     105.51724137931035,
     366.3448275862069,
     123.17241379310344]],
   'bbox_mode': <BoxMode.XYWH_ABS: 1>},
  {'iscrowd': 0,
   'bbox': [987.5862068965517,
    76.82758620689656,
    290.20689655172407,
    55.172413793103445],
   'category_id': 0,
   'segmentation': [[1276.6896551724137,
     132,
     1277.7931034482758,
     120.9655172413793,
     987.5862068965517,
     76.82758620689656,
     989.7931034482758,
     85.6551724137931]],
  

Each element seems to be talking about a specific image file in dictionary format. The dictionary contains information about the image name, it's height and width and also has an image ID attached to it. It's also contains a lot of details about the annotations, specifically the boundary boxes, segmentations and the category ID each of them belong to

Let's plot one of the images now

![raw_image](../data/report/raw_image.png)

We can see that the image does indeed have a cone and lane object. 

In [7]:
cv2.imread(train_dataset_dicts[0]["file_name"]).shape

(360, 1280, 3)

We can also see the image indeed is a 360x1280 image. The 3 reflects 3 channels which is true since this is a BGR image

Let's draw the boundary box and segmentation results on top of this

![labelled_image](../data/report/labelled_image.png)

We can see the iamge has appropriate boundary boxes and segmentations surrounding it as well

The Detectron2 Network will be trained on such images and then finally use an inference on an unknown set of images to make a video