<a href="https://colab.research.google.com/github/FabianGermany/AutonomousDrivingDetectron2/blob/main/Detectron2_Personal_Notebook_GoogleDrive_Instance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic information: Detectron2 algorithm used for Audi A2D2 dataset

<img src="https://github.com/FabianGermany/AutonomousDrivingDetectron2/blob/main/output_data/example_output_object_detection_pretrained.jpg?raw=1" width="900">

In the Colab/Jupyter notebook, we will
* choose a pretrained model available in Detectron2 framework
* shortly test the default pre-trained Detectron2 model on a single picture for test and demonstration purposes
* shortly test the access to the Audi A2D2 dataset and parse to dataset to a format we need later
* train the model on the Audi A2D2 dataset with training images and
* test the model trained on Audi A2D2 with test images in order to evaluate this model
* run the default and the trained model an an exemplary video


# 1 General preparations



General stuff

In [None]:
#beautiful print for Colab steps
def statement_done():
  print("\n")
  print(30 * "*")
  print("This step is done.")
  print(30 * "*")
  
statement_done()



******************************
This step is done.
******************************


Google Drive

In [None]:
import os, sys
from google.colab import drive
drive.mount('/content/gdrive') #or drive.mount('/content/drive')
statement_done()

Mounted at /content/gdrive


******************************
This step is done.
******************************


Some of the packages like OpenCV are already installed on Colab Server on default and only need to be imported. Other need to be installed via pip. Since notebooks sometimes need to reconnect and re-install takes a lot of time, you can keep the installation on Google Drive using this solution, but this doesn't work well with Detectron and torch 1.8...

(Warning: This only works if the notebook is opened in content/notebooks 
folder, not if it's opened from GitHub!)

So all in all, I recommend to set the boolean `local_install` to false.

In [None]:
#local_install = True
local_install = False
statement_done()



******************************
This step is done.
******************************


Also make sure to choose the desired mode. It's recommended to set `dataset_json_available`and `load_existing_trained_model` to `True` if the script has already been run successfully at least once and we just want to perform an evaluation. Putting this to `True` this implies that we simply load our previosuly stored dataset .json and our trained model so we don't need to recalculate everything for running the evaluation.

In [None]:
#decide whether to re-generate the A2D2 dict data or just load it if it was already calculated before and stored in cloud
dataset_json_available = True
#dataset_json_available = False

In [None]:
#decide whether to used existing trained model (stored in output folder) or re-learn the model
load_existing_trained_model = True
#load_existing_trained_model = False
statement_done()

In [None]:
#only do this if local installation
if(local_install):

  #create a path to save the modules in Google Drive
  nb_path = '/content/notebooks'
  os.symlink('/content/gdrive/My Drive/Colab Notebooks', nb_path)
  sys.path.insert(0,nb_path)

statement_done()



******************************
This step is done.
******************************


Install packages/dependencies and import them
Some of them or the desired specific versions may not be installed by default on Colab, so we need the *!pip install* commands. For example we need a specific version of pytorch (pytorch 1.8). Pytorch version 1.9 is already
installed on Colab, but is not compatible with Detectron2. See more [here](https://detectron2.readthedocs.io/en/latest/tutorials/install.html).

The *--target=$nb_path* means that it will be installed locally to the Google Drive.

If *--target=$nb_path* is activated, you only need to run this once.


In [None]:
#only do this if local installation
if (local_install):
  !pip install --target=$nb_path pyyaml==5.1
  !pip install --target=$nb_path torch==1.8.0+cu101 torchvision==0.9.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
  !pip install --target=$nb_path detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
  
#server installation
else: #not local_install
  !pip install pyyaml==5.1 #if locally: !pip install --target=$nb_path pyyaml==5.1
  !pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
  !pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

#exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime  
statement_done()

Collecting pyyaml==5.1
[?25l  Downloading https://files.pythonhosted.org/packages/9f/2c/9417b5c774792634834e730932745bc09a7d36754ca00acf1ccd1ac2594d/PyYAML-5.1.tar.gz (274kB)
[K     |█▏                              | 10kB 13.9MB/s eta 0:00:01[K     |██▍                             | 20kB 18.1MB/s eta 0:00:01[K     |███▋                            | 30kB 10.8MB/s eta 0:00:01[K     |████▉                           | 40kB 9.5MB/s eta 0:00:01[K     |██████                          | 51kB 5.0MB/s eta 0:00:01[K     |███████▏                        | 61kB 5.2MB/s eta 0:00:01[K     |████████▍                       | 71kB 5.3MB/s eta 0:00:01[K     |█████████▋                      | 81kB 5.8MB/s eta 0:00:01[K     |██████████▊                     | 92kB 6.2MB/s eta 0:00:01[K     |████████████                    | 102kB 5.0MB/s eta 0:00:01[K     |█████████████▏                  | 112kB 5.0MB/s eta 0:00:01[K     |██████████████▍                 | 122kB 5.0MB/s eta 0:00:01



******************************
This step is done.
******************************


In [None]:
#import pytorch check pytorch installation: 
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
assert torch.__version__.startswith("1.8")   # please manually install torch 1.8 if Colab changes its default version cause Detectron2 currently needs torch 1.8
statement_done()

1.8.0+cu101 False


******************************
This step is done.
******************************


In [None]:
# import some common pre-installed libraries
import numpy as np
import numpy.linalg as la
import matplotlib.pylab as pt
import os, json, cv2, random
from urllib.request import urlopen # lib that handles url stuff
from google.colab.patches import cv2_imshow
import json, pprint
from IPython.display import YouTubeVideo, HTML, display
from base64 import b64encode
from google.colab import files
from os import listdir
from os.path import isfile, join
import glob
statement_done()



******************************
This step is done.
******************************


In [None]:
#import libraries like Detectron2 and some of its utlities that we manually installed before

import detectron2
from detectron2.utils.logger import setup_logger #logger
from detectron2 import model_zoo #pre-trained models
from detectron2.engine import DefaultPredictor #for testing/inference
from detectron2.config import get_cfg #configuration for training and testing
from detectron2.utils.visualizer import Visualizer, ColorMode #visualize inferences
from detectron2.data import MetadataCatalog, DatasetCatalog #for registration/metadata
from detectron2.structures import BoxMode #format of bounding boxes
from detectron2.engine import DefaultTrainer #training models
from detectron2.evaluation import COCOEvaluator, inference_on_dataset #evaluation
from detectron2.data import build_detection_test_loader #evaluation
from detectron2.modeling import build_model #building models

# setup Detectron2 logger
setup_logger()

statement_done()



******************************
This step is done.
******************************


Import custom functions and functions from A2D2 tutorial:

In [None]:
#import functions from functions.py
!wget https://raw.githubusercontent.com/FabianGermany/AutonomousDrivingDetectron2/main/functions.py -q -O functions.py
!python functions.py
statement_done()
import functions



******************************
This step is done.
******************************


Google Colab Access to Google Drive

In [None]:
# Colab deletes files regularly or every time restarting, to prevent this use Google Drive (another option is re-download from GitHub via !wget -q)
drive.mount('/content/gdrive')

print("Default directory is: ")
!pwd #show current directory

os.chdir("/content/gdrive/My Drive/Dev/ColabDetectron2Project") #change directory

print("Current directory is: ")
!pwd #show current directory

print("Files in current directory are:\n")
!ls #list files in current directory

statement_done()

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
Default directory is: 
/content
Current directory is: 
/content/gdrive/My Drive/Dev/ColabDetectron2Project
Files in current directory are:

dataset_1_train
dataset_2_test
detectron2
example-personal-input.jpg
example-personal-output.jpg
exemplary_scene_rural_2_muted_input_local.mp4
exemplary_scene_rural_2_muted_output_default_local.mkv
functions.py
output
__pycache__
testing_dict.json
training_dict.json


******************************
This step is done.
******************************


# 2 Choosing a pre-trained Detectron2 model 

Choose a pre-trained model from the Detectron2 model zoo. For comparison purposes, we can try to run the whole script with different models such as Mask R-CNN, Faster R-CNN or RetinaNet.

In [None]:
#model file
#see https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

#model_path is pointing to the .yaml file (this is the basic model staying the same after training as well)
#--------------------------------------------------------
#model_path = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml" #COCO Instance Segmentation with Mask R-CNN --> this is default but also includes instance segmentation which I dont need
model_path = "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml" #COCO Object Detection with Fast R-CNN --> I will choose this
#model_path = "COCO-Detection/retinanet_R_50_FPN_3x.yaml" #COCO Object Detection with RetinaNet
#model_path = "COCO-Detection/rpn_R_50_FPN_1x.yaml" #COCO Object Detection with RPN & Fast R-CNN; only lr sched = 1x, not = 3x available
#model_path = "COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml" #COCO Panoptic Segmentation with FPN
#model_path = "COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml #COCO Person Keypoint Detection wiht R-CNN

#model_path_local is the absolute path to the config file in .yaml format
#--------------------------------------------------------
model_path_local = model_zoo.get_config_file(model_path) #e.g. /usr/local/lib/python3.7/dist-packages/detectron2/model_zoo/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml


#model_config_path and model_config_path_short is pointing to the pickle / config file (.pkl) (or sometimes .pth when it's trained) That's for the WEIGHTS file and it will change after training
#--------------------------------------------------------
#see #https://github.com/facebookresearch/detectron2/blob/master/detectron2/model_zoo/model_zoo.py
model_config_path = model_zoo.get_checkpoint_url(model_path) #e.g. https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl
model_config_path_short = model_config_path[42:] #remove first part of URL; e.g. "COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl"

#Besides COCO, there is also some models trained on Cityscapes & Pascal VOC or LVIS
statement_done()



******************************
This step is done.
******************************


Then, we create a Detectron2 config and a Detectron2 `DefaultPredictor`. This will be our pretrained predictor that we won't change. For comparison purposes, we will later create another cfg called `cfg2` which will also start as a pretrained model but we will train it later and compare to `cfg`.

In [None]:
cfg = get_cfg()
cfg.merge_from_file(model_path_local) #use the pre-trained model
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
cfg.MODEL.WEIGHTS = model_config_path
pretrained_predictor = DefaultPredictor(cfg)
statement_done()

# 3 Running the pre-trained Detectron2 model on an exemplary image
We do this for test and demonstration purposes before we start training and testing a huge amount of data

Let's download and show an interesting image:

In [None]:
# load and store personal example image
#im = cv2.imread("./example-input-local.jpg") #this wont work cause Colab always deletes locally stored files
!wget https://raw.githubusercontent.com/FabianGermany/AutonomousDrivingDetectron2/main/input_data/example_input.jpg -q -O example-personal-input.jpg
im = cv2.imread("./example-personal-input.jpg")

#make image a bit smaller for display into notebook
imS = functions.resize_img(25, im)
cv2_imshow(imS)
statement_done()

Let's use the `DefaultPredictor` to run inference (testing) on this image.

In [None]:
outputs = pretrained_predictor(im) #run the model on the image
statement_done()

In [None]:
# look at the outputs. See https://detectron2.readthedocs.io/tutorials/models.html#model-output-format for specification
print(outputs["instances"].pred_classes) #prints the classes as int
print(outputs["instances"].pred_boxes) #prints the bounding box values
statement_done()

Show output as image

In [None]:
# We can use `Visualizer` to draw the predictions on the image.
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
output_image = out.get_image()[:, :, ::-1]
output_image_resized = functions.resize_img(20, output_image)
cv2_imshow(output_image_resized)
cv2.imwrite("./example-personal-output.jpg", output_image)
statement_done()

# 4 Checking out and preparing the Audi A2D2 dataset

Before we start training our model on our [Audi A2D2 dataset](https://www.a2d2.audi/a2d2/en.html) dataset, we want to quickly make sure, that data is accessible. The data is available on the A2D2 server. Due to the huge size of data, I downloaded the data to my local storage and and picked only a small part of it (3 out of 18 folders). Two of the folders will later be the training dataset and one will be the testing dataset. Have done this, I put this data to my Google Drive due to performance advantages. 
<!--The data is already predownloaded on an Amazon AWS S3 bucket to we can run our script on SageMaker later.-->


In [None]:
if (not dataset_json_available):
  link_to_config_file = "https://raw.githubusercontent.com/FabianGermany/AutonomousDrivingDetectron2/main/About_Audi_A2D2/cams_lidars.json" #sensor configuration file 

  #download and unzip/decompress training and testing files (in my case I don't run this anymore cause the 15 GB cloud storage from Google Drive is not enough for the zip files and the raw files together
  # so I unzipped it once and deleted the zip files again)
  !unzip dataset_1_train.zip -d dataset_1_train #unzip Data for bounding boxes for training
  !unzip dataset_2_test.zip -d dataset_2_test #unzip Data for bounding boxes for testing
  
statement_done()


In [None]:
if (not dataset_json_available):
  f = urlopen(link_to_config_file) #open json file from webserver (use urlopen cause open function is python is based on relative paths on current directory)
  config = json.load(f)
  #print("This is the whole config file: \n")
  #pprint.pprint(config)
  #for specific data use something like
  #pprint.pprint(config.keys())
  #pprint.pprint(config['lidars'].keys())
  #pprint.pprint(config['lidars']['front_left'])
  #pprint.pprint(config['cameras']['front_left']['view'])
  #etc.

statement_done()

Access exemplary single data (images and bounding box annotation) from dataset:

In [None]:
#Choose an image
file_name = '000000166'
#file_name = '000002351'
#file_name = '000004848' #missing bounding boxes...
#file_name = '000004885' #also super bad annotation
#file_name = '000006752'
statement_done()

In [None]:
#train folder dataset_1_train have 20181107_132730 and 20181108_091945 folders
#test folder dataset_2_test has 20181016_125231 folder

#show image
print("Current image:")
file_img_original = 'dataset_1_train/20181107_132730/camera/cam_front_center/20181107132730_camera_frontcenter_' + file_name + '.png'
image_original = cv2.imread(file_img_original)
image_original_resized = functions.resize_img(40, image_original)
cv2_imshow(image_original_resized)

statement_done()

In [None]:
#let's define a function that is delivering the bounding box and class information from a single entry the json file; we put this into a custom list[dict] format
def get_bounding_boxes_and_classes_from_json(json_path, mute = True):

  if(not mute): print("\nBounding Box Information:")
  boxes = functions.read_bounding_boxes(json_path, mute = mute)
  #points = functions.get_points(boxes[0])

  if(not mute): print("\n\nJSON File Name:")
  if(not mute): print (json_path)

  #if(not mute): print("\n\nBounding Box Information:")
  #if(not mute): pprint.pprint(boxes)

  n_detected_objects = len(boxes)

  if(not mute): print("\n\n2D Bounding Box Coordinates:")

  current_Coord2D_dict =  {'right': '', 'left': '', 'bottom': '', 'top': '', 'class': ''}
  #Coord2D = [{}] * n_detected_objects #init a list of empty dictionaries
  Coord2D = []

  #write values into the dict entries (each bounding box has one list entry)
  for i in range(n_detected_objects):
    current_Coord2D_dict['right'] = boxes[i]['right']
    current_Coord2D_dict['left'] = boxes[i]['left']
    current_Coord2D_dict['bottom'] = boxes[i]['bottom']
    current_Coord2D_dict['top'] = boxes[i]['top']
    current_Coord2D_dict['class'] = boxes[i]['class']
    Coord2D.append(current_Coord2D_dict.copy()) #use copy(): You need to append a copy, otherwise you are just adding references to the same dictionary over and over again:

  return Coord2D, n_detected_objects

statement_done()

In [None]:
#let's run this function on a test json
file_name_bboxes = 'dataset_1_train/20181107_132730/label3D/cam_front_center/20181107132730_label3D_frontcenter_' + file_name + '.json'
result_json, n_objects = get_bounding_boxes_and_classes_from_json(file_name_bboxes, mute = False)
pprint.pprint(result_json)

#print("\n\n3D Coordinates:")
#print(points)

statement_done()

In [None]:
#Double check: Bounding Box on image
print("Image with 2D bounding boxes:")
file_img_original = 'dataset_1_train/20181107_132730/camera/cam_front_center/20181107132730_camera_frontcenter_' + file_name + '.png'
image_original = cv2.imread(file_img_original)


#amount of boxes
n_detected_objects = len(result_json)

#draw bounding boxes
for i in range(n_detected_objects):
  cv2.rectangle(image_original, (int(result_json[i]['top']), int(result_json[i]['left'])), (int(result_json[i]['bottom']), int(result_json[i]['right'])),(36,255,12), 4) #according to documentation: (left, top) and (right, bottom), but doesn't work; seems like A2D2 dataset has some mix-up
  cv2.putText(image_original, result_json[i]['class'], (int(result_json[i]['top']), int(result_json[i]['left'])-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (36,255,12), 2)
  #cv2.circle(image_original, (100,900), radius=4, color=(36,255,12), thickness=5)

image_original_resized = functions.resize_img(40, image_original)
cv2_imshow(image_original_resized)

#the bounding box fits roughly, but it's actually not very good (reason: maybe the conversion to int?)

statement_done()

The annotations of A2D2 are actually sometimes pretty bad and somtimes missing, like in the files '000004848' or '000004885.

Prepare the dataset

In [None]:
if (dataset_json_available): #already done before, so just load it
  dict_of_training_images_and_json = json.load(open("training_dict.json"))
  dict_of_testing_images_and_json = json.load(open("testing_dict.json"))

else: #not created before, so let's recreate

  #train folder dataset_1_train have 20181107_132730 and 20181108_091945 folders
  #test folder dataset_2_test has 20181016_125231 folder

  #the training images and json are here:
  path_training_images_1 = 'dataset_1_train/20181107_132730/camera/cam_front_center/' #/20181107132730_camera_frontcenter_...
  path_training_json_1 = 'dataset_1_train/20181107_132730/label3D/cam_front_center/' #/20181107132730_label3D_frontcenter_...
  path_training_images_2 = 'dataset_1_train/20181108_091945/camera/cam_front_center/' #/20181108091945_camera_frontcenter_...
  path_training_json_2 = 'dataset_1_train/20181108_091945/label3D/cam_front_center/'#/20181108091945_label3D_frontcenter_...

  #the testing images and json are here:
  path_testing_images = 'dataset_2_test/20181016_125231/camera/cam_front_center/' #/20181016125231_camera_frontcenter_...
  path_testing_json = 'dataset_2_test/20181016_125231/label3D/cam_front_center/' #/20181016125231_label3D_frontcenter_...

  #list up all elements in those folders
  list_of_training_images_1 = [f for f in listdir(path_training_images_1) if isfile(join(path_training_images_1, f))]
  list_of_training_json_1 = [f for f in listdir(path_training_json_1) if isfile(join(path_training_json_1, f))]
  list_of_training_images_2 = [f for f in listdir(path_training_images_2) if isfile(join(path_training_images_2, f))]
  list_of_training_json_2 = [f for f in listdir(path_training_json_2) if isfile(join(path_training_json_2, f))]
  list_of_testing_images = [f for f in listdir(path_testing_images) if isfile(join(path_testing_images, f))]
  list_of_testing_json = [f for f in listdir(path_testing_json) if isfile(join(path_testing_json, f))]

  #remove json information files from image folders
  list_of_training_images_1 = [x for x in list_of_training_images_1 if x.endswith('png')]
  list_of_training_images_2 = [x for x in list_of_training_images_2 if x.endswith('png')]
  list_of_testing_images = [x for x in list_of_testing_images if x.endswith('png')]

  #add the path to the file into the list to get the full path into the dict later
  list_of_training_images_1_full_path = []
  list_of_training_json_1_full_path = []
  list_of_training_images_2_full_path = []
  list_of_training_json_2_full_path = []
  list_of_testing_images_full_path = []
  list_of_testing_json_full_path = []

  for entry in list_of_training_images_1:
    list_of_training_images_1_full_path.append(path_training_images_1 + str(entry))
  for entry in list_of_training_json_1:
    list_of_training_json_1_full_path.append(path_training_json_1 + str(entry))
  for entry in list_of_training_images_2:
    list_of_training_images_2_full_path.append(path_training_images_2 + str(entry))
  for entry in list_of_training_json_2:
    list_of_training_json_2_full_path.append(path_training_json_2 + str(entry))
  for entry in list_of_testing_images:
    list_of_testing_images_full_path.append(path_testing_images + str(entry))
  for entry in list_of_testing_json:
    list_of_testing_json_full_path.append(path_testing_json + str(entry))

  #merge the two training datasets
  list_of_training_images_full_path = list_of_training_images_1_full_path + list_of_training_images_2_full_path
  list_of_training_json_full_path = list_of_training_json_1_full_path + list_of_training_json_2_full_path

  #sorts lists alpabetically (so by number of the file)
  list_of_training_images_full_path = sorted(list_of_training_images_full_path, key=str.lower)
  list_of_training_json_full_path = sorted(list_of_training_json_full_path, key=str.lower)
  list_of_testing_images_full_path = sorted(list_of_testing_images_full_path, key=str.lower)
  list_of_testing_json_full_path = sorted(list_of_testing_json_full_path, key=str.lower)

  #make pairs of png file and json file
  dict_of_training_images_and_json, dict_of_testing_images_and_json = [], []
  current_training_img_and_json, current_testing_img_and_json = {'image': '', 'json': ''}, {'image': '', 'json': ''}

  for current_training_image, current_training_json in zip(list_of_training_images_full_path, list_of_training_json_full_path):
    current_training_img_and_json['image'] = current_training_image
    current_training_img_and_json['json'] = current_training_json
    dict_of_training_images_and_json.append(current_training_img_and_json.copy())

  for current_testing_image, current_testing_json in zip(list_of_testing_images_full_path, list_of_testing_json_full_path):
    current_testing_img_and_json['image'] = current_testing_image
    current_testing_img_and_json['json'] = current_testing_json
    dict_of_testing_images_and_json.append(current_testing_img_and_json.copy())

  #store dict_of_testing_images_and_json and dict_of_training_images_and_json
  json.dump( dict_of_training_images_and_json, open( "training_dict.json", 'w' ) )
  json.dump( dict_of_testing_images_and_json, open( "testing_dict.json", 'w' ) )


#print the dict
print('Paths to training data files: ' + str(dict_of_training_images_and_json))
print('Paths to testing data files: ' + str(dict_of_testing_images_and_json))
#pprint.pprint(dict_of_training_images_and_json)

#amount of training and testing images
n_training_elements = len(dict_of_training_images_and_json)
n_testing_elements = len(dict_of_testing_images_and_json)
print("Amount of training images: " + str(n_training_elements))
print("Amount of testing images: " + str(n_testing_elements))

statement_done()

In [None]:
#We need a function converting the class into a number (Detectron2 needs that format)
#According to detectron documentation: category_id (int, required): an integer in the range [0, num_categories-1] representing the category label. The value num_categories is reserved to represent the “background” category, if applicable.
#the docs of A2D2 only list up all segmentation classes, but not the bounding box classes, they just tell on the website there is 14; 
#so I searched manually for it and found in my 3 sub-datasets 12 classes (either I missed something or the sub-datasets doesn't include the remaining 2 categories)
#but well it doesn't matter, so remaining text, will receive another number meaning "misc"

def conv_num(class_name_as_string):
  if(class_name_as_string == 'Animal'):
    return 0
  elif(class_name_as_string == 'Bicycle'):
    return 1
  elif(class_name_as_string == 'Bus'):
    return 2
  elif(class_name_as_string == 'Car'):
    return 3
  elif(class_name_as_string == 'Cyclist'):
    return 4
  elif(class_name_as_string == 'EmergencyVehicle'):
    return 5
  elif(class_name_as_string == 'MotorBiker'):
    return 6
  elif(class_name_as_string == 'Motorcycle'):
    return 7
  elif(class_name_as_string == 'Pedestrian'):
    return 8
  elif(class_name_as_string == 'Truck'):
    return 9
  elif(class_name_as_string == 'UtilityVehicle'):
    return 10
  elif(class_name_as_string == 'VanSUV'):
    return 11
  else: #maybe I missed Nr. 13/14, but it doesn't matter cause this class doesn't seem to be relevant
    return 12


#this means we have 0...12 = 13 classes (including the misc class)
#this means 14 will be the background category for Detectron2

bb_classes = ['Animal', 'Bicycle', 'Bus', 'Car', 'Cyclist', 
              'EmergencyVehicle', 'MotorBiker', 'Motorcycle', 
              'Pedestrian', 'Truck', 'UtilityVehicle', 'VanSUV', 'Misc']

n_classes = len(bb_classes)

statement_done()

Here, the dataset is in its custom format, therefore we write a function to parse it and prepare it into detectron2's standard format. User should write such a function when using a dataset in custom format. See the [Detectron2 custom dataset tutorial](https://detectron2.readthedocs.io/tutorials/datasets.html) for more details. We also need to register our dataset to Detectron2.

In [None]:
#Detectron needs a function returning the dataset in list[dict] format which we already created; more information: https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html
#input is a list[dict] as well, but in another format; so this function converts our list[dict] to a format simular to the COCO format
def Audi_Dataset_Function(data_in_custom_format, demonstration_mode = False):

  #ABOUT
  #----------------------------
  #the format needs to be list[dict] according to COCO's format; each dict is representing one image
  #we need the fields: file_name, height, width, image_id and annotations
  
  '''
  file_name #full path to image (str)
  height, width #shape of image (int)
  image_id #unique ID (str or int)
  annotations #available annotations(list[dict])
    bbox #4 numbers for bounding box (list[float])
    bbox_mode #format of the bbox(int), currently supports: BoxMode.XYXY_ABS, BoxMode.XYWH_ABS
    category_id #an integer in the range [0, num_categories-1] representing the category label. The value num_categories is reserved to represent the “background” category, if applicable.(int)
    segmentation, keypoints, iscrowd #this is not needed for our purposes
  '''

  #for the demonstration mode only use a part of data
  if (demonstration_mode == False): #normal mode
    n_iterations = len(data_in_custom_format) #use all elements
  else: #demonstration mode
    n_iterations = 10 #use only a part of the data
    print("DEMONSTRATION MODE")
    #alternative for demonstration mode: for d in random.sample(range(n_iterations), 3):


  #(1) first we create an empty list of dict
  dataset = []

  #(2) now for each iteration (scene/image) we create the required dict and grab the current image and json file and use its data
  
  for i in range(n_iterations):
    print("Iteration " + str(i+1) + " / " + str(n_iterations))

    #(2.1) We create an empty dict
    current_dict = {}

    #(2.2) now for each image/json we extract the annotation data and convert to desired format
    path_to_image_file = data_in_custom_format[i]['image'] #image file name
    path_to_json_file = data_in_custom_format[i]['json'] #json file name
    BBox_Output, n_available_bounding_boxes = get_bounding_boxes_and_classes_from_json(path_to_json_file) #json data
    height, width = cv2.imread(path_to_image_file).shape[:2] #height and width

    current_dict["file_name"] = path_to_image_file
    current_dict["height"] = height
    current_dict["width"] = width
    current_dict["image_id"] = i #or we could also use the filename etc.

    
    #(2.3) first we create an empty list of dict for the annotations
    annotations = []

    #(2.4) now for each available bounding box we extract the data

    for j in range(n_available_bounding_boxes):

      #(2.4.1) first we extract the data
      BBox_Top = BBox_Output[j]['top']
      BBox_Bottom = BBox_Output[j]['bottom']
      BBox_Left = BBox_Output[j]['left']
      BBox_Right = BBox_Output[j]['right']
      BBox_class = BBox_Output[j]['class']
      BBox_class_numeric = conv_num(BBox_class)

      current_annotation = {
            "bbox": [BBox_Top, BBox_Left, BBox_Bottom, BBox_Right], #TODO I changed the sequence (see bbox_mode)
            "bbox_mode": BoxMode.XYXY_ABS, #we need XYXY_ABS which means: (x0, y0, x1, y1) in absolute floating points coordinates; should be left, top, right, bottom; but in this case I guess top, left, bottom, right
            "category_id": BBox_class_numeric, #class as int type
      }

      #(2.4.2) then we append this annotation dict to the annotation list
      annotations.append(current_annotation.copy())

    #(2.5) then we add the annotation list to the current dict
    current_dict["annotations"] = annotations

    #(2.6) then we append this dict to the list
    dataset.append(current_dict.copy())

  #(3) we finish and return the whole dataset
  return dataset


statement_done()

In [None]:
#test this function before run with all data
output_dict_demonstration = Audi_Dataset_Function(dict_of_training_images_and_json, demonstration_mode = True)
pprint.pprint(output_dict_demonstration)

statement_done()

In [None]:
# register our training dataset to Detectron2
d = 'training'
DatasetCatalog.register("Custom_Audi_A2D2_Dataset_Training", lambda d = d : Audi_Dataset_Function(dict_of_training_images_and_json, demonstration_mode = False)) #register the whole training dataset; attention: this only works with lambda functions!
MetadataCatalog.get("Custom_Audi_A2D2_Dataset_Training").set(thing_classes = bb_classes) #declare class names

Audi_Metadata = MetadataCatalog.get("Custom_Audi_A2D2_Dataset_Training") #register metadata of training dataset (might also be used for testing/evaluation)
#access the data via: List[Dict] = DatasetCatalog.get("Custom_Audi_A2D2_Dataset_Training")
statement_done()

To verify the data loading is correct, let's visualize the images and its annotations of some selected samples in the training set:


In [None]:
output_dict_demonstration = Audi_Dataset_Function(dict_of_training_images_and_json, demonstration_mode = True)
for d in output_dict_demonstration:
    print("Image " + str(d["file_name"] + ":"))
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=Audi_Metadata, scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    output_img = out.get_image()[:, :, ::-1]
    output_img_resized = functions.resize_img(50, output_img)
    cv2_imshow(output_img_resized)
statement_done()

# 5 Training Detectron2 model on Audi A2D2 dataset

The previous steps were successful, so we can now train a Detectron2 model on our custom Audi A2D2 dateset.
This dataset has 14 classes for bounding boxes.
We'll train a bounding box model from an existing model pre-trained on COCO dataset, available in detectron2's model zoo.

Note that COCO dataset does not have categories like "EmergencyVehicle". But we'll be able to recognize these new classes.

We have a COCO-pretrained R50-FPN Faster R-CNN model that we will fine-tune now on the A2D2 dataset.

In [None]:
if (not load_existing_trained_model): #learn again
  cfg2 = get_cfg()
  cfg2.merge_from_file(model_path_local) #use the COCO-pretrained model from model zoo
  cfg2.DATASETS.TRAIN = ("Custom_Audi_A2D2_Dataset_Training",) #use the A2D2 dataset to train
  cfg2.DATASETS.TEST = () #no test for now
  cfg2.DATALOADER.NUM_WORKERS = 2
  cfg2.MODEL.WEIGHTS = model_config_path  #initialize with COCO-pretrained model from model zoo 
  cfg2.SOLVER.IMS_PER_BATCH = 2
  cfg2.SOLVER.BASE_LR = 0.00025  # pick a good LR
  cfg2.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough, but training longer might be better
  cfg2.SOLVER.STEPS = []        # do not decay learning rate
  cfg2.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this dataset (default: 512)
  cfg2.MODEL.ROI_HEADS.NUM_CLASSES = n_classes  # amount of classes (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)

  os.makedirs(cfg2.OUTPUT_DIR, exist_ok=True) #save trained model here
  trainer = DefaultTrainer(cfg2) 
  trainer.resume_or_load(resume=False) #todo this can also be used for loading model...; resume=true means loading from MODEL.WEIGHTS
  trainer.train()

else: #load_existing_trained_model #use existing model to skip some steps and save time

  cfg2 = get_cfg()
  cfg2.merge_from_file(model_path_local)
  #cfg2.DATASETS.TRAIN = ("Custom_Audi_A2D2_Dataset_Training",) #use the A2D2 dataset to train
  #cfg2.DATASETS.TRAIN = ()
  cfg2.DATASETS.TEST = () #no test for now
  cfg2.DATALOADER.NUM_WORKERS = 2
  #cfg2.MODEL.WEIGHTS = model_config_path  #initialize with COCO-pretrained model from model zoo 
  cfg2.MODEL.WEIGHTS = os.path.join(cfg2.OUTPUT_DIR, "model_final.pth")  # path to the model we trained before
  cfg2.SOLVER.IMS_PER_BATCH = 2
  cfg2.SOLVER.BASE_LR = 0.00025  # pick a good LR
  cfg2.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough, but training longer might be better
  cfg2.SOLVER.STEPS = []        # do not decay learning rate
  cfg2.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this dataset (default: 512)
  cfg2.MODEL.ROI_HEADS.NUM_CLASSES = n_classes  # amount of classes (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)


  #trainer = DefaultTrainer(cfg2) #this is also needed for evaluation TODO...todo läuft nur wenn cfg2.DATASETS.TRAIN dabei ist... #todo this step is super slow, can I skip it?
  #trainer.resume_or_load(resume=True) #todo set to true if only evaluate

statement_done()

In [None]:
# Look at training curves in tensorboard:
if (not load_existing_trained_model): 
  %load_ext tensorboard
  %tensorboard --logdir output
statement_done()

# 6 Testing trained model (Inference & evaluation using the trained model)

After then model has been trained, we need to evaluate it by running inference with the trained model on the A2D2 dataset.

First, we create a predictor using the model we just trained:

In [None]:
# Inference should use the config with parameters that are used in training
# cfg2 already exists and already contains everything we've set previously. We changed it a little bit for inference:
cfg2.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3   # set a custom testing threshold #default was 0.7
cfg2.MODEL.WEIGHTS = os.path.join(cfg2.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
#cfg2.DATASETS.TEST = ("fruits_nuts", ) #todo das kommt von oben...
finetuned_predictor = DefaultPredictor(cfg2)
statement_done()

Then, we randomly select several samples to visualize the prediction results.

In [None]:
output_dict_testing_selection = Audi_Dataset_Function(dict_of_testing_images_and_json, demonstration_mode = True)
for d in output_dict_testing_selection:
    im = cv2.imread(d["file_name"])
    outputs = finetuned_predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    v = Visualizer(im[:, :, ::-1], metadata=Audi_Metadata, scale=0.5)
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    output_img = out.get_image()[:, :, ::-1]
    output_img_resized = functions.resize_img(70, output_img)
    cv2_imshow(output_img_resized)
statement_done()

Let's register our testing dataset now:

In [None]:
# register the testing dataset here
e = 'testing'
DatasetCatalog.register("Custom_Audi_A2D2_Dataset_Testing", lambda e = e : Audi_Dataset_Function(dict_of_testing_images_and_json, demonstration_mode = False)) #register whole training dataset
MetadataCatalog.get("Custom_Audi_A2D2_Dataset_Testing").set(thing_classes = bb_classes) #declare class names
statement_done()

We will evaluate its performance using AP metric implemented in COCO API.
The value goes from 0...100 (instead of 0...1).

In [None]:
evaluator = COCOEvaluator("Custom_Audi_A2D2_Dataset_Testing", ("bbox", ), False, output_dir="./output/") #tasks needs to be tuple format:("bbox", "segm") etc. ; if ("bbox",) you need to have comma, otherwise it will be read as string
val_loader = build_detection_test_loader(cfg2, "Custom_Audi_A2D2_Dataset_Testing")
statement_done()

In [None]:
if (load_existing_trained_model):
  model = build_model(cfg2)  # returns a torch.nn.Module #todo is that correct? 
  #https://detectron2.readthedocs.io/en/latest/_modules/detectron2/evaluation/evaluator.html#inference_on_dataset
  #If model is an nn.Module, it will be temporarily set to `eval` mode.
  #If you wish to evaluate a model in `training` mode instead, you can wrap the given model and override its behavior of `.eval()` and `.train()`.
  #inference_on_dataset needs a argument model which is usually trainer.model;
  # but if we don't train before we don't have this
  print(inference_on_dataset(model, val_loader, evaluator))

else:
  print(inference_on_dataset(trainer.model, val_loader, evaluator))
  # another equivalent way to evaluate the model is to use `trainer.test`

statement_done()

The metrics are given by the text above. The frequency information in FPS is hidden next to : 
`Total inference pure compute time`.
For Faster R-CNN we get `Total inference pure compute time: 0:01:51 (0.126056 s / img per device, on 1 devices)`

COCOEvaluator is delivering AP values from 0 to 100. According to [COCO](https://cocodataset.org/#detection-eval), there is no difference between AP and mAP, so let's take the final AP value for our evaluation.

# 7 Bonus: Running the model on a video
Let's run our original and our trained model on an exemplary self-made video.

In [None]:
# We will process a video stored in my GitHub (it's also possible to use video from YouTube, for that see the Detectron2 Colab tutorial)
!wget https://raw.githubusercontent.com/FabianGermany/AutonomousDrivingDetectron2/main/input_data/exemplary_scene_rural_2_muted-input.mp4 -q -O exemplary_scene_rural_2_muted_input_local.mp4

statement_done()

In [None]:
#show the original video
mp4 = open('./exemplary_scene_rural_2_muted_input_local.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

display(HTML('Original video:'))
display(HTML("""
<video width=700 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url))

statement_done()

First we use the original model:

In [None]:
# Run frame-by-frame inference demo on this video with the "demo.py" tool provided by Detectron2:
!git clone https://github.com/facebookresearch/detectron2

%run detectron2/demo/demo.py --config-file detectron2/configs/{model_path} --video-input exemplary_scene_rural_2_muted_input_local.mp4 --confidence-threshold 0.6 --output exemplary_scene_rural_2_muted_output_default_local.mkv \
  --opts MODEL.WEIGHTS detectron2://\{model_config_path_short}
#if you want to use semantic segmentation, panoptic segmentation etc. just change model_path and model_config_path_short variable 
statement_done()



In [None]:
# Download the results
files.download('exemplary_scene_rural_2_muted_output_default_local.mkv')
statement_done()

Now we use our trained model:

In [None]:
#!git clone https://github.com/facebookresearch/detectron2 #todo delete this line?
%run detectron2/demo/demo.py --config-file detectron2/configs/{model_path} --video-input exemplary_scene_rural_2_muted_input_local.mp4 --confidence-threshold 0.4 --output exemplary_scene_rural_2_muted_output_trained_local.mkv \
  --opts MODEL.WEIGHTS os.path.join(cfg2.OUTPUT_DIR, "model_final.pth")
statement_done()

#--opts MODEL.WEIGHTS detectron2://\{model_config_path_short} --> .pkl --> diese sind eig. hier: "model_final.pth" denn oben: cfg2.MODEL.WEIGHTS = os.path.join(cfg2.OUTPUT_DIR, "model_final.pth")
#todo adapt threshold
#todo try to outsource: path_weights_trained = os.path.join(cfg2.OUTPUT_DIR, "model_final.pth")

In [None]:
# Download the results
files.download('exemplary_scene_rural_2_muted_output_trained_local.mkv')
statement_done()