# Object Detection Example (Image-based) with Thetis

Thetis can evaluate the AI safety of modern, image-based object detector models.
In this example, we utilize a [Faster R-CNN model by Torchvision](https://pytorch.org/vision/main/models/faster_rcnn.html) in conjunction
with a custom demo data set ([download here](https://thetishostedfiles.blob.core.windows.net/demofiles/thetis_demo_detection.zip)) to demonstrate the evaluation process for
object detectors. The instructions below should be easy do adapt to your own use-case.

## Set Up the Environment
In a first step, you need to install Thetis by using pip. We also use the TQDM Library to draw nice progress bars:

```shell
$ pip install thetis tqdm
```

Next, you need to obtain a license in order to use Thetis.


For the current example, you can use the *demo license* located within the same directory as this notebook.
This license only works for our demonstration data set with the exact configuration provided in this notebook.
Use the license file [demo_license_detection.dat](https://raw.githubusercontent.com/EFS-OpenSource/Thetis/main/examples/demo_license_detection.dat).

A customized *full license*, enabling you to run Thetis with your own data sets and settings, is available at our [Subscription Page](https://efs-opensource.github.io/Thetis/subscription.html).

Place the license file either in the working directory of your application or at:

- Windows: `<User>/AppData/Local/Thetis/license.dat`
- Unix: `~/.local/thetis/license.dat`

## Install PyTorch
To follow through this example, you need an installation of PyTorch. Please follow the installation instructions at [PyTorch Homepage](https://pytorch.org/).

You won't need to train a model from scratch in this example. We only adapt a model for inference.

## Increase Logging Verbosity

For detailed runtime information about Thetis, run the following cell to add a logging handler to the Thetis logger to increase verbosity of the application.

In [None]:
import logging
import sys

# Configure root logger as catch-all logging config
logger = logging.getLogger("Thetis")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logger.addHandler(handler)

## Run Inference with the PyTorch Object Detector

First, we need to load and initialize our model, the [Faster R-CNN by Torchvision](https://pytorch.org/vision/main/models/faster_rcnn.html).
Note that the model is pre-trained on the MS COCO data set with several categories. In our example, we only
work with the categories "person", "bicycle", and "car".

*Note:* If your machine is behind a proxy, you likely need to configure your environment so that the following code
can download the pre-trained model and data set. Set the `HTTP_PROXY` and `HTTPS_PROXY` environment variables to point
to your proxy server and port before launching python or jupyter.

In [None]:
import numpy as np
from torchvision.io import read_image, ImageReadMode
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights

# initialize object detection model from torchvision model zoo
weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
model = fasterrcnn_resnet50_fpn_v2(weights=weights)
model.eval()

# retrieve necessary image transformations (e.g., normalization, etc.) and available categories
preprocess = weights.transforms()
categories = np.array(weights.meta["categories"])

In the next step, download and extract
the [Demo Detection Data Set](https://thetishostedfiles.blob.core.windows.net/demofiles/thetis_demo_detection.zip) which is artificially generated using
the [Carla simulation engine](https://carla.org/).

In [None]:
import urllib.request
from zipfile import ZipFile

# download demo detection data set
urllib.request.urlretrieve("https://thetishostedfiles.blob.core.windows.net/demofiles/thetis_demo_detection.zip", "thetis_demo_detection.zip")

# unzip data set to local disk
with ZipFile("thetis_demo_detection.zip", 'r') as zip_ref:
    zip_ref.extractall("demo_detection")

After download and extraction, we can load the JSON annotation
files and run inference with the Torchvision model:

In [None]:
import os
from glob import glob
from tqdm import tqdm
import json
import torch

# set the inference device - if you have a CUDA device, you can set it to "cuda:<idx>"
device = "cpu"
# device = "cuda:0"

model.to(device)

# get a list of all JSON files
annotation_files = glob(os.path.join("demo_detection", "annotations", "*.json"))
data = []

# iterate over all JSON files and retrieve annotations
for filename in tqdm(annotation_files, desc="Running inference on images ..."):
    with open(filename, "r") as open_file:
        anns = json.load(open_file)

    # load respective image, run preprocessing (transformation) and finally run inference
    img = read_image(os.path.join("demo_detection", "img", anns["image_file"]), ImageReadMode.RGB)
    img = [preprocess(img).to(device=device)]

    # make inference and copy back to CPU (if CUDA device has been used for inference)
    with torch.no_grad():
        pred = model(img)[0]
        pred = {k: v.cpu() for k, v in pred.items()}

    # store predicted and target data for current frame
    data.append((pred, anns))

## Expected Data Format for Object Detection

After loading the ground-truth information and running inference using an AI model (see example above),
we must format our predictions and annotations in a way that can be ingested by Thetis. In object detection evaluation mode,
Thetis expects a Python dictionary for the predictions and annotations, where the keys represent the image identifiers
(e.g., image name) and the values represent the individual (predicted or ground-truth) objects within a single frame.

*Important*: The dictionary for the ground-truth annotations requires a key `__meta__` which holds width and height
information for each image within the data set, provided as Pandas DataFrame. Note that the index of the entries within
this DataFrame must match with the keys (i.e. image identifiers) of the Python dictionaries.

In [None]:
import pandas as pd

# Thetis expects a dictionary with image name as key and a pd.DataFrame with predicted information as value.
# A similar format is also expected for the ground-truth annotations with additional sensitive attributes
# used for fairness evaluation. The field "__meta__" is always required with meta information for each frame.
annotations = {"__meta__": pd.DataFrame(columns=["width", "height"])}
predictions = {}

# iterate over all frames with predicted and target information
for pred, anns in data:

    # retrieve predicted labels, bounding boxes, and filter predictions by label
    predicted_labels = categories[pred["labels"].numpy()]
    predicted_boxes = pred["boxes"].numpy().reshape((-1, 4))
    target_boxes = np.array(anns["boxes"]).reshape((-1, 4))
    filter = np.isin(predicted_labels, ["person", "bicycle", "car"])
    filename = anns["image_file"]

    # add predicted information as pd.DataFrame
    predictions[filename] = pd.DataFrame.from_dict({
        "labels": predicted_labels[filter],
        "confidence": pred["scores"].numpy()[filter],
        "xmin": predicted_boxes[:, 0][filter],
        "ymin": predicted_boxes[:, 1][filter],
        "xmax": predicted_boxes[:, 2][filter],
        "ymax": predicted_boxes[:, 3][filter],
    })

    # add ground-truth information also as pd.DataFrame with additional sensitive attributes
    annotations[filename] = pd.DataFrame.from_dict({
        "target": anns["classes"],
        "gender": anns["gender"],
        "age": anns["age"],
        "xmin": target_boxes[:, 0],
        "ymin": target_boxes[:, 1],
        "xmax": target_boxes[:, 2],
        "ymax": target_boxes[:, 3],
    })

    # some additional meta information such as image width and height are also required
    annotations["__meta__"].loc[filename] = [anns["image_width"], anns["image_height"]]

## Optional: Store and Load Data Set to/from Disk

The inference routine as well as the data preparation process might be time-consuming. To store the prepared data set on disk, you can utilize the helper function `write_json_with_pandas`. You can re-load the data set from disk using the `read_json_with_pandas` function:

In [None]:
from thetiscore.io import write_json_with_pandas, read_json_with_pandas

# write the JSON-like Python dictionary to disk
write_json_with_pandas(
    json_like=predictions,
    filename="carla_predictions.json",
)

write_json_with_pandas(
    json_like=annotations,
    filename="carla_annotations.json",
)

# load the dictionaries from disk
load_predictions = read_json_with_pandas("carla_predictions.json")
load_annotations = read_json_with_pandas("carla_annotations.json")

## Run AI Safety Evaluation with Thetis

Given your data is in the right format, simply call Thetis with the predictions, the ground-truth information and the prepared configuration file:

In [None]:
from thetiscore import thetis


result = thetis(
   config="demo_config_detection.yaml",
   annotations=annotations,
   predictions=predictions,
   output_dir="./output",
   license_file_path="demo_license_detection.dat"
)

For details of Thetis configuration, see section [Configuration](https://efs-opensource.github.io/Thetis/configuration.html).
For the current example, you can download the [demo configuration file](https://raw.githubusercontent.com/EFS-OpenSource/Thetis/main/examples/demo_config_detection.yaml)
from this repository or [click here](https://thetishostedfiles.blob.core.windows.net/demofiles/thetis_demo_detection.zip).

Thetis returns its findings, the final rating and recommendations for mitigation strategies as a JSON-like dictionary. We capture the dictionary as `result` and can access the different evaluation aspects:

* `result[<task>]['rating_score']` for the rating score of the selected task (e.g., 'fairness' or 'uncertainty').
* `result[<task>]['recommendations']` for the recommendations to mitigate possible issues of the selected task.
* `result[<task>]['rating_enum']` for a categorization of the actual aspect into `'GOOD'`, `'MEDIUM'`,
  or `'BAD'` depending on the rating score.

Note that the remaining evaluation metrics are grouped by the specified IoU scores which are used for the matching
of predicted objects with ground-truth ones (e.g., an IoU score of 0.5 might be used to decide if a prediction
has matched an existing ground-truth object or not). In the configuration file, you can specify multiple IoU scores
that are taken into account for the final evaluation process.