# Object detection example with Thetis

Thetis can evaluate AI systems that perform image-based object detection tasks.
In this example, we utilize a [Faster R-CNN model by Torchvision](https://pytorch.org/vision/main/models/faster_rcnn.html) in conjunction
with a custom demo dataset ([download here](https://thetishostedfiles.blob.core.windows.net/demofiles/thetis_demo_detection.zip)) to demonstrate the evaluation process for object detectors. The instructions below should be easy do adapt to your own use-case.

## Set up the environment

If you haven't done so already, install Thetis using pip. We also use the TQDM library to draw nice progress bars:

```shell
$ pip install thetis tqdm
```

For this example, you can use the demo license located within the same directory as this notebook.
This license only works for our demonstration dataset with the exact configuration provided in this notebook.
Use the license file [demo_license_classification.dat](https://raw.githubusercontent.com/EFS-OpenSource/Thetis/main/examples/demo_license_classification.dat).

Place the license file either in the working directory of your application or at:

- Windows: `<User>/AppData/Local/Thetis/license.dat`
- Unix: `~/.local/thetis/license.dat`

## Install PyTorch

To follow through this example, you need an installation of PyTorch. Please follow the installation instructions at [PyTorch Homepage](https://pytorch.org/).

You won't need to train a model from scratch in this example. We only adapt a model for inference.

## Increase logging verbosity

To obtain detailed runtime information about Thetis, run the following cell. This will add a logging handler to the Thetis logger, increasing the application's verbosity.

In [None]:
import logging
import sys

# Configure root logger as catch-all logging config
logger = logging.getLogger("Thetis")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logger.addHandler(handler)

## Run inference with the PyTorch object detector

First, we need to load and initialize our model, the [Faster R-CNN by Torchvision](https://pytorch.org/vision/main/models/faster_rcnn.html).
Note that the model is pre-trained on the MS COCO dataset with several categories. In our example, we only
work with the categories "person" and "car".

*Note:* If your machine is behind a proxy, you likely need to configure your environment so that the following code
can download the pre-trained model and dataset. Set the `HTTP_PROXY` and `HTTPS_PROXY` environment variables to point
to your proxy server and port before launching Python or Jupyter.

In [None]:
import numpy as np
from torchvision.io import read_image, ImageReadMode
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights

# initialize object detection model from torchvision model zoo
weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
model = fasterrcnn_resnet50_fpn_v2(weights=weights)
model.eval()

# retrieve necessary image transformations (e.g., normalization, etc.) and available categories
preprocess = weights.transforms()
categories = np.array(weights.meta["categories"])

In the next step, download and extract our [demo detection dataset](https://thetishostedfiles.blob.core.windows.net/demofiles/thetis_demo_detection.zip)
which we artificially generated using the [Carla simulation engine](https://carla.org/).

In [None]:
import urllib.request
from zipfile import ZipFile

# download demo detection dataset
urllib.request.urlretrieve("https://thetishostedfiles.blob.core.windows.net/demofiles/thetis_demo_detection.zip", "thetis_demo_detection.zip")

# unzip dataset to local disk
with ZipFile("thetis_demo_detection.zip", 'r') as zip_ref:
    zip_ref.extractall("demo_detection")

After download and extraction, we can load the JSON annotation
files and run inference with the Torchvision model:

In [None]:
import os
from glob import glob
from tqdm import tqdm
import json
import torch

# set the inference device - if you have a CUDA device, you can set it to "cuda:<idx>"
device = "cpu"
# device = "cuda:0"

model.to(device)

# get a list of all JSON files
annotation_files = glob(os.path.join("demo_detection", "annotations", "*.json"))
data = []

# iterate over all JSON files and retrieve annotations
for filename in tqdm(annotation_files, desc="Running inference on images ..."):
    with open(filename, "r") as open_file:
        anns = json.load(open_file)

    # load respective image, run preprocessing (transformation) and finally run inference
    img = read_image(os.path.join("demo_detection", "img", anns["image_file"]), ImageReadMode.RGB)
    img = [preprocess(img).to(device=device)]

    # make inference and copy back to CPU (if CUDA device has been used for inference)
    with torch.no_grad():
        pred = model(img)[0]
        pred = {k: v.cpu() for k, v in pred.items()}

    # store predicted and target data for current frame
    data.append((pred, anns))

## Expected data format for object detection

After loading the ground truth data and running inference using an AI model (see the example above), you need to format your predictions and annotations to be compatible with Thetis. In object detection evaluation mode, Thetis expects predictions and annotations to be provided as Python dictionaries. In these dictionaries, the keys should represent the image identifiers (e.g. image names), and the values should represent the individual objects (predicted or ground truth) within each frame.

### Include image metadata

The dictionary for ground truth annotations must include a key `__meta__`, which contains the width and height information for each image in the dataset. This information should be provided as a Pandas DataFrame. The index of this DataFrame must match the keys (image identifiers) of the Python dictionaries used for the ground truth annotations and predictions.

In [None]:
import pandas as pd

# Thetis expects a dictionary with image name as key and a pd.DataFrame with predicted information as value.
# A similar format is also expected for the ground truth annotations with additional sensitive attributes
# used for fairness evaluation. The field "__meta__" is always required with meta information for each frame.
annotations = {"__meta__": pd.DataFrame(columns=["width", "height"])}
predictions = {}

# iterate over all frames with predicted and target information
for pred, anns in data:

    # retrieve predicted labels, bounding boxes, and filter predictions by label
    predicted_labels = categories[pred["labels"].numpy()]
    predicted_boxes = pred["boxes"].numpy().reshape((-1, 4))
    target_boxes = np.array(anns["boxes"]).reshape((-1, 4))
    filter = np.isin(predicted_labels, ["person", "car"])
    filename = anns["image_file"]

    # add predicted information as pd.DataFrame
    predictions[filename] = pd.DataFrame.from_dict({
        "labels": predicted_labels[filter],
        "confidence": pred["scores"].numpy()[filter],
        "xmin": predicted_boxes[:, 0][filter],
        "ymin": predicted_boxes[:, 1][filter],
        "xmax": predicted_boxes[:, 2][filter],
        "ymax": predicted_boxes[:, 3][filter],
    })

    # add ground truth information also as pd.DataFrame with additional sensitive attributes
    annotations[filename] = pd.DataFrame.from_dict({
        "target": anns["classes"],
        "gender": anns["gender"],
        "age": anns["age"],
        "xmin": target_boxes[:, 0],
        "ymin": target_boxes[:, 1],
        "xmax": target_boxes[:, 2],
        "ymax": target_boxes[:, 3],
    })

    # some additional meta information such as image width and height are also required
    annotations["__meta__"].loc[filename] = [anns["image_width"], anns["image_height"]]

## Optional: store and load dataset to/from disk

The inference routine and data preparation process can be time-consuming. To store the prepared dataset on disk, you can use our helper function `write_json_with_pandas`. You can reload the dataset from disk using the `read_json_with_pandas` function:

In [None]:
from thetis.io import write_json_with_pandas, read_json_with_pandas

# write the JSON-like Python dictionary to disk
write_json_with_pandas(
    json_like=predictions,
    filename="carla_predictions.json",
)

write_json_with_pandas(
    json_like=annotations,
    filename="carla_annotations.json",
)

# load the dictionaries from disk
load_predictions = read_json_with_pandas("carla_predictions.json")
load_annotations = read_json_with_pandas("carla_annotations.json")

## Run Thetis to analyze and evaluate the AI system

Once your data is in the correct format, you can call Thetis with the predictions, ground truth information, and the prepared configuration file:

In [None]:
from thetis import thetis


result = thetis(
   config="demo_config_detection.yaml",
   annotations=annotations,
   predictions=predictions,
   output_dir="./output",
   license_file_path="demo_license_detection.dat"
)

You can download the [demo configuration file](https://raw.githubusercontent.com/EFS-OpenSource/Thetis/main/examples/demo_config_detection.yaml) for this example from our repository. For detailed information on Thetis configuration, refer to the [Configuration](https://efs-opensource.github.io/Thetis/configuration.html) section.

In addition to generating the report in PDF format, which we display below, Thetis also returns its findings, final rating, and recommendations for mitigation strategies as a JSON-like dictionary. We capture this dictionary as `result` and access it as follows:

* `result[<aspect>]` contains a sub-dictionary with results for each aspect of the analysis, e.g. 'fairness' or 'uncertainty'.
* `result[<aspect>]['rating_score']` contains the rating as a score from 0 to 10.
* `result[<aspect>]['rating_enum']` contains the rating as a grade, which can be `'GOOD'`, `'MEDIUM'`, or `'BAD'`, depending on the rating score.
* `result[<aspect>]['recommendations']` contains findings regarding possible issues and recommendations for mitigation.

Note that the remaining evaluation metrics are grouped by the specified IoU thresholds, which are used to establish correspondences between predicted and ground truth objects. For example, an IoU threshold of 0.5 might be used to decide if the bounding box of a predicted object has sufficient overlap with the bounding box of a ground truth object to count the prediction as referring to that ground truth object. In the configuration file, you can specify multiple IoU thresholds that will be used for comparison in the final evaluation process.

In [None]:
# show the PDF report within the current Jupyter notebook
from IPython.display import IFrame

IFrame("./output/report.pdf", width=800, height=1024)