# Sensor Fusion Exercise
Welcome to the EDTH sensor fusion exercise in which you will:
- Set up an array of two camera sensors that are 50 meters apart from each other and observe a 2D map (flat surface)
- Derive an object recognition algorithm to detect and localise a tank and a car in the map
- Implement the core logic for fusing identified objects into a common operational picture

### Setting
We use a local right-handed cartesian coordinate system for which `x` points north and `y` points east. Each camera is facing towards the objects which is configured by the
cameras bearing (or azimuth) angle with 0° bearing = facing parallel north and 90° bearing = facing parallel east.

<img src='resources/setting.png' width=25% height=25%/>

### Modules
This exercise comprises of 3 modules:
- `sensor.py` defining the camera logic for streaming images
- `object_recognition.py` defining the algorithm to process camera images, detect objects and generate a unique object identification
- `fusion_station.py` defining the central system to fuse identified objects into a shared representation

### Setup
Install required packages via [poetry](https://python-poetry.org/docs/) by executing from the source directory:
```
# Ensure Python >= 3.10 is installed, otherwise you can download with
pyenv install 3.10
pyenv global 3.10

# Install the virtual environment
poetry shell
poetry install
```


In [None]:
import logging
import math
from typing import Any

import numpy as np
from scipy.optimize import linear_sum_assignment

from sensor_fusion_exercise.fusion_station import FusionStation
from sensor_fusion_exercise.object_recognition import BoundingBox, IdentifiedObject, MockObjectRecognition
from sensor_fusion_exercise.sensor import CameraConfig, Frame, MockCamera, Sensor
from sensor_fusion_exercise.utils import OBJECT_PRIORS_WIDTH, LocationNE, Meters, SensorId

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

### 1. Sensor array
First, we place a sensor array in the map which sets 2 cameras that are 50 meters apart to each other w.r.t. east. Each one is facing a tank and car such that both are visible in the image as depicted in the drawing above.

In [None]:
camera_cfg = CameraConfig(image_width=1920, image_height=1080, fov_horizontal=40.0, fov_vertical=22.5)

sensor_1 = MockCamera(
    config=camera_cfg, asset_location_ne=LocationNE(north=0.0, east=0.0), bearing_angle=26.6, sensor_id=1
)
sensor_2 = MockCamera(
    config=camera_cfg, asset_location_ne=LocationNE(north=0.0, east=50.0), bearing_angle=0.0, sensor_id=2
)

sensor_array: list[Sensor[Frame]] = [sensor_1, sensor_2]

### 2. Object detection and localisation from a single view
Next, we initialise an object recognition algorithm by providing ground truth detections for the classes tank and car.
The detection yields a bounding box in pixel coordinates and a class name.
Given the bounding box size and the identified class, together with a known object size prior of the objects real width, we can estimate its distance from the camera.

**Note**: This is just one of many possible approaches to localise an object from image coordinates in a map. One could also use triangulation from a stereo view from 2 or more cameras to get the object distances or use monocular depth estimation neural networks.

**TODO**: Fill the missing logic of the distance estimation in the function `monocular_distance_from_object_prior` below.
**Hint**: Trigonometry is your friend for solving this part.

<img src='resources/mono_depth.png' width=25% height=25%/>


In [None]:
ground_truth_objects = {
    1: [
        IdentifiedObject(
            class_name="tank",
            bounding_box=BoundingBox(880, 500, 180, 100),
            object_location_north_east=LocationNE.null(),
        ),
        IdentifiedObject(
            class_name="car", bounding_box=BoundingBox(800, 500, 60, 100), object_location_north_east=LocationNE.null()
        ),
    ],
    2: [
        IdentifiedObject(
            class_name="car", bounding_box=BoundingBox(1200, 500, 70, 100), object_location_north_east=LocationNE.null()
        ),
        IdentifiedObject(
            class_name="tank",
            bounding_box=BoundingBox(860, 500, 180, 100),
            object_location_north_east=LocationNE.null(),
        ),
    ],
}


def monocular_distance_from_object_prior(identified_object: IdentifiedObject, frame: Frame) -> Meters:
    """
    Calculates the distance based on an object size prior and the respective bounding box size.
    Returns a distance estimate from the camera to the target object in meters.
    """
    ...
    real_object_width = OBJECT_PRIORS_WIDTH[identified_object.class_name]
    # ------------------ TODO: Fill the missing code, ~10min
    #
    #
    # ------------------


object_recognition = MockObjectRecognition(
    ground_truth_objects=ground_truth_objects, monocular_distance_estimation=monocular_distance_from_object_prior
)

### 3. Sensor Fusion
Finally, after having identified and localised the objects in each sensor we can fuse the multi-sensor representation into a single common operational picture.
This is done by matching identified objects if they are similar.
Defining the heuristic for similarity is up to the engineer (or the neural network).
For this exercise we define a distance metric and generate a cost matrix for the assignment problem.

First, let's visualise the common operational picture if we don't fuse identified objects and print the sensor object mapping from the object recognition algorithm.

In [None]:
def match_and_fuse_identified_objects(
    _sensor_object_mapping: dict[SensorId, list[IdentifiedObject]],
) -> list[IdentifiedObject]:
    raise NotImplementedError


fusion_station = FusionStation(sensor_array, object_recognition, match_and_fuse_identified_objects)
sensor_object_mapping = fusion_station.execute_without_fusion()

logger.info("Sensor object mapping: \n")
for items in sensor_object_mapping.items():
    logger.info(f"Sensor ID: {items[0]} with identified objects: {items[1]} \n")

**TODO**: Next, let's fuse the identified objects into a common representation and visualise the common operational picture again. As a simplification for this exercise we know that both car and tank
can be seen by each camera and hence, a 2-to-2 assignment exists.

**Hint**: you can use [scipy's bipartite-graph matching](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html) after defining
a 2x2 cost matrix to solve the assignment problem.

In [None]:
def match_and_fuse_identified_objects(
    sensor_object_mapping: dict[SensorId, list[IdentifiedObject]],
) -> list[IdentifiedObject]:
    """
    Matches identified objects of first sensor against the identified objects of the second sensor.
    First, a distance metric is defined to derive the similarity/dissimilarity of object pairs.
    Based on the distance metrics, one can find an optimal assignment between object pairs to
    output a list of fused identified objects.
    """

    def distance_metric(object1: IdentifiedObject, object2: IdentifiedObject) -> Any: ...

    ...
    # ------------------ TODO: Fill the missing code, ~20min
    #
    #
    # ------------------


fusion_station = FusionStation(sensor_array, object_recognition, match_and_fuse_identified_objects)
fusion_station.execute_with_fusion()