# Assignment 02a - 3D Pedestrian Detection based on a Single Camera


## Goals
The goal of this assignment is to obtain **pedestrian locations in the 3D world (camera reference frame)** on the dataset sequence you've been working on in the practica and in [`01a data visualization`](fa_01a_data_visualization.ipynb).

You should present a solution which works with a **single camera only**.
You are not allowed to make any use of the pedestrian ground truth in the test sequence to improve your detector (e.g. size, appearance, location of pedestrians).
Your detector should in principle work equally well on other unseen test sequences (i.e. be generalizable).


Your approach should work in the following conditions:
- minimum distance of pedestrian to camera: 5 m
- maximum distance of pedestrian to camera: 40 m
- minimum pedestrian height: 1.2 m
- maximum pedestrian height: 1.9 m

All notes mentioned in [`00 overview`](fa_00_overview.ipynb) apply, such as the policy on code reuse.

## Input
As in [`01a data visualization`](fa_01a_data_visualization.ipynb), you will work with the custom `Sequence` of `Dataset` with `start_index=1430` and `end_index=1545`.

## Output
- Plots, visualizations and videos within this notebook (but please clear the outputs before handing in the ipynb files)
- Answers to the questions within this notebook
- Evaluation metrics and plots representing the performance of your approach
- A dictionary called `frame_pedestrian_dicts` of the format:

```python
{
1430: [
  {'label_class': 'Pedestrian',
   "extent_object": array([0.41, 0.55, 1.72]),
   'T_cam_object': array([[-0.013857  , -0.9997468 ,  0.01772762,  6.00276488],
          [ 0.10934269, -0.01913807, -0.99381983,  2.63204144],
          [ 0.99390751, -0.01183297,  0.1095802 , 10.77843551],
          [ 0.        ,  0.        ,  0.        ,  1.        ]]),
   'score': 0.851},
  {'label_class': 'Pedestrian',
   "extent_object": array([0.5, 0.5, 1.7]),
   'T_cam_object': array([[-1.38569996e-02, -9.99746799e-01,  1.77276209e-02,
           -1.47727574e+01],
          [ 1.09342687e-01, -1.91380698e-02, -9.93819833e-01,
            1.06022264e+01],
          [ 9.93907511e-01, -1.18329702e-02,  1.09580196e-01,
            2.95320115e+01],
          [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
            1.00000000e+00]]),
   'score': 0.56},
    ...
   ],
1431: [
  {'label_class': 'Pedestrian',
   "extent_object": array([0.41, 0.55, 1.72]),
   'T_cam_object': array([[-0.013857  , -0.9997468 ,  0.01772762,  6.00276488],
          [ 0.10934269, -0.01913807, -0.99381983,  2.63204144],
          [ 0.99390751, -0.01183297,  0.1095802 , 10.77843551],
          [ 0.        ,  0.        ,  0.        ,  1.        ]]),
   'score': 0.851},
    ...
   ],
}
```
The keys represent the frame_index of the `Measurements` within the `Sequence`, each value is a list of `pedestrian_dict`s which resemble the format of the annotated 3D object labels (cf. `Measurements.get_labels_camera()`):
- `label_class`: should be `'Pedestrian'` for all objects
- `extent_object` describes the object extent in `[length, width, height]` (along object's `[x, y, z]` axes, respectively). Cf. Practicum 1.
- `T_cam_object` describes the transformation from the `object` local frame (center bottom of object) into the camera frame `cam` as a homogeneous transformation matrix. The orientation of the object (given by the top-left 3x3 sub-matrix of `T_cam_object` does not matter. You can use the identity (`np.eye(3)`). The submatrix `T_cam_object[0:3, 3]` describes the bottom center point of the object in camera frame.
- `score`: a floating point number between 0.0 and 1.0 representing the objectness of the object, i.e. the probability of being a pedestrian (as opposed to background).

You can use other custom keys with further information in case you want to use it in a later notebook, but it should be json serializable (i.e., convert numpy arrays to python lists, first). 

Please use
```python
from assignment.solution_helpers import save_frame_pedestrian_dicts
save_frame_pedestrian_dicts(frame_pedestrian_dicts)
```
to serialize `frame_pedestrian_dicts` to file (to be potentially loaded in a successive assignment notebook).

### Q 02a.1 Specification of intended solution
In this notebook, you should implement an approach that could potentially run within a vehicle which is only equipped with a **single (mono) camera**, i.e. without LiDAR, radar or stereo/disparity information.

Please describe what you will implement to achieve the goal of obtaining 3D pedestrian locations via a single camera.
Here are some questions to stimulate the specification of your solution:
1. What processing steps and building blocks are needed?
2. Which concepts from the lectures are you planning to incorporate?
3. Which building blocks from the practica and assignment 01a will you be re-using?
4. Which intermediate results can you represent in plots/visualizations/tables to show the graders that your intended solution does the right thing?

### A 02a.1
**Your answer:** (maximum 400 words)
1. First of all, the detection phase must be faced. Since there is no 3D detection aid for this problem, a more primitive approach will be used. In particular, the information about the ground plane will be taken into accountto produce smarter proposal boxes with respect to a simple sliding window approach, especially in terms of size. After the image has been split in different smaller patches, the classification phase begins. As for the image patch classification, since only the given sequence can be used, and that should also be our test set, the most reasonable solution consists in using a pretrained feature extractore and classifier. After that, the positive detections will to be "brought back" to the 3D environment in order to store the 3D position of the pedestrians.
2. Since we are working with a single camera only, all the knowledge regarding the monocular vision will be fundamental, including the perspective camera model and the two types of transformations. Obviously, also all the information regarding the techniques for region proposal generation will be useful, especially the part about proposals without 3D knowledge (except for the ground plane). Unfortunately, for the classifier part, not much knowledge can be used, due to the lack of a training test.
3. Practicum 1 has a very similar intent compared to this assignment, therefore a great part of procedure steps and code from that assignment will be extremely useful. In particular the two functions `project_points` and `BoundingBox` from `practicum1` will be fundamental. Other than those, also the function `get_plane_meshgrid` from `fa_01a_data_visualization` will be used to create an initial meshgrid for the proposal boxes.
4. The plots and visualizations I intend to share are mostly k3d plots and images, to show the working in the 3D and 2D space respectively. Since the first part concerns a creation of possible boxes in 3D world, a k3d plot will be shown, followed by its results on the 2D world, that will all be represented by images. Finally, some considerations about the last part will again be represented as a k3d plot since they belong to the 3D environment. 

YOUR ANSWER HERE

### Q 02a.2 Proactive reflection
1. Which assumptions does your intended solution make?
2. In which situations might your intended solution fail?

### A 02a.2
**Your answer:** (maximum 200 words)
1. The main assumption on which my solution is based on is that the proposal boxes fit decently well the pedestrians. Since there is no additional aid from lidars or radars, in order to build the detection boxes there must be some sort of systematic discrete method that will obviously not be able to perfectly fit the pedestrians every time. Therefore the assumption is that the information provided at the end are enough for the scope of this problem.
2. Connecting to the previous answer, my solution is not efficient if the information in the final dictionary needs to be very precise both in terms of position and of dimension of the object. In fact the dimension will be kept at a standard since there is no way to understanding automatically if the detection box underfits or overfits the pedestrian. The position is probably slightly more precise, but once again is influenced by the box that will be defined as "best fitting".

YOUR ANSWER HERE

# now: HAVE FUN & HAPPY CODING!

In [None]:
# some magic to ease iterative implementation
from IPython import get_ipython

ipython = get_ipython()
if ipython:
    ipython.magic("load_ext autoreload")
    ipython.magic("autoreload 2")

If you want to import functions, e.g., from `fa_01a_data_visualization.ipynb` notebook, you can import them as shown below.
If you run into any errors when importing via `ipynb`, please check the hints within [`00 Overview`](fa_00_overview.ipynb).

In [None]:
from ipynb.fs.defs.fa_01a_data_visualization import get_bounding_box_from_object

As stated above, you should provide a single-camera only solution.
So technically speaking, your approach **should not use the following methods** from the `Measurements` object:

In [None]:
from sequence_loader import Measurements

forbidden_method_tokens = {"radar", "lidar", "right", "disparity"}
[method_name for method_name in dir(Measurements) if any(token in method_name for token in forbidden_method_tokens)]

## PedestrianDetector class
For your convenience, we created a class `PedestrianDetector` you should use as a base-class for your implementation.

A `PedestrianDetector` encapsulates the functionality of obtaining 3D pedestrian locations in camera frame from a `Measurements` object, i.e. the data of sensors for a single point in time.
When instantiating with `is_debug=True`, it fills the `DebugOutputAggregator` (`self.doa`) with debug outputs, such as visualizations to be retrieved and visualized after detection.
This becomes handy for demonstrating intermediate steps of your approach. Setting `is_debug=False` allows to skip the time-consuming visualizations and allows for faster execution.
After instantiating an object of the class, the intended steps are feeding a `Measurements` object with `set_measurements()` and obtaining a list of `pedestrian_dict`s by calling `get_pedestrian_dicts()`.

Please get familiar with the `PedestrianDetector` class below. And don't worry: we'll show you an example further down.

In [None]:
import numpy as np
from assignment.solution_helpers import DebugOutputAggregator
from common.sequence_loader import Measurements


class PedestrianDetector:
    """
    Pedestrian Detector class.

    Make a subclass of me which fulfills my interface.

    """

    def __init__(self, is_debug=False):
        """
        is_debug: whether to add debug output (plots, figures, images, strings) to self.doa
        """
        self.is_debug = is_debug
        self.doa = DebugOutputAggregator()
        self.measurements = None

    def set_measurements(self, measurements: Measurements):
        """
        measurements: Measurements object of the frame to detect pedestrians in.
        """
        self.measurements = measurements

        # clear debug output aggregator
        self.doa = DebugOutputAggregator()

    def get_pedestrian_dicts(self):
        """
        Perform 3D pedestrian detection on the the current Measurements.

        return list of pedestrian_dicts
        """
        pass

## PedestrianDetector example subclass
To show you how to create a subclass of the `PedestrianDetector`, we provide you with the following example below. Please also note how the `DebugOutputAggregator` is being used to add visualizations for debugging and grading (such as k3d plots, Matplotlib figures, images or strings).
Please note that visualizations are only added to `self.doa` (`doa`: short for `DebugOutputAggregator`), if `self.is_debug` is set to `True`.
You should do the same with your implementation.

In [None]:
import cv2
import k3d
import matplotlib.pyplot as plt
from common.k3d_helpers import plot_axes, plot_box


class OraclePedestrianDetector(PedestrianDetector):
    """
    Oracle Pedestrian Detector as a dummy implementation.

    This class demonstrates how to make a subclass of PedestrianDetector and use
    the DebugOutputAggregator to show intermediate results, if is_debug is set to True.
    """

    def __init__(self, is_debug):
        # propagate is_debug flag to mama.
        super().__init__(is_debug)

    def get_pedestrian_dicts(self):
        """
        This function returns the ground truth pedestrian dicts as detections.

        Returning ground truth as detections is kinda cheating, but we do it for demonstration purposes.
        It also shows you that detections and ground truth labels should have the same representation
        to allow for reuse of visualization functions
        """
        # get ground truth labels
        label_dicts = self.measurements.get_labels_camera()
        # filter pedestrians
        pedestrian_dicts = [ld for ld in label_dicts if ld["label_class"] == "Pedestrian"]

        # remove bottom_center_cam to be json serializable later on
        for pedestrian_dict in pedestrian_dicts:
            del pedestrian_dict["bottom_center_cam"]

        # the following lines demonstrate the usage of DebugOutputAggregator to visualize intermediate results
        # as strings, k3d plots, matplotlib figures or images

        if self.is_debug:
            # add example string
            self.doa.add_string(
                **{
                    "name": "description-of-approach",
                    "description": "Description of the approach of OraclePedestrianDetector",
                    "string": "The OraclePedestrianDetector publishes the ground truth pedestrians as detections.\n"
                    "We make use of the fact that ground truth labels and detections are represented in the same format.\n"
                    "Using ground truth objects as detections within *your* solution does not earn you any points.",
                }
            )

        if self.is_debug:
            # add k3d plot with ground truth objects
            plot = k3d.plot(camera_auto_fit=False)
            plot.camera = [0.99, -7.18, -6.40, 0.19, -1.16, 4.17, -0.06, -0.89, 0.44]
            plot += plot_axes()  # camera frame
            # origin frame of k3d plot is the camera fram, so use identity to transform labels
            T_origin_camera = np.eye(4, dtype=np.float32)
            color_red = 0xFF0000
            for pedestrian_dict in pedestrian_dicts:
                plot_box(plot, pedestrian_dict, T_origin_camera=T_origin_camera, color=color_red)
            # add the k3d plot to debug output aggregator
            self.doa.add_k3d_plot(
                **{
                    "name": "ground-truth-pedestrian-dicts",
                    "description": "A plot showing ground truth pedestrian boxes in red.\n"
                    f"There are {len(pedestrian_dicts)} ground truth objects in the scene.\n"
                    "The origin of the plot is the camera frame.",
                    "plot": plot,
                }
            )

        if self.is_debug:
            # add bar plot of pedestrian heights
            fig, ax = plt.subplots()
            extents = np.asarray([pedestrian_dict["extent_object"] for pedestrian_dict in pedestrian_dicts])
            ax.bar(range(extents.shape[0]), extents[:, 2])
            ax.set_xlabel("ground truth pedestrian id")
            ax.set_ylabel("height of pedestrian [m]")
            ax.set_xticks(range(extents.shape[0]))
            plt.close(fig)
            self.doa.add_matplotlib_figure(
                **{
                    "name": "bar-chart-of-pedestrian_heights",
                    "description": "Heights of each pedestrian ground truth object in the scene",
                    "figure": fig,
                }
            )

        if self.is_debug:
            # add example image
            image_draw = self.measurements.get_camera_image()
            uv_image_center = (image_draw.shape[1] // 2, image_draw.shape[0] // 2)
            cv2.circle(image_draw, uv_image_center, radius=20, color=(0, 255, 255), thickness=2)
            self.doa.add_image(
                **{
                    "name": "image-with-circle",
                    "description": "An example image showing how to add images to the DebugOutputAggregator.\n"
                    "It contains a yellow circle in the image center.",
                    "image": image_draw,
                }
            )

        # we make use of the fact that labels and detections are to be represented in the same way
        return pedestrian_dicts

Now let's instantiate an `OraclePedestrianDetector` and run it on a single Measurements object to obtain `pedestrian_dicts` and look at the debug output afterwards.

In [None]:
# run OraclePedestrianDetector on a single frame
from common.sequence_loader import Dataset

dataset = Dataset()
start_index = 1430
end_index = 1545
sequence = dataset.get_custom_sequence(start_index, end_index)

measurements = next(iter(sequence))
pedestrian_detector = OraclePedestrianDetector(is_debug=True)
pedestrian_detector.set_measurements(measurements)
pedestrian_dicts = pedestrian_detector.get_pedestrian_dicts()
pedestrian_dicts

In [None]:
# iterate over all DetectionOutputAggregator objects and show them accordingly below this cell
# this is the way we will look at your intermediate results as well
# it is important to us, that you create sufficient intermediate results
# and also use verbose descriptions of the debug outputs
# (as you would use for captions of figures in scientific papers)
#
# you can toggle scrolling of the output by selecting this cell and 'Cell' > 'Current Outputs' > 'Toggle Scrolling'
[None for i in iter(pedestrian_detector.doa)]

## Your own PedestrianDetector
Now it's time to create your own subclass of `PedestrianDetector` and detect pedestrians in 3D in the camera frame.
Please provide visualizations of a few intermediate steps in order to obtain partial credit for concepts/implementation and to show the graders that your approach provides the intended functionality.

We recommend to start with a simple approach and iteratively improve it based on the experience you gain along the line.

You can reuse/copy code and files from the practica from your own group.
If you copy code, please mark it as such, e.g. by adding a comment above denoting where you copied it from.
If you load files, you have to use the environment variable `SOURCE_DIR`, e.g., `os.path.join(os.environ["SOURCE_DIR"], "practicum1", "file_name")`.



In [None]:
# Create your subclass of PedestrianDetector here
# Then instantiate it to an object called `pedestrian_detector`
# and feed it with a single measurement of the provided sequence

from common.sequence_loader import Dataset

from ipynb.fs.defs.fa_01a_data_visualization import get_plane_meshgrid
from ipynb.fs.defs.practicum1 import project_points
from ipynb.fs.defs.practicum1 import BoundingBox
from common.visualization import draw_bbox_to_image
from common.visualization import showimage
from ImagePatch import ImagePatch
from BoundingBox import clip_bbox_to_image
import tensorflow as tf
import os
from preprocessing_fns import preprocessing_fn_mobilenet
from tensorflow.image import non_max_suppression

is_debug = True

# class MyFancyPedestrianDetector(PedestrianDetector):
#     ...
#
# pedestrian_detector = MyFancyPedestrianDetector(is_debug=is_debug)

# YOUR CODE HERE

class MyFancyPedestrianDetector(PedestrianDetector):
    
    # Copied the init function from the example above
    def __init__(self, is_debug):
        super().__init__(is_debug)
        
    # Define a variant of get_plane_meshgrid from fa_01a_data_visualization that deals with ground_plane in camera frame
    def get_plane_meshgrid_cf(plane_model, x_min, x_max, z_min, z_max, x_step, z_step):
        a, b, c, d = plane_model
        x = np.arange(x_min, x_max + x_step, x_step)
        z = np.arange(z_min, z_max + z_step, z_step)
        Xs, Zs = np.meshgrid(x, z)
        Ys = (a * Xs + c * Zs + d) / (- b)
        return Xs, Ys, Zs
    
    # Filter the points of the meshgrid so to keep only those who fit in the camera image
    def filter_points(image, points, projection_matrix):
        # Augment points dimension and project them in 2D
        points_aug = np.hstack((points, np.ones((points.shape[0], 1))))
        uvs = project_points(projection_matrix, points_aug)
        # Filter the points whose 2D projection lands outside the camera image
        points_reduced = points[uvs[:, 0] >= 0]
        uvs_reduced = uvs[uvs[:, 0] >= 0]
        points_reduced = points_reduced[uvs_reduced[:, 0] < image.shape[1]]
        uvs_reduced = uvs_reduced[uvs_reduced[:, 0] < image.shape[1]]
        points_reduced = points_reduced[uvs_reduced[:, 1] >= 0]
        uvs_reduced = uvs_reduced[uvs_reduced[:, 1] >= 0]
        points_reduced = points_reduced[uvs_reduced[:, 1] < image.shape[0]]
        uvs_reduced = uvs_reduced[uvs_reduced[:, 1] < image.shape[0]]
        return points_reduced
    
    # Define a function that calculates the corners of the bounding box in 3D environment
    def corners_from_grid(grid_points, height, width):
        corners = []
        for i in range(len(grid_points)):
            corners.append([grid_points[i, 0] - width/2, grid_points[i, 1], grid_points[i, 2]])
            corners.append([grid_points[i, 0] + width/2, grid_points[i, 1], grid_points[i, 2]])
            corners.append([grid_points[i, 0] - width/2, grid_points[i, 1] - height, grid_points[i, 2]])
            corners.append([grid_points[i, 0] + width/2, grid_points[i, 1] - height, grid_points[i, 2]])
        return np.array(corners)
    
    
    # This function generates all the BoundingBox objects to extract the image patches from the image.
    # Rather than using a simple sliding window approach, this procedure first builds a series of possible regions in the 3D
    # environment where people may be, later it projects those points in the image to get the region proposal. This method is
    # based on the knowledge of the ground plane equation that it uses to identify a series of possible pedestrian positions in
    # the image. The advantage of this method with respect to the sliding window approach is that it automatically generates
    # region proposals of different dimension based on the distance from the camera, and it only proposes regions "standing" on
    # the ground plane.
    def get_bbox(self):
        
        # Get the ground plane from the current measurements
        ground_plane = self.measurements.get_ground_plane()
        
        # Create a meshgrid to identify all the possible points on the ground plane that will be analysed
        # Define parameters for the meshgrid
        z_min = 5
        z_max = 40
        x_min = -50   # We can choose very large margins since the points will be filtered later
        x_max = 50
        z_step = 1   # The smaller this interval the more optimized the algorithm (step=0.5 takes 15.89s, step=1 takes 7.23s)
        x_step = 1
        # Get points of ground plane
        Xs, Ys, Zs = MyFancyPedestrianDetector.get_plane_meshgrid_cf(ground_plane, x_min, x_max, z_min, z_max, x_step, z_step)
        XYZs = np.vstack([Xs.ravel(), Ys.ravel(), Zs.ravel()]).T
        
        # Get the image and the projection matrix from measurements
        image = self.measurements.get_camera_image()
        projection_matrix = self.measurements.get_camera_projection_matrix()
        # Filter points that fit in camera image
        XYZs_filtered = MyFancyPedestrianDetector.filter_points(image, XYZs, projection_matrix)
        
        # Calculate the corners for the bounding boxes in 3D world
        corners_3d = MyFancyPedestrianDetector.corners_from_grid(XYZs_filtered, 1.70, 0.8)
        
        # Plot the meshgrid with the corresponding corners in the 3D world
        if self.is_debug:
            plot = k3d.plot()
            plot += plot_axes(np.eye(4, dtype=np.float32))
            plot += k3d.points(positions=XYZs_filtered.astype(np.float32), point_size=0.2, color=0xff0000)
            plot += k3d.points(positions=corners_3d.astype(np.float32), point_size=0.1, color=0x00ff00)
#             plot.display()
            self.doa.add_k3d_plot(
                **{
                    "name": "Possible detections in 3D world",
                    "description": "A plot showing all then possible detections represented in a 3D environment. \n"
                    "Notice the cone shaped disposition due to the filtering of the points.",
                    "plot": plot,
                }
            )
        
        # Project the points in 2D
        corners_3d_aug = np.hstack((corners_3d, np.ones((corners_3d.shape[0], 1))))        
        corners_2d = project_points(projection_matrix, corners_3d_aug)
        # Reshape the array to get the corners of each bounding box per row
        corners_2d = corners_2d.reshape(-1, 4, 2)
        
        # Create a list with a bounding box for each set of corners
        bbox_list = []
        for i in range(corners_2d.shape[0]):
            bbox = BoundingBox(corners_2d[i, 2, 1].astype(np.int32), corners_2d[i, 2, 0].astype(np.int32), corners_2d[i, 1, 1].astype(np.int32), corners_2d[i, 1, 0].astype(np.int32), from_corners=True)
            bbox_list.append(bbox)
        
        # Print the image with all the possible bounding boxes
        if self.is_debug:
            image_test = self.measurements.get_camera_image()
            for i in range(len(bbox_list)):
                draw_bbox_to_image(image_test, bbox_list[i])
#             showimage(image_test)
            self.doa.add_image(
                **{
                    "name": "All possible bboxes",
                    "description": "The image shows in the 2D environment all the bounding boxes that will be considered for the detection.\n",
                    "image": image_test,
                }
            )
        
        # Finally, crate the proposals based on the bounding boxes
        frame_proposals = []
        for i in range(len(bbox_list)):
            clip_bbox_to_image(bbox_list[i], image.shape[:2])
            patch_image = image[bbox_list[i].v:bbox_list[i].v+bbox_list[i].h, bbox_list[i].u:bbox_list[i].u+bbox_list[i].w]
            patch = ImagePatch(patch_image, bbox_list[i])
            frame_proposals.append(patch)
            
        return [XYZs_filtered, frame_proposals]
    
    # Simply apply the pretrained classifier on the proposals and identify the ones with pedestrian.
    # Great part of the following code has been copied from practicum1.
    def find_pedestrian(self):
        
        # First, define the classifier. A pretrained classifier willbe used due to lack of training set.
        patch_classifier = tf.keras.models.load_model(os.path.join(os.environ["SOURCE_DIR"], "practicum1", "pedestrian_classifier"))
        
        # Get the frame_proposals with get_bbox function
        XYZs_filtered, frame_proposals = MyFancyPedestrianDetector.get_bbox(self)
        
        # Preprocess the patches and classify them
        frame_patches = np.concatenate([preprocessing_fn_mobilenet(proposal_patch.image) for proposal_patch in frame_proposals], 0)
        predictions = patch_classifier.predict(frame_patches)
        # Add the score feature of each patch to its description
        for i, pred in enumerate(predictions):
            frame_proposals[i].score = pred
        # Filter the patches to estract only those representing pedestrian with high certainty
        threshold = 0.6
        pedestrian_patches = [proposal_patch for proposal_patch in frame_proposals if proposal_patch.score >= threshold]
        
        # Show all the pedestrian predictions in the image
        if self.is_debug:
            image_test_1 = self.measurements.get_camera_image()
            for pedestrian in pedestrian_patches:
                bbox = pedestrian.bbox
                draw_bbox_to_image(image_test_1, bbox, color=(255,0,0))
#             showimage(image_test_1)
            self.doa.add_image(
                **{
                    "name": "Pedestrian patches in the image",
                    "description": "All the patches classiffied as containing a pedestrian are shown in the image\n",
                    "image": image_test_1,
                }
            )
        
        # Reduce the overlapping proposals to a single one using NMS algorithm .
        # I had to modify the code from practicum 1 since that only generated one patch even for two distinct pedestrians.
        all_bboxes = []
        confidences = []
        nms_patches = []
        overlap_thresh = 0.01
        for frame in pedestrian_patches:
            bbox = np.asarray([frame.bbox.get_bbox_corners()])
            bbox = bbox.reshape(4,)
            all_bboxes.append(bbox)
            confidence = np.asarray([frame.score[0]])
            confidences.append(confidence)
        confidences = np.array(confidences)
        n_confidences = confidences.shape[0]
        confidences = confidences.reshape(n_confidences,)
        if len(all_bboxes) > 0:
            idx = non_max_suppression(np.array(all_bboxes), np.array(confidences), max_output_size=len(all_bboxes), iou_threshold=overlap_thresh)
            for i in idx:
                nms_patch = pedestrian_patches[i]
                nms_patches.append(nms_patch)

        # Print the final result
        if self.is_debug:
            image_test_2 = self.measurements.get_camera_image()
            for pedestrian in nms_patches:
                bbox = pedestrian.bbox
                draw_bbox_to_image(image_test_2, bbox, color=(255,0,0))
#             showimage(image_test_2)
            self.doa.add_image(
                **{
                    "name": "Pedestrian patches in the image after NMS",
                    "description": "The patch with highest certainty of containing a pedestrian is shown on the image\n",
                    "image": image_test_2,
                }
            )
        
        return [XYZs_filtered, frame_proposals, nms_patches]
        
    # To get the position of the pedestrian in 3D world from the "successful" patch, the initial meshgrid is used.
    # The idea is to find the position on the initial meshgrid that generated the pedestrian patch.
    def get_T_cam_object(self):

        # Get the data from the previous functions
        XYZs_filtered, frame_proposals, nms_patches = MyFancyPedestrianDetector.find_pedestrian(self)

        # Compare the nms_patches with the initial frame proposals to find the index of the pedestrian patch.
        indexes = []
        scores = []
        for i in range(len(frame_proposals)):
            for j in range(len(nms_patches)):
                score = nms_patches[j].score
                if nms_patches[j].bbox == frame_proposals[i].bbox:
                    if i not in indexes:
                        indexes.append(i)
                    if score not in scores:
                        scores.append(score)

        # Use the found indexes on the initial grid to find the position of the pedestrian.
        pedestrian_pos = []
        for i in indexes:
            pedestrian_pos.append(XYZs_filtered[i])
        pedestrian_pos = np.array(pedestrian_pos)
        
        # Add a k3d.plot to show pedestrian position in 3D space
        if self.is_debug:
            plot_1 = k3d.plot()
            plot_1 += plot_axes(np.eye(4, dtype=np.float32))
            plot_1 += k3d.points(positions=XYZs_filtered.astype(np.float32), point_size=0.2, color=0xff0000)
            for pos in pedestrian_pos:
                plot_1 += k3d.points(positions=pos.astype(np.float32), point_size=0.5, color=0x0000ff)
#             plot_1.display()
            self.doa.add_k3d_plot(
                **{
                    "name": "Pedestrian position in initial meshgrid",
                    "description": "The blue dot shows the pedestrian patch position within all the initial proposals in 3D space.",
                    "plot": plot_1,
                }
            )
            
        # Create a transformation matrix with a predefined standard rotation and the bbox position just found.
        T_cam_object = []
        for i in range(len(pedestrian_pos)):
            T_cam_object.append(np.array([[0, -1,  0, pedestrian_pos[i, 0]],
                                          [0,  0, -1, pedestrian_pos[i, 1]],
                                          [1,  0,  0, pedestrian_pos[i, 2]],
                                          [0,  0,  0,                   1]]))
        
        return [T_cam_object, scores]
    
    # Makes use of all the other function to produce the final dicts
    def get_pedestrian_dicts(self):
        
        # Get the T_cam_object and the score for every detection
        T_cam_object, scores = MyFancyPedestrianDetector.get_T_cam_object(self)
        
        # Creates the dictionary with predifined values for 'label_class' and 'extent_object'
        pedestrian_dicts = []
        for i in range(len(T_cam_object)):
            score = scores[i]
            p_dict = {'label_class': 'Pedestrian',
                      # For the extent_object, no way was found to precisely detect the size of the pedestrian based on the
                      # bounding box dimension since it does not reliably fit on the image, therefore some standard dimensions
                      # are used.
                      'extent_object': np.array([0.8, 0.8, 1.70]),
                      'T_cam_object': T_cam_object[i],
                      'score': score[0]}
            pedestrian_dicts.append(p_dict)
            
        return pedestrian_dicts
    
        

pedestrian_detector = MyFancyPedestrianDetector(is_debug=is_debug)

# raise NotImplementedError()

dataset = Dataset()
sequence = dataset.get_custom_sequence(start_index, end_index)

# get first measurements object of the sequence
measurements = next(iter(sequence))
# measurements = sequence[start_index + 55]

# feed measurements
pedestrian_detector.set_measurements(measurements)

pedestrian_dicts = pedestrian_detector.get_pedestrian_dicts()
pedestrian_dicts

In [None]:
# make sure each pedestrian_dict has all required keys present
required_keys = {"label_class", "extent_object", "T_cam_object", "score"}
for pedestrian_dict in pedestrian_dicts:
    assert required_keys.issubset(set(pedestrian_dict.keys()))

In [None]:
# make sure the pedestrian_detector object is a (duck-typed) PedestrianDetector subclass
assert isinstance(pedestrian_dicts, list)
assert {"doa", "get_pedestrian_dicts"}.issubset(set(dir(pedestrian_detector)))

In [None]:
# let's have a look at your debug outputs
# show debug outputs
[None for i in iter(pedestrian_detector.doa)]

In [None]:
# DO NOT DELETE THIS CELL!

In [None]:
# DO NOT DELETE THIS CELL!

### Localize pedestrians on the whole sequence

Please assemble the target structure `frame_pedestrian_dicts` below by iterating over the sequence and obtaining all pedestrian dicts.

Along the line we will collect timing information to get a feeling on the efficiency of the approach.
We are aware that approaches are hard to compare on different hardware, so *real-time* is not the goal we set here.

In [None]:
from assignment.solution_helpers import DurationAggregator
from tqdm.notebook import tqdm

sequence = dataset.get_custom_sequence(start_index, end_index)
frame_pedestrian_dicts = {
    1430: [
        {
            # ...
        },
    ]  # frame_index as key. Fill me with pedestrian_dicts using your subclass of PedestrianDetector
}

is_debug = False
pedestrian_detector = None  # overwrite me with your instantiated pedestrian detector class
# YOUR CODE HERE

# Instantiate pedestrian detector
pedestrian_detector = MyFancyPedestrianDetector(is_debug=is_debug)

# raise NotImplementedError()

# log time for running detector on each measurements instance
duration_aggregator = DurationAggregator(is_print_durations=True)
for measurements in tqdm(duration_aggregator.aggregate_durations(sequence), total=len(sequence)):

    pedestrian_detector.set_measurements(measurements)
    refined_proposal_dicts_nms = pedestrian_detector.get_pedestrian_dicts()
    frame_pedestrian_dicts[measurements.get_index()] = refined_proposal_dicts_nms

In [None]:
# show mean duration for processing a single frame to answer the question 02a.3 below.
assert len(duration_aggregator) == len(sequence)
mean_duration_s = duration_aggregator.get_mean_duration_s()
print(f"mean duration: {mean_duration_s:.2f} s")

### Q 02a.3 Runtime
Please reflect on the mean duration of your algorithm.
1. What is the mean duration per timestep of your pedestrian detector on your machine? (see output of cell above)
2. How much speed-up would be needed in order to run it 'real-time' within a car given a sensor measurement update rate of 10 Hz?

Don't overoptimize: your approach should run at most 30 s per timestep (to keep our inference time during grading manageable), though something around 1-3 s per timestep seems a realistic goal.

### A 02a.3
**Your answer:** (maximum 150 words)
1. The mean duration per timestep is 7.65s with a step in the meshgrid of 1m. With 0.5m the duration per timestep is 15.89 (without cache the first steps may take longer and therefore increase the global average).
2. Right now the algorithm computes 0.12 frames per second, while in real time it should compute 10 freames per second. Therefore, the needed speed-up would be x83.3.

YOUR ANSWER HERE

In [None]:
import json

# print(frame_pedestrian_dicts[1430])
# check for proper format
from assignment.solution_helpers import save_frame_pedestrian_dicts

# make sure all frames within the sequence are filled with frame pedestrian dicts
assert set(frame_pedestrian_dicts.keys()) == set(sequence.get_indices())

# check for type of output
for fpds in frame_pedestrian_dicts.values():
    for fpd in fpds:
        assert {"label_class", "extent_object", "T_cam_object"}.issubset(set(fpd.keys()))
        assert fpd["T_cam_object"].shape == (4, 4)
        assert fpd["label_class"] == "Pedestrian"

# use save_frame_pedestrian_dicts with is_dry_run=True to check for serializability
is_serializable = True
try:
    save_frame_pedestrian_dicts(frame_pedestrian_dicts, is_dry_run=True)
except TypeError as e:
    print("Error, frame_pedestrian_dicts is not json serializable: %s" % str(e))
    is_serializable = False
if not is_serializable:
    assert False, "See error above"

## Quantitative Evaluation (Image Projections)
Let's evaluate your detector via comparing the projected 2D bounding boxes of the `frame_pedestrian_dicts` you obtained via your approach against ground truth pedestrian bounding boxes (cf. [Practicum 1](../practicum1/practicum1.ipynb)).
Evaluation metrics will be ROC curves, average precision (IoU=0.2) and mean average precision (mAP).

### Ground Truth bounding boxes (image projections)
We fill `gt_bboxes` with a list of pedestrian bounding box coordinates for each image frame (cf. Practicum 1 evaluation).

In [None]:
from assignment.evaluation_helpers import get_gt_bboxes
gt_bboxes = get_gt_bboxes(sequence)
gt_bboxes[0]  # frame 1430

In [None]:
assert len(gt_bboxes) == len(sequence)
assert all(len(bbox) == 4 for bboxes in gt_bboxes for bbox in bboxes)
assert gt_bboxes[0][0] == (762, 1746, 958, 1852)

### Prediction bounding boxes (image projections)
Now, we assemble `sequence_proposals` out of your `frame_pedestrian_dicts` similar to Practicum 1.
`sequence_proposals` contain a list of `ImagePatch`es for each frame.
Each `ImagePatch` contains the projected 2D bounding box and the detector score.

In [None]:
from assignment.evaluation_helpers import get_sequence_proposals
sequence_proposals = get_sequence_proposals(sequence, frame_pedestrian_dicts)
sequence_proposals[0]  # frame 1430

In [None]:
from practicum1.ImagePatch import ImagePatch

assert len(sequence_proposals) == len(gt_bboxes)
assert len(sequence_proposals) == len(sequence)
# sequence_proposals should be of type ImagePatch and have score of proper shape and range
assert all(isinstance(sp, ImagePatch) for sps in sequence_proposals for sp in sps)
assert all(
    len(sp.score) == 1 for sps in sequence_proposals for sp in sps
), "score as in practicum1 needs to be a one-element list"
assert all(sp.score[0] >= 0.0 for sps in sequence_proposals for sp in sps)
assert all(sp.score[0] <= 1.0 for sps in sequence_proposals for sp in sps)
for frame_index, pedestrian_dicts in enumerate(frame_pedestrian_dicts.values()):
    assert len(sequence_proposals[frame_index]) == len(pedestrian_dicts)

### Metrics Dict (image projections)
We use `generate_metrics_dict` as in Practicum 1 to evaluate `sequence_proposals` against `gt_bboxes` for the given `discrimination_thresholds` and `iou_thresholds`.

In [None]:
from practicum1.evaluation import generate_metrics_dict

discrimination_thresholds = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
iou_thresholds = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

metrics_dict = generate_metrics_dict(sequence_proposals, gt_bboxes, discrimination_thresholds, iou_thresholds)

metrics_dict

In [None]:
assert set(metrics_dict.keys()) == set(iou_thresholds)
assert all(v.shape == (len(discrimination_thresholds), 2) for v in metrics_dict.values())

### Precision-Recall Curve (image projections)
Let's plot the Precision-Recall curve for the IoU threshold of 0.2 (and interactively).
See Practicum 1.

In [None]:
from ipynb.fs.defs.practicum1 import plot_pr_curve
from ipywidgets import fixed, interact, FloatSlider

interact(plot_pr_curve, metrics_dict=fixed(metrics_dict), iou_thresh=FloatSlider(min=0.0, max=1.0, step=0.1, value=0.2))

### Average Precision (image projections)
What is the `average_precision` for `iou_threshold = 0.2`?
Let's reuse code from Practicum 1.


In [None]:
from sklearn.metrics import auc

iou_threshold = 0.2
precisions, recalls = metrics_dict[iou_threshold].T
average_precision = auc(recalls, precisions)

print(f"Average Precision @ IoU thresh. of {iou_threshold:.01f} = {average_precision * 100:.01f} (image projections)")

In [None]:
assert average_precision >= 0.0
assert average_precision <= 1.0

### Mean Average Precision (image projections)
What is the `mean_average_precision` (mAP) of your approach?
Let's reuse code from Practicum 1.

A basic implementation should achieve an mAP value of at least 10%.

In [None]:
from practicum1.metrics import mAP

mean_average_precision = mAP(metrics_dict)
print(f"Mean Average Precision: {mean_average_precision * 100:.01f} (image projections)")

In [None]:
# DO NOT DELETE THIS CELL!

## Video (qualitative evaluation)
Let's create a video over the whole sequence drawing the projected bounding boxes of all detected 3D pedestrians in ` frame_pedestrian_dicts`.
We reuse the function `get_bounding_box_from_object` you implemented in `fa_01a_data_visualization`.
The output is a list of images (BGR `np.array`s) called `images_draw`.

In [None]:
from common.visualization import draw_bbox_to_image

def draw_pedestrian_bounding_boxes(frame_pedestrian_dicts, sequence):
    images_draw = []  # fill me with images of the sequence with pedestrian bounding boxes drawn onto
    assert len(frame_pedestrian_dicts) == len(sequence)

    for measurements in tqdm(sequence, total=len(sequence)):
        pedestrian_dicts = frame_pedestrian_dicts[measurements.get_index()]

        image_draw = measurements.get_camera_image()
        P2 = measurements.get_camera_projection_matrix()
        bboxes = [get_bounding_box_from_object(pd, P2) for pd in pedestrian_dicts]
        for bbox in bboxes:
            draw_bbox_to_image(image_draw, bbox, color=(0, 255, 0), thickness=3)
        images_draw.append(image_draw)
    return images_draw


images_draw = draw_pedestrian_bounding_boxes(frame_pedestrian_dicts, sequence)

In [None]:
# make sure we have a video along the whole sequence
assert len(images_draw) == len(sequence)
# make sure we have images of full resolution and color
assert images_draw[0].shape == (1216, 1936, 3)

Let's visualize the video inline via `create_animation`. This might take a minute.

In [None]:
from common.visualization import create_animation
from IPython.core.display import HTML

anim = create_animation(images_draw)
HTML(anim.to_html5_video())

# Birds-eye view visualization
The above video is handy to analyze the projected bounding boxes.
However, it is hard to judge the depth perception of the detected pedestrians.

Let's create a birds-eye view plot to judge the distance of the objects to the camera frame.


In [None]:
# extract pedestrian positions in birds-eye view for every frame_index from frame_pedestrian_dicts
from collections import defaultdict

frame_ped_positions = dict()
frame_ped_scores = dict()
for frame_index, pedestrian_dicts in frame_pedestrian_dicts.items():
    frame_ped_positions[frame_index] = []
    frame_ped_scores[frame_index] = []
    for pedestrian_dict in pedestrian_dicts:
        ped_position = pedestrian_dict["T_cam_object"][[0, 2], 3]  # take only xz positions (in camera frame)
        frame_ped_positions[frame_index].append(ped_position)
        frame_ped_scores[frame_index].append(pedestrian_dict["score"])
for frame_index, ped_positions in frame_ped_positions.items():
    frame_ped_positions[frame_index] = np.asarray(ped_positions).reshape(-1, 2)
for frame_index, ped_scores in frame_ped_scores.items():
    frame_ped_scores[frame_index] = np.asarray(ped_scores).reshape(-1, 1)
frame_ped_positions[1430], frame_ped_scores[1430]  # frame 1430

In [None]:
# do the same for ground truth pedestrian positions
frame_ped_gts = dict()
for measurements in sequence:
    frame_index = measurements.get_index()
    frame_ped_gts[frame_index] = []
    # subselect pedestrians
    labels_camera = [m for m in measurements.get_labels_camera() if m["label_class"] == "Pedestrian"]
    for label_camera in labels_camera:
        ped_gt = label_camera["T_cam_object"][[0, 2], 3]  # take only xz positions (in camera frame)
        frame_ped_gts[frame_index].append(ped_gt)
for frame_index, ped_gts in frame_ped_gts.items():
    frame_ped_gts[frame_index] = np.asarray(ped_gts)
frame_ped_gts[1430]  # frame 1430

In [None]:
# create interactive plot showing detected pedestrian positions and ground truth positions
import matplotlib.pyplot as plt
from IPython.display import HTML
from matplotlib.animation import FuncAnimation

# get bounds for plotting
all_ped_positions = np.vstack(list(frame_ped_positions.values()))
all_ped_gts = np.vstack(list(frame_ped_gts.values()))
all_peds = np.vstack([all_ped_positions, all_ped_gts])
xmax, zmax = np.max(np.abs(all_peds), axis=0)  # symmetric
xmin, zmin = -xmax, -zmax
zmin = 0.0  # make plot start at camera position

fig, ax = plt.subplots(figsize=(15, 12))

def plot_ped_positions(frame_index):
    ax.cla()  # remove content from last frame
    ax.set_xlim(left=xmin - 2.0, right=xmax + 2.0)
    ax.set_ylim(bottom=zmin, top=zmax + 2.0)
    ax.set_aspect("equal")
    ax.set_xlabel("x (camera frame)")
    ax.set_ylabel("z (camera frame)")
    ax.set_title(f"frame: {frame_index}")
    ax.grid(True, alpha=0.5)
    ax.scatter(0.0, 0.0, color="r")  # camera frame

    if frame_ped_gts[frame_index].size > 0:
        ax.scatter(
            frame_ped_gts[frame_index][:, 0], frame_ped_gts[frame_index][:, 1], color="y", s=500, marker="*", alpha=0.6
        )

    if frame_ped_positions[frame_index].size > 0:
        ax.scatter(frame_ped_positions[frame_index][:, 0], frame_ped_positions[frame_index][:, 1])


ani = FuncAnimation(fig, func=plot_ped_positions, frames=list(frame_ped_positions.keys()))
plt.close()  # avoid drawing additional figure below animation
HTML(ani.to_jshtml())

## Quantitative Evaluation (birds-eye view)
In the quantitative evaluation (image projections) we have evaluated the implementation of your approach based on projections of the 3D bounding boxes onto the camera image.
Remembering your task was to **detect pedestrians in *3D*** and looking at the birds-eye view plot above, we might figure that a measure based on image projections might not be appropriate for evaluating 3D localization.

To close that gap, let's quantitatively evaluate your `frame_pedestrian_dicts` against the ground truth pedestrian objects in birds-eye view.
We represent both your detections and ground truth objects as circles on the (simplified) ground plane, compared to bounding boxes within the image.
The simplified ground plane is span by the XZ plane of the camera frame (thus ignoring y values).
We set the variable `radius_m` below to 3 m to be tolerant against inaccurate depth estimates.
Similar to the image projection based evaluation, we can associate detections to ground truth objects via overlapping the their circles (IoU, intersection over union).
Eventually, this also yields a `metrics_dict` which can be interpreted in the same way as we did above for the image-projection based evaluation.

Let's see what average precision (AP) and mean average precision (mAP) we get for the birds-eye view based evaluation.

In [None]:
# create sequence ground plane proposals
from assignment.evaluation_helpers import get_sequence_proposals_circle

sequence_groundplane_proposals = get_sequence_proposals_circle(frame_ped_positions, frame_ped_scores)

# ground plane x/y axes correspond to camera frame x/z axes, respectively
# so don't be confused by the dict keys 'x' and 'y' below
print('The sequence_groundplane_proposals has for every frame a list with a dict for every detection!')
print('The list for the first frame is:\n{}'.format(sequence_groundplane_proposals[0]))

In [None]:
# create ground truth sequence ground plane proposals
from assignment.evaluation_helpers import get_GT_sequence_groundplane_proposals

GT_sequence_groundplane_proposals = get_GT_sequence_groundplane_proposals(frame_ped_gts)

# ground plane x/y axes correspond to camera frame x/z axes, respectively
# so don't be confused by the dict keys 'x' and 'y' below
print('The GTsequence_groundplane_proposals has for every frame a list with a dict for every pedestrian!')
print('The list for the first frame is:\n{}'.format(GT_sequence_groundplane_proposals[0]))

In [None]:
from assignment.evaluation_helpers import generate_metrics_dict_circle

discrimination_thresholds = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
iou_thresholds = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
radius_m = 3.0  # radius of representing circles for overlap computation

# generate the metrics_dict from birds-eye view based circles
metrics_dict = generate_metrics_dict_circle(sequence_groundplane_proposals,
                                            GT_sequence_groundplane_proposals,
                                            discrimination_thresholds,
                                            iou_thresholds,
                                            radius=radius_m)

### Precision-Recall Curve (birds-eye view)
Let's plot the Precision-Recall curve for the IoU threshold of 0.2 (and interactively).
See Practicum 1.

In [None]:
# plot the precision-recall curve
interact(plot_pr_curve, metrics_dict=fixed(metrics_dict), iou_thresh=FloatSlider(min=0.0, max=1.0, step=0.1, value=0.2))

### Average Precision (birds-eye view)
What is the `average_precision` for `iou_threshold = 0.2`?
Let's reuse code from Practicum 1.


In [None]:
from sklearn.metrics import auc

iou_threshold = 0.2
precisions, recalls = metrics_dict[iou_threshold].T
average_precision = auc(recalls, precisions)

print(f"Average Precision @ IoU thresh. of {iou_threshold:.01f} = {average_precision * 100:.01f} (birds-eye view)")

### Mean Average Precision (birds-eye view)
What is the `mean_average_precision` (mAP) of your approach?
Let's reuse code from Practicum 1.

A basic implementation should achieve an mAP value of at least 5%.

In [None]:
from practicum1.metrics import mAP

mAP_value = mAP(metrics_dict)

print(f'Mean Average Precision: {mAP_value*100:.01f} (birds-eye-view)')

### Q 02a.4 Interpretation of experimental results
Please interpret your experimental results:
1. Qualitative: How does your approach behave in terms of false positives and false negatives? (video / birds-eye view plot)
2. Quantitative: Please discuss the Precision-Recall plot, AP and mAP values in comparison to ideally achievable values. Reflect on the differing values between image-projection based evaluation and birds-eye view based evaluation.

### A 02a.4
**Your answer:** (maximum 350 words)
1. From a qualitative point of view, from the video it is clear to see that the detection is not as precise as it could be. First of all, it struggles to detect further away pedestrians and it concentrates mostly on closer ones. Secondly, it often generates some false positives when analyzing non-conventional structures (e.g. motorbikes). Overall, however, it does detect pedestrian close by and, in terms of safety for a self-driving car, that is the most important feature. Notice that also the false positives are much less of issue if compared to false negatives since in the first case the car stops for no reason, in the second case it might run over a pedestrian.
2. For both the image-projection evaluation and the birds-view evaluation the results are overall satisfying. The Precision-Recall curve is slightly better in the image-projection evaluation case rather than the birds-view one, but in both cases it follows the expected shape and can be therefore considered correct. To have a more relatable idea in terms of performance, it is better to analyze the Average Precision metric (AP) that is actually strictly related to the Precision-Recall curve since it represents the area of graph under the curve itself. Obviously, depending on the classifier threshold, the number of false positives and false negatives increases or decreases accordingly: the threshold I am using now, 0.6, I think represents a good middle way and generates good AP values. For the the image-projection the the AP at an IoU threshold of 0.2 is 33.2: the value is lower for higher threshold and does not increase with lower threshold. Differently, for the birds-view evaluation, the AP value at an IoU of 0.2 keeps increasing by lowering the threshold: its value at 0.6 threshold is 28.4, which represents a good tradeoff with the qualitative analysis. To have an even more general idea, though, we can consider the mean Average Precision (mAP): for the image-projection the value is 18.2, while for the birds-view it keeps a little lower at 17.2. Overall, it is understandable that this algorithm performs better in terms of image-projection evaluation rather than birds-view evaluation if the classifier's threshold is set high enough, otherwise, the presence of a lot of false positives, makes the bird-view overperform the image-projection evaluation.

YOUR ANSWER HERE

## Saving to disk for later usage
Now, we're saving the `frame_pedestrian_dicts` to have them accessible to subsequent modules for tracking and motion planning.
For now, we don't make use of the saved files, though we might do so in assignments of the coming years.

In [None]:
from assignment.solution_helpers import save_frame_pedestrian_dicts

save_frame_pedestrian_dicts(frame_pedestrian_dicts)

In [None]:
# make sure file exists
import os

save_frame_pedestrian_dicts(frame_pedestrian_dicts, is_dry_run=True)
assert os.path.exists(os.path.join(os.environ["SOURCE_DIR"], "assignment", "frame-pedestrian-dicts.json"))

### Q 02a.5 Future Work
1. How can improve your method even more, i.e., if you had more time at your disposal?

### A 02a.5
**Your answer:** (maximum 150 words)
1. The first thing I would do is trying to optimize the whole algorithm in order to avoid useless computation and make it faster. With a faster algorithm I would be able to run a tighter grid without incurring in excessive computation time per frame. With a tighter grid I would have a higher chance to detect further pedestrian and more detection per pedestrian resulting in a better fit after the selection. A better fit would mean a more precise indication regarding the pedestrian position in each frame, thus increasing the performance of the algorithm. Regarding the other big issue concerning my solution, the dimension of the objects, a possibile way to improve the performance would be using different bounding box sizes for every position and, if they fit well enough, that would allow to get a better idea about the dimension of the objects.


YOUR ANSWER HERE

# GREAT JOB!
You've come a long way.
You detected pedestrians in 3D with a single camera from a moving vehicle.
Are you ready for the last challenge?