# DriveNet Closed-Loop Evaluation

**Note: this notebook assumes you've already run the [training notebook](./drivenet_train.ipynb) and stored your model successfully.**

## What is a closed-loop evaluation?
In closed-loop evaluation DriveNet is in **full control of the AV**. At each time step, we predict the future trajectory and then move the AV int he first of the DriveNet's predictions. 

At Lyft, we refer to this process with the terms **forward-simulate** or **unroll**.

![closed-loop](../../images/drivenet/closed-loop.svg)

This evalaution is crucial to assess the real performance of our DriveNet. **Ideally, we would test the model on the road in the real world**. However, this is clearly very expensive and scales poorly. Forward-simulation is an attempt to evaluate the system in a setting which is closest as possible to a real road test on the same route.


## What is a good closed-loop metric?
In this setting **metrics are particularly challenging**. In fact, we would like to penalise some of the simulation drift (e.g. going off road or in the opposite lane) while at the same time allow others (e.g. different speed profiles). As an example, using the same metrics we employed during the open loop evaluation (e.g. ADE) would penalise DriveNet Unconditionally.

At Lyft L5, we use a large set of different metrics to capture dangerous manoeuvres and behaviours. 

For the sake of simplicity, in this notebook we will be using a very simple proxy to detect if our model is driving in a sensible way, composed of two different metrics:

### Collisions
Our AV should avoid collisions with other agents. However, not all collisions are created equally. This is because while our AV is fully controlled by Drivenet, other agents are not (a setting we call **log replay**).

If our AV was slower than the recorded one, chasing agents might bump into us. Clearly, this won't happen in a real setting where other agents can react to our behaviour.

In this simple example, we won't distinguish between collisions caused by non reactivity of other agents and actual collisions, and we will simply report them all categorised by where they occurred (front, rear and side with respect to the AV).

However, if we only considered collision, our AV could pass our tests by driving off-road or in a different lane.

### Distance from Reference Trajectory
To address the issue presented above, we require our DriveNet to loosely stick to the original trajectory in the data. By setting the right threshold on the distance we can allow for different speed profile and small steerings, while pensalising large deviations like driving off-road.

We can do so by computing the distance between the first predictions from DriveNet and the corresponding annotated positions in world coordinates.

In [None]:
from tempfile import gettempdir
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torch.utils.data.dataloader import default_collate
from tqdm import tqdm

from l5kit.configs import load_config_data
from l5kit.data import LocalDataManager, ChunkedDataset, filter_agents_by_frames
from l5kit.dataset import EgoDataset
from l5kit.rasterization import build_rasterizer
from l5kit.geometry import transform_points, angular_distance, yaw_as_rotation33
from l5kit.visualization import TARGET_POINTS_COLOR, PREDICTED_POINTS_COLOR, draw_trajectory
from l5kit.drivenet.model import DriveNetModel
from l5kit.drivenet.utils import detect_collision
from l5kit.kinematic import AckermanPerturbation
from l5kit.random import GaussianRandomGenerator

import os

## Prepare Data path and load cfg

By setting the `L5KIT_DATA_FOLDER` variable, we can point the script to the folder where the data lies.

Then, we load our config file with relative paths and other configurations (rasteriser, training params...).

In [None]:
# set env variable for data
os.environ["L5KIT_DATA_FOLDER"] = "/tmp/l5kit_data"
dm = LocalDataManager(None)
# get config
cfg = load_config_data("./drivenet_config.yaml")

## Load The Model



In [None]:
model_path = "/tmp/drivenet.pt"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = torch.load(model_path).to(device)
model = model.eval()
torch.set_grad_enabled(False)

## Load the Evaluation Data
Differently from training and open loop evaluation, this setting is intrinsically sequential. As such, we won't be using any of PyTorch parallelisation functionalities.

In [None]:
# ===== INIT DATASET
eval_cfg = cfg["val_data_loader"]
rasterizer = build_rasterizer(cfg, dm)
eval_zarr = ChunkedDataset(dm.require(eval_cfg["key"])).open()
eval_dataset = EgoDataset(cfg, eval_zarr, rasterizer)
print(eval_dataset)

# Define our Unroll function

Our unroll function is actually very simple. At each timestep:
- we **forward the current frame** to DriveNet;
- we **get the predicted trajectory** from the model;
- we **compute our metrics** (collisions and distance from the reference trajectory);
- if we have not collided with other agents and we're still on the road we can use the first point of the trajectory as position for the next frame;
- otherwise, we reset the AV to the original GT position.

The function returns not only the RGB frames, but also the different types of errors according to the metrics.

After this function is defined we can iterate through scenes, visualise the result and plot the accumulated metrics very easily.


In [None]:
def unroll_scene(scene_dataset, model, drifting_threshold=10):
    ## prepare return buffers
    images = []
    collisions = []
    driftings = []

    for frame_idx in tqdm(range(len(scene_dataset))):
        data = scene_dataset[frame_idx]
        del data["host_id"]
        data_batch = default_collate([data])
        result = model(data_batch)
        
        predicted_positions = result["positions"].detach().cpu().numpy().squeeze()
        predicted_yaws = result["yaws"].detach().cpu().numpy().squeeze()

        ## store image
        im_ego = rasterizer.to_rgb(data["image"].transpose(1, 2, 0))    
        draw_trajectory(im_ego, transform_points(predicted_positions, data["raster_from_agent"]), PREDICTED_POINTS_COLOR)
        images.append(im_ego[::-1])

        ## compute absolute positions
        pred_positions_m = transform_points(predicted_positions, data["world_from_agent"])
        pred_angles_rad = predicted_yaws + data["yaw"]
        gt_positions_m = transform_points(data["target_positions"], data["world_from_agent"])

        ## detect collisions
        agents_frame = filter_agents_by_frames(scene_dataset.dataset.frames[frame_idx], scene_dataset.dataset.agents)[0]
        collision = detect_collision(data["centroid"], data["yaw"], data["extent"],agents_frame)
        collisions.append(collision)
        if collision[0] != "":
            continue

        ## detect drifting
        drifting = np.linalg.norm(pred_positions_m[0] - gt_positions_m[0]) > drifting_threshold
        driftings.append(drifting)
        if drifting:
            continue

        ## mutate the next frame if we're not at the end of the scene
        frame_mutate_idx = frame_idx + 1
        if frame_mutate_idx < len(scene_dataset):
            scene_dataset.dataset.frames[frame_mutate_idx]["ego_translation"][:2] = pred_positions_m[0]
            scene_dataset.dataset.frames[frame_mutate_idx]["ego_rotation"] = yaw_as_rotation33(pred_angles_rad[0])
    
    return images, collisions, driftings


# Test our function on a scene
In this cell, we test our unroll function on a scene from the evaluation set. 

L5Kit comes with a handy function to separate an individual scene from a bigger dataset that we can use to unroll a single scene (or a set) of choice.

In [None]:
# ==== EVAL LOOP
scenes_to_unroll = 5
images, collisions, driftings = [], [], []
for scene_idx in range(0, len(eval_zarr.scenes), len(eval_zarr.scenes)//scenes_to_unroll):
    scene_dataset = eval_dataset.get_scene_dataset(scene_idx)
    scene_images, scene_collisions, scene_driftings = unroll_scene(scene_dataset, model)
    images.append(scene_images)
    collisions.append(scene_collisions)
    driftings.append(scene_driftings)

# Qualitative Evaluation: Visualise the Closed-Loop

We can visualise the frames we have stored in the previous cell. 

**DriveNet is now in full control of the AV as it moves through the annotated scene.**

In [None]:
from IPython.display import display, clear_output
import PIL
from time import sleep
 
for frame in images[2]:
    clear_output(wait=True)
    display(PIL.Image.fromarray(frame))
    sleep(0.05)

# Quantitative Evaluation: Plotting Errors from the Closed-Loop

We can collect metrics from different scenes and plot them here.

For collision, we have split them into *rear, front and side* to better capture the nature of different errors.

In [None]:
collision_names = np.asarray([collision[0] for scene_collisions in collisions for collision in scene_collisions])
driftings = np.asarray([drift for scene_driftings in driftings for drift in scene_driftings])
values = []
names = []

for collision_name in ["front", "side", "rear"]:
    values.append(np.sum(collision_names == collision_name))
    names.append(collision_name)

values.append(np.sum(driftings == True))
names.append("displacement error")

plt.bar(np.arange(len(names)), values)
plt.xticks(np.arange(len(names)), names)
plt.show()