# Trade Study Evaluations

Now that we know how to make algorithms and how to make partial-stack AVs, we can leverage the simplicity of AVstack to initiate some trade studies. In these trade studies, we will be comparing and contrasting different designs against objective performance metrics on scenes.

We will first set up the scene managers. Here, we use the KITTI and nuScenes scene managers. We could just as easily add in the Carla scene manager, if we've downloaded a Carla dataset suitable for AVstack.

In [1]:
import os
import avstack
import avapi
from tqdm import tqdm

%load_ext autoreload
%autoreload 2

data_base = '../../lib-avstack-api/data/'

obj_data_dir_k = os.path.join(data_base, 'KITTI/object')
raw_data_dir_k = os.path.join(data_base, 'KITTI/raw')
obj_data_dir_n = os.path.join(data_base, 'nuScenes')

KSM = avapi.kitti.KittiScenesManager(obj_data_dir_k, raw_data_dir_k, convert_raw=False)
NSM = avapi.nuscenes.nuScenesManager(obj_data_dir_n)
SMs = [KSM, NSM]

Cannot import rss library -- don't worry about this unless you need 'safety' evals
Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Set up AV Models

In [2]:
# lidar perception algorithm (3D)
li_perception = {0:'pointpillars',
                 1:'ssn',
                 2:'pointpillars',
                 3:'ssn',
                 4:'pointpillars',
                 5:'ssn'}

# camera perception algorithm (2D)
ca_perception = {0:None,
                 1:None,
                 2:'fasterrcnn',
                 3:'fasterrcnn',
                 4:'cascade_mask_rcnn',
                 5:'cascade_mask_rcnn'}

# tracking/fusion algorithm
tracking = {0:'basic-box-tracker',
            1:'basic-box-tracker',
            2:'basic-box-tracker-fusion-3stage',
            3:'basic-box-tracker-fusion-3stage',
            4:'basic-box-tracker-fusion-3stage',
            5:'basic-box-tracker-fusion-3stage'}

# which sensor to use to evaluate performance
sensor_eval = {0:'main_lidar',
               1:'main_lidar',
               2:'main_camera',
               3:'main_camera',
               4:'main_camera',
               5:'main_camera'}

# whether we only care about the front half of lidar data
filter_front = {0:False,
                1:False,
                2:True,
                3:True,
                4:True,
                5:True}

# The base ego classes we will use for each (see the source code for details)
vs = avstack.ego.vehicle
AVs = {0:vs.LidarPerceptionAndTrackingVehicle,
       1:vs.LidarPerceptionAndTrackingVehicle,
       2:vs.LidarCameraPerceptionAndTrackingVehicle,
       3:vs.LidarCameraPerceptionAndTrackingVehicle,
       4:vs.LidarCameraPerceptionAndTrackingVehicle,
       5:vs.LidarCameraPerceptionAndTrackingVehicle}

## Run Trade Studies

The avapi package comes with a trade-study evaluation tool. There are many possible configuration options available, and we could not possibly enumerate them all here. We provide a selection of configuration options that are easy to understand.

In [6]:
# %%capture
# use ^^^ to suppress output

frame_res_all, seq_res_all = avapi.evaluation.run_trades(
    SMs=SMs,                      # scene managers
    AVs=AVs,                      # av models
    li_perception=li_perception,  # lidar perception
    ca_perception=ca_perception,  # camera perception
    tracking=tracking,            # tracking
    sensor_eval=sensor_eval,      # which sensor to use for ground-truth evaluations
    sensor_eval_super=None,       # if we need to use a larger field-of-view sensor to filter FPs
    trade_type='standard',        # only 'standard' is available at the moment
    filter_front=filter_front,    # whether to filter lidar data to the front-view only
    n_trials_max=3,               # number of scenes to evaluate
    max_dist=100,                 # max distance of objects we care about
    n_cases_max=5,                # how many of the specified cases to run (in the dictionary of the above cell)
    max_frames=150,               # max possible frames per scene
    frame_start=1,                # which starting frame to use
    save_result=True,
    save_file_base='study-1-{}-seq-res.p',
    trial_indices=None)



Running dataset Kitti over 3 trials
   Running trial 0, using index 0
Loads checkpoint by local backend from path: /home/spencer/Documents/Projects/AVstack/avstack-docs/lib-avstack-core/third_party/mmdetection3d/checkpoints/kitti/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class_20220301_150306-37dc2420.pth
      Running dataset: KITTI, case 0


 19%|████████████▋                                                       | 20/107 [01:40<07:17,  5.03s/it]


KeyboardInterrupt: 

## Make Results Tables

We can use the latex functionality of pandas to make tables that can go directly into a latex document.

#### Load the data

Each row is a specific case. Each column is either a descriptor or an output metric. Each cell with metrics may contain an aggregate per-scene metric or a list of per-frame metrics with teh list having length of the number of frames.

In [4]:
import pickle
import numpy as np
import pandas as pd

# load raw data
ds = ['kitti', 'nuscenes']  # need to update this based on the datasets used
data = []
for d in ds:
    tab_file = 'study-1-{}-seq-res.p'.format(d)  # this must match the save_file_base above
    with open(tab_file, 'rb') as f:
        data.append(pickle.load(f))
        
# convert to dataframe
df = pd.concat(pd.DataFrame.from_dict(dat) for dat in data)
print(df.shape)
df.head(7)

(25, 71)


Unnamed: 0,Case,Dataset,Trial,Metrics_perception_object_3d_tot_TP,Metrics_perception_object_3d_tot_FP,Metrics_perception_object_3d_tot_FN,Metrics_perception_object_3d_tot_T,Metrics_perception_object_3d_mean_precision,Metrics_perception_object_3d_mean_recall,Metrics_tracking_HOTA_HOTA,...,Metrics_prediction_std_ADE,Metrics_prediction_std_FDE,Metrics_prediction_n_with_truth,Metrics_prediction_n_objects,Metrics_perception_object_2d_tot_TP,Metrics_perception_object_2d_tot_FP,Metrics_perception_object_2d_tot_FN,Metrics_perception_object_2d_tot_T,Metrics_perception_object_2d_mean_precision,Metrics_perception_object_2d_mean_recall
0,0,KITTI,0,402,134,44,446,0.762997,0.89704,"[0.6115882159203753, 0.6115882159203753, 0.611...",...,1.037896,2.196316,437,442,,,,,,
1,1,KITTI,0,194,99,252,446,0.5002,0.415999,"[0.40064066411277033, 0.40064066411277033, 0.4...",...,1.199937,1.862107,286,286,,,,,,
2,2,KITTI,0,402,134,44,446,0.762997,0.89704,"[0.7123721722473954, 0.7123721722473954, 0.712...",...,0.643589,1.676179,328,328,347.0,224.0,99.0,446.0,0.62293,0.73162
3,3,KITTI,0,194,99,252,446,0.5002,0.415999,"[0.48204490336550637, 0.48204490336550637, 0.4...",...,0.609636,1.207461,190,190,347.0,224.0,99.0,446.0,0.62293,0.73162
4,4,KITTI,0,402,134,44,446,0.762997,0.89704,"[0.7003655971382674, 0.7003655971382674, 0.700...",...,0.860504,1.507013,298,298,0.0,0.0,446.0,446.0,0.0,0.0
5,0,KITTI,1,349,466,25,374,0.371507,0.903611,"[0.5243092884836676, 0.5243092884836676, 0.524...",...,1.319843,3.869034,804,804,,,,,,
6,1,KITTI,1,211,299,163,374,0.245336,0.390823,"[0.4203230160938515, 0.4203230160938515, 0.420...",...,1.897539,4.178149,711,711,,,,,,


#### Extract Interesting Results

To make the tables, we must define which metrics we are interested in, how to compute an "aggregate" metric, in the case of a cell being a list of per-frame metrics, and amongst datasets, how to evaluate the "best" performing, if we want to underline the best in the table.

The latex table generation relies on a couple of custom commands. These help make multi-column sub-cells within a single cell. Include the following in your latex preamble for this to work.
```
\newcommand{\tworowsubtablecenter}[2]{\begin{tabular}{@{}c@{}} #1 \\ #2 \end{tabular}}
\newcommand{\tworowsubtableleft}[2]{\begin{tabular}{@{}l@{}} #1 \\ #2 \end{tabular}}
```

In [5]:
# key: column name for metric we are interested in
# value: our short-name we would like to call this
metrics_of_interest = {'Metrics_perception_object_3d_mean_precision':'Per: 3D Prec.',
                       'Metrics_perception_object_3d_mean_recall':'Per: 3D Rec.',
                       'Metrics_perception_object_2d_mean_precision':'Per: 2D Prec.',
                       'Metrics_perception_object_2d_mean_recall':'Per: 2D Rec.',
                       'Metrics_tracking_HOTA_HOTA':'Trk: HOTA',
                       'Metrics_tracking_CLEAR_MOTA':'Trk: MOTA',
                       'Metrics_tracking_CLEAR_MOTP':'Trk: MOTP',
                       'Metrics_prediction_std_ADE':'Pred: ADE',
                       'Metrics_prediction_std_FDE':'Pred: FDE'}

# If not none, it assumes there is a list of metrics we need to infer over
# (e.g., per-frame or by-threshold in case of e.g., P/R curve)
expansion_types = {'Metrics_perception_object_3d_mean_precision':None,
                   'Metrics_perception_object_3d_mean_recall':None,
                   'Metrics_perception_object_2d_mean_precision':None,
                   'Metrics_perception_object_2d_mean_recall':None,
                   'Metrics_tracking_HOTA_HOTA':'value-at-middle',
                   'Metrics_tracking_CLEAR_MOTA':'value-at-middle',
                   'Metrics_tracking_CLEAR_MOTP':'value-at-middle',
                   'Metrics_prediction_std_ADE':None,
                   'Metrics_prediction_std_FDE':None}

# how to evaluate the "goodness" of a case compared to another
metric_best_evaluator = {'Metrics_perception_object_3d_mean_precision':np.nanargmax,
                         'Metrics_perception_object_3d_mean_recall':np.nanargmax,
                         'Metrics_perception_object_2d_mean_precision':np.nanargmax,
                         'Metrics_perception_object_2d_mean_recall':np.nanargmax,
                         'Metrics_tracking_HOTA_HOTA':np.nanargmax,
                         'Metrics_tracking_CLEAR_MOTA':np.nanargmax,
                         'Metrics_tracking_CLEAR_MOTP':np.nanargmax,
                         'Metrics_prediction_std_ADE':np.nanargmin,
                         'Metrics_prediction_std_FDE':np.nanargmin}

In [6]:
# Convert to table format in double/triple slash format
mark_best_in_cell = True
mask_best_in_col = True

single_subrow_formatter = '{}'
double_subrow_formatter = '\tworowsubtablecenter{{{}}}{{{}}}'
triple_subrow_formatter = '\tworowsubtablecenter{{{}}}{{\tworowsubtablecenter{{{}}}{{{}}}}}'
formatters = {1:single_subrow_formatter, 2:double_subrow_formatter, 3:triple_subrow_formatter}

dses = df["Dataset"].unique()
subrow_formatter = formatters[len(dses)]

print(f'Slash dataset ordering is: {dses}')
res_slash_agg = []
for i_case in df['Case'].unique():
    res_slash = {'Case':i_case}
    res_slash.update({'Data':subrow_formatter.format(*[d[0].upper() for d in dses])})
    for met_k, met_v in metrics_of_interest.items():
        res_met_slash = []
        for dataset in df['Dataset'].unique():
            met_res = df[(df['Case'] == i_case) & (df['Dataset'] == dataset)][met_k]
            if expansion_types[met_k] == 'max-over-dict':
                met_vals = [np.max(met.values()) for met in met_res]
            elif expansion_types[met_k] == 'max-over-list':
                met_vals = [np.max(met) for met in met_res]
            elif expansion_types[met_k] == 'value-at-middle':
                met_vals = [np.median(met) for met in met_res]  # median accomplishes middle value
            elif expansion_types[met_k] is None:
                met_vals = met_res
            else:
                raise NotImplementedError('Expansion {} not implemented'.format(expansion_types[met_k]))
            mn, std = np.nanmedian(met_vals), np.nanstd(met_vals)
            res_met_slash.append((mn, std))
            
        # slash format -- underlining best in cell
        if mark_best_in_cell and (not all([np.isnan(mn) for mn, _ in res_met_slash])):
            best_idx = metric_best_evaluator[met_k]([mn for mn, _ in res_met_slash])
        else:
            best_idx = None
        res_met_slash_new = []
        for i, (mn, std) in enumerate(res_met_slash):
            if np.isnan(mn):
                wstr = 'N/A'
            else:
                if (best_idx is not None) and (i==best_idx):
                    wstr = f'\\underline{{{mn:4.2f} +/- {std:4.2f}}}'
                else:
                    wstr = f'{mn:4.2f} +/- {std:4.2f}'
            res_met_slash_new.append(wstr)

        # Format the whole slash
        res_sla = subrow_formatter.format(*res_met_slash_new)
        res_slash.update({met_v:res_sla})
    res_slash_agg.append(res_slash)

Slash dataset ordering is: ['KITTI' 'nuScenes']


  r, k = function_base._ureduce(a, func=_nanmedian, axis=axis, out=out,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


In [7]:
pd.set_option('display.max_colwidth',1000)
df_slash = pd.DataFrame(res_slash_agg)
lat_str = df_slash.to_latex(index=False, multirow=True, escape=False).replace('\\\\\n', '\\\\ \\midrule\n')
print(lat_str)

\begin{tabular}{rllllllllll}
\toprule
 Case &                        Data &                                                   Per: 3D Prec. &                                                    Per: 3D Rec. &                                                   Per: 2D Prec. &                                                    Per: 2D Rec. &                                                       Trk: HOTA &                                                       Trk: MOTA &                                                       Trk: MOTP &                                                       Pred: ADE &                                                       Pred: FDE \\ \midrule
\midrule
    0 & \tworowsubtablecenter{K}{N} & \tworowsubtablecenter{0.57 +/- 0.20}{\underline{0.99 +/- 0.01}} & \tworowsubtablecenter{\underline{0.90 +/- 0.00}}{0.25 +/- 0.07} &                                 \tworowsubtablecenter{N/A}{N/A} &                                 \tworowsubtablecenter{N/A}{N/A} & \tworowsu

  lat_str = df_slash.to_latex(index=False, multirow=True, escape=False).replace('\\\\\n', '\\\\ \\midrule\n')
