### RefAV Tutorial
To start, we separate the scenario mining ground truth annotations into separate log folders.

At the same time, we create the ground truth .pkl files we can use to evaluate performance later.

In [None]:
from pathlib import Path
import os
from refAV.paths import AV2_DATA_DIR, SM_DATA_DIR
from refAV.dataset_conversion import separate_scenario_mining_annotations, create_gt_mining_pkls_parallel

sm_val_feather = Path('av2_sm_downloads/scenario_mining_val_annotations.feather')
separate_scenario_mining_annotations(sm_val_feather, SM_DATA_DIR)
create_gt_mining_pkls_parallel(sm_val_feather, SM_DATA_DIR, num_processes=max(1, int(.5*os.cpu_count())))

RefAV works by constructing compositional functions that can be used to define a scenario.

Here is an example of using the compositional functions to define a scenario corresponding 
to a "moving vehicle behind another vehicle being crossed by a jaywalking pedestrian'. 

In [None]:
from refAV.utils import *
from refAV.paths import SM_PRED_DIR

dataset_dir = SM_DATA_DIR / 'val'
output_dir = SM_PRED_DIR / 'val'
log_id = '0b86f508-5df9-4a46-bc59-5b9536dbde9f'
log_dir = dataset_dir / log_id

description = 'moving vehicle behind another vehicle being crossed by a jaywalking pedestrian'

peds = get_objects_of_category(log_dir, category='PEDESTRIAN')
peds_on_road = on_road(peds, log_dir)
jaywalking_peds = scenario_not(at_pedestrian_crossing)(peds_on_road, log_dir)

vehicles = get_objects_of_category(log_dir, category='VEHICLE')
moving_vehicles = scenario_and([in_drivable_area(vehicles, log_dir), scenario_not(stationary)(vehicles, log_dir)])
crossed_vehicles = being_crossed_by(moving_vehicles, jaywalking_peds, log_dir)
behind_crossed_vehicle = get_objects_in_relative_direction(crossed_vehicles, moving_vehicles, log_dir,
											direction='backward', max_number=1, within_distance=25)

#Output scenario outputs a .pkl for the predicted tracks during that scenario
output_scenario(behind_crossed_vehicle, description, log_dir, output_dir)

Now we know how to define a scenario, let's let an LLM do it for us.

Using this function requires an [Anthropic API](https://www.anthropic.com/api) key!

Since this can get quite expensive,
we provide the predicted scenario definitions in the output/llm_scenario_predictions folder.

In [None]:
from refAV.scenario_prediction import predict_scenario_from_description
from refAV.paths import LLM_DEF_DIR

predict_scenario_from_description('vehicle heading toward ego from the side while at an intersection', LLM_DEF_DIR)

With the basics out of the way, let's run evaluation on the entire validation dataset.
The create_base_prediction function calls the LLM scenario definition generator and the
 runs the defintion to find instance of the prompt.
  It can take quite a bit of time to go through all of the logs.

In [None]:
import json
from tqdm import tqdm
from refAV.eval import create_baseline_prediction

log_prompt_input_path = Path('av2_sm_downloads/log_prompt_pairs_val.json')
with open(log_prompt_input_path, 'rb') as f:
	log_prompts = json.load(f)

for log_id, prompts in tqdm(log_prompts.items()):
	for prompt in prompts:
		create_baseline_prediction(prompt, log_id, SM_PRED_DIR, LLM_DEF_DIR)

The combine_matching_pkls function will combine all prediction and ground truth .pkl files into a single .pkl file. This is the .pkl file that is used for submission to the leaderboard. Running evaluate_pkls will the predicted tracks across four metrics: HOTA-Temporal, HOTA, timestamp-level F1, and scenario-level F1.

In [None]:
from refAV.eval import combine_matching_pkls, evaluate_pkls

eval_output_dir = Path(f'output/evaluation/val')
combine_matching_pkls(SM_DATA_DIR, SM_PRED_DIR, eval_output_dir)
evaluate_pkls(eval_output_dir / 'combined_predictions.pkl', eval_output_dir / 'combined_gt.pkl')

