### RefAV Tutorial
To start, we separate the scenario mining ground truth annotations into separate log folders.

At the same time, we create the ground truth .pkl files we can use to evaluate performance later. 

In [None]:
from pathlib import Path
import os
from refAV.paths import SM_DATA_DIR
from refAV.dataset_conversion import separate_scenario_mining_annotations, create_gt_mining_pkls_parallel

sm_val_feather = Path('av2_sm_downloads/scenario_mining_val_annotations.feather')
separate_scenario_mining_annotations(sm_val_feather, SM_DATA_DIR / 'val')
create_gt_mining_pkls_parallel(sm_val_feather, SM_DATA_DIR / 'val', num_processes=max(1, int(.9*os.cpu_count())))

RefAV takes any base set of predicted tracks runs a set of filtering operations to identify the relevant portions of each track.
The baseline uses the track predictions from the winner of the 2024  AV2 End-to-End Forecasting challenge: Le3DE2E. This block downloads the track predictions for the val set from Hugging Face.

In [None]:
from huggingface_hub import hf_hub_download
from refAV.dataset_conversion import pickle_to_feather
from refAV.paths import SM_PRED_DIR, AV2_DATA_DIR
from pathlib import Path

repo_id = "CainanD/AV2_Tracker_Predictions"
filename = "Le3DE2E_tracking_predictions_val.pkl"
tracker_predictions_dir = 'tracker_predictions'

hf_hub_download(repo_id, filename, repo_type='dataset', local_dir=tracker_predictions_dir)

tracking_val_predictions = Path(tracker_predictions_dir + '/' + filename)

pickle_to_feather(AV2_DATA_DIR / 'val', tracking_val_predictions, SM_PRED_DIR / 'val')

RefAV works by constructing compositional functions that can be used to define a scenario.

Here is an example of using the compositional functions to define a scenario corresponding 
to a "moving vehicle behind another vehicle being crossed by a jaywalking pedestrian'. 

In [None]:
from refav.utils import *
from refav.paths import SM_PRED_DIR, SM_DATA_DIR
from IPython.display import Video

dataset_dir = SM_DATA_DIR / 'val'
output_dir = SM_PRED_DIR / 'val'
log_id = '0b86f508-5df9-4a46-bc59-5b9536dbde9f'
log_dir = dataset_dir / log_id

description = 'vehicle behind another vehicle being crossed by a jaywalking pedestrian'

peds = get_objects_of_category(log_dir, category='PEDESTRIAN')
peds_on_road = on_road(peds, log_dir)
jaywalking_peds = scenario_not(at_pedestrian_crossing)(peds_on_road, log_dir)

vehicles = get_objects_of_category(log_dir, category='VEHICLE')
moving_vehicles = scenario_and([in_drivable_area(vehicles, log_dir), scenario_not(stationary)(vehicles, log_dir)])
crossed_vehicles = being_crossed_by(moving_vehicles, jaywalking_peds, log_dir)
behind_crossed_vehicle = get_objects_in_relative_direction(crossed_vehicles, moving_vehicles, log_dir,
											direction='backward', max_number=1, within_distance=25)

#Output scenario outputs a .pkl and .mp4 for the predicted tracks during that scenario
output_scenario(behind_crossed_vehicle, description, log_dir, output_dir, visualize=True)

Video('output/experiments/val/0b86f508-5df9-4a46-bc59-5b9536dbde9f/scenario visualizations/vehicle behind another vehicle being crossed by a jaywalking pedestrian_n6.mp4')

Now we know how to define a scenario, let's let an LLM do it for us.

This tutorial supports three different LLMs:
1. [qwen-2-5-7b](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
2. [gemini-2-0-flash-thinking](https://aistudio.google.com/prompts/new_chat)
3. [claude-3-5-sonnet](https://www.anthropic.com/api)

Qwen is an open-source, open-weight model that is run locally. Gemini requires a free API key through AI Studio. Claude requires a paid API key through Anthropic. Since Claude is a paid model, we provide the predicted scenario definitions in the output/llm_scenario_predictions/claude-sonnnet-3-5 folder.

In [None]:
from refav.scenario_prediction import predict_scenario_from_description
from refav.paths import LLM_DEF_DIR

prompt = 'vehicle heading towards ego from the side while at an intersection'
model = 'gemini-2-0-flash-thinking'

predict_scenario_from_description(prompt, LLM_DEF_DIR, model)

with open(LLM_DEF_DIR/model/(prompt + '.txt'), 'r') as def_file:
    definition_text = def_file.read()
    print(definition_text)

With the basics out of the way, let's run evaluation on the entire validation dataset.
The create_base_prediction function calls the LLM scenario definition generator and the
 runs the defintion to find instance of the prompt.
  It can take quite a bit of time to go through all of the logs.

In [None]:
import json
from tqdm import tqdm
from pathlib import Path
from refav.paths import SM_PRED_DIR, LLM_DEF_DIR
from refav.eval import create_baseline_prediction

log_prompt_input_path = Path('av2_sm_downloads/log_prompt_pairs_val.json')
with open(log_prompt_input_path, 'rb') as f:
	log_prompts = json.load(f)

method_name = 'qwen-2-5-7b'
for i, (log_id, prompts) in enumerate(log_prompts.items()):
	print(log_id)
	for prompt in tqdm(prompts, desc=f'{i}/{len(log_prompts)}'):
		create_baseline_prediction(prompt, log_id, SM_PRED_DIR / 'val', LLM_DEF_DIR, method_name=method_name)

The combine_matching_pkls function will combine all prediction and ground truth .pkl files into a single .pkl file. This is the .pkl file that is used for submission to the leaderboard. Running evaluate_pkls will the predicted tracks across four metrics: HOTA-Temporal, HOTA, timestamp-level F1, and scenario-level F1.

In [None]:
from pathlib import Path
from refav.paths import SM_DATA_DIR, SM_PRED_DIR
from refav.eval import combine_pkls, evaluate_pkls

eval_output_dir = Path(f'output/evaluation/val')
combine_pkls(SM_DATA_DIR / 'val', SM_PRED_DIR / 'val', eval_output_dir, method_name=method_name)
metrics = evaluate_pkls(eval_output_dir / f'{method_name}_predictions.pkl', eval_output_dir / 'combined_gt.pkl')

print_indented_dict(metrics)

If the evaluate function executes successfully, your predictions .pkl file is ready to submit to the EvalAI server for evaluation. Create a profile at EvalAI [EvalAI](https://eval.ai/) in to receive an account token. This code will submit to the validation set. This should take about 10-30 minutes to evaluate, depending on the number of predicted tracks.

```bash
pip install evalai
evalai set_token <EvalAI_account_token>
evalai challenge 2469 phase 4899 submit --file output/evaluation/val/combined_predictions.pkl --large
```