# BayScen Evaluation - Paper Results Replication

This notebook evaluates BayScen and baseline methods, generating all tables from the paper:
- **Table II:** Safety-Critical Scenario Discovery (Effectiveness)
- **Table III:** Realism Analysis
- **Table IV:** 3-Way Coverage Quality

## Setup

In [1]:
import pandas as pd
from pathlib import Path

from metrics import (
    evaluate_all_methods,
    generate_table_ii,
    generate_table_iii,
    generate_table_iv,
    save_paper_tables
)

import warnings
warnings.filterwarnings('ignore')

## Configuration

In [2]:
# Choose scenario
SCENARIO = 1  # 1 or 2

# Define paths
scenario_folder = Path(f"Scenario{SCENARIO} Generated Scenarios")
json_folder = Path(f"Scenario{SCENARIO} Execution Results (JSON)")
real_data_path = Path("../data/processed/bayscen_final_data.csv")

# Methods to evaluate
methods = ['bayscen', 'random', 'sitcov', 'PICT_3w', 'PICT_2w', 'CTBC']

print(f"Evaluating Scenario {SCENARIO}")
print(f"Methods: {', '.join(methods)}")

Evaluating Scenario 1
Methods: bayscen, random, sitcov, PICT_3w, PICT_2w, CTBC


## Define Attributes and Domains

In [3]:
if SCENARIO == 1:
    attributes = [
        'Cloudiness', 'WindIntensity', 'Precipitation', 'PrecipitationDeposits',
        'Wetness', 'FogDensity', 'RoadFriction', 'FogDistance'
    ]
    parameter_domains = {
        'Cloudiness': [0, 20, 40, 60, 80, 100],
        'WindIntensity': [0, 20, 40, 60, 80, 100],
        'Precipitation': [0, 20, 40, 60, 80, 100],
        'PrecipitationDeposits': [0, 20, 40, 60, 80, 100],
        'Wetness': [0, 20, 40, 60, 80, 100],
        'RoadFriction': [0.1, 0.2, 0.4, 0.6, 0.8, 1.0],
        'FogDensity': [0, 20, 40, 60, 80, 100],
        'FogDistance': [0, 20, 40, 60, 80, 100]
    }
else:  # Scenario 2
    attributes = [
        'TimeOfDay', 'Cloudiness', 'WindIntensity', 'Precipitation',
        'PrecipitationDeposits', 'Wetness', 'FogDensity',
        'RoadFriction', 'FogDistance'
    ]
    parameter_domains = {
        'TimeOfDay': [-90, -60, -30, 0, 30, 60, 90],
        'Cloudiness': [0, 20, 40, 60, 80, 100],
        'WindIntensity': [0, 20, 40, 60, 80, 100],
        'Precipitation': [0, 20, 40, 60, 80, 100],
        'PrecipitationDeposits': [0, 20, 40, 60, 80, 100],
        'Wetness': [0, 20, 40, 60, 80, 100],
        'RoadFriction': [0.1, 0.2, 0.4, 0.6, 0.8, 1.0],
        'FogDensity': [0, 20, 40, 60, 80, 100],
        'FogDistance': [0, 20, 40, 60, 80, 100]
    }

## Run Evaluation (6-8 minutes)

In [4]:
# Evaluate all methods
results = evaluate_all_methods(
    scenario_folder=scenario_folder,
    json_folder=json_folder,
    real_data_path=real_data_path,
    methods=methods,
    attributes=attributes,
    parameter_domains=parameter_domains,
    output_file=Path(f"results/results_scenario{SCENARIO}.csv")
)

print("\n✓ Evaluation complete!")

BAYSCEN EVALUATION - ALL METHODS

Real data: 41767 observations
Methods to evaluate: 6
Attributes: ['Cloudiness', 'WindIntensity', 'Precipitation', 'PrecipitationDeposits', 'Wetness', 'FogDensity', 'RoadFriction', 'FogDistance']


Evaluating methods:   0%|                                                                        | 0/6 [00:00<?, ?it/s]


Evaluating bayscen...
  Scenarios: 648


Evaluating methods:  17%|██████████▋                                                     | 1/6 [00:54<04:30, 54.13s/it]

  ✓ Complete

Evaluating random...
  Scenarios: 648


Evaluating methods:  33%|█████████████████████▎                                          | 2/6 [01:57<03:58, 59.70s/it]

  ✓ Complete

Evaluating sitcov...
  Scenarios: 648


Evaluating methods:  50%|████████████████████████████████                                | 3/6 [03:00<03:03, 61.00s/it]

  ✓ Complete

Evaluating PICT_3w...
  Scenarios: 456


Evaluating methods:  67%|██████████████████████████████████████████▋                     | 4/6 [04:00<02:01, 60.74s/it]

  ✓ Complete

Evaluating PICT_2w...
  Scenarios: 61


Evaluating methods:  83%|█████████████████████████████████████████████████████▎          | 5/6 [04:57<00:59, 59.22s/it]

  ✓ Complete

Evaluating CTBC...
  Scenarios: 95


Evaluating methods: 100%|████████████████████████████████████████████████████████████████| 6/6 [05:54<00:00, 59.11s/it]

  ✓ Complete

✓ Results saved to results\results_scenario1.csv

✓ Evaluation complete!





## Table II: Safety-Critical Scenario Discovery (RQ1: Effectiveness)

Reports both absolute counts and normalized rates.

In [6]:
table_ii = generate_table_ii(results)
table_ii

Unnamed: 0,Method,N,TTC<0.5 Count (#),TTC<0.5 Rate (%),Collision (≥2/3) Count (#),Collision (≥2/3) Rate (%),Collision (3/3) Count (#),Collision (3/3) Rate (%)
0,bayscen,648,15,2.3,32,4.9,11,1.7
1,random,648,12,1.9,44,6.8,8,1.2
2,sitcov,648,13,2.0,34,5.2,4,0.6
3,PICT_3w,456,6,1.3,20,4.4,1,0.2
4,PICT_2w,61,1,1.6,2,3.3,0,0.0
5,CTBC,95,0,0.0,1,1.1,0,0.0


## Table III: Realism Analysis of Discovered Safety-Critical Scenarios

In [7]:
table_iii = generate_table_iii(results)
table_iii

Unnamed: 0,Method,Overall Realism (%),Mean TTC < 0.5 Count,Mean TTC < 0.5 Realistic (#),Mean TTC < 0.5 Realism (%),Collisions (≥2/3) Count,Collisions (≥2/3) Realistic (#),Collisions (≥2/3) Realism(%)
0,bayscen,83.5,15,10,73.3,32,24,78.1
1,random,30.4,12,3,33.3,44,15,34.1
2,sitcov,31.6,13,4,30.8,34,14,44.1
3,PICT_3w,33.5,6,4,66.7,20,9,45.0
4,PICT_2w,29.5,1,0,0.0,2,0,0.0
5,CTBC,29.5,0,0,0.0,1,0,0.0


## Table IV: 3-Way Coverage Quality Analysis

In [8]:
table_iv = generate_table_iv(results)
table_iv

Unnamed: 0,Method,All triples,Real triples,Precision,F1,N
0,bayscen,59.6%,94.0%,79.5%,86.1,648
1,random,94.9%,95.0%,50.5%,66.0,648
2,sitcov,94.7%,94.9%,50.5%,65.9,648
3,PICT_3w,100.0%,100.0%,50.4%,67.0,456
4,PICT_2w,27.1%,27.0%,50.2%,35.1,61
5,CTBC,33.9%,33.4%,49.7%,40.0,95


## Save All Tables

In [9]:
# Save all paper tables to CSV in results folder

output_dir = Path("results")
output_dir.mkdir(exist_ok=True)

save_paper_tables(
    results, 
    output_prefix=f"paper_scenario{SCENARIO}",
    output_dir=output_dir
)

print("\n✓ All tables saved in results/ folder!")
print(f"  - results/paper_scenario{SCENARIO}_II_effectiveness.csv")
print(f"  - results/paper_scenario{SCENARIO}_III_realism.csv")
print(f"  - results/paper_scenario{SCENARIO}_IV_coverage.csv")


✓ Saved Table II to results\paper_scenario1_II_effectiveness.csv
✓ Saved Table III to results\paper_scenario1_III_realism.csv
✓ Saved Table IV to results\paper_scenario1_IV_coverage.csv

TABLE II: Safety-Critical Scenario Discovery (RQ1: Effectiveness)
 Method   N  TTC<0.5 Count (#)  TTC<0.5 Rate (%)  Collision (≥2/3) Count (#)  Collision (≥2/3) Rate (%)  Collision (3/3) Count (#)  Collision (3/3) Rate (%)
bayscen 648                 15               2.3                          32                        4.9                         11                       1.7
 random 648                 12               1.9                          44                        6.8                          8                       1.2
 sitcov 648                 13               2.0                          34                        5.2                          4                       0.6
PICT_3w 456                  6               1.3                          20                        4.4               

### Redo the same steps for Scenario 2, or run the following command in a terminal:

```
python evaluate.py --scenario 2
```

## Summary

✓ All paper tables replicated successfully!

The CSV files are ready for:
- Direct inclusion in spreadsheet software
- Further analysis
- Comparison across scenarios