# Bayesian Network Parametrization - BayScen

This notebook demonstrates how to use the BayScen parametrization pipeline to train a complete Bayesian Network for both scenarios.

**Key Improvements:**
- ✅ Single source of truth: Uses `abstract_variables.py` for all abstraction definitions
- ✅ Scenario selection: Supports both Scenario 1 (Vehicle-Vehicle) and Scenario 2 (Vehicle-Cyclist)
- ✅ No code duplication: Automatically extracts structure from abstract variable definitions

**Pipeline Overview:**
1. Load weather data from Norwegian Meteorological Institute
2. Load learned network structure
3. Fit base model with Bayesian estimation (BDeu prior)
4. Extend with abstracted variables (from abstract_variables.py)
5. Add T-junction variables (positions and Collision_Point)
6. Save trained models

**Output:**
- `fitted_bayesian_network.pkl`: Base environmental model
- `extended_bayesian_network.pkl`: Model with abstracted variables
- `scenario{N}_full_bayesian_network.pkl`: Complete model ready for scenario generation

## Setup and Imports

In [None]:
import sys
from pathlib import Path

# Add bayscen to path
sys.path.append(str(Path.cwd().parent))

from modeling.bn_parametrization import BayesianNetworkParametrizer
from modeling.bn_utils import load_model, print_cpd_as_dataframe
from abstraction.abstract_variables import ABSTRACT_VARIABLES

import warnings
warnings.filterwarnings('ignore')

## Verify Abstraction Definitions

First, let's verify that the abstracted variables are properly defined in `abstract_variables.py`:

In [None]:
# Display abstracted variable definitions
print("Abstracted Variables from abstract_variables.py:")
print("=" * 70)

for name, var in ABSTRACT_VARIABLES.items():
    if name != 'Collision_Point':  # Skip deterministic variable
        print(f"\n{var.name}:")
        print(f"  Values: {var.values}")
        print(f"  Description: {var.description}")
        print(f"  Parents:")
        for parent_info in var.parents:
            if len(parent_info) == 3:
                parent, rel, weight = parent_info
                print(f"    - {parent} ({rel}, weight={weight})")

## Scenario 1: Vehicle-Vehicle

Train the model for Scenario 1 (8 environmental variables, no Time_of_Day)

In [None]:
# Define paths
data_path = "../../data/processed/bayscen_final_data.csv"
structure_path = "structure_learning/learned_structures/scenario1_structure.txt"

# Initialize parametrizer for Scenario 1
param_s1 = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=structure_path,
    scenario=1  # Vehicle-Vehicle
)

# Run complete pipeline
param_s1.run_full_pipeline()

## Scenario 2: Vehicle-Cyclist

Train the model for Scenario 2 (9 environmental variables, includes Time_of_Day)

In [None]:
# Initialize parametrizer for Scenario 2
param_s2 = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=structure_path,  # Can reuse same structure
    scenario=2  # Vehicle-Cyclist with Time_of_Day
)

# Run complete pipeline
param_s2.run_full_pipeline()

## Alternative: Step-by-Step Execution

For more control, you can execute each step individually:

In [None]:
# Create new parametrizer instance
param = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=structure_path,
    scenario=1
)

# Step 1: Load data
param.load_data()
print(f"\nData shape: {param.data.shape}")
print(param.data.head())

In [None]:
# Step 2: Load structure
param.load_structure()
print(f"\nNetwork has {len(param.edges)} edges")

In [None]:
# Verify abstracted edges were extracted correctly
print("\nAbstracted edges extracted from abstract_variables.py:")
for parent, child, rel, weight in param.abstracted_edges:
    print(f"  {parent} -> {child} ({rel}, {weight})")

In [None]:
# Step 3: Fit base model
param.fit_base_model()

In [None]:
# Step 4: Add abstracted variables (uses abstract_variables.py)
param.extend_with_abstractions()

In [None]:
# Step 5: Add T-junction variables
param.add_tjunction_variables()

In [None]:
# Step 6: Save models
param.save_models()

## Inspecting the Trained Model

In [None]:
# Load the final model (Scenario 1)
full_model = load_model('models/scenario1_full_bayesian_network.pkl')

print(f"\nModel has {len(full_model.nodes())} nodes:")
print(sorted(full_model.nodes()))

In [None]:
# Inspect CPD for an abstracted variable
visibility_cpd = full_model.get_cpds('Visibility')

print("Visibility CPD (first 10 rows):")
df = print_cpd_as_dataframe(visibility_cpd, max_rows=10)
print(df)

In [None]:
# Inspect Road_Surface CPD
road_surface_cpd = full_model.get_cpds('Road_Surface')

print("Road_Surface CPD (first 10 rows):")
df = print_cpd_as_dataframe(road_surface_cpd, max_rows=10)
print(df)

In [None]:
# Inspect Collision_Point CPD (deterministic mapping)
collision_cpd = full_model.get_cpds('Collision_Point')

print("Collision_Point CPD (sample of 20 rows):")
df = print_cpd_as_dataframe(collision_cpd, max_rows=20)
print(df)

## Compare Scenarios

In [None]:
# Load both models
model_s1 = load_model('models/scenario1_full_bayesian_network.pkl')
model_s2 = load_model('models/scenario2_full_bayesian_network.pkl')

print("Scenario Comparison:")
print("=" * 70)
print(f"Scenario 1 (Vehicle-Vehicle):")
print(f"  Total nodes: {len(model_s1.nodes())}")
print(f"  Total edges: {len(model_s1.edges())}")

print(f"\nScenario 2 (Vehicle-Cyclist):")
print(f"  Total nodes: {len(model_s2.nodes())}")
print(f"  Total edges: {len(model_s2.edges())}")

# Find difference
s1_nodes = set(model_s1.nodes())
s2_nodes = set(model_s2.nodes())
extra_in_s2 = s2_nodes - s1_nodes

print(f"\nExtra nodes in Scenario 2: {extra_in_s2}")

## Verify Model Properties

In [None]:
# Check model validity
is_valid = full_model.check_model()
print(f"Model is valid: {is_valid}")

# Check for cycles (should be False for a valid DAG)
import networkx as nx
has_cycles = not nx.is_directed_acyclic_graph(full_model)
print(f"Model has cycles: {has_cycles}")

In [None]:
# Count variable types
environmental_vars = [
    "Cloudiness", "Wind_Intensity", "Precipitation",
    "Precipitation_Deposits", "Wetness", "Fog_Density",
    "Road_Friction", "Fog_Distance"
]

abstracted_vars = ["Road_Surface", "Vehicle_Stability", "Visibility"]

tjunction_vars = [
    "Start_Ego", "Goal_Ego", "Start_Other", "Goal_Other", "Collision_Point"
]

print(f"Environmental variables: {len(environmental_vars)}")
print(f"Abstracted variables: {len(abstracted_vars)}")
print(f"T-junction variables: {len(tjunction_vars)}")
print(f"Total (Scenario 1): {len(environmental_vars) + len(abstracted_vars) + len(tjunction_vars)}")

## Verify Abstraction Extraction

Confirm that the abstracted edges were correctly extracted from `abstract_variables.py`:

In [None]:
# Create a new parametrizer to check extraction
test_param = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=structure_path,
    scenario=1
)

print("Extracted Abstracted Edges:")
print("=" * 70)
print(f"Total edges: {len(test_param.abstracted_edges)}\n")

# Group by child for display
from collections import defaultdict
by_child = defaultdict(list)
for parent, child, rel, weight in test_param.abstracted_edges:
    by_child[child].append((parent, rel, weight))

for child, parents in sorted(by_child.items()):
    print(f"\n{child}:")
    total_weight = sum(w for _, _, w in parents)
    for parent, rel, weight in parents:
        percentage = (weight / total_weight) * 100
        print(f"  <- {parent} ({rel}, {weight:.2f} = {percentage:.1f}%)")
    print(f"  Total weight: {total_weight:.2f}")

print("\n✓ All abstraction structure extracted from abstract_variables.py")

## Command Line Usage

You can also run the parametrization from the command line:

```bash
# Scenario 1 (default)
python bn_parametrization.py

# Scenario 2 (with Time_of_Day)
python bn_parametrization.py --scenario 2
```

## Next Steps

The trained models are now ready for scenario generation! Use them with:

1. **Combinatorial Testing**: Generate all combinations of abstracted variable values (3 × 6 × 6 × 6 = 648 configurations)
2. **Conditional Sampling**: For each configuration, sample concrete scenarios from the BN
3. **Rarity Prioritization**: Focus on rare but valid parameter combinations
4. **Diversity Selection**: Ensure diverse coverage of the parameter space

See the `scenario_generation.py` module for the complete generation pipeline.

## Key Improvements in This Version

### ✅ Single Source of Truth
- All abstraction definitions come from `abstract_variables.py`
- No duplication of parent-child relationships and weights
- Automatic extraction via `_extract_abstracted_edges()`

### ✅ Scenario Selection
- Single script handles both Scenario 1 and Scenario 2
- Automatic variable selection based on scenario
- Command-line argument for easy switching

### ✅ Better Maintainability
- Update `abstract_variables.py` once → changes propagate automatically
- No need to manually sync weights/relationships
- Reduces risk of inconsistencies