# Bayesian Network Parametrization - BayScen

This notebook demonstrates how to use the BayScen parametrization pipeline to train a complete Bayesian Network for both scenarios.

**Pipeline Overview:**
1. Load weather data from Norwegian Meteorological Institute
2. Load learned network structure
3. Fit base model with Bayesian estimation (BDeu prior)
4. Extend with abstracted variables (from abstract_variables.py)
5. Add T-junction variables (positions and Collision_Point)
6. Save trained models

**Output:**
- `fitted_bayesian_network.pkl`: Base environmental model
- `extended_bayesian_network.pkl`: Model with abstracted variables
- `scenario{N}_full_bayesian_network.pkl`: Complete model ready for scenario generation

## Setup and Imports

In [1]:
import sys
from pathlib import Path

# Add bayscen to path
sys.path.append(str(Path.cwd().parent))

from modeling.bn_parametrization import BayesianNetworkParametrizer
from modeling.bn_utils import load_model, print_cpd_as_dataframe
from abstraction.abstract_variables import ABSTRACT_VARIABLES

import warnings
warnings.filterwarnings('ignore')

## Verify Abstraction Definitions

First, let's verify that the abstracted variables are properly defined in `abstract_variables.py`:

In [2]:
# Display abstracted variable definitions
print("Abstracted Variables from abstract_variables.py:")
print("=" * 70)

for name, var in ABSTRACT_VARIABLES.items():
    if name != 'Collision_Point':  # Skip deterministic variable
        print(f"\n{var.name}:")
        print(f"  Values: {var.values}")
        print(f"  Description: {var.description}")
        print(f"  Parents:")
        for parent_info in var.parents:
            if len(parent_info) == 3:
                parent, rel, weight = parent_info
                print(f"    - {parent} ({rel}, weight={weight})")

Abstracted Variables from abstract_variables.py:

Visibility:
  Values: [0, 20, 40, 60, 80, 100]
  Description: Sensor detection range and object recognition capability
  Parents:
    - Fog_Density (inverse, weight=0.45)
    - Fog_Distance (normal, weight=0.45)
    - Precipitation (inverse, weight=0.1)

Road_Surface:
  Values: [0, 20, 40, 60, 80, 100]
  Description: Vehicle-road traction dynamics and braking capability
  Parents:
    - Road_Friction (normal, weight=0.6)
    - Wetness (inverse, weight=0.2)
    - Precipitation_Deposits (inverse, weight=0.2)

Vehicle_Stability:
  Values: [0, 20, 40, 60, 80, 100]
  Description: External destabilizing forces on vehicle trajectory
  Parents:
    - Wind_Intensity (inverse, weight=0.2)
    - Road_Friction (normal, weight=0.8)


## Scenario 1: Vehicle-Vehicle

Train the model for Scenario 1 (8 environmental variables, no Time_of_Day)

In [3]:
# Define paths
data_path = "../../data/processed/bayscen_final_data.csv"
scenario1_structure_path = "structure_learning/learned_structures/scenario1_structure.txt"

# Initialize parametrizer for Scenario 1
param_s1 = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=scenario1_structure_path,
    scenario=1  # Vehicle-Vehicle
)

# Run complete pipeline
param_s1.run_full_pipeline()


BAYESIAN NETWORK PARAMETRIZATION - SCENARIO 1

STEP 1: LOADING DATA (Scenario 1)
✓ Loaded data from ../../data/processed/bayscen_final_data.csv
  Total observations: 41767
✓ Selected 8 base variables
  Variables: Cloudiness, Wind_Intensity, Precipitation, Precipitation_Deposits, Wetness, Fog_Density, Road_Friction, Fog_Distance

STEP 2: LOADING NETWORK STRUCTURE
✓ Loaded structure from structure_learning/learned_structures/scenario1_structure.txt
  Total edges: 9

  Edges:
    Cloudiness -> Precipitation
    Wind_Intensity -> Fog_Density
    Precipitation -> Precipitation_Deposits
    Precipitation -> Fog_Density
    Precipitation_Deposits -> Wetness
    Precipitation_Deposits -> Road_Friction
    Wetness -> Road_Friction
    Fog_Density -> Fog_Distance
    Fog_Density -> Wetness

STEP 3: FITTING BASE MODEL
Estimating parameters with Bayesian estimation (BDeu prior, ESS=5)...
✓ Base model fitted successfully
  Model has 8 nodes
  Model has 9 edges
✓ Model validation passed

STEP 4: AD

## Scenario 2: Vehicle-Cyclist

Train the model for Scenario 2 (9 environmental variables, includes Time_of_Day)

In [4]:
# Initialize parametrizer for Scenario 2
scenario2_structure_path = "structure_learning/learned_structures/scenario2_structure.txt"
param_s2 = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=scenario2_structure_path,  # Can reuse same structure
    scenario=2  # Vehicle-Cyclist with Time_of_Day
)

# Run complete pipeline
param_s2.run_full_pipeline()


BAYESIAN NETWORK PARAMETRIZATION - SCENARIO 2

STEP 1: LOADING DATA (Scenario 2)
✓ Loaded data from ../../data/processed/bayscen_final_data.csv
  Total observations: 41767
✓ Selected 9 base variables
  Variables: Cloudiness, Wind_Intensity, Precipitation, Precipitation_Deposits, Wetness, Fog_Density, Road_Friction, Fog_Distance, Time_of_Day

STEP 2: LOADING NETWORK STRUCTURE
✓ Loaded structure from structure_learning/learned_structures/scenario2_structure.txt
  Total edges: 11

  Edges:
    Time_of_Day -> Cloudiness
    Time_of_Day -> Wind_Intensity
    Cloudiness -> Precipitation
    Wind_Intensity -> Fog_Density
    Precipitation -> Precipitation_Deposits
    Precipitation -> Fog_Density
    Precipitation_Deposits -> Wetness
    Precipitation_Deposits -> Road_Friction
    Wetness -> Road_Friction
    Fog_Density -> Fog_Distance
    Fog_Density -> Wetness

STEP 3: FITTING BASE MODEL
Estimating parameters with Bayesian estimation (BDeu prior, ESS=5)...
✓ Base model fitted successfully

## Alternative: Step-by-Step Execution

For more control, you can execute each step individually:

In [59]:
# Create new parametrizer instance
param = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=scenario1_structure_path,
    scenario=1
)

# Step 1: Load data
param.load_data()
print(f"\nData shape: {param.data.shape}")
print(param.data.head())


STEP 1: LOADING DATA (Scenario 1)
✓ Loaded data from ../../data/processed/bayscen_final_data.csv
  Total observations: 41767
✓ Selected 8 base variables
  Variables: Cloudiness, Wind_Intensity, Precipitation, Precipitation_Deposits, Wetness, Fog_Density, Road_Friction, Fog_Distance

Data shape: (41767, 8)
   Cloudiness  Wind_Intensity  Precipitation  Precipitation_Deposits  Wetness  \
0          60              40              0                       0       60   
1          60              60              0                       0      100   
2          40              80              0                      20       40   
3          20              80              0                      20       60   
4          20              80              0                       0      100   

   Fog_Density  Road_Friction  Fog_Distance  
0            0            0.1           100  
1            0            0.2           100  
2            0            0.8           100  
3            0       

In [60]:
# Step 2: Load structure
param.load_structure()
print(f"\nNetwork has {len(param.edges)} edges")


STEP 2: LOADING NETWORK STRUCTURE
✓ Loaded structure from structure_learning/learned_structures/scenario1_structure.txt
  Total edges: 9

  Edges:
    Cloudiness -> Precipitation
    Wind_Intensity -> Fog_Density
    Precipitation -> Precipitation_Deposits
    Precipitation -> Fog_Density
    Precipitation_Deposits -> Wetness
    Precipitation_Deposits -> Road_Friction
    Wetness -> Road_Friction
    Fog_Density -> Fog_Distance
    Fog_Density -> Wetness

Network has 9 edges


In [61]:
# Verify abstracted edges were extracted correctly
print("\nAbstracted edges extracted from abstract_variables.py:")
for parent, child, rel, weight in param.abstracted_edges:
    print(f"  {parent} -> {child} ({rel}, {weight})")


Abstracted edges extracted from abstract_variables.py:
  Fog_Density -> Visibility (inverse, 0.45)
  Fog_Distance -> Visibility (normal, 0.45)
  Precipitation -> Visibility (inverse, 0.1)
  Road_Friction -> Road_Surface (normal, 0.6)
  Wetness -> Road_Surface (inverse, 0.2)
  Precipitation_Deposits -> Road_Surface (inverse, 0.2)
  Wind_Intensity -> Vehicle_Stability (inverse, 0.2)
  Road_Friction -> Vehicle_Stability (normal, 0.8)


In [62]:
# Step 3: Fit base model
param.fit_base_model()


STEP 3: FITTING BASE MODEL
Estimating parameters with Bayesian estimation (BDeu prior, ESS=5)...
✓ Base model fitted successfully
  Model has 8 nodes
  Model has 9 edges
✓ Model validation passed


In [63]:
# Step 4: Add abstracted variables (uses abstract_variables.py)
param.extend_with_abstractions()


STEP 4: ADDING ABSTRACTED VARIABLES
Abstracted variables structure (from abstract_variables.py):

  Visibility:
    <- Fog_Density (inverse, weight=0.45)
    <- Fog_Distance (normal, weight=0.45)
    <- Precipitation (inverse, weight=0.1)

  Road_Surface:
    <- Road_Friction (normal, weight=0.6)
    <- Wetness (inverse, weight=0.2)
    <- Precipitation_Deposits (inverse, weight=0.2)

  Vehicle_Stability:
    <- Wind_Intensity (inverse, weight=0.2)
    <- Road_Friction (normal, weight=0.8)

Computing CPDs for abstracted variables...
Copying CPDs from original model...
  ✓ Cloudiness
  ✓ Precipitation
  ✓ Wind_Intensity
  ✓ Fog_Density
  ✓ Precipitation_Deposits
  ✓ Wetness
  ✓ Road_Friction
  ✓ Fog_Distance

Computing CPDs for abstracted variables...
  Computing Visibility...
  ✓ Visibility
  Computing Road_Surface...
  ✓ Road_Surface
  Computing Vehicle_Stability...
  ✓ Vehicle_Stability

✓ Extended model is valid!

✓ Extended model has 11 nodes
  Added: Visibility, Road_Surface, Veh

In [64]:
# Step 5: Add T-junction variables
param.add_tjunction_variables()


STEP 5: ADDING T-JUNCTION VARIABLES
Added 5 T-junction nodes
  Position variables: Start_Ego, Goal_Ego, Start_Other, Goal_Other
  Collision variable: Collision_Point

Creating CPDs for position variables...
  ✓ Start_Ego
  ✓ Goal_Ego
  ✓ Start_Other
  ✓ Goal_Other

Creating CPD for Collision_Point...
  ✓ Collision_Point

✓ Full model validation passed

Collision scenarios: 12/81 result in collisions


In [65]:
# Step 6: Save models
param.save_models()


STEP 6: SAVING MODELS
✓ Model saved to C:\Users\BH280005\Documents\bayscen_code\bayscen\modeling\models\fitted_bayesian_network.pkl
✓ Model saved to C:\Users\BH280005\Documents\bayscen_code\bayscen\modeling\models\extended_bayesian_network.pkl
✓ Model saved to C:\Users\BH280005\Documents\bayscen_code\bayscen\modeling\models\scenario1_full_bayesian_network.pkl

✓ FINAL MODEL: C:\Users\BH280005\Documents\bayscen_code\bayscen\modeling\models\scenario1_full_bayesian_network.pkl


## Inspecting the Trained Model

In [66]:
# Load the final model (Scenario 1)
full_model = load_model('models/scenario1_full_bayesian_network.pkl')

print(f"\nModel has {len(full_model.nodes())} nodes:")
print(sorted(full_model.nodes()))

✓ Model loaded from models/scenario1_full_bayesian_network.pkl

Model has 16 nodes:
['Cloudiness', 'Collision_Point', 'Fog_Density', 'Fog_Distance', 'Goal_Ego', 'Goal_Other', 'Precipitation', 'Precipitation_Deposits', 'Road_Friction', 'Road_Surface', 'Start_Ego', 'Start_Other', 'Vehicle_Stability', 'Visibility', 'Wetness', 'Wind_Intensity']


In [67]:
# Inspect CPD for an abstracted variable
visibility_cpd = full_model.get_cpds('Visibility')

print("Visibility CPD (first 10 rows):")
df = print_cpd_as_dataframe(visibility_cpd, max_rows=10)
print(df)

Visibility CPD (first 10 rows):
                                             0         20        40   \
Fog_Density Fog_Distance Precipitation                                 
0           0            0              0.049069  0.109205  0.243039   
                         20             0.053834  0.119810  0.266642   
                         40             0.058691  0.130620  0.290700   
                         60             0.063579  0.141499  0.314911   
                         80             0.068437  0.152308  0.338969   
                         100            0.073202  0.162914  0.362572   
            20           0              0.034379  0.076512  0.170280   
                         20             0.036198  0.080560  0.179290   
                         40             0.037907  0.084364  0.187755   
                         60             0.042214  0.093948  0.209085   

                                             60        80        100  
Fog_Density Fog_Distance Precipi

In [68]:
# Inspect Road_Surface CPD
road_surface_cpd = full_model.get_cpds('Road_Surface')

print("Road_Surface CPD (first 10 rows):")
df = print_cpd_as_dataframe(road_surface_cpd, max_rows=10)
print(df)

Road_Surface CPD (first 10 rows):
                                                   0         20        40   \
Road_Friction Wetness Precipitation_Deposits                                 
0.0           0       0                       0.084364  0.187755  0.417856   
                      20                      0.105366  0.234497  0.378965   
                      40                      0.128618  0.286244  0.335911   
                      60                      0.153160  0.340865  0.290466   
                      80                      0.177796  0.395693  0.244848   
                      100                     0.201309  0.448022  0.201309   
              20      0                       0.105366  0.234497  0.378965   
                      20                      0.128618  0.286244  0.335911   
                      40                      0.153160  0.340865  0.290466   
                      60                      0.177796  0.395693  0.244848   

                             

In [69]:
# Inspect Collision_Point CPD (deterministic mapping)
collision_cpd = full_model.get_cpds('Collision_Point')

print("Collision_Point CPD (sample of 20 rows):")
df = print_cpd_as_dataframe(collision_cpd, max_rows=20)
print(df)

Collision_Point CPD (sample of 20 rows):
                                            c1   c2   c3  None
Start_Ego Goal_Ego Start_Other Goal_Other                     
Left      Left     Left        Left        0.0  0.0  0.0   1.0
                               Right       0.0  0.0  0.0   1.0
                               Base        0.0  0.0  0.0   1.0
                   Right       Left        0.0  0.0  0.0   1.0
                               Right       0.0  0.0  0.0   1.0
                               Base        0.0  0.0  0.0   1.0
                   Base        Left        0.0  0.0  0.0   1.0
                               Right       0.0  0.0  0.0   1.0
                               Base        0.0  0.0  0.0   1.0
          Right    Left        Left        0.0  0.0  0.0   1.0
                               Right       0.0  0.0  0.0   1.0
                               Base        0.0  0.0  0.0   1.0
                   Right       Left        0.0  0.0  0.0   1.0
              

## Compare Scenarios

In [70]:
# Load both models
model_s1 = load_model('models/scenario1_full_bayesian_network.pkl')
model_s2 = load_model('models/scenario2_full_bayesian_network.pkl')

print("Scenario Comparison:")
print("=" * 70)
print(f"Scenario 1 (Vehicle-Vehicle):")
print(f"  Total nodes: {len(model_s1.nodes())}")
print(f"  Total edges: {len(model_s1.edges())}")

print(f"\nScenario 2 (Vehicle-Cyclist):")
print(f"  Total nodes: {len(model_s2.nodes())}")
print(f"  Total edges: {len(model_s2.edges())}")

# Find difference
s1_nodes = set(model_s1.nodes())
s2_nodes = set(model_s2.nodes())
extra_in_s2 = s2_nodes - s1_nodes

print(f"\nExtra nodes in Scenario 2: {extra_in_s2}")

✓ Model loaded from models/scenario1_full_bayesian_network.pkl
✓ Model loaded from models/scenario2_full_bayesian_network.pkl
Scenario Comparison:
Scenario 1 (Vehicle-Vehicle):
  Total nodes: 16
  Total edges: 21

Scenario 2 (Vehicle-Cyclist):
  Total nodes: 17
  Total edges: 23

Extra nodes in Scenario 2: {'Time_of_Day'}


## Verify Model Properties

In [71]:
# Check model validity
is_valid = full_model.check_model()
print(f"Model is valid: {is_valid}")

# Check for cycles (should be False for a valid DAG)
import networkx as nx
has_cycles = not nx.is_directed_acyclic_graph(full_model)
print(f"Model has cycles: {has_cycles}")

Model is valid: True
Model has cycles: False


In [72]:
# Count variable types
environmental_vars = [
    "Cloudiness", "Wind_Intensity", "Precipitation",
    "Precipitation_Deposits", "Wetness", "Fog_Density",
    "Road_Friction", "Fog_Distance"
]

abstracted_vars = ["Road_Surface", "Vehicle_Stability", "Visibility"]

tjunction_vars = [
    "Start_Ego", "Goal_Ego", "Start_Other", "Goal_Other", "Collision_Point"
]

print(f"Environmental variables: {len(environmental_vars)}")
print(f"Abstracted variables: {len(abstracted_vars)}")
print(f"T-junction variables: {len(tjunction_vars)}")
print(f"Total (Scenario 1): {len(environmental_vars) + len(abstracted_vars) + len(tjunction_vars)}")

Environmental variables: 8
Abstracted variables: 3
T-junction variables: 5
Total (Scenario 1): 16


## Verify Abstraction Extraction

Confirm that the abstracted edges were correctly extracted from `abstract_variables.py`:

In [5]:
# Create a new parametrizer to check extraction
test_param = BayesianNetworkParametrizer(
    data_path=data_path,
    structure_path=scenario2_structure_path,
    scenario=2
)

print("Extracted Abstracted Edges:")
print("=" * 70)
print(f"Total edges: {len(test_param.abstracted_edges)}\n")

# Group by child for display
from collections import defaultdict
by_child = defaultdict(list)
for parent, child, rel, weight in test_param.abstracted_edges:
    by_child[child].append((parent, rel, weight))

for child, parents in sorted(by_child.items()):
    print(f"\n{child}:")
    total_weight = sum(w for _, _, w in parents)
    for parent, rel, weight in parents:
        percentage = (weight / total_weight) * 100
        print(f"  <- {parent} ({rel}, {weight:.2f} = {percentage:.1f}%)")
    print(f"  Total weight: {total_weight:.2f}")

print("\n✓ All abstraction structure extracted from abstract_variables.py")

Extracted Abstracted Edges:
Total edges: 9


Road_Surface:
  <- Road_Friction (normal, 0.60 = 60.0%)
  <- Wetness (inverse, 0.20 = 20.0%)
  <- Precipitation_Deposits (inverse, 0.20 = 20.0%)
  Total weight: 1.00

Vehicle_Stability:
  <- Wind_Intensity (inverse, 0.20 = 20.0%)
  <- Road_Friction (normal, 0.80 = 80.0%)
  Total weight: 1.00

Visibility:
  <- Fog_Density (inverse, 0.40 = 40.0%)
  <- Fog_Distance (normal, 0.40 = 40.0%)
  <- Precipitation (inverse, 0.10 = 10.0%)
  <- Time_of_Day (normal, 0.10 = 10.0%)
  Total weight: 1.00

✓ All abstraction structure extracted from abstract_variables.py


## Command Line Usage

You can also run the parametrization from the command line:

```bash
# Scenario 1 (default)
python bn_parametrization.py

# Scenario 2 (with Time_of_Day)
python bn_parametrization.py --scenario 2
```

## Next Steps

The trained models are now ready for scenario generation! Use them with:

1. **Combinatorial Testing**: Generate all combinations of abstracted variable values (3 × 6 × 6 × 6 = 648 configurations)
2. **Conditional Sampling**: For each configuration, sample concrete scenarios from the BN
3. **Rarity Prioritization**: Focus on rare but valid parameter combinations
4. **Diversity Selection**: Ensure diverse coverage of the parameter space

See the `scenario_generation.py` module for the complete generation pipeline.

## Key Improvements in This Version

### ✅ Single Source of Truth
- All abstraction definitions come from `abstract_variables.py`
- No duplication of parent-child relationships and weights
- Automatic extraction via `_extract_abstracted_edges()`

### ✅ Scenario Selection
- Single script handles both Scenario 1 and Scenario 2
- Automatic variable selection based on scenario
- Command-line argument for easy switching

### ✅ Better Maintainability
- Update `abstract_variables.py` once → changes propagate automatically
- No need to manually sync weights/relationships
- Reduces risk of inconsistencies