# BayScen Scenario Generation - Tutorial

This notebook demonstrates how to generate test scenarios using the BayScen framework.

## Overview

The generation process:
1. Load trained Bayesian Network model
2. Generate scenarios using combinatorial coverage + conditional sampling
3. Evaluate scenarios (realism, coverage, diversity)
4. Export for testing

**Scenarios:**
- Scenario 1: Vehicle-Vehicle conflicts (8 environmental variables)
- Scenario 2: Vehicle-Cyclist conflicts (9 environmental variables + Time_of_Day)

**Modes:**
- `rare`: Prioritize edge cases (recommended for testing)
- `common`: Prioritize typical scenarios

## Setup

In [None]:
import sys
import pickle
import pandas as pd
from pathlib import Path

# Add parent directory to path
sys.path.append(str(Path.cwd().parent))

from generation.scenario_generator import BayesianScenarioGenerator
from generation.evaluation_metrics import (
    evaluate_scenarios,
    compute_attribute_distributions,
    compute_realism,
    compute_coverage
)
from generation.generation_utils import (
    validate_scenarios,
    export_for_carla,
    split_by_collision_point,
    get_summary_statistics
)
from abstraction.abstract_variables import LEAF_NODES

import warnings
warnings.filterwarnings('ignore')

## Configuration

In [None]:
# Choose scenario and mode
SCENARIO = 2  # 1 or 2
MODE = 'rare'  # 'common' or 'rare'

# Define paths
MODEL_PATH = f"../modeling/models/scenario{SCENARIO}_full_bayesian_network.pkl"
DATA_PATH = "../../data/processed/bayscen_final_data.csv"  # For evaluation
OUTPUT_DIR = Path("generated_scenarios")
OUTPUT_DIR.mkdir(exist_ok=True)

print(f"Configuration:")
print(f"  Scenario: {SCENARIO}")
print(f"  Mode: {MODE}")
print(f"  Model: {MODEL_PATH}")
print(f"  Output: {OUTPUT_DIR}")

## Step 1: Load Trained Model

In [None]:
# Load the Bayesian Network
print("Loading Bayesian Network...")
with open(MODEL_PATH, 'rb') as f:
    model = pickle.load(f)

print(f"✓ Model loaded successfully")
print(f"  Total nodes: {len(model.nodes())}")
print(f"  Total edges: {len(model.edges())}")
print(f"\n  Nodes: {sorted(model.nodes())}")

## Step 2: Define Variables

In [None]:
# Abstracted variables (from abstract_variables.py)
abstracted_variables = LEAF_NODES
print("Abstracted Variables:")
for var, values in abstracted_variables.items():
    print(f"  {var}: {values}")

# Concrete variables (scenario-specific)
concrete_variables_s1 = [
    "Cloudiness",
    "Wind_Intensity",
    "Precipitation",
    "Precipitation_Deposits",
    "Wetness",
    "Fog_Density",
    "Road_Friction",
    "Fog_Distance",
    "Start_Ego",
    "Goal_Ego",
    "Start_Other",
    "Goal_Other"
]

concrete_variables_s2 = ["Time_of_Day"] + concrete_variables_s1

concrete_variables = concrete_variables_s2 if SCENARIO == 2 else concrete_variables_s1

print(f"\nConcrete Variables ({len(concrete_variables)}):")
print(f"  {concrete_variables}")

## Step 3: Create Generator

In [None]:
# Create generator
prefer_rare = (MODE == 'rare')

generator = BayesianScenarioGenerator(
    model=model,
    leaf_nodes=abstracted_variables,
    initial_nodes=concrete_variables,
    similarity_threshold=0.1,
    n_samples=100000,
    use_sampling=True,
    prefer_rare=prefer_rare
)

print(f"✓ Generator created")
print(f"  Mode: {MODE}")
print(f"  Prefer rare: {prefer_rare}")
print(f"  Similarity threshold: 0.1")
print(f"  Number of samples: 100,000")

## Step 4: Generate Scenarios

This will generate all combinations of abstracted variables and sample concrete parameters.

**Expected output:** 648 scenarios (6×6×6×3 combinations)

**Time:** ~10-15 minutes

In [None]:
# Generate scenarios
scenarios = generator.generate_scenarios()

print(f"\n{'='*70}")
print(f"Generation Summary:")
print(f"  Total scenarios: {len(scenarios)}")
print(f"  Columns: {list(scenarios.columns)}")
print(f"{'='*70}")

## Step 5: Inspect Generated Scenarios

In [None]:
# Display first few scenarios
print("First 5 scenarios:")
scenarios.head()

In [None]:
# Summary statistics
print("\nSummary Statistics:")
summary = get_summary_statistics(scenarios)
summary

In [None]:
# Distribution by collision point
print("\nScenarios by Collision Point:")
print(scenarios['Collision_Point'].value_counts())

# Split by collision point
by_collision = split_by_collision_point(scenarios)
for cp, df in by_collision.items():
    print(f"  {cp}: {len(df)} scenarios")

In [None]:
# Probability distribution
print("\nProbability Distribution:")
print(f"  Mean: {scenarios['probability'].mean():.6f}")
print(f"  Median: {scenarios['probability'].median():.6f}")
print(f"  Min: {scenarios['probability'].min():.6f}")
print(f"  Max: {scenarios['probability'].max():.6f}")

# Plot histogram
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.hist(scenarios['probability'], bins=50, edgecolor='black')
plt.xlabel('Probability')
plt.ylabel('Count')
plt.title('Probability Distribution')

plt.subplot(1, 2, 2)
plt.hist(scenarios['probability'], bins=50, edgecolor='black', log=True)
plt.xlabel('Probability')
plt.ylabel('Count (log scale)')
plt.title('Probability Distribution (Log Scale)')

plt.tight_layout()
plt.show()

## Step 6: Validate Scenarios

In [None]:
# Validate scenarios
validation = validate_scenarios(scenarios)

print("Validation Results:")
print(f"  Valid: {validation['is_valid']}")
print(f"  Total scenarios: {validation['num_scenarios']}")

if not validation['is_valid']:
    print(f"\n  Issues found:")
    for issue in validation['issues']:
        print(f"    - {issue}")
else:
    print("  ✓ All scenarios are valid!")

## Step 7: Evaluate Scenarios

Compare generated scenarios against real-world data.

In [None]:
# Define evaluation attributes (environmental variables only)
eval_attributes = [
    "Cloudiness",
    "Wind_Intensity",
    "Precipitation",
    "Precipitation_Deposits",
    "Wetness",
    "Fog_Density",
    "Road_Friction",
    "Fog_Distance"
]

if SCENARIO == 2:
    eval_attributes = ["Time_of_Day"] + eval_attributes

print(f"Evaluation attributes: {eval_attributes}")

In [None]:
# Comprehensive evaluation
results = evaluate_scenarios(
    real_data_path=DATA_PATH,
    generated_df=scenarios,
    attributes=eval_attributes,
    print_summary=True
)

### Detailed Metrics

In [None]:
# Realism metric
print(f"\nRealism: {results['realism']:.1f}%")
print(f"  → {results['realism']:.1f}% of scenarios are within real-world distribution")

if results['realism'] > 90:
    print("  ✓ Excellent realism!")
elif results['realism'] > 70:
    print("  ✓ Good realism")
else:
    print("  ⚠ Consider adjusting generation parameters")

In [None]:
# Coverage metric
coverage = results['coverage']
print(f"\nCoverage: {coverage['coverage_percentage']:.1f}%")
print(f"  Covered: {coverage['num_covered']}/{coverage['total_real_unique']} unique real scenarios")

if coverage['coverage_percentage'] > 90:
    print("  ✓ Excellent coverage!")
elif coverage['coverage_percentage'] > 70:
    print("  ✓ Good coverage")
else:
    print("  ⚠ Consider increasing scenario diversity")

In [None]:
# Diversity metric
unique = results['num_unique']
uniqueness = (unique / len(scenarios)) * 100

print(f"\nDiversity:")
print(f"  Unique scenarios: {unique}")
print(f"  Total scenarios: {len(scenarios)}")
print(f"  Uniqueness: {uniqueness:.1f}%")

if uniqueness > 95:
    print("  ✓ Excellent diversity!")
elif uniqueness > 80:
    print("  ✓ Good diversity")
else:
    print("  ⚠ Some duplicate scenarios detected")

### Distribution Comparison

In [None]:
# Compare distributions for specific attributes
from generation.evaluation_metrics import compare_distributions

# Example: Compare Cloudiness distribution
compare_distributions(
    results['distributions'],
    'Cloudiness',
    plot=True
)

In [None]:
# Compare all distributions
for attr in eval_attributes[:3]:  # Show first 3
    print(f"\n{'='*50}")
    compare_distributions(
        results['distributions'],
        attr,
        plot=False
    )

## Step 8: Save Scenarios

In [None]:
# Save to CSV
output_csv = OUTPUT_DIR / f"scenario{SCENARIO}_{MODE}_scenarios.csv"
scenarios.to_csv(output_csv, index=False)
print(f"✓ Saved scenarios to: {output_csv}")

# Save to Excel
output_excel = OUTPUT_DIR / f"scenario{SCENARIO}_{MODE}_scenarios.xlsx"
generator.save_scenarios(scenarios, str(output_excel))
print(f"✓ Saved scenarios to: {output_excel}")

In [None]:
# Save evaluation results
eval_output = OUTPUT_DIR / f"scenario{SCENARIO}_{MODE}_evaluation.pkl"
with open(eval_output, 'wb') as f:
    pickle.dump(results, f)
print(f"✓ Saved evaluation results to: {eval_output}")

## Step 9: Export for CARLA (Optional)

In [None]:
# Export for CARLA simulator
carla_output = OUTPUT_DIR / f"scenario{SCENARIO}_{MODE}_carla.csv"
export_for_carla(scenarios, str(carla_output))
print(f"✓ Exported CARLA-compatible scenarios to: {carla_output}")

## Comparison: Common vs Rare Modes (Optional)

If you've generated both modes, you can compare them:

In [None]:
# Load both modes for comparison
try:
    common_scenarios = pd.read_csv(OUTPUT_DIR / f"scenario{SCENARIO}_common_scenarios.csv")
    rare_scenarios = pd.read_csv(OUTPUT_DIR / f"scenario{SCENARIO}_rare_scenarios.csv")
    
    print("Comparing Common vs Rare Modes:\n")
    
    print(f"Common Scenarios:")
    print(f"  Count: {len(common_scenarios)}")
    print(f"  Mean probability: {common_scenarios['probability'].mean():.6f}")
    print(f"  Median probability: {common_scenarios['probability'].median():.6f}")
    
    print(f"\nRare Scenarios:")
    print(f"  Count: {len(rare_scenarios)}")
    print(f"  Mean probability: {rare_scenarios['probability'].mean():.6f}")
    print(f"  Median probability: {rare_scenarios['probability'].median():.6f}")
    
    # Compare using utility function
    from generation.generation_utils import compare_scenario_sets
    compare_scenario_sets(
        common_scenarios,
        rare_scenarios,
        eval_attributes,
        "Common",
        "Rare"
    )
    
except FileNotFoundError:
    print("Both common and rare scenarios needed for comparison.")
    print("Generate both modes to compare:")
    print("  MODE = 'common'  # Run once")
    print("  MODE = 'rare'    # Run again")

## Summary

This notebook demonstrated:
1. ✓ Loading trained Bayesian Network
2. ✓ Configuring scenario generator
3. ✓ Generating test scenarios
4. ✓ Validating scenario quality
5. ✓ Evaluating realism, coverage, and diversity
6. ✓ Exporting scenarios for testing

**Next Steps:**
- Run scenarios in CARLA simulator
- Analyze test results
- Compare with baseline methods (Random, SitCov, CTBC, PICT)

## Quick Reference

### Generate Both Scenarios

```python
# Scenario 1 - Rare
SCENARIO = 1
MODE = 'rare'
# Run cells above

# Scenario 1 - Common
SCENARIO = 1
MODE = 'common'
# Run cells above

# Scenario 2 - Rare
SCENARIO = 2
MODE = 'rare'
# Run cells above

# Scenario 2 - Common
SCENARIO = 2
MODE = 'common'
# Run cells above
```

### Command Line Alternative

```bash
cd bayscen/generation

# Generate all variants
python generate_scenarios.py --scenario 1 --mode rare
python generate_scenarios.py --scenario 1 --mode common
python generate_scenarios.py --scenario 2 --mode rare
python generate_scenarios.py --scenario 2 --mode common
```