# BayScen Data Collection and Processing Tutorial

This notebook demonstrates how to collect and process weather data for BayScen scenario generation.

## Overview

**Data Sources:**

**Frost API** (Norwegian Meteorological Institute)
- Official documentation: https://frost.met.no/
- Credentials: https://frost.met.no/auth/requestCredentials.html
- **Main Station**: Road conditions (friction, wetness, surface type), wind speed, precipitation, visibility
- **Secondary Station**: Cloudiness data (since main station may lack it)

**Output:** Hourly weather data with 9 parameters ready for Bayesian Network training.

## Step 1: Configuration

First, set up your API credentials and data collection parameters.

In [None]:
# API Credentials
# Register at: https://frost.met.no/auth/requestCredentials.html
FROST_CLIENT_ID = "YOUR_CLIENT_ID"
FROST_CLIENT_SECRET = "YOUR_CLIENT_SECRET"

# Location and stations
STATION_MAIN = "SN84770"  # Main station with road conditions
STATION_CLOUD = "SN90450"  # Station with cloudiness data

# Time period
START_TIME = "2020-12-01T00:00:00Z"
END_TIME = "2024-03-03T23:59:59Z"

## Step 2: Collect Raw Data

Fetch data from all sources and save to `raw/` directory.

In [None]:
from pathlib import Path
from collect import collect_full_dataset

# Collect data
output_files = collect_full_dataset(
    frost_client_id=FROST_CLIENT_ID,
    frost_client_secret=FROST_CLIENT_SECRET,
    station_id_main=STATION_MAIN,
    station_id_cloud=STATION_CLOUD,
    start_time=START_TIME,
    end_time=END_TIME,
    output_dir=Path("raw")
)

print("\n✓ Raw data collection complete!")
for name, path in output_files.items():
    print(f"  {name}: {path}")

## Step 3: Load Raw Data

Load the collected raw data for processing.

In [None]:
import pandas as pd

# Load main station data
df_main = pd.read_csv("raw/frost_main_station.csv")
df_main['timestamp'] = pd.to_datetime(df_main['timestamp'], utc=True)

print(f"Main station data: {df_main.shape}")
print(f"Columns: {list(df_main.columns)}")
df_main.head()

In [None]:
# Load cloudiness data (if available)
df_cloud = None
cloud_file = Path("raw/frost_cloud_station.csv")

if cloud_file.exists():
    df_cloud = pd.read_csv(cloud_file)
    df_cloud['timestamp'] = pd.to_datetime(df_cloud['timestamp'], utc=True)
    print(f"Cloud station data: {df_cloud.shape}")
    display(df_cloud.head())
else:
    print("No cloudiness data from secondary station")

## Step 4: Process Data

Transform raw data into BayScen format with:
- Hourly aggregation
- Missing value handling
- Discretization to CARLA-compatible ranges
- Final column naming

In [None]:
from process import WeatherDataProcessor

# Initialize processor
processor = WeatherDataProcessor()

# Run full processing pipeline
df_processed = processor.process_full_pipeline(df_main, df_cloud)

print("\n✓ Processing complete!")
print(f"Final shape: {df_processed.shape}")

## Step 5: Inspect Processed Data

View the final dataset structure and distributions.

In [None]:
# Display first few rows
df_processed.head(10)

In [None]:
# Display data types and null counts
print("Data Types:")
print(df_processed.dtypes)
print("\nNull Values:")
print(df_processed.isnull().sum())

In [None]:
# Display value distributions
for col in df_processed.columns:
    print(f"\n{col}:")
    print(df_processed[col].value_counts().sort_index().head(10))

## Step 6: Save Processed Data

Save the final dataset for use in BayScen.

In [None]:
# Create output directory
output_dir = Path("processed")
output_dir.mkdir(exist_ok=True)

# Save processed data
output_file = output_dir / "bayscen_final_data.csv"
df_processed.to_csv(output_file, index=False)

print(f"✓ Saved: {output_file}")
print(f"  {len(df_processed)} rows × {len(df_processed.columns)} columns")

## Summary

You now have:

1. **Raw data** in `raw/` directory
   - `frost_main_station.csv` - Road conditions, wind, precipitation, visibility
   - `frost_cloud_station.csv` - Cloudiness data

2. **Processed data** in `processed/` directory
   - `bayscen_final_data.csv` - Ready for Bayesian Network training

**Next steps:**
- Use the processed data to train Bayesian Networks
- Generate test scenarios with BayScen
- Validate scenarios in CARLA simulation

**Final column format:**
- `Time of Day` - Sun altitude angle (-90 to 90)
- `Cloudiness` - Cloud coverage (0-100)
- `Precipitation` - Precipitation intensity (0-100)
- `Wind Intensity` - Wind strength (0-100)
- `Fog Density` - Fog thickness (0-100)
- `Fog Distance` - Visibility distance (0-100)
- `Wetness` - Road wetness (0-100)
- `Precipitation Deposits` - Water/snow on road (0-100)
- `Road Friction` - Tire grip (0.0-1.0)