# 02 — Hotspot Mapping (Empirical RTI Clusters)

## Objective
Digitize and validate empirically observed road-traffic injury (RTI) hotspots and align them with the Kigali road network.

This notebook will:
- load hotspot configuration from `configs/hotspots.json`
- validate the schema (IDs, lat/lon presence, weights)
- visualize hotspots over Kigali for sanity checking

## 1.0 Scope & Expected Outputs

### In scope
- Create and validate a hotspot configuration file
- Plot hotspots on a base map (quick sanity check)
- Export a static plot for documentation (optional)

### Outputs
- Hotspot config: `configs/hotspots.json`
- Optional figure: `reports/figures/kigali_hotspots.png`

In [1]:
import json
from pathlib import Path

PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()

CONFIG_PATH = PROJECT_ROOT / "configs" / "hotspots.json"
FIGURES = PROJECT_ROOT / "reports" / "figures"
FIGURES.mkdir(parents=True, exist_ok=True)

print("CONFIG_PATH:", CONFIG_PATH)

with open(CONFIG_PATH, "r", encoding="utf-8") as f:
    cfg = json.load(f)

print("Loaded metadata:")
print(cfg.get("metadata", {}))
print("Hotspots count:", len(cfg.get("hotspots", [])))

CONFIG_PATH: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems/configs/hotspots.json
Loaded metadata:
{'source': 'Patel et al. (2016) injury hotspot study (to be digitized/geocoded)', 'city': 'Kigali, Rwanda', 'version': '0.1', 'notes': 'Fill in lat/lon and weights after extracting hotspot coordinates from the paper/map.'}
Hotspots count: 15


## 2.0 Validate Hotspot Configuration

Before we plot anything, we validate `configs/hotspots.json` to ensure:
- IDs are unique
- names are not placeholders
- lat/lon are present and numeric
- weights are positive

If validation fails (expected right now because we still have placeholders),
we will still be able to load the file, but we should not use it for simulation yet.

In [2]:
import sys

# Make sure repo root is importable so "src/..." imports work in notebooks
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print("Added to sys.path:", sys.path[0])

from src.data.hotspots import load_hotspots, validate_hotspots, hotspots_to_dataframe

cfg = load_hotspots(CONFIG_PATH)

errors = validate_hotspots(cfg)
print("## Hotspot config validation")

if errors:
    print(f"[FAIL] Found {len(errors)} issue(s):")
    for e in errors:
        print(" -", e)
else:
    print("[OK] Hotspot config looks valid.")

df_hotspots = hotspots_to_dataframe(cfg)
print("\n## Preview")
display(df_hotspots.head(10))

Added to sys.path: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems
## Hotspot config validation
[FAIL] Found 30 issue(s):
 - HS01 has missing placeholder name
 - HS01 has missing lat/lon (null)
 - HS02 has missing placeholder name
 - HS02 has missing lat/lon (null)
 - HS03 has missing placeholder name
 - HS03 has missing lat/lon (null)
 - HS04 has missing placeholder name
 - HS04 has missing lat/lon (null)
 - HS05 has missing placeholder name
 - HS05 has missing lat/lon (null)
 - HS06 has missing placeholder name
 - HS06 has missing lat/lon (null)
 - HS07 has missing placeholder name
 - HS07 has missing lat/lon (null)
 - HS08 has missing placeholder name
 - HS08 has missing lat/lon (null)
 - HS09 has missing placeholder name
 - HS09 has missing lat/lon (null)
 - HS10 has missing placeholder name
 - HS10 has missing lat/lon (null)
 - HS11 has missing placeholder name
 - HS11 has missing lat/lon (null)
 - HS12 has missing placeholder name
 - HS12 has missing lat/lon

Unnamed: 0,id,name,lat,lon,weight,notes
0,HS01,HOTSPOT_NAME_01,,,1.0,
1,HS02,HOTSPOT_NAME_02,,,1.0,
2,HS03,HOTSPOT_NAME_03,,,1.0,
3,HS04,HOTSPOT_NAME_04,,,1.0,
4,HS05,HOTSPOT_NAME_05,,,1.0,
5,HS06,HOTSPOT_NAME_06,,,1.0,
6,HS07,HOTSPOT_NAME_07,,,1.0,
7,HS08,HOTSPOT_NAME_08,,,1.0,
8,HS09,HOTSPOT_NAME_09,,,1.0,
9,HS10,HOTSPOT_NAME_10,,,1.0,


## 3.0 Plot Hotspots (Map Preview)

We visualize hotspots on a simple interactive map.

Important:
- Hotspots with missing/invalid `lat/lon` are skipped (and reported).
- The map is still created and saved even if there are 0 valid points.

Output:
- `reports/figures/kigali_hotspots.html`

In [3]:
import math
import folium

print("## Hotspot Map Preview")

# Kigali city center (approx) for a stable default map center
KIGALI_CENTER = (-1.9536, 30.0606)

df = df_hotspots.copy()

# Convert lat/lon to numeric where possible
df["lat_num"] = df["lat"].apply(lambda x: float(x) if x is not None else None)
df["lon_num"] = df["lon"].apply(lambda x: float(x) if x is not None else None)

valid = df.dropna(subset=["lat_num", "lon_num"]).copy()
invalid_count = len(df) - len(valid)

print("Total hotspots:", len(df))
print("Valid points:", len(valid))
print("Skipped (missing lat/lon):", invalid_count)

m = folium.Map(location=KIGALI_CENTER, zoom_start=12, control_scale=True)

# Plot markers for valid hotspots
for _, row in valid.iterrows():
    weight = float(row["weight"]) if row["weight"] is not None else 1.0
    radius = 4 + min(10, math.sqrt(max(weight, 0.0)) * 2)

    popup_text = f"{row['id']} — {row['name']} (weight={weight})"
    folium.CircleMarker(
        location=(row["lat_num"], row["lon_num"]),
        radius=radius,
        popup=popup_text,
        tooltip=row["id"],
        fill=True,
    ).add_to(m)

out_path = FIGURES / "kigali_hotspots.html"
m.save(str(out_path))
print("[OK] Saved map:", out_path)

m

## Hotspot Map Preview
Total hotspots: 15
Valid points: 0
Skipped (missing lat/lon): 15
[OK] Saved map: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems/reports/figures/kigali_hotspots.html


## 4.0 Discretize Macro Hotspot Regions into Intersection Points

Patel et al. (2016) provides hotspot *regions* (via KDE), not a table of exact point coordinates.

To operationalize this for simulation:
- We define 3 macro hotspot regions (center + radius in meters)
- For each region, we select `n_points` candidate intersections (graph nodes)
- We rank candidates by a simple proxy: **node degree** (more connected intersections are plausible high-risk/critical points)

The result is a reproducible set of 15 hotspot seed points that can be used by the incident generator.

In [8]:
import json
import osmnx as ox
import pandas as pd

REGIONS_PATH = PROJECT_ROOT / "configs" / "hotspot_regions.json"
GRAPHML_PATH = PROJECT_ROOT / "data" / "processed" / "network" / "kigali.graphml"
OUT_CSV_PATH = PROJECT_ROOT / "data" / "processed" / "network" / "generated_hotspots.csv"

print("REGIONS_PATH:", REGIONS_PATH)
print("GRAPHML_PATH:", GRAPHML_PATH)
print("OUT_CSV_PATH:", OUT_CSV_PATH)

import numpy as np

def haversine_m(lat1, lon1, lat2, lon2):
    """Compute great-circle distance in meters (vectorized with numpy)."""
    R = 6371000.0  # Earth radius (m)
    lat1 = np.radians(lat1)
    lon1 = np.radians(lon1)
    lat2 = np.radians(lat2)
    lon2 = np.radians(lon2)

    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = np.sin(dlat / 2) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2) ** 2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
    return R * c

with open(REGIONS_PATH, "r", encoding="utf-8") as f:
    regions_cfg = json.load(f)

regions = regions_cfg.get("regions", [])
print("Regions count:", len(regions))

if not GRAPHML_PATH.exists():
    raise FileNotFoundError(
        "Missing kigali.graphml. Run notebooks/01_osm_to_sumo.ipynb first to generate it."
    )

print("\nLoading Kigali graph...")
G_city = ox.load_graphml(GRAPHML_PATH)
print("Graph loaded.")
print("Nodes:", len(G_city.nodes))
print("Edges:", len(G_city.edges))

print("\nConverting graph nodes to GeoDataFrame...")
gdf_nodes = ox.graph_to_gdfs(G_city, nodes=True, edges=False)
gdf_nodes = gdf_nodes.reset_index().rename(columns={"osmid": "node_id"})

# Ensure lat/lon columns
gdf_nodes["lat"] = gdf_nodes["y"]
gdf_nodes["lon"] = gdf_nodes["x"]

# Node degree as a simple importance proxy
deg = dict(G_city.degree())
gdf_nodes["degree"] = gdf_nodes["node_id"].map(deg).fillna(0).astype(int)

print("Node rows:", len(gdf_nodes))

rows = []

for r in regions:
    region_id = r["region_id"]
    name = r["name"]
    lat = float(r["lat"])
    lon = float(r["lon"])
    radius_m = float(r["radius_m"])
    n_points = int(r["n_points"])

    print(f"\n--- {region_id}: {name} ---")
    print("Center:", (lat, lon), "Radius (m):", radius_m, "n_points:", n_points)

    # Get candidate nodes within radius
    center_point = (lat, lon)
    candidates = gdf_nodes.copy()

    distances = haversine_m(
        candidates["lat"].to_numpy(),
        candidates["lon"].to_numpy(),
        lat,
        lon,
    )
    candidates["dist_m"] = distances

    within = candidates[candidates["dist_m"] <= radius_m].copy()
    print("Candidates within radius:", len(within))

    if len(within) == 0:
        print("[WARN] No nodes found within radius. Consider increasing radius_m.")
        continue

    # Rank by degree (desc), then closest distance (asc)
    within_sorted = within.sort_values(["degree", "dist_m"], ascending=[False, True]).head(n_points)

    for j, row in enumerate(within_sorted.itertuples(index=False), start=1):
        rows.append(
            {
                "generated_id": f"{region_id}_P{j}",
                "region_id": region_id,
                "region_name": name,
                "node_id": row.node_id,
                "lat": float(row.lat),
                "lon": float(row.lon),
                "degree": int(row.degree),
                "dist_m": float(row.dist_m),
                "weight": float(row.degree),  # simple initial weight proxy
                "notes": "Generated from region by degree ranking",
            }
        )

df_gen = pd.DataFrame(rows)
print("\nGenerated hotspot points:", len(df_gen))
display(df_gen.head(15))

OUT_CSV_PATH.parent.mkdir(parents=True, exist_ok=True)
df_gen.to_csv(OUT_CSV_PATH, index=False)
print("[OK] Saved generated hotspots:", OUT_CSV_PATH)

REGIONS_PATH: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems/configs/hotspot_regions.json
GRAPHML_PATH: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems/data/processed/network/kigali.graphml
OUT_CSV_PATH: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems/data/processed/network/generated_hotspots.csv
Regions count: 3

Loading Kigali graph...
Graph loaded.
Nodes: 18941
Edges: 50228

Converting graph nodes to GeoDataFrame...
Node rows: 18941

--- R1: Central Kigali (CBD / Downtown) ---
Center: (-1.9499, 30.0588) Radius (m): 2500.0 n_points: 5
Candidates within radius: 1091

--- R2: Nyabugogo (Transport hub area) ---
Center: (-1.9396, 30.0445) Radius (m): 2000.0 n_points: 5
Candidates within radius: 511

--- R3: Remera / Giporoso corridor (East) ---
Center: (-1.9579, 30.106) Radius (m): 2500.0 n_points: 5
Candidates within radius: 1301

Generated hotspot points: 15


Unnamed: 0,generated_id,region_id,region_name,node_id,lat,lon,degree,dist_m,weight,notes
0,R1_P1,R1,Central Kigali (CBD / Downtown),279254284,-1.948202,30.057958,8,210.713897,8.0,Generated from region by degree ranking
1,R1_P2,R1,Central Kigali (CBD / Downtown),281375156,-1.950362,30.061309,8,283.569247,8.0,Generated from region by degree ranking
2,R1_P3,R1,Central Kigali (CBD / Downtown),279254295,-1.946908,30.058399,8,335.676055,8.0,Generated from region by degree ranking
3,R1_P4,R1,Central Kigali (CBD / Downtown),12237774265,-1.950586,30.055314,8,394.810945,8.0,Generated from region by degree ranking
4,R1_P5,R1,Central Kigali (CBD / Downtown),281375108,-1.948339,30.064269,8,632.102299,8.0,Generated from region by degree ranking
5,R2_P1,R2,Nyabugogo (Transport hub area),1223047162,-1.941434,30.050202,8,665.637856,8.0,Generated from region by degree ranking
6,R2_P2,R2,Nyabugogo (Transport hub area),1223047194,-1.942393,30.050042,8,689.716784,8.0,Generated from region by degree ranking
7,R2_P3,R2,Nyabugogo (Transport hub area),1223047118,-1.937843,30.053296,8,996.806177,8.0,Generated from region by degree ranking
8,R2_P4,R2,Nyabugogo (Transport hub area),1223047252,-1.938496,30.053403,8,997.007421,8.0,Generated from region by degree ranking
9,R2_P5,R2,Nyabugogo (Transport hub area),1223047198,-1.936985,30.053473,8,1038.750439,8.0,Generated from region by degree ranking


[OK] Saved generated hotspots: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems/data/processed/network/generated_hotspots.csv


In [9]:
import json
import pandas as pd
from pathlib import Path

HOTSPOTS_JSON_PATH = PROJECT_ROOT / "configs" / "hotspots.json"

if not OUT_CSV_PATH.exists():
    raise FileNotFoundError(f"Missing generated hotspots CSV: {OUT_CSV_PATH}")

df_gen = pd.read_csv(OUT_CSV_PATH)
df_gen = df_gen.sort_values(["region_id", "generated_id"]).reset_index(drop=True)

print("Generated CSV rows:", len(df_gen))
display(df_gen.head(15))

# Load existing hotspots.json to preserve metadata
with open(HOTSPOTS_JSON_PATH, "r", encoding="utf-8") as f:
    hotspots_cfg = json.load(f)

metadata = hotspots_cfg.get("metadata", {})

new_hotspots = []
for i, row in enumerate(df_gen.itertuples(index=False), start=1):
    hs_id = f"HS{i:02d}"
    name = f"{row.generated_id} — {row.region_name}"

    new_hotspots.append(
        {
            "id": hs_id,
            "name": name,
            "lat": float(row.lat),
            "lon": float(row.lon),
            "weight": float(row.weight) if row.weight is not None else 1.0,
            "notes": f"node_id={row.node_id}, dist_m={row.dist_m:.1f}, degree={row.degree}",
        }
    )

updated = {"metadata": metadata, "hotspots": new_hotspots}

with open(HOTSPOTS_JSON_PATH, "w", encoding="utf-8") as f:
    json.dump(updated, f, indent=2, ensure_ascii=False)

print("[OK] Updated hotspots.json:", HOTSPOTS_JSON_PATH)
print("Hotspots count:", len(new_hotspots))

Generated CSV rows: 15


Unnamed: 0,generated_id,region_id,region_name,node_id,lat,lon,degree,dist_m,weight,notes
0,R1_P1,R1,Central Kigali (CBD / Downtown),279254284,-1.948202,30.057958,8,210.713897,8.0,Generated from region by degree ranking
1,R1_P2,R1,Central Kigali (CBD / Downtown),281375156,-1.950362,30.061309,8,283.569247,8.0,Generated from region by degree ranking
2,R1_P3,R1,Central Kigali (CBD / Downtown),279254295,-1.946908,30.058399,8,335.676055,8.0,Generated from region by degree ranking
3,R1_P4,R1,Central Kigali (CBD / Downtown),12237774265,-1.950586,30.055314,8,394.810945,8.0,Generated from region by degree ranking
4,R1_P5,R1,Central Kigali (CBD / Downtown),281375108,-1.948339,30.064269,8,632.102299,8.0,Generated from region by degree ranking
5,R2_P1,R2,Nyabugogo (Transport hub area),1223047162,-1.941434,30.050202,8,665.637856,8.0,Generated from region by degree ranking
6,R2_P2,R2,Nyabugogo (Transport hub area),1223047194,-1.942393,30.050042,8,689.716784,8.0,Generated from region by degree ranking
7,R2_P3,R2,Nyabugogo (Transport hub area),1223047118,-1.937843,30.053296,8,996.806177,8.0,Generated from region by degree ranking
8,R2_P4,R2,Nyabugogo (Transport hub area),1223047252,-1.938496,30.053403,8,997.007421,8.0,Generated from region by degree ranking
9,R2_P5,R2,Nyabugogo (Transport hub area),1223047198,-1.936985,30.053473,8,1038.750439,8.0,Generated from region by degree ranking


[OK] Updated hotspots.json: /Users/testsolutions/Documents/Academics/mission-capstone/marl-in-ems/configs/hotspots.json
Hotspots count: 15
