# Waterlogging + Route Scoring Training Notebook

This notebook demonstrates a reproducible pipeline that:
- Fetches recent rainfall for Mumbai from Open-Meteo
- Creates synthetic waterlogging pivots and bad-road points (you can replace with real dataset CSVs),
        
- Shows how different weights / a learned model would alter route ranking and explanations.

Notes:
- This notebook uses synthetic data where public labelled datasets are not embedded. Replace the synthetic CSVs with real ones when available.
- If you gather labeled user preferences (which route chosen) you can easily train a classifier/regressor (we include a skeleton).

In [None]:
# Install dependencies (uncomment to run)
# !pip install pandas numpy matplotlib requests scikit-learn geopandas

In [3]:
import math
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from typing import List, Tuple

plt.rcParams['figure.figsize'] = (8,6)

## Helper functions: haversine, preference map, rainfall factor, bad-road penalty

In [4]:
def haversine_km(lat1, lon1, lat2, lon2):
    R = 6371.0
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlambda = math.radians(lon2 - lon1)
    a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlambda / 2) ** 2
    return 2 * R * math.atan2(math.sqrt(a), math.sqrt(1 - a))

def rainfall_factor(rain_mm):
    if rain_mm <= 0:
        return 0.1
    normalized = min(rain_mm / 100.0, 1.0)
    return 0.1 + 0.9 * (normalized ** 0.6)

def preference_at_point(lat, lon, pivots, spread_km=2.0):
    pref = 0.0
    for p_lat, p_lon, severity in pivots:
        d = haversine_km(lat, lon, p_lat, p_lon)
        influence = math.exp(-d / spread_km)
        pref = max(pref, severity * influence)
    return min(pref, 1.0)

def bad_road_penalty_at_point(lat, lon, bad_points):
    best = 0.0
    for b_lat, b_lon, severity in bad_points:
        d = haversine_km(lat, lon, b_lat, b_lon)
        influence = math.exp(-d / 1.0)
        best = max(best, severity * influence)
    return min(best, 1.0)

## Fetch recent rainfall for Mumbai (Open-Meteo)

In [5]:
# Fetch hourly precipitation for the last 24 hours around Mumbai center
url = "https://api.open-meteo.com/v1/forecast?latitude=19.0760&longitude=72.8777&hourly=precipitation&past_hours=24&forecast_hours=0"
r = requests.get(url, timeout=10)
r.raise_for_status()
data = r.json()
hourly = data.get('hourly', {})
precip = hourly.get('precipitation', [])
total_24h = sum(float(x or 0) for x in precip[:24])
print(f'Open-Meteo total 24h precipitation (Mumbai center): {total_24h:.2f} mm')

Open-Meteo total 24h precipitation (Mumbai center): 0.00 mm


## Create synthetic waterlogging pivots and bad-road dataset (replace with real CSVs)

In [6]:
# Synthetic pivots: (lat, lon, severity 0..1)
pivots = [
    (19.0460, 72.8538, 0.9),  # central hotspot
    (19.0728, 72.8826, 0.85),
    (19.0056, 72.8417, 0.95),
    (19.1197, 72.8464, 0.8),
]
# Synthetic bad-road points
bad_points = [
    (19.0757, 72.8772, 0.6),
    (19.0600, 72.8850, 0.7),
    (19.0400, 72.8400, 0.5),
]
print('Pivots:', pivots)
print('Bad road points:', bad_points)

Pivots: [(19.046, 72.8538, 0.9), (19.0728, 72.8826, 0.85), (19.0056, 72.8417, 0.95), (19.1197, 72.8464, 0.8)]
Bad road points: [(19.0757, 72.8772, 0.6), (19.06, 72.885, 0.7), (19.04, 72.84, 0.5)]


## Example: three synthetic routes (each a list of lat,lng). These simulate the three choices returned by the backend.

In [7]:
route1 = [(19.0760,72.8777),(19.075,72.88),(19.07,72.885),(19.066,72.889)]  # Fastest
route2 = [(19.0760,72.8777),(19.08,72.875),(19.09,72.87),(19.1,72.865)]  # Balanced
route3 = [(19.0760,72.8777),(19.07,72.87),(19.065,72.863),(19.06,72.856)]  # Safest (artificial)
routes = [route1, route2, route3]
labels = ['fastest','safer','safest']

def avg_rain_along(route, total_24h):
    # For demo: assume total_24h applies uniformly along route; realistic pipeline would sample gridded rainfall
    return [total_24h / len(route) for _ in route]

def avg_preference(route):
    vals = [preference_at_point(lat, lon, pivots) for lat,lon in route]
    return sum(vals)/len(vals)

def avg_bad_penalty(route):
    vals = [bad_road_penalty_at_point(lat, lon, bad_points) for lat,lon in route]
    return sum(vals)/len(vals)

features = []
for r in routes:
    rain_along = avg_rain_along(r, total_24h)
    prefs = [preference_at_point(lat,lon,pivots) for lat,lon in r]
    avg_pref = sum(prefs)/len(prefs)
    avg_bad = avg_bad_penalty(r)
    dist_km = sum(haversine_km(r[i][0],r[i][1],r[i+1][0],r[i+1][1]) for i in range(len(r)-1))
    features.append({'label':labels.pop(0),'avg_pref':avg_pref,'avg_bad':avg_bad,'dist_km':dist_km,'rain_24h':total_24h})

df = pd.DataFrame(features)
df

Unnamed: 0,label,avg_pref,avg_bad,dist_km,rain_24h
0,fastest,0.634333,0.388684,1.643046,0.0
1,safer,0.38862,0.262463,2.98723,0.0
2,safest,0.435883,0.227943,2.893023,0.0


## Scoring: rule-based vs. alternative weights

In [8]:
# Normalize distance into 0..1 (relative)
df['dist_norm'] = (df['dist_km'] - df['dist_km'].min()) / max(df['dist_km'].max() - df['dist_km'].min(), 1e-6)

def compute_score(row, w_risk=0.6, w_bad=0.3, w_dist=0.1):
    return w_risk * row['avg_pref'] + w_bad * row['avg_bad'] + w_dist * row['dist_norm']

df['score_rule'] = df.apply(compute_score, axis=1)
# Alternative: more weight to bad-roads (simulate tuning/training outcome)
df['score_alt'] = df.apply(lambda r: compute_score(r, w_risk=0.45, w_bad=0.45, w_dist=0.1), axis=1)

print('Rule-based scores (lower=better):')
print(df[['label','avg_pref','avg_bad','dist_km','score_rule']])
print('Alternative weighted scores:')
print(df[['label','score_alt']])

Rule-based scores (lower=better):
     label  avg_pref   avg_bad   dist_km  score_rule
0  fastest  0.634333  0.388684  1.643046    0.497205
1    safer  0.388620  0.262463  2.987230    0.411911
2   safest  0.435883  0.227943  2.893023    0.422904
Alternative weighted scores:
     label  score_alt
0  fastest   0.460358
1    safer   0.392987
2   safest   0.391713


## Simple explanation generation (user-friendly)

In [9]:
def explain_row(row):
    parts = []
    if row['dist_norm'] < 0.05:
        parts.append('Shortest distance')
    else:
        parts.append(f'{row['dist_km']:.1f} km route')
    parts.append(f'Flood risk: {(row['avg_pref']*100):.0f}%')
    if row['avg_bad'] > 0.4:
        parts.append('Known poor road sections â€” expect rough patches')
    elif row['avg_bad'] > 0.1:
        parts.append('Minor poor-road segments')
    return '; '.join(parts)

df['explanation'] = df.apply(explain_row, axis=1)
df[['label','score_rule','score_alt','explanation']]

Unnamed: 0,label,score_rule,score_alt,explanation
0,fastest,0.497205,0.460358,Shortest distance; Flood risk: 63%; Minor poor...
1,safer,0.411911,0.392987,3.0 km route; Flood risk: 39%; Minor poor-road...
2,safest,0.422904,0.391713,2.9 km route; Flood risk: 44%; Minor poor-road...


## What to do next / how to replace synthetic with real data:
- Replace `pivots` with a CSV of observed waterlogging hotspots (columns: lat, lon, severity).
- Replace `bad_points` with known bad-road segments (or points annotated by local teams).
- Instrument the app to log features and user choices into a CSV for training (columns: timestamp, route_id, features..., chosen).
- Train a small model (e.g., XGBoost) using those logged rows to learn weights; use model SHAP or feature importances to create explanations.

This notebook shows that changing the weights (or learning them) alters route ranking and explanations.