# Ambulance Route & Hospital Optimization System

## Overview
This project explores an ML-driven decision system for emergency medical services (EMS) that jointly optimizes ambulance routing and hospital selection. 

Rather than routing patients to the nearest hospital by distance alone, the system incorporates:
- Travel time estimation
- Hospital congestion and capacity
- Patient severity and urgency

The goal is to minimize effective treatment delay while respecting real-world operational constraints.

## Problem Statement

Emergency medical routing is a multi-objective decision problem. While navigation systems can optimize for shortest travel time, they do not account for downstream delays caused by hospital congestion or varying clinical urgency.

Transporting a critically injured patient to a congested hospital may increase time-to-treatment, while transporting a stable patient to a distant hospital may unnecessarily increase risk.

This project aims to design a severity-aware optimization system that recommends the best hospital and route for an ambulance by balancing:
- Estimated travel time
- Predicted hospital wait time
- Hospital congestion risk
- Patient severity

## Project Scope

This project is a simulation-based decision system and is not intended for real-time clinical deployment.

### Included:
- Historical and simulated data
- Machine learning models for wait-time and congestion prediction
- Severity-aware optimization logic
- Geospatial routing and visualization

### Excluded:
- Real-time EMS or 911 data
- Live traffic feeds
- Clinical decision-making or diagnosis
- Production deployment or API services


## Key Assumptions

- Hospital congestion patterns can be reasonably approximated using historical and simulated data.
- Travel time estimates based on road networks and time-of-day adjustments are sufficient for relative comparison.
- Patient severity can be represented on a discrete ordinal scale (1–5).
- Hospitals may prioritize critically ill patients even under high load.


## Ethical Disclaimer

This project is for educational and research purposes only.

All data used is publicly available or simulated. No real patient data is included. The system does not provide medical advice and should not be used for real-world emergency medical decision-making.

Any conclusions drawn from this project reflect simulated scenarios and modeling assumptions, not clinical outcomes.


## System Design Overview

1. Generate or ingest patient emergency scenarios
2. Compute travel-time estimates to candidate hospitals
3. Predict hospital wait times using machine learning
4. Estimate hospital congestion risk
5. Apply severity-aware optimization to rank hospitals
6. Visualize routes and decisions on a map


In [9]:
!pip install pandas numpy matplotlib seaborn scikit-learn xgboost networkx geopy shapely tqdm joblib



In [10]:
# =========================
# Core Data Libraries
# =========================
import pandas as pd
import numpy as np

# =========================
# Visualization
# =========================
import matplotlib.pyplot as plt
import seaborn as sns

# =========================
# Machine Learning
# =========================
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.ensemble import RandomForestRegressor

import xgboost as xgb

# =========================
# Routing & Graphs
# =========================
import networkx as nx

# =========================
# Geospatial
# =========================
from geopy.distance import geodesic
from shapely.geometry import Point

# =========================
# Utilities
# =========================
from tqdm import tqdm
import joblib
import warnings
from pathlib import Path

warnings.filterwarnings("ignore")

# =========================
# Plot settings
# =========================
plt.style.use("seaborn-v0_8")

In [11]:
# =========================
# Set Paths
# =========================

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 50)

# Project root
PROJECT_ROOT = Path("~/Documents/projects/ambulance").expanduser()

RAW_DATA = PROJECT_ROOT / "data" / "raw"
PROCESSED_DATA = PROJECT_ROOT / "data" / "processed"

OSM_PBF_PATH = RAW_DATA / "norcal_openstreet.osm.pbf"

RAW_DATA, PROCESSED_DATA, OSM_PBF_PATH

(PosixPath('/Users/anishkrishnan/Documents/projects/ambulance/data/raw'),
 PosixPath('/Users/anishkrishnan/Documents/projects/ambulance/data/processed'),
 PosixPath('/Users/anishkrishnan/Documents/projects/ambulance/data/raw/norcal_openstreet.osm.pbf'))

In [12]:
# =========================
# Import Directories
# =========================

er_baseline = pd.read_csv(
    RAW_DATA / "er_wait_baseline.csv"
)

hospital_capacity = pd.read_csv(
    RAW_DATA / "hospital_capacity.csv"
)

In [18]:
# =========================
# Structural Cleaning
# =========================

# Hospital Dataset Granularity: One row is a unique hospital and its capacity information
# ER Dataset Granularity: One row is a single case, labeled with unique case ID & column information

hospital_used = hospital_capacity.copy()
er_used = er_baseline.copy()

#Standardizing column names by removing capital letters in ER dataset
er_used.columns = er_used.columns.str.lower()

#Enforcing data types for ER data
er_used['caseid'] = er_used['caseid'].astype('string')           
er_used['shift'] = er_used['shift'].astype('string')            
er_used['triagelevel'] = er_used['triagelevel'].astype('string') 
er_used['staffonduty'] = er_used['staffonduty'].astype('int')   
er_used['waittime_mins'] = er_used['waittime_mins'].astype('float') 
er_used['walkout_yn'] = er_used['walkout_yn'].astype('string')
er_used['arrivaltime'] = er_used['arrivaltime'].astype('string') 
er_used['age'] = er_used['age'].astype('float')                  
er_used['criticalcases_onshift'] = er_used['criticalcases_onshift'].astype('int')
er_used['defect_yn'] = er_used['defect_yn'].astype('string')    


#Enforcing data types for hospital data
hospital_used['hospital_pk'] = hospital_used['hospital_pk'].astype('string')
hospital_used['collection_week'] = hospital_used['collection_week'].astype('string')
hospital_used['state'] = hospital_used['state'].astype('string')
hospital_used['ccn'] = hospital_used['ccn'].astype('string')
hospital_used['hospital_name'] = hospital_used['hospital_name'].astype('string')
hospital_used['address'] = hospital_used['address'].astype('string')
hospital_used['city'] = hospital_used['city'].astype('string')
hospital_used['zip'] = hospital_used['zip'].astype('string')
hospital_used['hospital_subtype'] = hospital_used['hospital_subtype'].astype('string')
hospital_used['fips_code'] = hospital_used['fips_code'].astype('string')
hospital_used['is_metro_micro'] = hospital_used['is_metro_micro'].astype('boolean')
hospital_used['geocoded_hospital_address'] = hospital_used['geocoded_hospital_address'].astype('string')
hospital_used['hhs_ids'] = hospital_used['hhs_ids'].astype('string')
hospital_used['is_corrected'] = hospital_used['is_corrected'].astype('string')

for i in hospital_used.columns:
    if i not in ['hospital_pk', 'is_corrected', 'collection_week', 'state', 'ccn', 'hospital_name', 'address', 'city', 'zip', 'hospital_subtype', 'fips_code', 'is_metro_micro', 'geocoded_hospital_address', 'hhs_ids']:
        hospital_used[i] = hospital_used[i].astype('string')
        hospital_used[i] = hospital_used[i].str.replace(',', '')
        hospital_used[i] = hospital_used[i].astype("float64")
        hospital_used[i] = pd.to_numeric(hospital_used[i], errors='raise')

# Replacing sentinel values in numerical columns with N/A
hospital_numeric_cols = hospital_used.select_dtypes(include=[np.number]).columns

hospital_used[hospital_numeric_cols] = (
    hospital_used[hospital_numeric_cols].mask(hospital_used[hospital_numeric_cols] <= -99999, np.nan)
)

# Replacing N/A values in numerical collumns with the mean of that column for hospital data
for i in hospital_numeric_cols:
    hospital_used[i] = (
    hospital_used[i]
    .fillna(hospital_used[i].median())
    )
    
# Replacing N/A values in numerical columns with the mean of that columns for ER data
er_numeric_cols = er_used.select_dtypes(include=[np.number]).columns

er_used[er_numeric_cols] = (
    er_used[er_numeric_cols]
    .fillna(er_used[er_numeric_cols].mean())
)

# Replacing outlier values in numerical columns with the mean of that column for ER data
for i in er_numeric_cols:
    Q1 = er_used[i].quantile(0.25)
    Q3 = er_used[i].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 4 * IQR
    upper_bound = Q3 + 4 * IQR
    er_used[i] = np.where((er_used[i] < lower_bound) | (er_used[i] > upper_bound), er_used[i].median(), er_used[i])

# Removing duplicate rows in hospital data
hospital_used = hospital_used.drop_duplicates()

# Removing duplicate rows in ER data
er_used = er_used.drop_duplicates()

er_used

Unnamed: 0,caseid,shift,triagelevel,staffonduty,waittime_mins,walkout_yn,arrivaltime,age,criticalcases_onshift,defect_yn
0,1,Day,2,5.0,140.057253,No,Afternoon,45.3,2.0,No
1,2,Night,1,5.0,124.620014,No,Morning,50.1,1.0,No
2,3,Day,3,5.0,143.727658,No,Evening,39.7,0.0,No
3,4,Day,2,5.0,165.008465,No,Afternoon,48.9,3.0,Yes
4,5,Night,2,5.0,122.288812,No,Evening,55.2,2.0,No
...,...,...,...,...,...,...,...,...,...,...
245,246,Day,1,5.0,119.849211,No,Morning,50.1,2.0,No
246,247,Day,2,5.0,116.434814,No,Afternoon,47.4,1.0,No
247,248,Night,3,5.0,112.098044,No,Evening,38.2,0.0,No
248,249,Day,2,5.0,170.902150,No,Morning,49.7,2.0,Yes


In [2]:
# =========================
# Exploratory Data Analysis
# =========================
hospital_used

NameError: name 'hospital_used' is not defined