# Smart Transport Optimization for SDG 11 — Colab Notebook
Author: STEPHEN ODHIAMBO

This notebook demonstrates an unsupervised learning approach (K-Means clustering) to identify regions
in a city that share similar transport demand characteristics and suggests optimization actions.
Context: Nairobi, Kenya

Run this notebook in Google Colab or locally. The notebook includes a synthetic example and
instructions for replacing the synthetic data with real open data (e.g., Open Data Kenya, World Bank).

In [None]:
# Install dependencies (uncomment if running in a fresh Colab environment)
# !pip install scikit-learn pandas matplotlib seaborn geopandas folium


In [None]:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')


In [None]:
# Synthetic dataset (replace with real city data)
data = {
    'Region': ['CBD','Westlands','Kibera','Langata','Embakasi','Kasarani','Karen','Ruiru'],
    'Population_Density': [12000, 9000, 15000, 11000, 13000, 9500, 7000, 10000],
    'Avg_Traffic_Volume': [800, 600, 1200, 900, 1100, 750, 400, 700],
    'Distance_to_CBD_km': [0, 3, 5, 6, 8, 7, 10, 12],
    'Bus_Stops': [25, 18, 10, 12, 15, 9, 7, 8]
}
df = pd.DataFrame(data)
df.head()


In [None]:
# Preprocess features
features = ['Population_Density', 'Avg_Traffic_Volume', 'Distance_to_CBD_km', 'Bus_Stops']
X = df[features].values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


In [None]:
# Choose number of clusters (k)
k = 3
kmeans = KMeans(n_clusters=k, random_state=42)
clusters = kmeans.fit_predict(X_scaled)
df['Cluster'] = clusters
df


In [None]:
# Visualize clusters
plt.figure(figsize=(8,6))
sns.scatterplot(x='Population_Density', y='Avg_Traffic_Volume', hue='Cluster', data=df, s=120)
plt.title('City Transport Clusters (K-Means)')
plt.xlabel('Population Density (people/km²)')
plt.ylabel('Average Traffic Volume (vehicles/day)')
plt.show()


In [None]:
# Cluster summary and suggested actions
summary = df.groupby('Cluster').agg({
    'Region': lambda x: ', '.join(x),
    'Population_Density': 'mean',
    'Avg_Traffic_Volume': 'mean',
    'Distance_to_CBD_km': 'mean',
    'Bus_Stops': 'mean'
}).reset_index()
summary.columns = ['Cluster', 'Regions', 'Avg_Pop_Density', 'Avg_Traffic', 'Avg_Distance_km', 'Avg_Bus_Stops']
summary


## How to use this notebook with real data
1. Replace the synthetic `data` dict with a `pd.read_csv()` call to your dataset.
2. Ensure features are numeric and fill or drop missing values.
3. Scale features, run KMeans, and visualize clusters.
4. Consider mapping clusters onto a city map using `folium` or `geopandas` for geospatial visuals.
