# NOTEBOOK 01: SOCIO-ECONOMIC SPATIAL EXTRACTION
**Objective:** To extract, classify, and mathematically weight Nairobi's workforce (Origins) and employment centers (Destinations) to build a highly accurate, socio-economic urban transit model.

## PHASE 1: JOB CENTER EXTRACTION (THE DESTINATIONS)
Real-world urban transit requires precise targeting of actual economic zones, not abstract mathematical clusters. We divide Nairobi's employment hubs into two distinct destinations based on the city's decentralized economic reality:
1. **Corporate / White-Collar Hubs:** The formal economy—administrative, financial, tech centers, and diplomatic zones.
2. **Industrial & Soko / Blue-Collar Hubs:** The manual and wholesale economy—manufacturing corridors, logistics hubs, and massive open-air markets.

Using the `osmnx` library, we dynamically connect to the OpenStreetMap (OSM) database to extract the exact geographic center-points (centroids) of these specific economic engines.

In [5]:
# ==========================================
# PHASE 1.1: ENVIRONMENT & LIBRARY SETUP
# ==========================================
import pandas as pd
import geopandas as gpd
import osmnx as ox
import rasterio
import numpy as np
import os
import warnings
warnings.filterwarnings('ignore')

print("[INFO] Initializing Notebook 01: Socio-Economic Extraction...")

# ==========================================
# PHASE 1.2: DEFINING THE ECONOMIC ENGINES 
# ==========================================
corporate_hubs = [
    "Nairobi Central, Nairobi",     
    "Upper Hill, Nairobi",          
    "Westlands, Nairobi",           
    "Kilimani, Nairobi",            
    "Gigiri, Nairobi",              
    "Riverside, Nairobi",           
    "Parklands, Nairobi",           
    "Lavington, Nairobi",           
    "Karen, Nairobi"                
]

industrial_soko_hubs = [
    "Industrial Area, Nairobi",     
    "Embakasi, Nairobi",            
    "Baba Dogo, Nairobi",           
    "Syokimau, Machakos",           
    "Eastleigh, Nairobi",           
    "Nyamakima, Nairobi",           # The API struggles with this one
    "Kariobangi, Nairobi",          
    "Gikomba, Nairobi",             
    "Muthurwa, Nairobi",            
    "Wakulima Market, Nairobi",     
    "Toi Market, Nairobi",          
    "City Park Market, Nairobi"     
]

# ==========================================
# PHASE 1.3: EXTRACTING SPATIAL COORDINATES (WITH OVERRIDE)
# ==========================================
print("[INFO] Extracting Job Hub coordinates from OpenStreetMap API...")

# The Override Dictionary for informal places the API cannot find
manual_coordinates = {
    "Nyamakima, Nairobi": (-1.2825, 36.8244) # Exact GPS coordinates for Nyamakima
}

destinations_data = []
hub_id = 0

# 1. Extract Corporate Hubs
for place in corporate_hubs:
    try:
        if place in manual_coordinates:
            lat, lon = manual_coordinates[place]
        else:
            lat, lon = ox.geocode(place)
            
        destinations_data.append({'hub_id': hub_id, 'name': place.split(",")[0], 'type': 'Corporate', 'lat': lat, 'lon': lon})
        hub_id += 1
        print(f"  [SUCCESS] Tagged Corporate Hub: {place}")
    except Exception:
        print(f"  [WARNING] Could not locate: {place}")

# 2. Extract Industrial & Soko Hubs
for place in industrial_soko_hubs:
    try:
        if place in manual_coordinates:
            lat, lon = manual_coordinates[place]
        else:
            lat, lon = ox.geocode(place)
            
        destinations_data.append({'hub_id': hub_id, 'name': place.split(",")[0], 'type': 'Industrial_Soko', 'lat': lat, 'lon': lon})
        hub_id += 1
        print(f"  [SUCCESS] Tagged Industrial/Soko Hub: {place}")
    except Exception:
        print(f"  [WARNING] Could not locate: {place}")

# 3. Convert to GeoDataFrame & Project to Metric (UTM 37S)
df_destinations = pd.DataFrame(destinations_data)
gdf_destinations = gpd.GeoDataFrame(
    df_destinations, 
    geometry=gpd.points_from_xy(df_destinations.lon, df_destinations.lat), 
    crs="EPSG:4326"
)
gdf_destinations_utm = gdf_destinations.to_crs("EPSG:32737")

print(f"\n[COMPLETE] Extracted {len(gdf_destinations_utm)} major Job Centers.")

[INFO] Initializing Notebook 01: Socio-Economic Extraction...
[INFO] Extracting Job Hub coordinates from OpenStreetMap API...
  [SUCCESS] Tagged Corporate Hub: Nairobi Central, Nairobi
  [SUCCESS] Tagged Corporate Hub: Upper Hill, Nairobi
  [SUCCESS] Tagged Corporate Hub: Westlands, Nairobi
  [SUCCESS] Tagged Corporate Hub: Kilimani, Nairobi
  [SUCCESS] Tagged Corporate Hub: Gigiri, Nairobi
  [SUCCESS] Tagged Corporate Hub: Riverside, Nairobi
  [SUCCESS] Tagged Corporate Hub: Parklands, Nairobi
  [SUCCESS] Tagged Corporate Hub: Lavington, Nairobi
  [SUCCESS] Tagged Corporate Hub: Karen, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Industrial Area, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Embakasi, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Baba Dogo, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Syokimau, Machakos
  [SUCCESS] Tagged Industrial/Soko Hub: Eastleigh, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Nyamakima, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub:

## PHASE 2 & 3: THE WORKFORCE & SOCIO-ECONOMIC JOIN
This phase extracts where people live (Origins) and mathematically categorizes them into Socio-Economic Tiers. 

**The Human-in-the-Loop (HITL) Architecture:**
Foreign AI models often misunderstand African geography. To prevent this, we explicitly define the neighborhood classifications using rigorous local knowledge. We divide the residential areas into three tiers:
* **Tier 1 (High-Income):** Low density, formal infrastructure.
* **Tier 2 (Informal / Blue-Collar):** High density, informal layout.
* **Tier 3 (Middle-Income):** Medium-high density, formal grid layout.



**The Process:**
1. Download the exact polygon boundaries for every neighborhood listed in our master configuration dictionary.
2. Load the WorldPop Density Raster to locate the population dots.
3. Perform a **Spatial Join (Point-in-Polygon)**: The code checks which neighborhood boundary the population dot falls inside, permanently stamping it with that Socio-Economic Tier.

In [13]:
# ==========================================
# PHASE 1.1: ENVIRONMENT & LIBRARY SETUP
# ==========================================
import pandas as pd
import geopandas as gpd
import osmnx as ox
import rasterio
import numpy as np
import os
from shapely.geometry import Point
import warnings
warnings.filterwarnings('ignore')

print("[INFO] Initializing Notebook 01: Socio-Economic Extraction...")

# ==========================================
# PHASE 1.2: DEFINING THE ECONOMIC ENGINES (FULL RIGOROUS LIST)
# ==========================================
corporate_hubs = [
    "Nairobi Central, Nairobi",     # The CBD
    "Upper Hill, Nairobi",          # Financial Center
    "Westlands, Nairobi",           # Corporate/Tech Center
    "Kilimani, Nairobi",            # Decentralized Commercial
    "Gigiri, Nairobi",              # Diplomatic/UN/NGO
    "Riverside, Nairobi",           # Corporate HQs
    "Parklands, Nairobi",           # Medical & Commercial
    "Lavington, Nairobi",           # Agencies & Decentralized Offices
    "Karen, Nairobi"                # Office Parks
]

industrial_soko_hubs = [
    "Industrial Area, Nairobi",     # Heavy Manufacturing
    "Embakasi, Nairobi",            # Aviation & Inland Container Depot
    "Baba Dogo, Nairobi",           # Light Manufacturing
    "Syokimau, Machakos",           # Mombasa Rd Manufacturing Corridor
    "Eastleigh, Nairobi",           # Massive Wholesale/Retail Engine
    "Nyamakima, Nairobi",           # Downtown Logistics & Freight
    "Kariobangi, Nairobi",          # Light Industries & Market
    "Gikomba, Nairobi",             # Largest Soko
    "Muthurwa, Nairobi",            # Transit & Soko
    "Wakulima Market, Nairobi",     # Food Wholesale (Marikiti)
    "Toi Market, Nairobi",          # Apparel/Retail Soko
    "City Park Market, Nairobi"     # Fresh Produce Soko
]

# ==========================================
# PHASE 1.3: EXTRACTING SPATIAL COORDINATES (WITH OVERRIDE)
# ==========================================
print("[INFO] Extracting Job Hub coordinates from OpenStreetMap API...")

# The Override Dictionary for informal places the API cannot find
manual_coordinates = {
    "Nyamakima, Nairobi": (-1.2825, 36.8244) # Exact GPS coordinates
}

destinations_data = []
hub_id = 0

# 1. Extract Corporate Hubs
for place in corporate_hubs:
    try:
        if place in manual_coordinates:
            lat, lon = manual_coordinates[place]
        else:
            lat, lon = ox.geocode(place)
            
        destinations_data.append({'hub_id': hub_id, 'name': place.split(",")[0], 'type': 'Corporate', 'lat': lat, 'lon': lon})
        hub_id += 1
        print(f"  [SUCCESS] Tagged Corporate Hub: {place}")
    except Exception:
        print(f"  [WARNING] Could not locate: {place}")

# 2. Extract Industrial & Soko Hubs
for place in industrial_soko_hubs:
    try:
        if place in manual_coordinates:
            lat, lon = manual_coordinates[place]
        else:
            lat, lon = ox.geocode(place)
            
        destinations_data.append({'hub_id': hub_id, 'name': place.split(",")[0], 'type': 'Industrial_Soko', 'lat': lat, 'lon': lon})
        hub_id += 1
        print(f"  [SUCCESS] Tagged Industrial/Soko Hub: {place}")
    except Exception:
        print(f"  [WARNING] Could not locate: {place}")

# 3. Convert to GeoDataFrame & Project to Metric (UTM 37S)
df_destinations = pd.DataFrame(destinations_data)
gdf_destinations = gpd.GeoDataFrame(
    df_destinations, 
    geometry=gpd.points_from_xy(df_destinations.lon, df_destinations.lat), 
    crs="EPSG:4326"
)
gdf_destinations_utm = gdf_destinations.to_crs("EPSG:32737")

print(f"\n[COMPLETE] Extracted {len(gdf_destinations_utm)} major Job Centers.")

[INFO] Initializing Notebook 01: Socio-Economic Extraction...
[INFO] Extracting Job Hub coordinates from OpenStreetMap API...
  [SUCCESS] Tagged Corporate Hub: Nairobi Central, Nairobi
  [SUCCESS] Tagged Corporate Hub: Upper Hill, Nairobi
  [SUCCESS] Tagged Corporate Hub: Westlands, Nairobi
  [SUCCESS] Tagged Corporate Hub: Kilimani, Nairobi
  [SUCCESS] Tagged Corporate Hub: Gigiri, Nairobi
  [SUCCESS] Tagged Corporate Hub: Riverside, Nairobi
  [SUCCESS] Tagged Corporate Hub: Parklands, Nairobi
  [SUCCESS] Tagged Corporate Hub: Lavington, Nairobi
  [SUCCESS] Tagged Corporate Hub: Karen, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Industrial Area, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Embakasi, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Baba Dogo, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Syokimau, Machakos
  [SUCCESS] Tagged Industrial/Soko Hub: Eastleigh, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub: Nyamakima, Nairobi
  [SUCCESS] Tagged Industrial/Soko Hub:

## PHASE 2 & 3: THE WORKFORCE & SOCIO-ECONOMIC JOIN
This phase extracts where people live (Origins) and mathematically categorizes them into Socio-Economic Tiers. 

**The Human-in-the-Loop (HITL) Architecture:**
Foreign AI models often misunderstand African geography. To prevent this, we explicitly define the neighborhood classifications using rigorous local knowledge. We divide the residential areas into three tiers:
* **Tier 1 (High-Income):** Low density, formal infrastructure.
* **Tier 2 (Informal / Blue-Collar):** High density, informal layout.
* **Tier 3 (Middle-Income):** Medium-high density, formal grid layout.

**The Process:**
1. Download the exact polygon boundaries for every neighborhood listed in our master configuration dictionary. If OSM only has a point, we mathematically synthesize a 1.3km boundary buffer.
2. Load the WorldPop Density Raster to locate the population dots.
3. Perform a **Spatial Join (Point-in-Polygon)**: The code checks which neighborhood boundary the population dot falls inside, permanently stamping it with that Socio-Economic Tier.

In [15]:
# ==========================================
# PHASE 2.1: THE HUMAN-IN-THE-LOOP CONFIGURATION 
# ==========================================
print("[INFO] Initiating Socio-Economic Boundary Extraction...")

tier_1_high_income = [
    "Karen, Nairobi", "Muthaiga, Nairobi", "Runda, Nairobi", "Lavington, Nairobi", 
    "Kitisuru, Nairobi", "Nyari, Nairobi", "Gigiri, Nairobi", "Spring Valley, Nairobi", 
    "Kileleshwa, Nairobi", "Kilimani, Nairobi", "Riverside, Nairobi", "Ridgeways, Nairobi", 
    "Loresho, Nairobi", "Hillview, Nairobi", "Lake View, Nairobi", "Kyuna, Nairobi", 
    "Parklands, Nairobi", "Hurlingham, Nairobi"
]

tier_2_informal = [
    "Kibera, Nairobi", "Mathare, Nairobi", "Mukuru Kwa Njenga, Nairobi", 
    "Mukuru Kwa Reuben, Nairobi", "Korogocho, Nairobi", "Kawangware, Nairobi", 
    "Kangemi, Nairobi", "Dandora, Nairobi", "Kariobangi, Nairobi", "Kayole, Nairobi", 
    "Huruma, Nairobi", "Majengo, Nairobi", "Kiambiu, Nairobi", "Viwandani, Nairobi"
]

tier_3_middle_income = [
    "Pipeline, Nairobi", "Umoja, Nairobi", "Donholm, Nairobi", "Buruburu, Nairobi", 
    "Tena, Nairobi", "Imara Daima, Nairobi", "South B, Nairobi", "South C, Nairobi", 
    "Madaraka, Nairobi", "Ngara, Nairobi", "Roysambu, Nairobi", "Kasarani, Nairobi", 
    "Zimmerman, Nairobi", "Langata, Nairobi", "Pangani, Nairobi", "Kahawa West, Nairobi", 
    "Ruaka, Kiambu", "Uthiru, Kiambu", "Fedha, Nairobi", "Nairobi West, Nairobi"
]

# ==========================================
# PHASE 2.2: DOWNLOADING & SYNTHESIZING POLYGONS
# ==========================================
def fetch_boundaries(places, tier_label):
    polygons = []
    for place in places:
        try:
            # 1. Try to get the official boundary polygon
            gdf = ox.geocode_to_gdf(place)
            geom = gdf['geometry'].iloc[0]
            
            # If OSM only has a point, trigger the fallback buffer
            if geom.geom_type == 'Point':
                geom = geom.buffer(0.012) # ~1.3km radius
                print(f"  [FIXED] Created synthetic buffer for Point-only location: {place}")
            else:
                print(f"  [SUCCESS] Downloaded official polygon for: {place}")
                
            polygons.append({'neighborhood': place.split(",")[0], 'tier': tier_label, 'geometry': geom})
            
        except Exception:
            try:
                # 2. Hard Fallback: Geocode just the center lat/lon and draw a 1.3km circle
                lat, lon = ox.geocode(place)
                geom = Point(lon, lat).buffer(0.012) # ~1.3km radius
                polygons.append({'neighborhood': place.split(",")[0], 'tier': tier_label, 'geometry': geom})
                print(f"  [FIXED] Geocoded and created synthetic boundary for: {place}")
            except Exception:
                print(f"  [FAILED] Completely failed to locate: {place}")
                
    return polygons

print("\nDownloading Tier 1 Boundaries...")
poly_t1 = fetch_boundaries(tier_1_high_income, "Tier_1_WhiteCollar")
print("\nDownloading Tier 2 Boundaries...")
poly_t2 = fetch_boundaries(tier_2_informal, "Tier_2_Informal")
print("\nDownloading Tier 3 Boundaries...")
poly_t3 = fetch_boundaries(tier_3_middle_income, "Tier_3_MiddleIncome")

# Combine all polygons into one Master Zoning Map
all_polygons = poly_t1 + poly_t2 + poly_t3
gdf_zones = gpd.GeoDataFrame(all_polygons, crs="EPSG:4326")
gdf_zones_utm = gdf_zones.to_crs("EPSG:32737")

print(f"\n[SUCCESS] {len(gdf_zones_utm)} Total Neighborhood Boundaries established.")

# ==========================================
# PHASE 3.1: POPULATION EXTRACTION (WORLDPOP)
# ==========================================
# Exact absolute path to the UN-adjusted constrained dataset
RASTER_PATH = r"C:\Users\Administrator\Desktop\Nairobi_Transit_Optimizer\data\raw\ken_ppp_2020_UNadj_constrained.tif"

print("\n[INFO] Extracting workforce dots from WorldPop raster...")
try:
    with rasterio.open(RASTER_PATH) as src:
        data = src.read(1)
        # Filter: Only keep pixels with > 30 people to remove empty land
        mask = data > 30 
        rows, cols = np.where(mask)
        pop_values = data[rows, cols]
        xs, ys = rasterio.transform.xy(src.transform, rows, cols)

    df_pixels = pd.DataFrame({'x': xs, 'y': ys, 'population': pop_values})
    gdf_pixels = gpd.GeoDataFrame(
        df_pixels, 
        geometry=gpd.points_from_xy(df_pixels.x, df_pixels.y), 
        crs="EPSG:4326"
    )
    gdf_pixels_utm = gdf_pixels.to_crs("EPSG:32737")
    
    print(f"[SUCCESS] Extracted {len(gdf_pixels_utm)} dense residential dots.")

    # ==========================================
    # PHASE 3.2: THE SOCIO-ECONOMIC SPATIAL JOIN
    # ==========================================
    print("[INFO] Executing Spatial Join: Dropping dots into Neighborhood Zones...")
    
    # Mathematically check which boundary polygon every dot falls inside
    gdf_classified_origins = gpd.sjoin(gdf_pixels_utm, gdf_zones_utm, how="inner", predicate="within")
    
    print("\n--- CLASSIFICATION AUDIT SUMMARY ---")
    summary = gdf_classified_origins['tier'].value_counts()
    print(summary)
    print("------------------------------------\n")
    print(f"[COMPLETE] {len(gdf_classified_origins)} workforce nodes successfully classified.")

except FileNotFoundError:
    print(f"\n[ERROR] Could not find {RASTER_PATH}. Please check the file path.")
except Exception as e:
    print(f"\n[ERROR] An unexpected error occurred: {e}")

[INFO] Initiating Socio-Economic Boundary Extraction...

Downloading Tier 1 Boundaries...
  [SUCCESS] Downloaded official polygon for: Karen, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Muthaiga, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Runda, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Lavington, Nairobi
  [SUCCESS] Downloaded official polygon for: Kitisuru, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Nyari, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Gigiri, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Spring Valley, Nairobi
  [SUCCESS] Downloaded official polygon for: Kileleshwa, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Kilimani, Nairobi
  [SUCCESS] Downloaded official polygon for: Riverside, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Ridgeways, Nairobi
  [FIXED] Geocoded and created synthetic boundary for: Loresho, Nairobi
  [

## PHASE 4: DATA EXPORT
With the Origins mathematically weighted by Socio-Economic Tier, and the Destinations explicitly labeled by Job Type, the data is saved to physical storage. This perfect baseline will be fed into **Notebook 02** to run the multi-tiered transit routing algorithms.

In [16]:
# ==========================================
# PHASE 4: CSV EXPORT
# ==========================================
print("[INFO] Exporting Smart Datasets to CSV...")

# Ensure output directory exists (using absolute path to be perfectly safe)
output_dir = r"C:\Users\Administrator\Desktop\Nairobi_Transit_Optimizer\data\processed"
os.makedirs(output_dir, exist_ok=True)

# 1. Save the Classified Destinations (The Jobs)
dest_path = os.path.join(output_dir, "classified_destinations.csv")
df_dest_export = pd.DataFrame({
    'hub_id': gdf_destinations_utm['hub_id'],
    'name': gdf_destinations_utm['name'],
    'type': gdf_destinations_utm['type'],
    'x': gdf_destinations_utm.geometry.x,
    'y': gdf_destinations_utm.geometry.y
})
df_dest_export.to_csv(dest_path, index=False)

# 2. Save the Classified Origins (The People)
orig_path = os.path.join(output_dir, "classified_origins.csv")
df_orig_export = pd.DataFrame({
    'neighborhood': gdf_classified_origins['neighborhood'],
    'tier': gdf_classified_origins['tier'],
    'population': gdf_classified_origins['population'],
    'x': gdf_classified_origins.geometry.x,
    'y': gdf_classified_origins.geometry.y
})
df_orig_export.to_csv(orig_path, index=False)

print(f"[SUCCESS] Destinations saved to: {dest_path}")
print(f"[SUCCESS] Origins saved to: {orig_path}")
print("\n[COMPLETE] NOTEBOOK 01 IS FINISHED. YOU MAY NOW PROCEED TO NOTEBOOK 02.")

[INFO] Exporting Smart Datasets to CSV...
[SUCCESS] Destinations saved to: C:\Users\Administrator\Desktop\Nairobi_Transit_Optimizer\data\processed\classified_destinations.csv
[SUCCESS] Origins saved to: C:\Users\Administrator\Desktop\Nairobi_Transit_Optimizer\data\processed\classified_origins.csv

[COMPLETE] NOTEBOOK 01 IS FINISHED. YOU MAY NOW PROCEED TO NOTEBOOK 02.
