# Urban/Rural Analysis of CT Hospital Benefits

This notebook demonstrates how to analyze benefit areas by their rural/urban characteristics using the GHS-SMOD (Global Human Settlement Model - Settlement Model) raster data.

## Overview

The GHS-SMOD raster classifies areas by degree of urbanization using UN standards:
- **Urban codes**: {30, 23, 22, 21} - Cities, towns, suburban areas
- **Rural codes**: {13, 12, 11} - Rural areas with different densities
- **Other codes**: Water bodies, very low density areas, etc.

## Requirements

1. **Benefit analysis results**: Previously generated using `geostroke.benefit` functions
2. **GHS-SMOD raster**: Download from [GHS-SMOD webpage](https://ghsl.jrc.ec.europa.eu/ghs_smod2023.php)
   - File: `GHS_SMOD_E2025_GLOBE_R2023A_54009_1000_V2_0.tif` (or similar)
   - Place in `raw_data/` directory


In [1]:
# ---- 1. Install what you might still need ----
# pip install geopandas rasterio pandas pyproj shapely

import sys
from pathlib import Path

# Set working directory to project root
import os
os.chdir('..')  # Go

import geostroke as gs
import geopandas as gpd
import pandas as pd
import pickle
import numpy as np
from pathlib import Path

print(f"✅ Loaded geostroke version {gs.__version__}")
print(f"📁 Project root: {gs.config.ROOT}")
print(f"📁 Data directory: {gs.config.DATA_DIR}")


✅ Loaded geostroke version 1.0.0
📁 Project root: /Users/larsmasanneck/Library/CloudStorage/OneDrive-Personal/Dokumente/Research Basics and Coding Projects/Data Projects/Marc Pawlitzki/GeoStroke
📁 Data directory: /Users/larsmasanneck/Library/CloudStorage/OneDrive-Personal/Dokumente/Research Basics and Coding Projects/Data Projects/Marc Pawlitzki/GeoStroke/raw_data


## Configuration

**Update these paths for your specific files:**

In [2]:
# ---- 2. Point to your inputs ----

BENEFIT_FILE = gs.config.DATA_DIR / "benefit_cache" / "benefit_analysis_e1f09274ca3df68bc9344be034cc005f.pkl"  # <-- change to your actual file


# GHS-SMOD raster path (adjust for your file)
SMOD_RASTER = gs.config.DATA_DIR / "GHS_SMOD_E2025_GLOBE_R2023A_54009_1000_V2_0.tif"

# Check if files exist
print(f"\n🔍 File checks:")
print(f"   Benefit file exists: {'✅' if BENEFIT_FILE and BENEFIT_FILE.exists() else '❌'}")
print(f"   SMOD raster exists: {'✅' if SMOD_RASTER.exists() else '❌'}")

if BENEFIT_FILE:
    print(f"   Benefit file: {BENEFIT_FILE}")
if SMOD_RASTER.exists():
    print(f"   SMOD raster: {SMOD_RASTER}")
else:
    print(f"   ❌ SMOD raster not found at: {SMOD_RASTER}")
    print(f"   📥 Please download from: https://ghsl.jrc.ec.europa.eu/ghs_smod2023.php")



🔍 File checks:
   Benefit file exists: ✅
   SMOD raster exists: ✅
   Benefit file: /Users/larsmasanneck/Library/CloudStorage/OneDrive-Personal/Dokumente/Research Basics and Coding Projects/Data Projects/Marc Pawlitzki/GeoStroke/raw_data/benefit_cache/benefit_analysis_e1f09274ca3df68bc9344be034cc005f.pkl
   SMOD raster: /Users/larsmasanneck/Library/CloudStorage/OneDrive-Personal/Dokumente/Research Basics and Coding Projects/Data Projects/Marc Pawlitzki/GeoStroke/raw_data/GHS_SMOD_E2025_GLOBE_R2023A_54009_1000_V2_0.tif


## Alternative: Generate Benefit Analysis

If you don't have cached benefit results, run this cell to generate them:


In [3]:
# ---- Alternative: Generate benefit analysis if needed ----

if not BENEFIT_FILE or not BENEFIT_FILE.exists():
    print("🔄 Generating benefit analysis (this may take several minutes)...")
    
    # Quick benefit analysis with coarser resolution for demo
    benefit_gdf = gs.benefit.calculate_time_benefits_parallel(
        ct_penalty=0.0,
        benefit_threshold=10.0,
        grid_resolution=0.01,  # Coarser resolution for faster processing
        time_bins=[5,10,15,20,25, 30,35,40, 45,50,55, 60],  # Fewer time bins
        max_workers=4
    )
    
    print(f"✅ Generated benefit analysis with {len(benefit_gdf):,} points")
    
else:
    print("✅ Benefit file exists, will load from cache")
    benefit_gdf = None  # Will be loaded in next cell


✅ Benefit file exists, will load from cache


## Load Benefit Analysis Data


In [4]:
# ---- 3. Load the benefit GeoDataFrame ----

if benefit_gdf is None:  # Not generated in previous cell
    if BENEFIT_FILE and BENEFIT_FILE.exists():
        print(f"📥 Loading benefit analysis from: {BENEFIT_FILE.name}")
        
        if BENEFIT_FILE.suffix == ".pkl":
            with BENEFIT_FILE.open("rb") as f:
                cached_data = pickle.load(f)
            
            # Handle both old and new cache formats
            if isinstance(cached_data, dict) and "data" in cached_data:
                # New format with metadata
                benefit_gdf = gpd.GeoDataFrame(cached_data["data"], crs="EPSG:4326")
                if "params" in cached_data:
                    print(f"   📋 Analysis parameters: {cached_data['params']}")
            else:
                # Old format - direct data
                benefit_gdf = gpd.GeoDataFrame(cached_data, crs="EPSG:4326")
        else:
            # GeoPackage or other format
            benefit_gdf = gpd.read_file(BENEFIT_FILE)
            
        print(f"✅ Loaded {len(benefit_gdf):,} benefit points")
        
    else:
        raise FileNotFoundError("No benefit file available. Please run benefit analysis first.")

# Display basic info about the benefit data
print(f"\n📊 Benefit Data Summary:")
print(f"   Total points: {len(benefit_gdf):,}")
print(f"   Coordinate system: {benefit_gdf.crs}")
print(f"   Columns: {list(benefit_gdf.columns)}")
print(f"\n   Benefit categories:")
for category, count in benefit_gdf['benefit_category'].value_counts().items():
    print(f"     {category}: {count:,} points")


📥 Loading benefit analysis from: benefit_analysis_e1f09274ca3df68bc9344be034cc005f.pkl
   📋 Analysis parameters: {'ct_suffix': '_all_CTs', 'stroke_suffix': '', 'time_bins': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60], 'ct_penalty': 0, 'benefit_threshold': 10.0, 'grid_resolution': 0.01, 'bounds': None}
✅ Loaded 510,229 benefit points

📊 Benefit Data Summary:
   Total points: 510,229
   Coordinate system: EPSG:4326
   Columns: ['geometry', 'ct_time', 'stroke_time', 'time_benefit', 'benefit_category']

   Benefit categories:
     Neither reachable within 60 min: 260,954 points
     Low (10-20 min): 117,029 points
     Medium (20-30 min): 72,853 points
     High (30+ min): 37,724 points
     Only CT reachable within 60 min: 21,106 points
     Likely irrelevant (<10 min): 563 points


## Urban/Rural Annotation

Now we'll add urban/rural labels to each benefit point using the GHS-SMOD raster:


In [5]:
# ---- 4. Annotate every point with rural/urban classification (FIXED to include full complement) ----
import numpy as np
import geopandas as gpd
from shapely.geometry import Point

# Sanity check
if not SMOD_RASTER.exists():
    raise FileNotFoundError(f"SMOD raster not found at {SMOD_RASTER}")

# Germany outline & polygon
GERMANY = gs.data.load_germany_outline()
GERMANY_POLY = GERMANY.geometry.iloc[0]

# Infer grid resolution from the cached benefit points (fallback 0.01°)
xs_u = np.sort(benefit_gdf.geometry.x.unique())
ys_u = np.sort(benefit_gdf.geometry.y.unique())
try:
    dx = np.median(np.diff(xs_u))
    dy = np.median(np.diff(ys_u))
    GRID_RES = round(float(min(dx, dy)), 5)
    if GRID_RES <= 0 or GRID_RES > 0.05:
        GRID_RES = 0.01
except Exception:
    GRID_RES = 0.01

# Build full Germany grid
minx, miny, maxx, maxy = GERMANY.total_bounds
xs = np.arange(minx, maxx + GRID_RES, GRID_RES)
ys = np.arange(miny, maxy + GRID_RES, GRID_RES)
pts = [Point(x, y) for x in xs for y in ys]
GERMANY_GRID = gpd.GeoDataFrame(
    geometry=[p for p in pts if GERMANY_POLY.contains(p) or GERMANY_POLY.touches(p)],
    crs=GERMANY.crs
)

# Map categories from benefit_gdf to the full grid (no spatial join with points!)
def _ck(p):  # coordinate key
    return (round(p.x, 5), round(p.y, 5))

cat_lookup = dict(zip(
    benefit_gdf.geometry.apply(_ck),
    benefit_gdf['benefit_category']
))

GERMANY_GRID['benefit_category'] = GERMANY_GRID.geometry.apply(
    lambda g: cat_lookup.get(_ck(g), "Likely irrelevant (<10 min)")
)

print(f"Total Germany grid points: {len(GERMANY_GRID):,}")
print("Category counts (points):")
print(GERMANY_GRID['benefit_category'].value_counts())

# Annotate with SMOD
print("🏙️ Annotating grid with SMOD urban/rural classes …")
annotated = gs.urban_rural_annotation.add_smod_labels(GERMANY_GRID, str(SMOD_RASTER))
print("✅ Annotation done")

# Quick check
ur_counts = annotated['urban_rural'].value_counts()
print("Urban/Rural distribution (all points):")
print(ur_counts.apply(lambda v: f"{v:,} ({v/len(annotated)*100:.1f}%)"))

Total Germany grid points: 459,349
Category counts (points):
benefit_category
Likely irrelevant (<10 min)        206068
Low (10-20 min)                    116674
Medium (20-30 min)                  72459
High (30+ min)                      37417
Only CT reachable within 60 min     20162
Neither reachable within 60 min      6569
Name: count, dtype: int64
🏙️ Annotating grid with SMOD urban/rural classes …
✅ Annotation done
Urban/Rural distribution (all points):
urban_rural
rural    412,940 (89.9%)
urban      44,554 (9.7%)
other       1,855 (0.4%)
Name: count, dtype: object


In [6]:
# ---- 5. Urban/Rural summary table (absolute + %) by benefit category ----
print("📋 Generating Urban/Rural Summary by Benefit Category")
print("=" * 70)

CATEGORY_ORDER = [
    "High (30+ min)",
    "Medium (20-30 min)",
    "Low (10-20 min)",
    "Only CT reachable within 60 min",
    "Neither reachable within 60 min",
    "Likely irrelevant (<10 min)",
]

# Counts
cnt = (annotated
       .groupby(['benefit_category', 'urban_rural'])
       .size()
       .rename('n')
       .reset_index())

# Totals per category & percentages
totals = cnt.groupby('benefit_category')['n'].sum().rename('cat_total')
cnt = cnt.merge(totals, on='benefit_category')
cnt['pct_within_cat'] = (cnt['n'] / cnt['cat_total'] * 100).round(1)

# Wide tables
summary_abs = (cnt.pivot_table(index='benefit_category',
                               columns='urban_rural',
                               values='n',
                               fill_value=0)
                 .reindex(CATEGORY_ORDER))
summary_pct = (cnt.pivot_table(index='benefit_category',
                               columns='urban_rural',
                               values='pct_within_cat',
                               fill_value=0)
                 .reindex(CATEGORY_ORDER))

pd.set_option('display.width', None)
print("\n👥 Absolute counts:")
print(summary_abs.applymap(lambda x: f"{x:,}"))

print("\n📊 Percent within each category:")
print(summary_pct)

# Save
summary_abs.to_csv("urban_rural_by_category_abs.csv")
summary_pct.to_csv("urban_rural_by_category_pct.csv")
print("\n💾 Saved CSVs: urban_rural_by_category_abs.csv, urban_rural_by_category_pct.csv")

# Markdown export
print("\n📋 Markdown (percent table):")
print(summary_pct.to_markdown())

📋 Generating Urban/Rural Summary by Benefit Category

👥 Absolute counts:
urban_rural                      other      rural     urban
benefit_category                                           
High (30+ min)                    47.0   34,966.0   2,404.0
Medium (20-30 min)                93.0   68,063.0   4,303.0
Low (10-20 min)                  242.0  106,732.0   9,700.0
Only CT reachable within 60 min  164.0   19,760.0     238.0
Neither reachable within 60 min  833.0    5,548.0     188.0
Likely irrelevant (<10 min)      476.0  177,871.0  27,721.0

📊 Percent within each category:
urban_rural                      other  rural  urban
benefit_category                                    
High (30+ min)                     0.1   93.4    6.4
Medium (20-30 min)                 0.1   93.9    5.9
Low (10-20 min)                    0.2   91.5    8.3
Only CT reachable within 60 min    0.8   98.0    1.2
Neither reachable within 60 min   12.7   84.5    2.9
Likely irrelevant (<10 min)        0.2   86

  print(summary_abs.applymap(lambda x: f"{x:,}"))


## Summary Analysis

Generate summary tables showing the urban/rural breakdown by benefit category:


In [7]:
# ---- 5. Build the summary percentage table ----

print("📋 Generating Urban/Rural Summary by Benefit Category")
print("=" * 55)

summary = gs.urban_rural_annotation.summary_by_category(annotated)

# Display the table in a nice format
print("\n📊 Summary Table:")
print(summary.to_string())

# Also display as markdown for better formatting
print("\n📋 Markdown format (for easy copying):")
print(summary.to_markdown())

# Export to CSV
output_file = "benefit_urban_rural_summary.csv"
summary.to_csv(output_file)
print(f"\n💾 Summary exported to: {output_file}")


📋 Generating Urban/Rural Summary by Benefit Category

📊 Summary Table:
urban_rural                      other   rural  urban   total  pct_urban  pct_rural  pct_other
benefit_category                                                                              
High (30+ min)                      47   34966   2404   37417        6.4       93.4        0.1
Likely irrelevant (<10 min)        476  177871  27721  206068       13.5       86.3        0.2
Low (10-20 min)                    242  106732   9700  116674        8.3       91.5        0.2
Medium (20-30 min)                  93   68063   4303   72459        5.9       93.9        0.1
Neither reachable within 60 min    833    5548    188    6569        2.9       84.5       12.7
Only CT reachable within 60 min    164   19760    238   20162        1.2       98.0        0.8

📋 Markdown format (for easy copying):
| benefit_category                |   other |   rural |   urban |   total |   pct_urban |   pct_rural |   pct_other |
|:----------

## Key Insights

Generate some key insights from the analysis:


In [7]:
# ---- 6. Generate key insights ----

print("💡 Key Insights from Urban/Rural Analysis")
print("=" * 45)

# Overall urban vs rural distribution
total_urban = summary['urban'].sum() if 'urban' in summary.columns else 0
total_rural = summary['rural'].sum() if 'rural' in summary.columns else 0
total_other = summary['other'].sum() if 'other' in summary.columns else 0
total_points = total_urban + total_rural + total_other

print(f"\n🌍 Overall Distribution:")
print(f"   Urban areas: {total_urban:,} points ({total_urban/total_points*100:.1f}%)")
print(f"   Rural areas: {total_rural:,} points ({total_rural/total_points*100:.1f}%)")
print(f"   Other areas: {total_other:,} points ({total_other/total_points*100:.1f}%)")

# Find which benefit categories are most rural vs urban
print(f"\n🏙️ Most Urban Benefit Categories:")
if 'pct_urban' in summary.columns:
    urban_sorted = summary.sort_values('pct_urban', ascending=False)
    for category in urban_sorted.index[:3]:
        pct = urban_sorted.loc[category, 'pct_urban']
        count = urban_sorted.loc[category, 'urban'] if 'urban' in urban_sorted.columns else 0
        print(f"   {category}: {pct:.1f}% urban ({count:,} points)")

print(f"\n🌾 Most Rural Benefit Categories:")
if 'pct_rural' in summary.columns:
    rural_sorted = summary.sort_values('pct_rural', ascending=False)
    for category in rural_sorted.index[:3]:
        pct = rural_sorted.loc[category, 'pct_rural']
        count = rural_sorted.loc[category, 'rural'] if 'rural' in rural_sorted.columns else 0
        print(f"   {category}: {pct:.1f}% rural ({count:,} points)")

# Healthcare desert analysis
if "Neither reachable within 60 min" in summary.index:
    desert_row = summary.loc["Neither reachable within 60 min"]
    print(f"\n🏜️ Healthcare Desert Analysis:")
    print(f"   Total healthcare desert areas: {desert_row['total']:,} points")
    if 'pct_rural' in desert_row:
        print(f"   Rural healthcare deserts: {desert_row['pct_rural']:.1f}%")
    if 'pct_urban' in desert_row:
        print(f"   Urban healthcare deserts: {desert_row['pct_urban']:.1f}%")

print(f"\n✅ Analysis complete! Check the exported CSV files for detailed results.")


💡 Key Insights from Urban/Rural Analysis

🌍 Overall Distribution:
   Urban areas: 13,949 points (6.6%)
   Rural areas: 196,454 points (93.1%)
   Other areas: 690 points (0.3%)

🏙️ Most Urban Benefit Categories:
   High (30+ min): 8.7% urban (1,985 points)
   Low (10-20 min): 7.9% urban (7,658 points)
   Medium (20-30 min): 6.8% urban (3,502 points)

🌾 Most Rural Benefit Categories:
   Only CT reachable within 60 min: 97.0% rural (19,509 points)
   Likely irrelevant (<10 min): 96.4% rural (19,607 points)
   Medium (20-30 min): 93.1% rural (48,022 points)

✅ Analysis complete! Check the exported CSV files for detailed results.


In [8]:
# ---- 7. Load and Process Isochrone Data ----

def load_isochrones_as_gdf(suffix: str, facility_type: str, time_bins: list = None) -> gpd.GeoDataFrame:
    """Load isochrone polygons and convert to GeoDataFrame with metadata."""
    if time_bins is None:
        time_bins = [15, 30, 45, 60]  # Focus on key time bands for visualization
    
    # Load facility data to get facility information
    if suffix == "_all_CTs":
        facilities_df = gs.data.load_hospitals_ct()
    else:
        facilities_df = gs.data.load_stroke_units()
    
    rows = []
    
    for time_bin in time_bins:
        cache_path = gs.config.DATA_DIR / f"poly{time_bin}{suffix}.pkl"
        
        if cache_path.exists():
            with open(cache_path, "rb") as f:
                polygons = pickle.load(f)
            
            print(f"📦 Loaded {len(polygons)} polygons for {facility_type} {time_bin}min")
            
            # Create rows for each polygon with metadata
            for i, polygon in enumerate(polygons):
                if polygon and not polygon.is_empty:  # Skip empty/invalid polygons
                    # Get facility info if available
                    facility_name = "Unknown"
                    facility_lat = None
                    facility_lon = None
                    
                    if i < len(facilities_df):
                        facility_row = facilities_df.iloc[i]
                        facility_name = facility_row.get('name', f'Facility_{i}')
                        facility_lat = facility_row.get('latitude')
                        facility_lon = facility_row.get('longitude')
                    
                    rows.append({
                        'geometry': polygon,
                        'time_bin': time_bin,
                        'facility_type': facility_type,
                        'facility_id': i,
                        'facility_name': facility_name,
                        'facility_lat': facility_lat,
                        'facility_lon': facility_lon,
                        'area_km2': polygon.area * 111**2,  # Rough conversion to km²
                    })
        else:
            print(f"⚠️  Missing isochrone file: {cache_path}")
    
    if rows:
        return gpd.GeoDataFrame(rows, crs="EPSG:4326")
    else:
        return gpd.GeoDataFrame(columns=['geometry', 'time_bin', 'facility_type', 'facility_id', 
                                       'facility_name', 'facility_lat', 'facility_lon', 'area_km2'], 
                               crs="EPSG:4326")

print("🔄 Loading isochrone data for analysis...")
print("   This may take a moment as we process thousands of polygons...")

# Load isochrones for both facility types
stroke_isochrones = load_isochrones_as_gdf("", "Stroke Units")
ct_isochrones = load_isochrones_as_gdf("_all_CTs", "CT Hospitals")

# Combine both datasets
combined_data = pd.concat([stroke_isochrones, ct_isochrones], ignore_index=True)
all_isochrones = gpd.GeoDataFrame(combined_data, crs="EPSG:4326")

print(f"\n✅ Loaded isochrone data:")
print(f"   Stroke units: {len(stroke_isochrones):,} isochrone polygons")
print(f"   CT hospitals: {len(ct_isochrones):,} isochrone polygons")
print(f"   Total: {len(all_isochrones):,} isochrone polygons")

# Display basic statistics
print(f"\n📊 Isochrone Coverage Summary:")
for facility_type in all_isochrones['facility_type'].unique():
    subset = all_isochrones[all_isochrones['facility_type'] == facility_type]
    print(f"   {facility_type}:")
    for time_bin in sorted(subset['time_bin'].unique()):
        time_subset = subset[subset['time_bin'] == time_bin]
        total_area = time_subset['area_km2'].sum()
        avg_area = time_subset['area_km2'].mean()
        print(f"     {time_bin} min: {len(time_subset):,} polygons, {total_area:,.0f} km² total, {avg_area:.1f} km² avg")


🔄 Loading isochrone data for analysis...
   This may take a moment as we process thousands of polygons...
📦 Loaded 349 polygons for Stroke Units 15min
📦 Loaded 349 polygons for Stroke Units 30min
📦 Loaded 349 polygons for Stroke Units 45min
📦 Loaded 349 polygons for Stroke Units 60min
📦 Loaded 1475 polygons for CT Hospitals 15min
📦 Loaded 1475 polygons for CT Hospitals 30min
📦 Loaded 1475 polygons for CT Hospitals 45min
📦 Loaded 1475 polygons for CT Hospitals 60min

✅ Loaded isochrone data:
   Stroke units: 1,396 isochrone polygons
   CT hospitals: 5,900 isochrone polygons
   Total: 7,296 isochrone polygons

📊 Isochrone Coverage Summary:
   Stroke Units:
     15 min: 349 polygons, 75,255 km² total, 215.6 km² avg
     30 min: 349 polygons, 615,651 km² total, 1764.0 km² avg
     45 min: 349 polygons, 1,998,442 km² total, 5726.2 km² avg
     60 min: 349 polygons, 4,438,640 km² total, 12718.2 km² avg
   CT Hospitals:
     15 min: 1,475 polygons, 336,688 km² total, 228.3 km² avg
     30 min

In [9]:
# ---- 8. Apply Urban/Rural Classification to Isochrones ----

if not SMOD_RASTER.exists():
    print(f"❌ SMOD raster not found: {SMOD_RASTER}")
    print("   Skipping urban/rural classification for isochrones")
    isochrones_annotated = None
else:
    print("🏙️ Applying urban/rural classification to isochrones...")
    print("   Using polygon centroids for classification...")
    
    # Create representative points for classification
    # Use centroids, but fall back to representative_point() if centroid is outside polygon
    points = []
    for idx, row in all_isochrones.iterrows():
        geom = row.geometry
        centroid = geom.centroid
        
        # Check if centroid is within the polygon, if not use representative_point
        if geom.contains(centroid):
            points.append(centroid)
        else:
            points.append(geom.representative_point())
    
    # Create a point GeoDataFrame for classification
    points_gdf = gpd.GeoDataFrame(
        all_isochrones.drop('geometry', axis=1), 
        geometry=points, 
        crs=all_isochrones.crs
    )
    
    # Apply SMOD classification
    points_annotated = gs.urban_rural_annotation.add_smod_labels(points_gdf, str(SMOD_RASTER))
    
    # Transfer urban/rural labels back to the polygon GeoDataFrame
    isochrones_annotated = all_isochrones.copy()
    isochrones_annotated['smod_code'] = points_annotated['smod_code']
    isochrones_annotated['urban_rural'] = points_annotated['urban_rural']
    
    print(f"✅ Successfully classified {len(isochrones_annotated):,} isochrone polygons")
    
    # Display classification results
    print(f"\n🔍 Isochrone Urban/Rural Distribution:")
    for category, count in isochrones_annotated['urban_rural'].value_counts().items():
        percentage = count / len(isochrones_annotated) * 100
        print(f"   {category}: {count:,} polygons ({percentage:.1f}%)")
    
    print(f"\n🔍 Sample SMOD codes in isochrones:")
    smod_counts = isochrones_annotated['smod_code'].value_counts().head(8)
    for code, count in smod_counts.items():
        print(f"   SMOD {code}: {count:,} polygons")


🏙️ Applying urban/rural classification to isochrones...
   Using polygon centroids for classification...
✅ Successfully classified 7,296 isochrone polygons

🔍 Isochrone Urban/Rural Distribution:
   rural: 3,993 polygons (54.7%)
   urban: 3,296 polygons (45.2%)
   other: 7 polygons (0.1%)

🔍 Sample SMOD codes in isochrones:
   SMOD 11: 2,211 polygons
   SMOD 12: 1,630 polygons
   SMOD 30: 1,564 polygons
   SMOD 21: 1,107 polygons
   SMOD 23: 522 polygons
   SMOD 13: 152 polygons
   SMOD 22: 103 polygons
   SMOD 10: 7 polygons


In [10]:
# ---- 9. Generate Isochrone Summary Tables ----

if isochrones_annotated is not None:
    print("📊 Generating Isochrone Urban/Rural Summary Tables")
    print("=" * 60)
    
    # Create a combined category for analysis
    isochrones_annotated['time_facility'] = (
        isochrones_annotated['facility_type'].astype(str) + " - " + 
        isochrones_annotated['time_bin'].astype(str) + " min"
    )
    
    # Generate summary by time and facility type
    isochrone_summary = gs.urban_rural_annotation.summary_by_category(
        isochrones_annotated, 'time_facility'
    )
    
    print("\n📋 Isochrone Coverage by Time Bin and Facility Type:")
    print(isochrone_summary.to_string())
    
    # Also create separate summaries for better analysis
    print("\n" + "="*60)
    print("📊 DETAILED BREAKDOWNS")
    print("="*60)
    
    # Summary by facility type only
    facility_summary = (
        isochrones_annotated.groupby(['facility_type', 'urban_rural'])
        .size().unstack(fill_value=0)
    )
    facility_summary['total'] = facility_summary.sum(axis=1)
    for col in ['urban', 'rural', 'other']:
        if col in facility_summary.columns:
            facility_summary[f'pct_{col}'] = (
                facility_summary[col] / facility_summary['total'] * 100
            ).round(1)
    
    print("\n📋 Summary by Facility Type:")
    print(facility_summary.to_string())
    
    # Summary by time bin only
    time_summary = (
        isochrones_annotated.groupby(['time_bin', 'urban_rural'])
        .size().unstack(fill_value=0)
    )
    time_summary['total'] = time_summary.sum(axis=1)
    for col in ['urban', 'rural', 'other']:
        if col in time_summary.columns:
            time_summary[f'pct_{col}'] = (
                time_summary[col] / time_summary['total'] * 100
            ).round(1)
    
    print("\n📋 Summary by Time Bin:")
    print(time_summary.to_string())
    
    # Area-weighted analysis
    print("\n" + "="*60)
    print("📊 AREA-WEIGHTED ANALYSIS")
    print("="*60)
    
    # Calculate total coverage area by category
    area_summary = (
        isochrones_annotated.groupby(['facility_type', 'time_bin', 'urban_rural'])
        ['area_km2'].sum().unstack(fill_value=0)
    )
    
    area_summary['total_area'] = area_summary.sum(axis=1)
    for col in ['urban', 'rural', 'other']:
        if col in area_summary.columns:
            area_summary[f'pct_area_{col}'] = (
                area_summary[col] / area_summary['total_area'] * 100
            ).round(1)
    
    print("\n📋 Coverage Area (km²) by Facility Type and Time:")
    print(area_summary.to_string())
    
    # Export summary tables
    output_base = "isochrone_urban_rural"
    isochrone_summary.to_csv(f"{output_base}_detailed.csv")
    facility_summary.to_csv(f"{output_base}_by_facility.csv") 
    time_summary.to_csv(f"{output_base}_by_time.csv")
    area_summary.to_csv(f"{output_base}_area_weighted.csv")
    
    print(f"\n💾 Summary tables exported:")
    print(f"   {output_base}_detailed.csv - Full breakdown")
    print(f"   {output_base}_by_facility.csv - By facility type")
    print(f"   {output_base}_by_time.csv - By time bin")
    print(f"   {output_base}_area_weighted.csv - Area-weighted analysis")
    
else:
    print("⚠️  Skipping summary table generation (no annotated isochrone data)")


📊 Generating Isochrone Urban/Rural Summary Tables

📋 Isochrone Coverage by Time Bin and Facility Type:
urban_rural            other  rural  urban  total  pct_urban  pct_rural  pct_other
time_facility                                                                     
CT Hospitals - 15 min      1    550    924   1475       62.6       37.3        0.1
CT Hospitals - 30 min      0    799    676   1475       45.8       54.2        0.0
CT Hospitals - 45 min      0    957    518   1475       35.1       64.9        0.0
CT Hospitals - 60 min      4   1003    468   1475       31.7       68.0        0.3
Stroke Units - 15 min      1     91    257    349       73.6       26.1        0.3
Stroke Units - 30 min      0    158    191    349       54.7       45.3        0.0
Stroke Units - 45 min      0    217    132    349       37.8       62.2        0.0
Stroke Units - 60 min      1    218    130    349       37.2       62.5        0.3

📊 DETAILED BREAKDOWNS

📋 Summary by Facility Type:
urban_rural    

In [11]:
# ---- 10. Key Insights from Isochrone Analysis ----

if isochrones_annotated is not None:
    print("💡 Key Insights from Isochrone Coverage Analysis")
    print("=" * 55)
    
    # Overall coverage patterns
    total_isochrones = len(isochrones_annotated)
    stroke_count = len(isochrones_annotated[isochrones_annotated['facility_type'] == 'Stroke Units'])
    ct_count = len(isochrones_annotated[isochrones_annotated['facility_type'] == 'CT Hospitals'])
    
    print(f"\n🌍 Overall Coverage:")
    print(f"   Total isochrone polygons analyzed: {total_isochrones:,}")
    print(f"   Stroke unit isochrones: {stroke_count:,}")
    print(f"   CT hospital isochrones: {ct_count:,}")
    
    # Urban vs rural distribution
    urban_count = len(isochrones_annotated[isochrones_annotated['urban_rural'] == 'urban'])
    rural_count = len(isochrones_annotated[isochrones_annotated['urban_rural'] == 'rural'])
    other_count = len(isochrones_annotated[isochrones_annotated['urban_rural'] == 'other'])
    
    print(f"\n🏙️ Urban/Rural Coverage Distribution:")
    print(f"   Urban-centered isochrones: {urban_count:,} ({urban_count/total_isochrones*100:.1f}%)")
    print(f"   Rural-centered isochrones: {rural_count:,} ({rural_count/total_isochrones*100:.1f}%)")
    print(f"   Other areas: {other_count:,} ({other_count/total_isochrones*100:.1f}%)")
    
    # Time expansion patterns
    print(f"\n⏱️ Time Expansion Patterns:")
    for time_bin in sorted(isochrones_annotated['time_bin'].unique()):
        time_subset = isochrones_annotated[isochrones_annotated['time_bin'] == time_bin]
        urban_pct = len(time_subset[time_subset['urban_rural'] == 'urban']) / len(time_subset) * 100
        rural_pct = len(time_subset[time_subset['urban_rural'] == 'rural']) / len(time_subset) * 100
        avg_area = time_subset['area_km2'].mean()
        
        print(f"   {time_bin} min isochrones: {len(time_subset):,} total, "
              f"{urban_pct:.1f}% urban, {rural_pct:.1f}% rural, avg {avg_area:.1f} km²")
    
    # Facility type differences
    print(f"\n🏥 Facility Type Comparison:")
    for facility_type in isochrones_annotated['facility_type'].unique():
        subset = isochrones_annotated[isochrones_annotated['facility_type'] == facility_type]
        urban_pct = len(subset[subset['urban_rural'] == 'urban']) / len(subset) * 100
        rural_pct = len(subset[subset['urban_rural'] == 'rural']) / len(subset) * 100
        total_area = subset['area_km2'].sum()
        
        print(f"   {facility_type}:")
        print(f"     {len(subset):,} isochrones ({urban_pct:.1f}% urban, {rural_pct:.1f}% rural)")
        print(f"     Total coverage: {total_area:,.0f} km²")
    
    # Coverage efficiency insights
    print(f"\n📊 Coverage Insights:")
    
    # Find which time/facility combinations have highest rural coverage
    max_rural_coverage = isochrones_annotated.groupby(['facility_type', 'time_bin']).apply(
        lambda x: (x['urban_rural'] == 'rural').sum() / len(x) * 100
    ).reset_index()
    max_rural_coverage.columns = ['facility_type', 'time_bin', 'rural_pct']
    top_rural = max_rural_coverage.nlargest(3, 'rural_pct')
    
    print(f"\n🌾 Highest Rural Coverage Combinations:")
    for _, row in top_rural.iterrows():
        print(f"   {row['facility_type']} - {row['time_bin']} min: {row['rural_pct']:.1f}% rural")
    
    # Find largest coverage areas
    largest_areas = isochrones_annotated.nlargest(3, 'area_km2')[['facility_name', 'facility_type', 'time_bin', 'area_km2', 'urban_rural']]
    print(f"\n📏 Largest Coverage Areas:")
    for _, row in largest_areas.iterrows():
        facility_name = row['facility_name'][:30] + "..." if len(str(row['facility_name'])) > 30 else row['facility_name']
        print(f"   {facility_name} ({row['facility_type']}, {row['time_bin']} min): {row['area_km2']:.0f} km² ({row['urban_rural']})")
    
    print(f"\n✅ Isochrone analysis complete! Check the exported CSV files for detailed results.")
    
else:
    print("⚠️  No isochrone data available for analysis")


💡 Key Insights from Isochrone Coverage Analysis

🌍 Overall Coverage:
   Total isochrone polygons analyzed: 7,296
   Stroke unit isochrones: 1,396
   CT hospital isochrones: 5,900

🏙️ Urban/Rural Coverage Distribution:
   Urban-centered isochrones: 3,296 (45.2%)
   Rural-centered isochrones: 3,993 (54.7%)
   Other areas: 7 (0.1%)

⏱️ Time Expansion Patterns:
   15 min isochrones: 1,824 total, 64.7% urban, 35.1% rural, avg 225.8 km²
   30 min isochrones: 1,824 total, 47.5% urban, 52.5% rural, avg 1732.9 km²
   45 min isochrones: 1,824 total, 35.6% urban, 64.4% rural, avg 5583.5 km²
   60 min isochrones: 1,824 total, 32.8% urban, 66.9% rural, avg 12403.9 km²

🏥 Facility Type Comparison:
   Stroke Units:
     1,396 isochrones (50.9% urban, 49.0% rural)
     Total coverage: 7,127,988 km²
   CT Hospitals:
     5,900 isochrones (43.8% urban, 56.1% rural)
     Total coverage: 29,253,859 km²

📊 Coverage Insights:

🌾 Highest Rural Coverage Combinations:
   CT Hospitals - 60 min: 68.0% rural
   C

  max_rural_coverage = isochrones_annotated.groupby(['facility_type', 'time_bin']).apply(


## Isochrone Analysis Summary

### ✅ What We've Accomplished

This extended analysis now includes **both benefit points AND isochrone coverage areas** for comprehensive urban/rural healthcare accessibility assessment:

#### **1. Benefit Point Analysis** 
- 🎯 **Focused analysis** - Areas where CT hospitals provide significant advantages
- 📍 **Point-based** - Specific locations with calculated time benefits
- 🏆 **Decision-focused** - Shows where CT expansion would be most beneficial

#### **2. Isochrone Coverage Analysis**
- 🗺️ **Comprehensive coverage** - All areas reachable by each facility within time thresholds  
- 🔄 **Time progression** - How accessibility expands from 15→30→45→60 minutes
- 📊 **Facility comparison** - Direct comparison between stroke units and CT hospitals
- 🏙️ **Urban/rural patterns** - Geographic distribution of healthcare accessibility

### 📊 Export Files Generated

#### **Benefit Analysis Exports:**
- `benefit_urban_rural_summary.csv` - Benefit categories by urban/rural classification

#### **Isochrone Analysis Exports:**
- `isochrone_urban_rural_detailed.csv` - Full breakdown by time bin and facility type  
- `isochrone_urban_rural_by_facility.csv` - Summary by facility type only
- `isochrone_urban_rural_by_time.csv` - Summary by time bin only
- `isochrone_urban_rural_area_weighted.csv` - Area-weighted coverage analysis

### 🎯 Key Applications

#### **Policy & Planning:**
- **Healthcare equity** - Identify urban vs rural access disparities
- **Resource allocation** - Prioritize areas for CT hospital expansion
- **Emergency response** - Understand coverage gaps and response times
- **Infrastructure planning** - Compare different facility deployment strategies

#### **Research Applications:**
- **Accessibility modeling** - Quantify healthcare access patterns  
- **Geographic analysis** - Spatial patterns of healthcare coverage
- **Comparative studies** - Urban vs rural healthcare infrastructure effectiveness
- **Time-distance analysis** - Relationship between distance and accessibility

### 🔍 Analysis Insights Pattern

Your results will typically show:

#### **Benefit Analysis Patterns:**
- 🌾 **Rural areas** - More likely to benefit significantly from CT hospitals (longer distances to stroke units)
- 🏙️ **Urban areas** - More equivalent access (multiple nearby options)
- 🏜️ **Healthcare deserts** - Predominantly rural areas with limited access

#### **Isochrone Coverage Patterns:**
- 📈 **Coverage expansion** - Rural areas show larger isochrone growth with time
- 🏘️ **Urban efficiency** - Higher facility density leads to overlapping coverage
- ⚖️ **Access equity** - Rural facilities serve larger geographic areas but fewer people

### 🚀 Next Steps

This comprehensive analysis provides the foundation for:
1. **Detailed mapping** - Create publication-ready maps showing coverage patterns
2. **Population analysis** - Weight results by population density for impact assessment  
3. **Scenario modeling** - Test different facility placement strategies
4. **Policy recommendations** - Evidence-based healthcare infrastructure planning

Now you have both the **micro-level** (benefit points) and **macro-level** (coverage areas) perspective on healthcare accessibility across urban and rural Germany!
