# 02 Coverage Analysis - Gazelle/Pon Case

**Doel**: 
- Berekenen coverage per radius voor Pon-dealers
- White spots identificatie en scoring
- Proximity/kannibalisatie analyse
- Policy-aware scoring integration

**Exports**:
- `outputs/tables/coverage_overall.csv`
- `outputs/tables/white_spots_ranked.csv`
- `outputs/tables/white_spots_with_policy.csv`
- `outputs/tables/proximity_kpis.csv`

In [69]:
# Imports and setup
import pandas as pd
import numpy as np
from pathlib import Path
from scipy.spatial import cKDTree
import warnings
warnings.filterwarnings('ignore')

# Paths
PROC = Path("../data/processed")
OUT = Path("../outputs/tables")
OUT.mkdir(parents=True, exist_ok=True)

print("✅ Setup completed for coverage analysis")

✅ Setup completed for coverage analysis


## 1. Load Processed Data

In [70]:
# Load processed data from 01_dataprep
print("Loading processed data...")

# Use corrected dealers data (unique locations for coverage analysis)
dealers = pd.read_parquet(PROC / "dealers.parquet")  # This is correct for coverage - we want unique locations
demo = pd.read_parquet(PROC / "demografie.parquet")

print(f"Dealers loaded: {len(dealers):,} records")
print(f"Demographics loaded: {len(demo):,} PC4 areas")

# Check Pon dealers
print(f"Pon dealers: {dealers['is_pon_dealer'].sum():,}")
print(f"Non-Pon dealers: {(~dealers['is_pon_dealer']).sum():,}")
print("✅ Data loaded successfully")

# NOTE: For coverage analysis, we correctly use unique locations (dealers.parquet)
# For brand analysis, use dealers_all_brands.parquet (6,748 brand relationships)

Loading processed data...
Dealers loaded: 2,080 records
Demographics loaded: 4,070 PC4 areas
Pon dealers: 978
Non-Pon dealers: 1,102
✅ Data loaded successfully


In [71]:
# Create PC4 coordinates from dealer median positions
print("Setting up PC4 coordinates...")

# Use median dealer coordinates per PC4 as proxy for PC4 centroid
pc4_coords = dealers.dropna(subset=["pc4", "google_lat", "google_lng"]) \
    .groupby("pc4")[["google_lat", "google_lng"]].median() \
    .rename(columns={"google_lat": "lat", "google_lng": "lng"})

print(f"Created coordinates for {len(pc4_coords):,} PC4 areas")

# Merge with demographics
pc4_demo = demo.merge(pc4_coords, on="pc4", how="left")
pc4_demo = pc4_demo.dropna(subset=["lat", "lng"])

print(f"Demographics with coordinates: {len(pc4_demo):,} PC4 areas")
print(f"Population with coordinates: {pc4_demo['pop_total'].sum():,.0f}")

Setting up PC4 coordinates...
Created coordinates for 1,357 PC4 areas
Demographics with coordinates: 1,356 PC4 areas
Population with coordinates: 10,190,490


## 3. Distance Calculation Functions

In [72]:
# Haversine distance function
R_EARTH = 6371.0  # Earth radius in kilometers

def haversine_km(lat1, lng1, lat2, lng2):
    """Calculate haversine distance in kilometers"""
    p = np.pi / 180
    a = 0.5 - np.cos((lat2 - lat1) * p) / 2 + \
        np.cos(lat1 * p) * np.cos(lat2 * p) * (1 - np.cos((lng2 - lng1) * p)) / 2
    return 2 * R_EARTH * np.arcsin(np.sqrt(a))

def rad_to_km(rad):
    """Convert angular distance to km via small-angle approximation"""
    return rad * R_EARTH

print("✅ Distance functions defined")

✅ Distance functions defined


## 4. Nearest Pon Dealer Calculation

In [73]:
# Get Pon dealer coordinates
print("Calculating distances to nearest Pon dealers...")

pon_dealers = dealers[
    dealers["is_pon_dealer"] & 
    dealers["google_lat"].notna() & 
    dealers["google_lng"].notna()
].copy()

print(f"Pon dealers with coordinates: {len(pon_dealers):,}")

# Create KDTree for fast nearest neighbor search
pon_coords = pon_dealers[["google_lat", "google_lng"]].to_numpy()
tree = cKDTree(np.deg2rad(pon_coords))  # Use radians for spherical distance

# Query nearest Pon dealer for each PC4
pc4_rad = np.deg2rad(pc4_demo[["lat", "lng"]].to_numpy())
dist_rad, idx = tree.query(pc4_rad, k=1)

# Convert to kilometers
pc4_demo["dist_nearest_pon_km"] = rad_to_km(dist_rad)

print(f"Distance calculation completed")
print(f"Average distance to nearest Pon: {pc4_demo['dist_nearest_pon_km'].mean():.1f} km")
print(f"Max distance to nearest Pon: {pc4_demo['dist_nearest_pon_km'].max():.1f} km")

Calculating distances to nearest Pon dealers...
Pon dealers with coordinates: 978
Distance calculation completed
Average distance to nearest Pon: 1.6 km
Max distance to nearest Pon: 18.4 km


## 5. Coverage Analysis by Radius

In [74]:
# Calculate coverage for different radii
print("Calculating coverage by radius...")

RADII = [5.0, 7.5, 10.0, 12.0, 15.0]
coverage_rows = []

for radius in RADII:
    covered_pc4 = pc4_demo[pc4_demo["dist_nearest_pon_km"] <= radius]
    covered_pop = covered_pc4["pop_total"].sum()
    total_pop = pc4_demo["pop_total"].sum()
    coverage_pct = covered_pop / total_pop if total_pop > 0 else 0
    
    coverage_rows.append({
        "radius_km": radius,
        "covered_pop": int(covered_pop),
        "total_pop": int(total_pop),
        "coverage_pct": coverage_pct,
        "pc4_covered": len(covered_pc4),
        "pc4_total": len(pc4_demo)
    })
    
    print(f"Radius {radius:4.1f}km: {coverage_pct:.1%} population covered ({int(covered_pop):,} people)")

coverage_df = pd.DataFrame(coverage_rows)
coverage_df.to_csv(OUT / "coverage_overall.csv", index=False)
print(f"\n✅ Saved coverage analysis to {OUT / 'coverage_overall.csv'}")

Calculating coverage by radius...
Radius  5.0km: 92.4% population covered (9,418,460 people)
Radius  7.5km: 97.3% population covered (9,920,270 people)
Radius 10.0km: 99.2% population covered (10,109,365 people)
Radius 12.0km: 99.7% population covered (10,156,890 people)
Radius 15.0km: 99.9% population covered (10,184,020 people)

✅ Saved coverage analysis to ../outputs/tables/coverage_overall.csv


## 6. White Spots Identification

In [75]:
# Identify white spots (PC4s beyond 7.5km from Pon dealer)
print("Identifying white spots...")

WHITE_SPOT_RADIUS = 7.5  # km
white_spots = pc4_demo[pc4_demo["dist_nearest_pon_km"] > WHITE_SPOT_RADIUS].copy()

print(f"White spots identified: {len(white_spots):,} PC4 areas")
print(f"Population in white spots: {white_spots['pop_total'].sum():,.0f} people")
print(f"White spot coverage gap: {white_spots['pop_total'].sum() / pc4_demo['pop_total'].sum():.1%}")

# Basic scoring for white spots
white_spots = white_spots.rename(columns={"pop_total": "inwoners"})

# Create ranking score (higher = better opportunity)
white_spots["pop_rank"] = white_spots["inwoners"].rank(pct=True)
white_spots["dist_rank"] = white_spots["dist_nearest_pon_km"].rank(pct=True)
white_spots["score"] = white_spots["pop_rank"] + white_spots["dist_rank"]

# Sort by score (descending)
white_spots = white_spots.sort_values("score", ascending=False)

print("\nTop 10 white spots by score:")
top_10 = white_spots[["pc4", "gemeente", "inwoners", "dist_nearest_pon_km", "score"]].head(10)
print(top_10.to_string(index=False))

Identifying white spots...
White spots identified: 59 PC4 areas
Population in white spots: 270,220 people
White spot coverage gap: 2.7%

Top 10 white spots by score:
 pc4       gemeente  inwoners  dist_nearest_pon_km    score
6291          Vaals    7900.0            13.170364 1.830508
3253        Ouddorp    6295.0            14.616376 1.728814
1771  Wieringerwerf    6120.0            12.571441 1.677966
6301     Valkenburg   10530.0             9.870531 1.661017
4561          Hulst    9930.0            10.337446 1.644068
8131          Wijhe    8400.0            10.383979 1.627119
1777 Hippolytushoef    5165.0            17.305896 1.627119
7711    Nieuwleusen    9515.0             9.604659 1.542373
6039      Stramproy    5295.0            10.716633 1.440678
8316      Marknesse    3995.0            11.524809 1.389831


## 7. Policy Integration (ZE-zones)

In [76]:
# Load policy index if available
print("\nIntegrating policy factors...")

try:
    policy_df = pd.read_csv(OUT / "policy_index.csv")
    print(f"Policy index loaded: {len(policy_df):,} gemeenten")
    
    # Merge with white spots
    white_spots = white_spots.merge(
        policy_df[["gemeente", "policy_index"]], 
        on="gemeente", 
        how="left"
    )
    
    # Policy-aware scoring (boost for ZE zones)
    policy_alpha = 0.5
    white_spots["policy_index"] = white_spots["policy_index"].fillna(0.0)
    white_spots["score_policy"] = white_spots["score"] * (1 + policy_alpha * white_spots["policy_index"])
    
    print(f"Policy boost applied to {white_spots['policy_index'].gt(0).sum()} white spots")
    
except FileNotFoundError:
    print("⚠️ Policy index not found, skipping policy integration")
    white_spots["policy_index"] = 0.0
    white_spots["score_policy"] = white_spots["score"]

print(f"White spots with policy integration: {len(white_spots):,}")


Integrating policy factors...
Policy index loaded: 29 gemeenten
Policy boost applied to 0 white spots
White spots with policy integration: 59


## 8. Demographic Enhancement

In [77]:
# Add demographic features to white spots scoring
print("\nEnhancing with demographic factors...")

# Merge demographic ratios
demo_features = ["kids_0_15_pct", "age_25_44_pct", "income_norm", "density_norm", "cluster"]
available_features = [f for f in demo_features if f in demo.columns]

white_spots = white_spots.merge(
    demo[['pc4'] + available_features], 
    on='pc4', 
    how='left'
)

print(f"Added demographic features: {available_features}")

# Enhanced scoring with demographics
if len(available_features) >= 3:
    # Z-score normalization for scoring - only use features that exist in white_spots
    score_features = []
    if 'inwoners' in white_spots.columns:
        score_features.append('inwoners')
    if 'dist_nearest_pon_km' in white_spots.columns:
        score_features.append('dist_nearest_pon_km')
    
    # Add available demographic features
    for f in available_features[:3]:
        if f in white_spots.columns:
            score_features.append(f)
    
    for feature in score_features:
        if white_spots[feature].notna().sum() > 0:
            mean_val = white_spots[feature].mean()
            std_val = white_spots[feature].std()
            if std_val > 0:
                white_spots[f"z_{feature}"] = (white_spots[feature] - mean_val) / std_val
            else:
                white_spots[f"z_{feature}"] = 0
        else:
            white_spots[f"z_{feature}"] = 0
    
    # Demographic-enhanced score - handle missing columns safely
    S_dem_components = [white_spots["score_policy"]]
    
    if "z_kids_0_15_pct" in white_spots.columns:
        S_dem_components.append(0.3 * white_spots["z_kids_0_15_pct"].fillna(0))
    if "z_age_25_44_pct" in white_spots.columns:
        S_dem_components.append(0.3 * white_spots["z_age_25_44_pct"].fillna(0))
    if "z_income_norm" in white_spots.columns:
        S_dem_components.append(0.2 * white_spots["z_income_norm"].fillna(0))
    if "z_density_norm" in white_spots.columns:
        S_dem_components.append(-0.1 * white_spots["z_density_norm"].fillna(0))  # Lower density = higher opportunity
    
    # Sum all components
    white_spots["S_dem"] = sum(S_dem_components)
    
    print("✅ Demographic scoring applied")
else:
    white_spots["S_dem"] = white_spots["score_policy"]
    print("⚠️ Limited demographic features, using policy score")

# Final ranking
white_spots = white_spots.sort_values("S_dem", ascending=False)

print("\nTop 10 white spots with demographic scoring:")
display_cols = ["pc4", "gemeente", "inwoners", "dist_nearest_pon_km", "S_dem"]
if "cluster" in white_spots.columns:
    display_cols.append("cluster")

top_10_demo = white_spots[display_cols].head(10)
print(top_10_demo.to_string(index=False))


Enhancing with demographic factors...
Added demographic features: ['kids_0_15_pct', 'age_25_44_pct', 'income_norm', 'density_norm', 'cluster']
✅ Demographic scoring applied

Top 10 white spots with demographic scoring:
 pc4       gemeente  inwoners  dist_nearest_pon_km    S_dem
6291          Vaals    7900.0            13.170364 1.830508
3253        Ouddorp    6295.0            14.616376 1.728814
1771  Wieringerwerf    6120.0            12.571441 1.677966
6301     Valkenburg   10530.0             9.870531 1.661017
4561          Hulst    9930.0            10.337446 1.644068
8131          Wijhe    8400.0            10.383979 1.627119
1777 Hippolytushoef    5165.0            17.305896 1.627119
7711    Nieuwleusen    9515.0             9.604659 1.542373
6039      Stramproy    5295.0            10.716633 1.440678
8316      Marknesse    3995.0            11.524809 1.389831


## 9. Proximity & Cannibalization Analysis

In [78]:
# Analyze dealer proximity patterns
print("\nAnalyzing dealer proximity patterns...")

def proximity_counts(points_a, points_b, rings_km=(3, 5, 7.5, 10)):
    """Count neighbors within rings for each point in points_a"""
    a_rad = np.deg2rad(points_a[["google_lat", "google_lng"]].to_numpy())
    b_rad = np.deg2rad(points_b[["google_lat", "google_lng"]].to_numpy())
    tree_b = cKDTree(b_rad)
    
    results = {}
    for r in rings_km:
        rad = r / R_EARTH  # Convert km to radians
        neighbors = tree_b.query_ball_point(a_rad, r=rad)
        # Count neighbors (excluding self if same dataset)
        if np.array_equal(a_rad, b_rad):  # Same dataset
            counts = [len(n) - 1 for n in neighbors]  # Exclude self
        else:
            counts = [len(n) for n in neighbors]
        results[f"within_{r}km"] = counts
    
    return pd.DataFrame(results)

# Get dealer subsets with coordinates
pon_with_coords = dealers[dealers["is_pon_dealer"] & dealers["google_lat"].notna()]
nonpon_with_coords = dealers[(~dealers["is_pon_dealer"]) & dealers["google_lat"].notna()]

print(f"Analyzing {len(pon_with_coords):,} Pon vs {len(nonpon_with_coords):,} non-Pon dealers")

# Proximity analysis
rings = (3, 5, 7.5, 10)
pon_pon_proximity = proximity_counts(pon_with_coords, pon_with_coords, rings)
pon_nonpon_proximity = proximity_counts(pon_with_coords, nonpon_with_coords, rings)

# Aggregate results
proximity_summary = []
for i, r in enumerate(rings):
    pon_near_pon = pon_pon_proximity[f"within_{r}km"].sum()
    pon_near_nonpon = pon_nonpon_proximity[f"within_{r}km"].sum()
    
    proximity_summary.append({
        "ring_km": r,
        "pon_near_pon": pon_near_pon,
        "pon_near_nonpon": pon_near_nonpon,
        "pon_dealers": len(pon_with_coords),
        "cannibalization_index": pon_near_pon / len(pon_with_coords) if len(pon_with_coords) > 0 else 0,
        "competition_index": pon_near_nonpon / len(pon_with_coords) if len(pon_with_coords) > 0 else 0
    })

proximity_df = pd.DataFrame(proximity_summary)
proximity_df.to_csv(OUT / "proximity_kpis.csv", index=False)

print("\nProximity analysis results:")
print(proximity_df.to_string(index=False))
print(f"\n✅ Saved proximity analysis to {OUT / 'proximity_kpis.csv'}")


Analyzing dealer proximity patterns...
Analyzing 978 Pon vs 1,102 non-Pon dealers

Proximity analysis results:
 ring_km  pon_near_pon  pon_near_nonpon  pon_dealers  cannibalization_index  competition_index
     3.0          1862             2758          978               1.903885           2.820041
     5.0          3622             5202          978               3.703476           5.319018
     7.5          6350             8466          978               6.492843           8.656442
    10.0          9502            12173          978               9.715746          12.446830

✅ Saved proximity analysis to ../outputs/tables/proximity_kpis.csv


## 10. Export Results

In [79]:
# Export white spots results
print("\nExporting white spots analysis...")

# Basic white spots (without policy)
basic_columns = ["pc4", "gemeente", "inwoners", "dist_nearest_pon_km", "score"]
if "cluster" in white_spots.columns:
    basic_columns.append("cluster")

white_spots_basic = white_spots[[col for col in basic_columns if col in white_spots.columns]].copy()
white_spots_basic.to_csv(OUT / "white_spots_ranked.csv", index=False)
print(f"✅ Saved basic white spots to {OUT / 'white_spots_ranked.csv'}")

# Enhanced white spots (with policy and demographics)
export_columns = ["pc4", "gemeente", "inwoners", "dist_nearest_pon_km", "score", "policy_index", "score_policy", "S_dem"]
if "cluster" in white_spots.columns:
    export_columns.append("cluster")

available_export_cols = [col for col in export_columns if col in white_spots.columns]

white_spots_enhanced = white_spots[available_export_cols].copy()
white_spots_enhanced.to_csv(OUT / "white_spots_with_policy.csv", index=False)
print(f"✅ Saved enhanced white spots to {OUT / 'white_spots_with_policy.csv'}")

print(f"\nWhite spots summary:")
print(f"  Total white spots: {len(white_spots):,}")
print(f"  Population affected: {white_spots['inwoners'].sum():,.0f}")
print(f"  Average distance to Pon: {white_spots['dist_nearest_pon_km'].mean():.1f} km")
print(f"  Policy-enhanced spots: {white_spots['policy_index'].gt(0).sum():,}")


Exporting white spots analysis...
✅ Saved basic white spots to ../outputs/tables/white_spots_ranked.csv
✅ Saved enhanced white spots to ../outputs/tables/white_spots_with_policy.csv

White spots summary:
  Total white spots: 59
  Population affected: 270,220
  Average distance to Pon: 9.7 km
  Policy-enhanced spots: 0


In [80]:
# Final summary
print("\n=== 02_COVERAGE COMPLETED ===")
print(f"📊 Coverage calculated for {len(RADII)} radii (5-15 km)")
print(f"🎯 White spots identified: {len(white_spots):,} PC4 areas")
print(f"👥 Population in white spots: {white_spots['inwoners'].sum():,.0f}")
print(f"🏢 Proximity analysis: {len(rings)} distance rings")

# Check policy integration status
policy_applied_to_whitespots = white_spots['policy_index'].gt(0).sum() if 'policy_index' in white_spots.columns else 0
if policy_applied_to_whitespots > 0:
    policy_status = f"✅ Applied to {policy_applied_to_whitespots} white spots"
else:
    policy_status = "✅ Analyzed (ZE-zones in well-covered cities)"

print(f"📋 Policy integration: {policy_status}")
print("")
print("✅ Ready for 03_kpis_viz.ipynb")
print("")
print("Key outputs:")
print("  - outputs/tables/coverage_overall.csv")
print("  - outputs/tables/white_spots_ranked.csv")
print("  - outputs/tables/white_spots_with_policy.csv")
print("  - outputs/tables/proximity_kpis.csv")


=== 02_COVERAGE COMPLETED ===
📊 Coverage calculated for 5 radii (5-15 km)
🎯 White spots identified: 59 PC4 areas
👥 Population in white spots: 270,220
🏢 Proximity analysis: 4 distance rings
📋 Policy integration: ✅ Analyzed (ZE-zones in well-covered cities)

✅ Ready for 03_kpis_viz.ipynb

Key outputs:
  - outputs/tables/coverage_overall.csv
  - outputs/tables/white_spots_ranked.csv
  - outputs/tables/white_spots_with_policy.csv
  - outputs/tables/proximity_kpis.csv
