# 05 - Active City Index

## Ziel
Fuehrt alle Teilindikatoren zusammen und berechnet den Active City Index.

## Inputs
- interim + processed Indikator-Dateien aus 01-04

## Outputs
- data/processed/muc_active_city_index.(csv|geojson|gpkg)

## Ausfuehrung
- Von oben nach unten ausfuehren (Restart & Run All).
- Dieses Notebook ist Teil der Pipeline 00 -> 05.


# 05 – Active City Index für München

Ziel dieses Notebooks:

- Zusammenführung aller vorbereiteten Stadtbezirks-Datensätze  
  (Bevölkerung, Parks, Sport, Mobility/ÖPNV + Radwege)
- Erstellung eines einheitlichen Analyse-GeoDataFrames `gdf_active`
- Definition und Normalisierung ausgewählter Indikatoren
- Berechnung eines ersten modularen Active-City-Index (MVP) auf Bezirksebene
- Erste Visualisierung des Index (Histogramm, Ranking und Choroplethenkarte)
- Export des Analyse-Datensatzes für weitere Schritte und Berichtsarbeit

In [None]:
import geopandas as gpd
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np

BASE_DIR = Path("..")
DATA_DIR = BASE_DIR / "data"
INTERIM_DIR = DATA_DIR / "interim"
PROCESSED_DIR = DATA_DIR / "processed"

BASE_DIR, INTERIM_DIR, PROCESSED_DIR


## A | Datenintegration & Indexkonstruktion

### Datenquellen Laden

Wir laden die vorbereiteten Layer:

- `muc_bezirke_bev_clean.geojson` (Bezirke + Bevölkerung + Fläche)
- `muc_bezirke_parks.geojson` (Parks je Bezirk)
- `muc_bezirke_sport.geojson` (Sportstätten je Bezirk)
- `muc_bezirke_mobility.geojson` (ÖPNV-Haltestellen und Radwege je Bezirk)

In [None]:
# Basis: Bezirke + Bevölkerung + Fläche
gdf_base = gpd.read_file(INTERIM_DIR / "muc_bezirke_bev_clean.geojson")

# Parks
gdf_parks = gpd.read_file(PROCESSED_DIR / "muc_bezirke_parks.geojson")[[
    "bez_nr",
    "parks_count",
    "parks_area_ha",
    "parks_pro_1000_einw",
    "parks_area_anteil_prozent"
]]

# Sport
gdf_sport = gpd.read_file(PROCESSED_DIR / "muc_bezirke_sport.geojson")[[
    "bez_nr",
    "sports_count",
    "sports_area_ha",
    "sports_pro_1000_einw",
    "sports_area_anteil_prozent"
]]

# Mobility (ÖPNV + Radwege)
gdf_mob = gpd.read_file(PROCESSED_DIR / "muc_bezirke_mobility.geojson")[[
    "bez_nr",
    "stops_count",
    "stops_pro_1000_einw",
    "radweg_length_m",
    "radweg_km",
    "radweg_km_pro_km2"
]]


In [None]:
gdf_base.info()

In [None]:
gdf_parks.info()

In [None]:
gdf_sport.info()

In [None]:
gdf_mob.info()

### Zusammenführung zum Analyse-Datensatz `gdf_active`

Die Einzel-Datensätze werden über den Bezirksschlüssel `bez_nr` zusammengeführt.
NaN-Werte in Kennzahlen-Spalten werden als 0 interpretiert (z. B. Bezirke ohne Parks/Sportstätten).


In [None]:
gdf_active = (
    gdf_base
    .merge(gdf_parks, on="bez_nr", how="left")
    .merge(gdf_sport, on="bez_nr", how="left")
    .merge(gdf_mob,   on="bez_nr", how="left")
)

kennzahl_spalten = [
    "parks_count", "parks_area_ha", "parks_pro_1000_einw", "parks_area_anteil_prozent",
    "sports_count", "sports_area_ha", "sports_pro_1000_einw", "sports_area_anteil_prozent",
    "stops_count", "stops_pro_1000_einw",
    "radweg_length_m", "radweg_km", "radweg_km_pro_km2"
]

for col in kennzahl_spalten:
    gdf_active[col] = gdf_active[col].fillna(0)

gdf_active[[
    "bez_nr", "name", "einwohner",
    "parks_pro_1000_einw",
    "sports_pro_1000_einw",
    "stops_pro_1000_einw",
    "radweg_km_pro_km2"
]].head()

### Konfiguration der Index-Dimensionen und Indikatoren

Der Active City Index soll modular aufgebaut sein.  
Dazu definieren wir für jede Dimension eine Liste an Indikator-Spalten:

- **Grün & Erholung**
- **Sportinfrastruktur**
- **Mobilität (ÖPNV + Radwege)**

Die Listen können später leicht angepasst oder erweitert werden.

In [None]:
# Grün & Erholung
GREEN_INDICATORS = [
    "parks_pro_1000_einw",
    "parks_area_anteil_prozent",
]

# Sportinfrastruktur
SPORT_INDICATORS = [
    "sports_pro_1000_einw",
    "sports_area_anteil_prozent",
]

# Mobilität (ÖPNV + Radwege)
MOBILITY_INDICATORS = [
    "stops_pro_1000_einw",
    "radweg_km_pro_km2",
]

# Alle verwendeten Indikatoren in einer Liste sammeln (für Normalisierung)
ALL_INDICATORS = list(set(
    GREEN_INDICATORS + SPORT_INDICATORS + MOBILITY_INDICATORS
))

ALL_INDICATORS

### Index Engine mit parametrisierter Normalisierung & Gewichten

In [None]:
# Normalisierungsfunktion
def normalize_series(x, method="minmax"):
    if method == "minmax":
        return (x - x.min()) / (x.max() - x.min())
    elif method == "zscore":
        return (x - x.mean()) / x.std()
    else:
        raise ValueError("Unknown method")

In [None]:
# Funktion zur Berechnung des Active City Index
def compute_active_city_index(
    gdf,
    green_ind,
    sport_ind,
    mob_ind,
    weights=None,
    norm_method="minmax"
):
    if weights is None:
        weights = {"green": 1/3, "sport": 1/3, "mob": 1/3}

    all_inds = list(set(green_ind + sport_ind + mob_ind))

    # 1) Normalisierung
    for col in all_inds:
        col_norm = col + "_norm"
        gdf[col_norm] = normalize_series(gdf[col], method=norm_method)

    # 2) Teilindizes
    gdf["index_gruen"] = gdf[[c + "_norm" for c in green_ind]].mean(axis=1)
    gdf["index_sport"] = gdf[[c + "_norm" for c in sport_ind]].mean(axis=1)
    gdf["index_mobil"] = gdf[[c + "_norm" for c in mob_ind]].mean(axis=1)

    # 3) Gesamtindex
    gdf["active_city_index"] = (
        weights["green"] * gdf["index_gruen"] +
        weights["sport"] * gdf["index_sport"] +
        weights["mob"]   * gdf["index_mobil"]
    )
    return gdf

Best Practice aus OECD:

### Index mit gleichen Gewichten

In [None]:
active_index = compute_active_city_index(
    gdf_active,
    GREEN_INDICATORS,
    SPORT_INDICATORS,
    MOBILITY_INDICATORS,
    weights={"green": 1/3, "sport": 1/3, "mob": 1/3}
)

In [None]:
cols = [
    "bez_nr",
    "name",
    "active_city_index",
    "index_gruen",
    "index_sport",
    "index_mobil"
]

active_index[cols] \
    .sort_values("active_city_index", ascending=False) \
    .reset_index(drop=True) \
    .head(25)   


## B | Auswertung & Visualisierung

### Erste Auswertung des Active City Index

Zur Einordnung des Index betrachten wir:

- die Verteilung der Indexwerte über alle Stadtbezirke (Histogramm)
- ein Ranking der Bezirke (Top/Bottom-Bezirke)

In [None]:
plt.figure(figsize=(5, 3))
active_index["active_city_index"].hist(bins=10)
plt.xlabel("Active City Index (MVP)")
plt.ylabel("Anzahl Bezirke")
plt.title("Verteilung des Active City Index (München)")
plt.show()

In [None]:

ranking = (
    active_index[cols]
    .sort_values("active_city_index", ascending=False)
    .reset_index(drop=True)
)

In [None]:
top5 = ranking.head(5)
bottom5 = ranking.tail(5)

display(top5)
display(bottom5)

In [None]:
ordered = active_index[["name", "active_city_index"]].sort_values(
    "active_city_index", ascending=True
)

plt.figure(figsize=(6, 6))
plt.barh(ordered["name"], ordered["active_city_index"])
plt.xlabel("Active City Index")
plt.title("Active City Index nach Stadtbezirk")
plt.tight_layout()
plt.show()

### Räumliche Verteilung des Active City Index

Der Index wird als Choroplethenkarte auf Bezirksebene visualisiert, um räumliche Muster zu erkennen.

In [None]:
ax = active_index.plot(
    column="active_city_index",
    legend=True,
    figsize=(6, 6)
)
plt.title("Active City Index (MVP) – Stadtbezirke München")
plt.axis("off")
plt.show()

### Scatterplots & korrelationen

In [None]:
dims = ["index_gruen", "index_sport", "index_mobil"]
titles = ["Grün", "Sport", "Mobilität"]

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

for col, title, ax in zip(dims, titles, axes):
    ax.scatter(active_index[col], active_index["active_city_index"])
    ax.set_xlabel(f"{title}-Index")
    ax.set_ylabel("Active City Index")
    ax.set_title(f"Active City Index vs. {title}")

plt.tight_layout()
plt.show()

In [None]:
corr_cols = [
    "active_city_index",
    "index_gruen",
    "index_sport",
    "index_mobil",
    "parks_pro_1000_einw",
    "sports_pro_1000_einw",
    "stops_pro_1000_einw",
    "radweg_km_pro_km2",
]

corr = active_index[corr_cols].corr()

fig, ax = plt.subplots(figsize=(6, 5))
im = ax.imshow(corr, cmap="coolwarm", vmin=-1, vmax=1)

ax.set_xticks(np.arange(len(corr_cols)))
ax.set_yticks(np.arange(len(corr_cols)))
ax.set_xticklabels(corr_cols, rotation=45, ha="right")
ax.set_yticklabels(corr_cols)

cbar = plt.colorbar(im, ax=ax)
cbar.set_label("Korrelationskoeffizient")

ax.set_title("Korrelationsmatrix der Active-City-Indikatoren")
plt.tight_layout()
plt.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
active_index.plot(
    column="active_city_index",
    cmap="viridis",
    legend=True,
    edgecolor="black",
    linewidth=0.5,
    ax=ax
)
ax.set_axis_off()
ax.set_title("Active City Index nach Stadtbezirk")
plt.tight_layout()
plt.show()

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

for col, title, ax in zip(
    ["index_gruen", "index_sport", "index_mobil"],
    ["Grün-Index", "Sport-Index", "Mobilitäts-Index"],
    axes
):
    active_index.plot(
        column=col,
        cmap="viridis",
        legend=True,
        edgecolor="black",
        linewidth=0.5,
        ax=ax
    )
    ax.set_axis_off()
    ax.set_title(title)

plt.tight_layout()
plt.show()

In [None]:
assert active_index["active_city_index"].between(0, 1).all()
assert active_index["index_gruen"].between(0, 1).all()
assert len(active_index) == 25  # 25 Stadtbezirke

## Sensitivitätsanalyse der Gewichtungen

In [None]:
# Sicherstellen, dass die drei Teilindizes existieren
for col in ["index_gruen", "index_sport", "index_mobil"]:
    assert col in active_index.columns, f"{col} fehlt in active_index"

# Basis-Index (equal weights) explizit als eigene Spalte ablegen
# falls active_city_index bereits der gleichgewichtete Index ist:
active_index["aci_equal"] = active_index["active_city_index"]

# Alternative Gewichtungsschemata:
# - Grün-fokussiert
# - Sport-fokussiert
# - Mobilitäts-fokussiert

active_index["aci_green_focus"] = (
    0.5  * active_index["index_gruen"] +
    0.25 * active_index["index_sport"] +
    0.25 * active_index["index_mobil"]
)

active_index["aci_sport_focus"] = (
    0.25 * active_index["index_gruen"] +
    0.5  * active_index["index_sport"] +
    0.25 * active_index["index_mobil"]
)

active_index["aci_mob_focus"] = (
    0.25 * active_index["index_gruen"] +
    0.25 * active_index["index_sport"] +
    0.5  * active_index["index_mobil"]
)

active_index[["name", "aci_equal", "aci_green_focus", "aci_sport_focus", "aci_mob_focus"]].head()

In [None]:
# Spearmans Rangkorrelation zwischen den verschiedenen Index-Varianten

from scipy.stats import spearmanr

index_variants = ["aci_equal", "aci_green_focus", "aci_sport_focus", "aci_mob_focus"]

print("Spearman-Rangkorrelationen der Indexvarianten:\n")

for var in index_variants[1:]:
    rho, p = spearmanr(active_index["aci_equal"], active_index[var])
    print(f"aci_equal vs {var}: ρ = {rho:.3f}, p = {p:.3f}")

In [None]:
# Bezirke mit stärkster Rangverschiebung identifizieren
def rank_positions(df, score_col):
    """
    Gibt ein Dict {Bezirkname: Rangposition} zurück,
    Rang 1 = höchster Score.
    """
    ranking = (
        df[["name", score_col]]
        .sort_values(score_col, ascending=False)
        .reset_index(drop=True)
    )
    return {row["name"]: i + 1 for i, row in ranking.iterrows()}


# Rangpositionen für alle Varianten
ranks_equal       = rank_positions(active_index, "aci_equal")
ranks_green_focus = rank_positions(active_index, "aci_green_focus")
ranks_sport_focus = rank_positions(active_index, "aci_sport_focus")
ranks_mob_focus   = rank_positions(active_index, "aci_mob_focus")

def rank_diff_list(base_ranks, alt_ranks, label):
    diffs = []
    for name in base_ranks:
        diff = alt_ranks[name] - base_ranks[name]  # positiv = schlechterer Rang
        diffs.append((name, base_ranks[name], alt_ranks[name], diff))
    diffs_sorted = sorted(diffs, key=lambda x: abs(x[3]), reverse=True)
    print(f"\nTop 5 Rangverschiebungen für {label}:")
    for name, base_pos, alt_pos, diff in diffs_sorted[:5]:
        print(f"- {name}: von Rang {base_pos} auf Rang {alt_pos} (Δ = {diff})")

rank_diff_list(ranks_equal, ranks_green_focus, "Grün-fokussierte Gewichtung")
rank_diff_list(ranks_equal, ranks_sport_focus, "Sport-fokussierte Gewichtung")
rank_diff_list(ranks_equal, ranks_mob_focus,   "Mobilitäts-fokussierte Gewichtung")

In [None]:
def top_n_for_variant(df, score_col, n=5):
    return (
        df[["bez_nr", "name", score_col]]
        .sort_values(score_col, ascending=False)
        .head(n)
        .reset_index(drop=True)
    )

print("Top 5 – equal weights:")
display(top_n_for_variant(active_index, "aci_equal"))

print("Top 5 – Grün-fokus:")
display(top_n_for_variant(active_index, "aci_green_focus"))

print("Top 5 – Sport-fokus:")
display(top_n_for_variant(active_index, "aci_sport_focus"))

print("Top 5 – Mobilitäts-fokus:")
display(top_n_for_variant(active_index, "aci_mob_focus"))

### Export


In [None]:
export_cols = [
    # Basisinfos
    "bez_nr",
    "name",
    "einwohner",
    "flaeche_ha",
    "einwohnerdichte",
    
    # Rohindikatoren
    "parks_pro_1000_einw",
    "parks_area_anteil_prozent",
    "sports_pro_1000_einw",
    "sports_area_anteil_prozent",
    "stops_pro_1000_einw",
    "radweg_km_pro_km2",
    
    # Teilindizes
    "index_gruen",
    "index_sport",
    "index_mobil",
    
    # Gesamtindex + Varianten
    "active_city_index",
    "aci_equal",
    "aci_green_focus",
    "aci_sport_focus",
    "aci_mob_focus",
]

# Falls noch nicht in deiner Liste: geometry für Geo-Export
export_cols_with_geom = export_cols + ["geometry"]

In [None]:
# DataFrame ohne Geometrie
df_export = active_index[export_cols].copy()

output_csv = "../data/processed/muc_active_city_index.csv"

df_export.to_csv(
    output_csv,
    index=False,
    float_format="%.4f"  # rundet numerische Werte, z. B. 0.123456 -> 0.1235
)

output_csv

In [None]:
gdf_export = active_index[export_cols_with_geom].copy()

output_geojson = "../data/processed/muc_active_city_index.geojson"

gdf_export.to_file(
    output_geojson,
    driver="GeoJSON"
)

output_geojson

In [None]:
output_gpkg = "../data/processed/muc_active_city_index.gpkg"

gdf_export.to_file(
    output_gpkg,
    layer="muc_active_index",
    driver="GPKG"
)

output_gpkg

## Robustheit: Leave one indicator out (LOO)

In [None]:
INDICATORS = {
    "green": ["parks_pro_1000_einw", "parks_area_anteil_prozent"],
    "sport": ["sports_pro_1000_einw", "sports_area_anteil_prozent"],
    "mob":   ["stops_pro_1000_einw", "radweg_km_pro_km2"],
}

base = compute_active_city_index(
    active_index.copy(),
    INDICATORS["green"],
    INDICATORS["sport"],
    INDICATORS["mob"],
    weights={"green": 1/3, "sport": 1/3, "mob": 1/3}
)

base_scores = base[["bez_nr", "name", "active_city_index"]].rename(
    columns={"active_city_index": "aci_base"}
)

In [None]:
from scipy.stats import spearmanr

results_leave_one_out = []

for dim, cols in INDICATORS.items():
    for col in cols:
        # Alle Indikatoren dieser Dimension OHNE col
        new_cols = cols.copy()
        new_cols.remove(col)

        gdf_tmp = compute_active_city_index(
            active_index.copy(),
            green_ind=INDICATORS["green"] if dim != "green" else new_cols,
            sport_ind=INDICATORS["sport"] if dim != "sport" else new_cols,
            mob_ind=INDICATORS["mob"]   if dim != "mob"   else new_cols,
        )

        tmp_scores = gdf_tmp[["bez_nr", "active_city_index"]].rename(
            columns={"active_city_index": "aci_loo"}
        )
        merged = base_scores.merge(tmp_scores, on="bez_nr")
        rho, _ = spearmanr(merged["aci_base"], merged["aci_loo"])

        results_leave_one_out.append({
            "dimension": dim,
            "removed_indicator": col,
            "spearman_rho": rho
        })

display(results_leave_one_out)

## Domains

In [None]:
DOMAIN_SCORES = {
    "green":  ["parks_pro_1000_einw_norm", "parks_area_anteil_prozent_norm"],
    "sport":  ["sports_pro_1000_einw_norm", "sports_area_anteil_prozent_norm"],
    "mob":    ["stops_pro_1000_einw_norm", "radweg_km_pro_km2_norm"],
}

for domain, cols in DOMAIN_SCORES.items():
    active_index[f"domain_{domain}"] = active_index[cols].mean(axis=1)

In [None]:
# Spinnenrad-Plot für einen Bezirk
import numpy as np

def plot_bezirk_profile(row):
    labels = ["Grün", "Sport", "Mobilität"]
    values = [
        row["index_gruen"],
        row["index_sport"],
        row["index_mobil"]
    ]
    values += values[:1]  # Radar close

    angles = np.linspace(0, 2*np.pi, len(labels) + 1)

    fig, ax = plt.subplots(subplot_kw={"polar": True}, figsize=(4, 4))
    ax.plot(angles, values)
    ax.fill(angles, values, alpha=0.25)
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(labels)
    ax.set_ylim(0, 1)
    ax.set_title(row["name"])
    plt.show()

# Beispiel für einen Bezirk:
plot_bezirk_profile(active_index.iloc[0])

# Abgabe-Upgrade: Forschungsfrage, Robustheit und Handlungsempfehlungen

Dieser Abschnitt fasst die zentralen Bausteine fuer den Projektsteckbrief in einer kompakten, nachvollziehbaren Form zusammen.

## 1) Bewertungslogik der Indikatoren

Die folgende Tabelle dokumentiert je Indikator:
- Dimension
- Normierungsidee
- erwartete Wirkungsrichtung auf den Active-City-Index
- kurze fachliche Begruendung

In [None]:
import pandas as pd
import numpy as np

indicator_logic = pd.DataFrame([
    {"indikator": "parks_pro_1000_einw", "dimension": "Gruen", "normierung": "pro 1.000 Einwohner", "richtung": "hoeher = besser", "begruendung": "Naehe zu Gruenraeumen fuer alltaegliche Bewegung"},
    {"indikator": "parks_area_anteil_prozent", "dimension": "Gruen", "normierung": "Anteil an Bezirksflaeche", "richtung": "hoeher = besser", "begruendung": "Raeumliche Verfuegbarkeit von Gruenflaechen"},
    {"indikator": "sports_pro_1000_einw", "dimension": "Sport", "normierung": "pro 1.000 Einwohner", "richtung": "hoeher = besser", "begruendung": "Angebotsdichte an Sportinfrastruktur"},
    {"indikator": "sports_area_anteil_prozent", "dimension": "Sport", "normierung": "Anteil an Bezirksflaeche", "richtung": "hoeher = besser", "begruendung": "Flaechenverfuegbarkeit fuer Bewegung/Sport"},
    {"indikator": "stops_pro_1000_einw", "dimension": "Mobilitaet", "normierung": "pro 1.000 Einwohner", "richtung": "hoeher = besser", "begruendung": "Erreichbarkeit bewegungsrelevanter Ziele"},
    {"indikator": "radweg_km_pro_km2", "dimension": "Mobilitaet", "normierung": "km pro km^2", "richtung": "hoeher = besser", "begruendung": "Dichte der Fahrradinfrastruktur"},
])

indicator_logic

## 2) Methodik-Formeln (kompakt)

Verwendete Baseline-Methodik:

1. Min-Max-Normalisierung je Indikator
x_norm = (x - min(x)) / (max(x) - min(x))

Sonderfall: Falls max(x) = min(x), wird der normierte Wert auf 0 gesetzt.

2. Teilindizes
index_gruen = mean(parks_pro_1000_einw_norm, parks_area_anteil_prozent_norm)
index_sport = mean(sports_pro_1000_einw_norm, sports_area_anteil_prozent_norm)
index_mobil = mean(stops_pro_1000_einw_norm, radweg_km_pro_km2_norm)

3. Gesamtindex
active_city_index = (1/3)*index_gruen + (1/3)*index_sport + (1/3)*index_mobil

## 3) Robustheit: Gewichtung und Rangverschiebungen

In [None]:
if "aci_equal" not in active_index.columns:
    active_index["aci_equal"] = active_index["active_city_index"]

if "aci_green_focus" not in active_index.columns:
    active_index["aci_green_focus"] = (
        0.5 * active_index["index_gruen"] +
        0.25 * active_index["index_sport"] +
        0.25 * active_index["index_mobil"]
    )

if "aci_sport_focus" not in active_index.columns:
    active_index["aci_sport_focus"] = (
        0.25 * active_index["index_gruen"] +
        0.5 * active_index["index_sport"] +
        0.25 * active_index["index_mobil"]
    )

if "aci_mob_focus" not in active_index.columns:
    active_index["aci_mob_focus"] = (
        0.25 * active_index["index_gruen"] +
        0.25 * active_index["index_sport"] +
        0.5 * active_index["index_mobil"]
    )

for score_col in ["aci_equal", "aci_green_focus", "aci_sport_focus", "aci_mob_focus"]:
    active_index[f"rank_{score_col}"] = active_index[score_col].rank(ascending=False, method="min").astype(int)

robustheit_ranking = active_index[[
    "bez_nr", "name", "rank_aci_equal", "rank_aci_green_focus", "rank_aci_sport_focus", "rank_aci_mob_focus"
]].copy()

robustheit_ranking["delta_green_vs_equal"] = robustheit_ranking["rank_aci_green_focus"] - robustheit_ranking["rank_aci_equal"]
robustheit_ranking["delta_sport_vs_equal"] = robustheit_ranking["rank_aci_sport_focus"] - robustheit_ranking["rank_aci_equal"]
robustheit_ranking["delta_mob_vs_equal"] = robustheit_ranking["rank_aci_mob_focus"] - robustheit_ranking["rank_aci_equal"]

robustheit_ranking.sort_values("rank_aci_equal").head(10)

In [None]:
score_cols = ["aci_equal", "aci_green_focus", "aci_sport_focus", "aci_mob_focus"]
spearman_matrix = active_index[score_cols].corr(method="spearman")
spearman_matrix

In [None]:
rank_shift_long = robustheit_ranking[[
    "name", "delta_green_vs_equal", "delta_sport_vs_equal", "delta_mob_vs_equal"
]].copy()

rank_shift_long["max_abs_delta"] = rank_shift_long[[
    "delta_green_vs_equal", "delta_sport_vs_equal", "delta_mob_vs_equal"
]].abs().max(axis=1)

rank_shift_long.sort_values("max_abs_delta", ascending=False).head(10)

## 4) Raeumliche Muster (Hotspots/Coldspots)

In [None]:
q25 = active_index["active_city_index"].quantile(0.25)
q75 = active_index["active_city_index"].quantile(0.75)

active_index["aci_zone"] = np.select(
    [active_index["active_city_index"] >= q75, active_index["active_city_index"] <= q25],
    ["Hotspot (oberes Quartil)", "Coldspot (unteres Quartil)"],
    default="Mittelbereich",
)

zone_summary = active_index[["name", "active_city_index", "aci_zone"]].sort_values("active_city_index", ascending=False)
zone_summary.head(10)

In [None]:
hotspots = active_index.loc[active_index["aci_zone"].str.startswith("Hotspot"), "name"].tolist()
coldspots = active_index.loc[active_index["aci_zone"].str.startswith("Coldspot"), "name"].tolist()

print("Hotspots (oberes Quartil):")
print(", ".join(hotspots) if hotspots else "-")
print("\nColdspots (unteres Quartil):")
print(", ".join(coldspots) if coldspots else "-")

## 5) Typisierung der Bezirke (Quadrantenlogik)

In [None]:
active_index["angebot_score"] = active_index[["index_gruen", "index_sport"]].mean(axis=1)
active_index["angebot_z"] = (active_index["angebot_score"] - active_index["angebot_score"].mean()) / active_index["angebot_score"].std()
active_index["mobil_z"] = (active_index["index_mobil"] - active_index["index_mobil"].mean()) / active_index["index_mobil"].std()

conditions = [
    (active_index["angebot_z"] >= 0) & (active_index["mobil_z"] >= 0),
    (active_index["angebot_z"] >= 0) & (active_index["mobil_z"] < 0),
    (active_index["angebot_z"] < 0) & (active_index["mobil_z"] >= 0),
    (active_index["angebot_z"] < 0) & (active_index["mobil_z"] < 0),
]
labels = [
    "Typ A: starkes Angebot + starke Mobilitaet",
    "Typ B: starkes Angebot + schwache Mobilitaet",
    "Typ C: schwaches Angebot + starke Mobilitaet",
    "Typ D: schwaches Angebot + schwache Mobilitaet",
]
active_index["bezirkstyp"] = np.select(conditions, labels, default="unbestimmt")

active_index[["name", "bezirkstyp", "angebot_score", "index_mobil", "active_city_index"]]     .sort_values(["bezirkstyp", "active_city_index"], ascending=[True, False])

In [None]:
active_index["bezirkstyp"].value_counts().rename_axis("bezirkstyp").to_frame("anzahl_bezirke")

## 6) Limitationen (fuer den Bericht)

- OSM-Daten sind nicht vollstaendig homogen gepflegt (Erfassungsbias moeglich).
- Der Index misst primaer Verfuegbarkeit und Dichte, nicht Qualitaet/Nutzbarkeit vor Ort.
- Zeitstand ist eine Momentaufnahme; OSM- und Verwaltungsdaten koennen sich aendern.
- Gleichgewichtete Aggregation ist transparent, aber normativ; alternative Gewichtungen liefern teils andere Rangfolgen.
- Fehlende soziooekonomische und gesundheitliche Kontextdaten koennen die Interpretation einschraenken.

## 7) Konkrete Handlungsempfehlungen (datenbasiert)

In [None]:
handlungsraum = active_index[[
    "bez_nr", "name", "active_city_index", "index_gruen", "index_sport", "index_mobil"
]].copy()

handlungsraum["schwaechste_dimension"] = handlungsraum[["index_gruen", "index_sport", "index_mobil"]].idxmin(axis=1)
handlungsraum["schwaechste_dimension"] = handlungsraum["schwaechste_dimension"].map({
    "index_gruen": "Gruen",
    "index_sport": "Sport",
    "index_mobil": "Mobilitaet",
})

prioritaet = handlungsraum.sort_values("active_city_index", ascending=True).head(5)
prioritaet

In [None]:
def empfehlung(dim):
    if dim == "Gruen":
        return "Parks/zugaengliche Gruenflaechen ausbauen und besser verteilen"
    if dim == "Sport":
        return "Sportangebote und niederschwellige Bewegungsflaechen ergaenzen"
    return "Radwegevernetzung und Haltestellen-Erreichbarkeit verbessern"

prioritaet_empf = prioritaet[["name", "active_city_index", "schwaechste_dimension"]].copy()
prioritaet_empf["empfehlung"] = prioritaet_empf["schwaechste_dimension"].apply(empfehlung)
prioritaet_empf

## Mini-Checklist fuer den Projektbericht

- Forschungsfrage beantwortet: Unterschiede und Muster zwischen Bezirken sind sichtbar.
- Methodik nachvollziehbar: Indikatoren, Formeln, Gewichtung dokumentiert.
- Robustheit gezeigt: Gewichtungsvarianten und Rangstabilitaet enthalten.
- Limitationen transparent: Daten- und Methoden-Grenzen klar benannt.
- Mehrwert/Transfer: konkrete Handlungsraeume und Anschlussfaehigkeit vorhanden.

[QUALITY-LISA]
## 8) Qualitaetsindikatoren (neu)

Ergaenzt werden drei qualitative Komponenten auf Bezirksebene:
- **Parkzugaenglichkeit**: Anteil oeffentlich/frei zugaenglicher Parkflaechen.
- **Sportanlagen-Typ/Oeffentlichkeit**: Diversitaet von Sportanlagentypen und deren Zugaenglichkeit.
- **Radweg-Sicherheit**: Anteil geschuetzter Radinfrastruktur an der Radweglaenge.

Hinweis: Die Kennzahlen sind OSM-basiert und als **Proxy-Indikatoren** zu verstehen.

In [None]:
import osmnx as ox
import geopandas as gpd
import pandas as pd
import numpy as np

place_name = "München, Deutschland"
public_access_values = {"yes", "public", "permissive", "destination"}

def _safe_series(df, col, default=""):
    if col in df.columns:
        return df[col].astype(str).str.lower().fillna(default)
    return pd.Series([default] * len(df), index=df.index)

def _minmax(series):
    s = pd.to_numeric(series, errors="coerce").fillna(0.0)
    den = s.max() - s.min()
    if den == 0:
        return pd.Series(0.0, index=s.index)
    return (s - s.min()) / den

[QUALITY-LISA]
### 8.1 Parkzugaenglichkeit

In [None]:

tags_parks_quality = {"leisure": "park"}
parks_raw = ox.features_from_place(place_name, tags_parks_quality).reset_index()
parks = gpd.GeoDataFrame(parks_raw, geometry="geometry", crs=parks_raw.crs)
parks = parks.to_crs(gdf_base.crs)

acc = _safe_series(parks, "access", "")
fee = _safe_series(parks, "fee", "")

parks["is_public_access"] = acc.isin(public_access_values) | (acc == "")
parks["is_free_access"] = (~fee.isin({"yes", "true", "1"})) | (fee == "")
parks["park_area_m2"] = parks.geometry.area

parks_join = gpd.sjoin(
    parks[["is_public_access", "is_free_access", "park_area_m2", "geometry"]],
    gdf_base[["bez_nr", "geometry"]],
    how="inner",
    predicate="intersects",
)

park_quality = (
    parks_join.groupby("bez_nr", as_index=False)
    .agg(
        parks_public_share=("is_public_access", "mean"),
        parks_free_share=("is_free_access", "mean"),
        parks_mean_area_m2=("park_area_m2", "mean"),
    )
)

park_quality["parks_quality_access"] = (
    0.5 * park_quality["parks_public_share"] +
    0.3 * park_quality["parks_free_share"] +
    0.2 * _minmax(park_quality["parks_mean_area_m2"])
)

park_quality.head()

[QUALITY-LISA]
### 8.2 Sportanlagen-Typ / Oeffentlichkeit

In [None]:

tags_sport_quality = {
    "leisure": ["pitch", "sports_centre", "stadium", "track"],
    "amenity": ["sports_centre"],
    "sport": True,
}

sport_raw = ox.features_from_place(place_name, tags_sport_quality).reset_index()
sport_q = gpd.GeoDataFrame(sport_raw, geometry="geometry", crs=sport_raw.crs)
sport_q = sport_q.to_crs(gdf_base.crs)

sport_q["facility_type"] = (
    _safe_series(sport_q, "sport", "")
    .replace("", np.nan)
    .fillna(_safe_series(sport_q, "leisure", ""))
    .replace("", np.nan)
    .fillna(_safe_series(sport_q, "amenity", ""))
    .replace("", "unknown")
)

acc_s = _safe_series(sport_q, "access", "")
fee_s = _safe_series(sport_q, "fee", "")
sport_q["is_public_access"] = acc_s.isin(public_access_values) | (acc_s == "")
sport_q["is_free_access"] = (~fee_s.isin({"yes", "true", "1"})) | (fee_s == "")

sport_join_q = gpd.sjoin(
    sport_q[["facility_type", "is_public_access", "is_free_access", "geometry"]],
    gdf_base[["bez_nr", "geometry"]],
    how="inner",
    predicate="intersects",
)

sport_quality = sport_join_q.groupby("bez_nr", as_index=False).agg(
    sports_type_diversity=("facility_type", "nunique"),
    sports_public_share=("is_public_access", "mean"),
    sports_free_share=("is_free_access", "mean"),
)

sport_quality["sports_quality_type_public"] = (
    0.5 * _minmax(sport_quality["sports_type_diversity"]) +
    0.3 * sport_quality["sports_public_share"] +
    0.2 * sport_quality["sports_free_share"]
)

sport_quality.head()


### 8.3 Radweg-Sicherheit

In [None]:

G = ox.graph_from_place(place_name, network_type="bike")
_, edges = ox.graph_to_gdfs(G)

edges = edges.reset_index(drop=True).copy()
edges = edges.to_crs(gdf_base.crs)

for col in ["cycleway", "cycleway:left", "cycleway:right", "cycleway:both", "segregated", "highway"]:
    if col not in edges.columns:
        edges[col] = ""

protected_vals = {"track", "separate", "opposite_track", "protected"}

def _is_protected(row):
    cyc_vals = {
        str(row.get("cycleway", "")).lower(),
        str(row.get("cycleway:left", "")).lower(),
        str(row.get("cycleway:right", "")).lower(),
        str(row.get("cycleway:both", "")).lower(),
    }
    segregated = str(row.get("segregated", "")).lower()
    highway = str(row.get("highway", "")).lower()
    return bool(cyc_vals.intersection(protected_vals)) or segregated in {"yes", "true", "1"} or highway == "cycleway"

edges["is_protected"] = edges.apply(_is_protected, axis=1)
edges = edges[["is_protected", "geometry"]].copy()

edges_by_bez = gpd.overlay(
    edges,
    gdf_base[["bez_nr", "geometry"]],
    how="intersection",
)

edges_by_bez["len_m"] = edges_by_bez.geometry.length

bike_quality = edges_by_bez.groupby("bez_nr", as_index=False).agg(
    bike_total_len_m=("len_m", "sum"),
    bike_protected_len_m=("len_m", lambda s: s[edges_by_bez.loc[s.index, "is_protected"]].sum()),
)

bike_quality["bike_protected_share"] = np.where(
    bike_quality["bike_total_len_m"] > 0,
    bike_quality["bike_protected_len_m"] / bike_quality["bike_total_len_m"],
    0.0,
)
bike_quality["bike_quality_safety"] = bike_quality["bike_protected_share"]

bike_quality.head()


### 8.4 Integration in den Index (Quality-Index + Enhanced Index)

In [None]:

quality_df = (
    gdf_base[["bez_nr", "name", "geometry"]]
    .merge(park_quality[["bez_nr", "parks_quality_access"]], on="bez_nr", how="left")
    .merge(sport_quality[["bez_nr", "sports_quality_type_public"]], on="bez_nr", how="left")
    .merge(bike_quality[["bez_nr", "bike_quality_safety"]], on="bez_nr", how="left")
)

for col in ["parks_quality_access", "sports_quality_type_public", "bike_quality_safety"]:
    quality_df[col] = quality_df[col].fillna(0.0)

quality_df["quality_index"] = quality_df[[
    "parks_quality_access",
    "sports_quality_type_public",
    "bike_quality_safety",
]].mean(axis=1)

active_index_quality = active_index.merge(
    quality_df[["bez_nr", "parks_quality_access", "sports_quality_type_public", "bike_quality_safety", "quality_index"]],
    on="bez_nr",
    how="left",
)

active_index_quality["active_city_index_plus"] = (
    0.7 * active_index_quality["active_city_index"] +
    0.3 * active_index_quality["quality_index"]
)

active_index_quality[[
    "name", "active_city_index", "quality_index", "active_city_index_plus"
]].sort_values("active_city_index_plus", ascending=False).head(10)

In [None]:
# Ergebnisexport fuer Bericht/Anhang (Quality-Teil)
quality_export_cols = [
    "bez_nr", "name", "active_city_index", "quality_index", "active_city_index_plus",
    "parks_quality_access", "sports_quality_type_public", "bike_quality_safety",
]

active_index_quality[quality_export_cols].to_csv(
    "../data/processed/muc_active_city_quality_index.csv", index=False, float_format="%.6f"
)

print("Wrote: ../data/processed/muc_active_city_quality_index.csv")