# Affordability Classification

In this notebook, we apply an affordability classification framework to Berlin’s subdistrict dataset.

The goal is to estimate whether a household with a given monthly income, apartment size, and rent threshold can realistically afford to live in each subdistrict, based on Mietspiegel rent classifications and local income levels.

We then use this classification to generate recommendations: subdistricts that are both affordable and aligned with user preferences (e.g. desired lifestyle clusters). This bridges the gap between raw rent/income statistics and actionable insights for Berlin residents.

`Conclusion`

Using the affordability classification, we can clearly identify which subdistricts fall within a reasonable rent-to-income ratio. In our example (income €3,500, 60 m² apartment, 30% threshold), the model successfully highlights top 5 recommended subdistricts within the user’s preferred cluster.

`Key takeaways`

* The affordability function adapts flexibly to income, apartment size, and thresholds, making it reusable for different household types.

* Relaxed thresholds ensure users still receive recommendations even in tighter rental markets.

* Combined with cluster profiles, this approach not only answers “Where can I afford to live?” but also “Which areas best match my lifestyle within my budget?”

This classification step is an essential component of the Berlin Housing Explorer app, powering its personalized recommendations and helping residents make more informed housing choices.

## Environment & Data

In [None]:
# Ensure project root is importable
import os, sys
PROJECT_ROOT = os.path.abspath(os.path.join(os.getcwd(), ".."))  # parent of notebooks_final
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

# Imports
import pandas as pd
from berlin_housing.tasks.classification import add_affordability, top_recommendations    

In [None]:
# Load Data
df = pd.read_csv("../data/processed/final_master_with_k4_clusters.csv")

# Cleanup
for junk in ("Unnamed: 0", "index"):
    if junk in df.columns:
        df = df.drop(columns=junk)

## Affordability Classification

In [None]:
# Add affordability
df_aff = add_affordability(
    df,
    monthly_income_eur=3500,
    size_m2=60,
    threshold=0.30,
    mietspiegel_col="subdistrict_avg_mietspiegel_classification",
    income_col="subdistrict_avg_median_income_eur",
)

# See top 5 recommendations based on example income
top5 = top_recommendations(
    df_aff,
    preferred_clusters=[1],
    k=5,
    relax_thresholds=(0.32, 0.35, 0.40),  # try these before broader fallbacks
)
top5

Unnamed: 0,bezirk,ortsteil,k4_cluster,aff_rent_per_m2,aff_est_monthly_rent,aff_rent_to_income,cafes,supermarket,green_space,schools,_score,_note
44,friedrichshain-kreuzberg,kreuzberg,1,11.679916,700.794979,0.174936,247,57,1041,202,-0.497564,exact_match
61,neukoelln,neukoelln,1,9.588745,575.324675,0.144763,187,68,801,136,-0.383237,exact_match
76,tempelhof-schoeneberg,schoeneberg,1,14.124841,847.490446,0.203914,148,55,857,143,-0.326086,exact_match
68,pankow,prenzlauer berg,1,16.208304,972.49827,0.205342,201,71,692,184,-0.276658,exact_match
23,friedrichshain-kreuzberg,friedrichshain,1,16.432258,985.935484,0.245415,144,50,776,111,-0.239585,exact_match
