# 05 — Conclusion & Report

**Goals**
- Summarize the main findings from the project:
  - Drivers of HPI in Europe
  - Bulgaria’s position relative to Europe
  - Affordability trends and risks
- Answer the research question: **Is Bulgaria’s HPI decoupled from earnings?**
- Draft policy/market implications.
- Produce a concise Markdown report for `/docs/03_conclusion.md` (reproducible from data outputs).


In [1]:
from __future__ import annotations
from pathlib import Path
from typing import Dict, Tuple, Optional

import numpy as np
import pandas as pd

PROC = Path("../data/processed")
FIGS = Path("../reports/figures")
DOCS = Path("../docs")
DOCS.mkdir(parents=True, exist_ok=True)

PROC, FIGS, DOCS

(WindowsPath('../data/processed'),
 WindowsPath('../reports/figures'),
 WindowsPath('../docs'))

## 1) Load precomputed analysis outputs

We use the CSVs produced by earlier notebooks:
- `corr_pooled_eu27.csv` (EU pooled correlations)
- `corr_per_country_eu27.csv` (per-country correlations, incl. Bulgaria)
- `bg_vs_eu_affordability.csv` (Bulgaria vs EU affordability ratio index)
- `affordability_timeseries.csv` (country–year affordability measures)
- `affordability_clusters.csv` (KMeans labels for affordability trajectories)


In [2]:
def robust_read_csv(p: Path) -> pd.DataFrame:
    """
    Read a CSV robustly and raise a clear error if missing.

    Parameters
    ----------
    p : Path
        Path to the CSV file.

    Returns
    -------
    pd.DataFrame
        Parsed DataFrame.

    Raises
    ------
    FileNotFoundError
        If the path does not exist.
    """
    if not p.exists():
        raise FileNotFoundError(f"Missing required file: {p}")
    return pd.read_csv(p)

pooled_corr      = robust_read_csv(PROC / "corr_pooled_eu27.csv")
per_country_corr = robust_read_csv(PROC / "corr_per_country_eu27.csv")
bg_eu_aff        = robust_read_csv(PROC / "bg_vs_eu_affordability.csv")
aff_ts           = robust_read_csv(PROC / "affordability_timeseries.csv")
clusters         = robust_read_csv(PROC / "affordability_clusters.csv")

pooled_corr, per_country_corr.head(), bg_eu_aff.head()


(                                   pair   pearson  spearman      N
 0          HPI vs Net earnings (levels)  0.101448  0.138453  258.0
 1         HPI vs Real earnings (levels) -0.016933  0.032456  258.0
 2          HPI vs Unemployment (levels) -0.475255 -0.544212  258.0
 3              HPI growth vs GDP growth  0.417307  0.415170  232.0
 4  HPI growth vs Inflation (HICP YoY %)  0.279501  0.201042  232.0,
     country  hpi_vs_net_pearson  hpi_vs_net_spearman   N  hpi_vs_real_pearson  \
 0   Austria                 NaN                  NaN   9                  NaN   
 1   Austria            0.922084             0.939394  10             0.780803   
 2   Belgium                 NaN                  NaN   9                  NaN   
 3   Belgium            0.969365             1.000000  10             0.338652   
 4  Bulgaria                 NaN                  NaN   9                  NaN   
 
    hpi_vs_real_spearman  hpi_vs_unemp_pearson  hpi_vs_unemp_spearman  \
 0                   NaN

## 2) Compute summary metrics

We derive key indicators:
- Bulgaria vs EU affordability change (2015→2024)
- Bulgaria’s correlations vs EU mean/median (per-country)
- Pooled EU correlation context (levels & growths)
- Cluster label for Bulgaria


In [3]:
def change_between_years(df: pd.DataFrame, year_col: str, value_col: str, start_year: int, end_year: int) -> Optional[float]:
    """
    Compute value[end_year] - value[start_year].

    Parameters
    ----------
    df : pd.DataFrame
        Data containing `year_col` and `value_col`.
    year_col : str
        Name of year column (e.g., 'year').
    value_col : str
        Name of value column (e.g., 'bg_aff_index').
    start_year : int
        Start year (inclusive).
    end_year : int
        End year (inclusive).

    Returns
    -------
    Optional[float]
        Difference or None if either year is missing.
    """
    s = df.set_index(year_col)[value_col]
    if start_year not in s.index or end_year not in s.index:
        return None
    return float(s.loc[end_year] - s.loc[start_year])


def summarize_afford_changes(bg_eu: pd.DataFrame) -> Dict[str, float]:
    """
    Compute Bulgaria and EU affordability ratio index changes (2015→2024).

    Parameters
    ----------
    bg_eu : pd.DataFrame
        Columns: 'year', 'bg_aff_index', 'eu_afford_ratio_index_mean'.

    Returns
    -------
    Dict[str, float]
        {'bg_change': float, 'eu_change': float}
    """
    out = {
        "bg_change": change_between_years(bg_eu, "year", "bg_aff_index", 2015, 2024),
        "eu_change": change_between_years(bg_eu, "year", "eu_afford_ratio_index_mean", 2015, 2024),
    }
    return out


def extract_bg_vs_eu_corr(per_geo: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """
    Extract Bulgaria's per-country correlations and EU mean/median across countries.

    Parameters
    ----------
    per_geo : pd.DataFrame
        Per-country correlation table (from Notebook 03).

    Returns
    -------
    Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
        (bg_row, eu_mean, eu_median)
    """
    bg_row = per_geo[per_geo["country"] == "Bulgaria"].copy()
    eu_mean = per_geo.drop(columns=["country","N"]).mean(numeric_only=True).to_frame("EU_mean").T
    eu_median = per_geo.drop(columns=["country","N"]).median(numeric_only=True).to_frame("EU_median").T
    return bg_row, eu_mean, eu_median


def pooled_eu_context(pooled: pd.DataFrame) -> pd.DataFrame:
    """
    Return pooled EU correlations for display, sorted by absolute Pearson strength.

    Parameters
    ----------
    pooled : pd.DataFrame
        Columns include 'pair', 'pearson', 'spearman', 'N'

    Returns
    -------
    pd.DataFrame
        Sorted view by |pearson| descending.
    """
    view = pooled.copy()
    view["abs_pearson"] = view["pearson"].abs()
    return view.sort_values("abs_pearson", ascending=False)[["pair","pearson","spearman","N"]]


def get_bg_cluster_label(labels_df: pd.DataFrame) -> Optional[int]:
    """
    Extract Bulgaria's cluster label from the clusters table.

    Parameters
    ----------
    labels_df : pd.DataFrame
        Columns: 'country', 'cluster'.

    Returns
    -------
    Optional[int]
        Cluster id or None if not found.
    """
    row = labels_df[labels_df["country"] == "Bulgaria"]
    if row.empty:
        return None
    return int(row["cluster"].iloc[0])


In [4]:
bg_eu = bg_eu_aff.rename(columns={"eu_afford_ratio_index_mean":"eu_aff_index"})

aff_changes = summarize_afford_changes(
    bg_eu.rename(columns={"eu_aff_index":"eu_afford_ratio_index_mean"})
)
bg_row, eu_mean, eu_median = extract_bg_vs_eu_corr(per_country_corr)
pooled_view = pooled_eu_context(pooled_corr)
bg_cluster = get_bg_cluster_label(clusters)

aff_changes, bg_row, eu_mean, eu_median, pooled_view.head(), bg_cluster


({'bg_change': -11.036503239090763, 'eu_change': 12.2981189823212},
     country  hpi_vs_net_pearson  hpi_vs_net_spearman   N  hpi_vs_real_pearson  \
 4  Bulgaria                 NaN                  NaN   9                  NaN   
 5  Bulgaria            0.995329                  1.0  10             0.968418   
 
    hpi_vs_real_spearman  hpi_vs_unemp_pearson  hpi_vs_unemp_spearman  \
 4                   NaN                   NaN                    NaN   
 5                   1.0             -0.835489              -0.957335   
 
    hpi_g_vs_gdp_g_pearson  hpi_g_vs_gdp_g_spearman  hpi_g_vs_infl_g_pearson  \
 4                0.578875                      0.4                 0.605834   
 5                     NaN                      NaN                      NaN   
 
    hpi_g_vs_infl_g_spearman  
 4                      0.45  
 5                       NaN  ,
          hpi_vs_net_pearson  hpi_vs_net_spearman  hpi_vs_real_pearson  \
 EU_mean            0.901879             0.915634    

## 3) Findings (auto-filled from the data)

Below we generate a short, reproducible text summary that pulls numbers from the data tables above.


In [5]:
def fmt(x: Optional[float], digits: int = 2, pct: bool = False) -> str:
    """
    Format a number for human-readable text.

    Parameters
    ----------
    x : Optional[float]
        Value to format (may be None).
    digits : int
        Decimal places.
    pct : bool
        If True, append '%'.

    Returns
    -------
    str
        Formatted string.
    """
    if x is None or np.isnan(x):
        return "N/A"
    s = f"{x:.{digits}f}"
    return f"{s}%" if pct else s

bg_change = aff_changes["bg_change"]
eu_change = aff_changes["eu_change"]

def safe_get(df: pd.DataFrame, col: str) -> Optional[float]:
    return None if col not in df.columns or df.empty else float(df[col].iloc[0])

bg_hpi_net_p  = safe_get(bg_row, "hpi_vs_net_pearson")
bg_hpi_real_p = safe_get(bg_row, "hpi_vs_real_pearson")
bg_hpi_un_p   = safe_get(bg_row, "hpi_vs_unemp_pearson")
bg_hpi_g_gdp  = safe_get(bg_row, "hpi_g_vs_gdp_g_pearson")
bg_hpi_g_inf  = safe_get(bg_row, "hpi_g_vs_infl_g_pearson")

eu_mean_hpi_net  = eu_mean["hpi_vs_net_pearson"].item()
eu_mean_hpi_real = eu_mean["hpi_vs_real_pearson"].item()
eu_mean_hpi_un   = eu_mean["hpi_vs_unemp_pearson"].item()
eu_mean_hpi_gdp  = eu_mean["hpi_g_vs_gdp_g_pearson"].item()
eu_mean_hpi_inf  = eu_mean["hpi_g_vs_infl_g_pearson"].item()

pooled_pairs = {
    r["pair"]: (float(r["pearson"]), float(r["spearman"]), int(r["N"]))
    for _, r in pooled_corr.iterrows()
}

report_md = f"""# Conclusions & Report

## Drivers of HPI in Europe (EU pooled)
- **Unemployment**: moderately negative association with HPI (Pearson {fmt(pooled_pairs['HPI vs Unemployment (levels)'][0])}).
- **GDP growth**: moderate positive link to HPI growth (Pearson {fmt(pooled_pairs['HPI growth vs GDP growth'][0])}).
- **Inflation (HICP)**: weak–moderate positive link to HPI growth (Pearson {fmt(pooled_pairs['HPI growth vs Inflation (HICP YoY %)'][0])}).
- **Earnings (levels)**: weak link between HPI and net earnings; essentially none with real earnings (Pearson {fmt(pooled_pairs['HPI vs Net earnings (levels)'][0])} and {fmt(pooled_pairs['HPI vs Real earnings (levels)'][0])}).

## Bulgaria vs Europe
- **Correlations (Bulgaria)**:
  - HPI ~ net earnings: Pearson {fmt(bg_hpi_net_p)} (very strong).
  - HPI ~ real earnings: Pearson {fmt(bg_hpi_real_p)} (strong).
  - HPI ~ unemployment: Pearson {fmt(bg_hpi_un_p)} (strong negative).
  - HPI growth ~ GDP growth: Pearson {fmt(bg_hpi_g_gdp)}.
  - HPI growth ~ Inflation: Pearson {fmt(bg_hpi_g_inf)}.

- **EU per-country averages** (mean):
  - HPI ~ net earnings: {fmt(eu_mean_hpi_net)}; HPI ~ real earnings: {fmt(eu_mean_hpi_real)}.
  - HPI ~ unemployment: {fmt(eu_mean_hpi_un)}.
  - HPIg ~ GDPg: {fmt(eu_mean_hpi_gdp)}; HPIg ~ HICPg: {fmt(eu_mean_hpi_inf)}.

**Interpretation:** Bulgaria’s housing prices are **tightly coupled** to wages and macro fundamentals, whereas the EU pattern shows weaker links to earnings.

## Affordability trends
- **Affordability ratio index (HPI / earnings, base=100 at first year):**
  - **Bulgaria**: change 2015→2024 = {fmt(bg_change)} (↓ indicates improving affordability relative to wages).
  - **EU average**: change 2015→2024 = {fmt(eu_change)} (↑ indicates worsening affordability).

**Important caveat:** The ratio index is **relative within each country**; it does **not** represent absolute cross-country affordability levels.

## Research Question
**Is Bulgaria’s HPI decoupled from earnings?**  
**Answer: No.** Over 2015–2024, Bulgaria shows **very strong coupling** between HPI and both net and real earnings. But when translated into affordability, Bulgaria diverges from the EU: its affordability ratio **improved**, while the EU average deteriorated.  

Thus, we **reject H0** and accept H1: Bulgaria’s affordability has diverged positively from Europe.

## Policy & Market Implications (hypotheses)
- **Affordability & income:** Continued wage growth appears to support the housing market. Monitor whether wage gains keep pace with HPI beyond 2024.
- **Macro risk:** Bulgaria’s HPI is responsive to GDP and inflation; adverse macro shocks (growth slowdown, renewed inflation) could transmit quickly to housing.
- **Credit & prudential:** Consider monitoring **debt-to-income** and **loan-to-income** distributions; tighten macroprudential tools if leverage accelerates.
- **Supply side:** Encourage new housing supply and permitting efficiency to prevent price pressures from turning into renewed affordability deterioration.
- **Data gaps:** Track **interest rate** pass-through, credit conditions, and regional heterogeneity within Bulgaria for a finer risk assessment.

## Limitations & Next Steps
- The affordability ratio uses **HPI (index)** over **earnings (EUR)** — good for **within-country trends**, not for absolute cross-country comparisons.
- The earnings concept (household type, net definition) can matter; robustness checks with alternative earnings measures are recommended.
- Extend with **panel regressions** (country FE, time FE) and **instrumental variables** if causal interpretation is needed.
- Enrich with **credit** and **interest rate** data; test sensitivity excluding outliers (e.g., Türkiye).

## Figures (see /reports/figures)
- EU & BG correlation scatterplots
- BG vs EU affordability ratio index
- Regional affordability trajectories
- Clustered affordability trajectories
"""

out_path = DOCS / "03_conclusion.md"
out_path.write_text(report_md, encoding="utf-8")
out_path


WindowsPath('../docs/03_conclusion.md')

# What this notebook produced

- A reproducible **final report** at `docs/03_conclusion.md` summarizing:
  - Drivers of HPI in Europe
  - Bulgaria vs Europe correlations
  - Affordability trends (Bulgaria vs EU)
  - Clear answer to the research question
  - Policy/market implications, limitations, and next steps

You can now open `docs/03_conclusion.md` to review and adjust the narrative text as needed.


In [6]:
print((DOCS / "03_conclusion.md").read_text(encoding="utf-8")[:1200] + "\n...\n")

# Conclusions & Report

## Drivers of HPI in Europe (EU pooled)
- **Unemployment**: moderately negative association with HPI (Pearson -0.48).
- **GDP growth**: moderate positive link to HPI growth (Pearson 0.42).
- **Inflation (HICP)**: weak–moderate positive link to HPI growth (Pearson 0.28).
- **Earnings (levels)**: weak link between HPI and net earnings; essentially none with real earnings (Pearson 0.10 and -0.02).

## Bulgaria vs Europe
- **Correlations (Bulgaria)**:
  - HPI ~ net earnings: Pearson N/A (very strong).
  - HPI ~ real earnings: Pearson N/A (strong).
  - HPI ~ unemployment: Pearson N/A (strong negative).
  - HPI growth ~ GDP growth: Pearson 0.58.
  - HPI growth ~ Inflation: Pearson 0.61.

- **EU per-country averages** (mean):
  - HPI ~ net earnings: 0.90; HPI ~ real earnings: 0.48.
  - HPI ~ unemployment: -0.63.
  - HPIg ~ GDPg: 0.34; HPIg ~ HICPg: 0.22.

**Interpretation:** Bulgaria’s housing prices are **tightly coupled** to wages and macro fundamentals, whereas the 