# DAT5501 Climate–Economic Risk Analysis: Exploration Notebook

This notebook provides lightweight exploratory checks to confirm:
- dataset schema and coverage,
- missingness and key distributions,
- early relationships between CO₂ per capita and GDP per capita / growth volatility.

The main pipeline is implemented in `main.py` and `src/`.

In [None]:
import pandas as pd

from src.utils import get_project_paths, safe_read_csv, describe_missingness

paths = get_project_paths()
paths

In [None]:
panel_path = paths.data_processed / "panel_features.csv"
country_path = paths.data_processed / "country_summary.csv"

panel = safe_read_csv(panel_path)
country = safe_read_csv(country_path)

panel.head(), country.head()

In [None]:
panel[["year"]].describe()
panel["iso_code"].nunique(), panel["country"].nunique()

In [None]:
describe_missingness(panel).head(15)

In [None]:
country[["avg_co2_per_capita", "mean_gdp_growth", "gdp_growth_volatility", "baseline_gdp_pc"]].describe()

In [None]:
country[["avg_co2_per_capita", "mean_gdp_growth", "gdp_growth_volatility", "baseline_gdp_pc"]].corr(numeric_only=True)

In [None]:
country.sort_values("avg_co2_per_capita", ascending=False).head(10)[["country", "avg_co2_per_capita", "gdp_growth_volatility", "baseline_gdp_pc"]]
country.sort_values("gdp_growth_volatility", ascending=False).head(10)[["country", "avg_co2_per_capita", "gdp_growth_volatility", "mean_gdp_growth"]]

## Notes / Findings (to transfer into report)

- Confirmed time coverage and country coverage aligns across both datasets after cleaning and merging.
- Missingness is concentrated in early periods and smaller economies; rolling-window features reduce usable years.
- Country-level summaries show large dispersion in emissions and economic stability metrics, supporting the need for regression controls.