## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day05/solution/day05_solution.ipynb)

# 🔥 Day 5 – Capstone: Linking CO₂ Emissions and Warming
### Guided loops: align → check → model → narrate

The capstone pulls every habit together. You’ll align two flagship climate datasets, validate the overlap, build a polished dual-axis figure, and narrate the evidence connecting fossil emissions to global temperature rise.

## 📇 Data Card — Global CO₂ & NASA Temperature Anomalies
- **CO₂ Source**: Our World in Data / Global Carbon Project (global fossil CO₂ emissions).
- **Temperature Source**: NASA GISTEMP v4 global surface temperature anomalies.
- **Temporal coverage**: CO₂ (1900–2022), Temperature (1880–2024).
- **Units**: CO₂ in gigatonnes/year; temperature anomaly in °C relative to 1951–1980 baseline.
- **Processing notes**: Align on annual `Year`, drop rows with missing values, rescale axes separately.
- **Last updated**: CO₂ download October 2024; GISTEMP January 2025 release.
- **Caveats**: CO₂ excludes land-use change; temperature anomalies are global averages (regional extremes smoothed).

> 🔎 **What this visualization cannot tell us**: Precise causality, short-term variability drivers, or uncertainty bands. Pair with context about climate sensitivity and non-CO₂ forcings.

## 🗺️ Workflow Map
1. **Setup & helpers**.
2. **Load** each dataset with quick inspections.
3. **Clean & align** overlapping years.
4. **Story scaffold** for chart metadata.
5. **Visualise** dual-axis figure with annotations and accessibility notes.
6. **Reflect** on narrative, uncertainty, and next analytical steps.

## Step 0 · Imports, style, and diagnostics

In [None]:

from pathlib import Path
from textwrap import dedent

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display

sns.set_theme(style="whitegrid")
plt.rcParams.update({
    "axes.titlesize": 18,
    "axes.labelsize": 13,
    "axes.titleweight": "bold",
    "figure.titlesize": 20,
    "xtick.labelsize": 11,
    "ytick.labelsize": 11,
})


def baseline_style():
    """Reset the Matplotlib/Seaborn style so every figure starts consistent."""
    sns.set_theme(style="whitegrid")
    plt.rcParams.update({
        "axes.titlesize": 18,
        "axes.labelsize": 13,
        "axes.titleweight": "bold",
        "figure.titlesize": 20,
        "xtick.labelsize": 11,
        "ytick.labelsize": 11,
        "legend.title_fontsize": 12,
        "legend.fontsize": 11,
    })
    return plt


def quick_peek(df, expected_columns=None, sample=3, label="DataFrame"):
    """Print a friendly snapshot so students can self-diagnose issues quickly."""
    print(f"
🔍 {label} preview")
    print(df.head(sample))
    print(f"Rows: {len(df):,} | Columns: {list(df.columns)}")
    if expected_columns:
        missing = [col for col in expected_columns if col not in df.columns]
        if missing:
            print(f"⚠️ Missing column(s): {missing}")
        else:
            print("✅ Columns match the expectation.")
    return df


def expect_rows_between(df, low, high, label="row count"):
    rows = len(df)
    if low <= rows <= high:
        print(f"✅ {label.title()} looks right: {rows:,}.")
    else:
        print(f"⚠️ {label.title()} looks off: {rows:,}. Expected between {low:,} and {high:,}.")
    return rows


def validate_story_elements(elements):
    missing = [key for key, value in elements.items() if not value or not str(value).strip()]
    if missing:
        print(f"⚠️ Please fill in these storytelling fields: {', '.join(missing)}")
    else:
        print("✅ Story scaffold is ready — every element is filled in.")
    return elements


def save_last_fig(filename, fig=None, dpi=300):
    """Save the latest Matplotlib figure with consistent export settings."""
    output_path = Path.cwd() / filename
    output_path.parent.mkdir(parents=True, exist_ok=True)
    if fig is None:
        fig = plt.gcf()
    if fig and getattr(fig, "axes", None):
        fig.savefig(output_path, dpi=dpi, bbox_inches="tight")
        print(f"💾 Saved figure to {output_path}")
    else:
        print("⚠️ No figure detected to save.")
    return output_path

baseline_style()


## Step 1 · Load CO₂ and temperature series

In [None]:

data_dir = Path.cwd() / "data"

co2_df = pd.read_csv(data_dir / "global_co2.csv")
co2_df = co2_df.rename(columns={"Year": "Year", "CO2": "CO2"})
quick_peek(co2_df, expected_columns=["Year", "CO2"], label="Global CO₂ emissions")

temp_raw = pd.read_csv(
    data_dir / "GLB.Ts+dSST.csv",
    skiprows=1,
    usecols=[0, 13],
    names=["Year", "TempAnomaly"],
    header=0,
)
temp_raw["TempAnomaly"] = pd.to_numeric(temp_raw["TempAnomaly"], errors="coerce")
quick_peek(temp_raw, expected_columns=["Year", "TempAnomaly"], label="NASA temperature anomalies")


### Self-diagnostic: overlapping year range

In [None]:

co2_years = set(co2_df["Year"].dropna().astype(int))
temp_years = set(temp_raw["Year"].dropna().astype(int))
overlap_years = sorted(co2_years & temp_years)
print(f"Overlap spans {overlap_years[0]} → {overlap_years[-1]} ({len(overlap_years)} years)")


## Step 2 · Merge, tidy, and create checkpoints

In [None]:

merged = co2_df.merge(temp_raw, on="Year", how="inner").dropna()
merged = merged[(merged["Year"] >= overlap_years[0]) & (merged["Year"] <= overlap_years[-1])]
merged = merged.sort_values("Year").reset_index(drop=True)
expect_rows_between(merged, 110, 125, label="aligned years")
quick_peek(merged, label="Merged CO₂ + temperature table")


### Progress anchor

In [None]:
display(Image(filename=str(Path.cwd() / 'plots' / 'day05_solution_plot.png')), width=420)

## Step 3 · Story-first chart checklist

In [None]:

latest_year = int(merged["Year"].iloc[-1])
TITLE = "Rising CO₂ Emissions Track with Rising Global Temperatures"
SUBTITLE = f"Global fossil CO₂ (Gt/year) and NASA temperature anomaly (°C), {merged['Year'].iloc[0]}–{latest_year}"
ANNOTATION = f"{latest_year}: CO₂ = {merged['CO2'].iloc[-1]:.1f} Gt, Temp anomaly = {merged['TempAnomaly'].iloc[-1]:.2f} °C"
SOURCE = "Global Carbon Project via OWID; NASA GISTEMP v4"
UNITS = "CO₂ (gigatonnes per year) & temperature anomaly (°C vs. 1951–1980 baseline)"
ACCESSIBILITY_NOTES = "Dual-axis line chart with contrasting colors, clear legends, 0-baseline grid, and annotation for latest year."

validate_story_elements({
    "TITLE": TITLE,
    "SUBTITLE": SUBTITLE,
    "ANNOTATION": ANNOTATION,
    "SOURCE": SOURCE,
    "UNITS": UNITS,
    "ACCESSIBILITY_NOTES": ACCESSIBILITY_NOTES,
})


## Step 4 · Build the dual-axis capstone figure

In [None]:

baseline_style()

fig, ax1 = plt.subplots(figsize=(12, 6))

ax1.plot(merged["Year"], merged["CO2"], color="#6c757d", linewidth=2.4, label="CO₂ emissions")
ax1.fill_between(merged["Year"], merged["CO2"], color="#6c757d", alpha=0.15)
ax1.set_ylabel("CO₂ emissions (Gt/year)", color="#6c757d")
ax1.tick_params(axis="y", labelcolor="#6c757d")

ax2 = ax1.twinx()
ax2.plot(merged["Year"], merged["TempAnomaly"], color="#d62728", linewidth=2.4, label="Temperature anomaly")
ax2.scatter(merged["Year"][::10], merged["TempAnomaly"][::10], color="#d62728", s=35)
ax2.set_ylabel("Temperature anomaly (°C)", color="#d62728")
ax2.tick_params(axis="y", labelcolor="#d62728")

ax1.set_xlabel("Year")
ax1.set_title(f"{TITLE}
{SUBTITLE}", loc="left")
ax1.grid(alpha=0.3)

ax2.annotate(
    ANNOTATION,
    xy=(latest_year, merged["TempAnomaly"].iloc[-1]),
    xytext=(latest_year - 25, merged["TempAnomaly"].iloc[-1] + 0.3),
    arrowprops=dict(arrowstyle="->", color="#d62728"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#bbbbbb"),
)

ax1.text(0.01, -0.2, f"Source: {SOURCE}", transform=ax1.transAxes, fontsize=10, color="#4f4f4f")
ax1.text(0.01, -0.27, f"Notes: {ACCESSIBILITY_NOTES}", transform=ax1.transAxes, fontsize=10, color="#4f4f4f")

fig.tight_layout()
plt.show()


### Export checkpoint

In [None]:
save_last_fig('plots/day05_solution_plot.png')

## Step 5 · Reflect on synthesis and next steps
- **Claim → Evidence → Visual → Takeaway**:
  - **Claim**: Human-caused CO₂ emissions have risen dramatically and align with the observed warming trend.
  - **Evidence**: Dual-axis figure shows emissions climbing from ~5 to >35 Gt while anomalies increase ~1 °C.
  - **Visual**: Accessible dual-axis line chart with annotation and clear source notes.
  - **Takeaway**: Mitigation requires bending the CO₂ curve; the climate system is responding to cumulative emissions.
- **Limitations**: Dual axes can mislead if scales are manipulated—call out that both axes start at zero with transparent units.
- **Potential misreads**: This chart does not prove causality; pair with physics-based explanations.
- **Next questions**: How do other greenhouse gases contribute? What happens when plotting cumulative emissions vs temperature?

## Process quality checklist
✅ Loaded both flagship datasets • ✅ Confirmed overlapping years • ✅ Completed story scaffold • ✅ Built annotated dual-axis figure • ✅ Reflected on narrative, ethics, and follow-up analyses.