## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%205%20%E2%80%93%20Capstone%20Project%20CO2%20and%20Climate.ipynb)

# 🔥 Day 5 · Capstone – Stitching CO₂ and Temperature

All week you've built habits: load, check, interpret, narrate. The capstone stitches them together to tell a systems story. We'll stay in the same rhythm—small loops that culminate in one polished visual with explicit justification for every choice.

## 🗂️ Data Card · Global CO₂ + NASA GISTEMP
| Field | Details |
| --- | --- |
| Sources | [Our World in Data – Global CO₂ emissions](https://ourworldindata.org/co2-and-greenhouse-gas-emissions) & [NASA GISTEMP v4](https://data.giss.nasa.gov/gistemp/) |
| Temporal coverage | CO₂: 1750–2022; Temperature: 1880–2024 (2025 provisional) |
| Geographic scope | Global totals |
| Units | CO₂ in gigatonnes per year; temperature in °C anomaly vs. 1951–1980 |
| Update cadence | Annual |
| Caveats | CO₂ excludes land-use change prior to 1850 in some versions; NASA 2025 value provisional. Different baselines complicate direct comparisons. |
| What this chart cannot show | Regional emissions, natural variability drivers, or attribution science. Pair with sectoral breakdowns or carbon budget analysis for depth. |

In [None]:
# 🔁 Shared scaffolds used across DS4S notebooks
from __future__ import annotations

import warnings
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

plt.rcParams.update({
    "figure.dpi": 120,
    "axes.facecolor": "#f8f9fb",
    "axes.grid": True,
    "grid.alpha": 0.25,
    "grid.linestyle": "--",
    "axes.titlesize": 18,
    "axes.labelsize": 12,
    "axes.titleweight": "bold",
    "legend.frameon": False,
    "legend.fontsize": 11,
    "font.family": "DejaVu Sans",
})

def quick_diagnostics(df: pd.DataFrame, dataset_name: str, *, expected_columns: list[str] | None = None, expected_rows: tuple[int, int] | None = None) -> None:
    """Print lightweight diagnostics without stopping execution."""
    print(f"\n🔍 {dataset_name}")
    print(f"Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    if expected_columns is not None:
        missing = [col for col in expected_columns if col not in df.columns]
        if missing:
            warnings.warn(f"Missing expected columns: {missing}")
    if expected_rows is not None:
        low, high = expected_rows
        if not (low <= len(df) <= high):
            warnings.warn(f"Row count {len(df)} outside expected range {expected_rows}")
        else:
            print(f"Row count within expected range {expected_rows}.")
    print("Null counts:")
    print(df.isna().sum())
    print("Preview:")
    print(df.head())
    print("-" * 60)

def expect_value_range(series: pd.Series, *, lower: float | None = None, upper: float | None = None, context: str = "") -> None:
    """Warn when values fall outside an expected numeric window."""
    label = context or series.name or "series"
    if lower is not None and float(series.min()) < lower:
        warnings.warn(f"{label}: minimum {series.min():.3f} is below expected {lower}")
    if upper is not None and float(series.max()) > upper:
        warnings.warn(f"{label}: maximum {series.max():.3f} is above expected {upper}")
    print(f"{label}: {series.min():.3f} → {series.max():.3f}")

def validate_story_elements(*, title: str, subtitle: str, annotation: str, source: str, units: str) -> None:
    """Confirm the storytelling scaffold is filled before plotting."""
    elements = {
        "TITLE": title,
        "SUBTITLE": subtitle,
        "ANNOTATION": annotation,
        "SOURCE": source,
        "UNITS": units,
    }
    missing = [key for key, value in elements.items() if not str(value).strip()]
    if missing:
        warnings.warn(f"Please fill these storytelling fields: {', '.join(missing)}")
    else:
        print("👍 Story scaffold complete.")

def baseline_style(ax: plt.Axes | None = None) -> plt.Axes:
    """Standardise axes styling for consistency across notebooks."""
    ax = ax or plt.gca()
    for spine in ["top", "right"]:
        ax.spines[spine].set_visible(False)
    ax.set_facecolor("#ffffff")
    return ax

def save_last_visual(fig, filename: str, *, subfolder: str = "plots") -> None:
    """Persist the most recent Matplotlib or Plotly figure without failing the run."""
    plots_dir = Path.cwd() / subfolder
    plots_dir.mkdir(parents=True, exist_ok=True)
    output_path = plots_dir / filename
    try:
        if hasattr(fig, "write_image"):
            fig.write_image(str(output_path))
        elif hasattr(fig, "savefig"):
            fig.savefig(output_path, dpi=300, bbox_inches="tight")
        else:
            warnings.warn("Figure type not supported for export; skipping save.")
            return
        print(f"Saved visual to {output_path}")
    except Exception as exc:
        warnings.warn(f"Plot export skipped: {exc}")


## Step 1 · Load the CO₂ emission series
Read the CSV and sanity-check shape, columns, and plausible ranges.

In [None]:
data_dir = Path.cwd() / "data"
df_co2 = pd.read_csv(data_dir / "global_co2.csv")
quick_diagnostics(
    df_co2,
    "Global CO2 emissions (Our World in Data)",
    expected_columns=["Year", "CO2"],
    expected_rows=(250, 300),
)
expect_value_range(df_co2["CO2"], lower=0, upper=45, context="CO2 (gigatonnes)")


## Step 2 · Load and clean NASA temperature anomalies
Use the same approach from Day 1: convert the anomaly column to numeric, drop provisional placeholders, and confirm the range.

In [None]:
temperature_path = data_dir / "GLB.Ts+dSST.csv"
df_temp = (
    pd.read_csv(
        temperature_path,
        skiprows=1,
        na_values=["***"],
        usecols=["Year", "J-D"],
    )
    .rename(columns={"J-D": "TempAnomaly"})
    .dropna()
)
df_temp["TempAnomaly"] = pd.to_numeric(df_temp["TempAnomaly"], errors="coerce")
df_temp = df_temp.dropna(subset=["TempAnomaly"]).astype({"Year": "int64"})

quick_diagnostics(
    df_temp,
    "NASA GISTEMP annual anomalies",
    expected_columns=["Year", "TempAnomaly"],
    expected_rows=(140, 200),
)
expect_value_range(df_temp["TempAnomaly"], lower=-1.0, upper=1.8, context="TempAnomaly (°C)")


## Step 3 · Align timelines and build helper columns
Join the tables on year, keep the overlapping period, and compute rolling means so we can reference them in the narrative.

In [None]:
df_story = (
    df_co2.set_index("Year")
    .join(df_temp.set_index("Year"), how="inner")
    .loc[1900:]
)

df_story["CO2_rolling5"] = df_story["CO2"].rolling(window=5, center=True).mean()
df_story["Temp_rolling5"] = df_story["TempAnomaly"].rolling(window=5, center=True).mean()

quick_diagnostics(
    df_story.reset_index(),
    "Merged CO2 + temperature table",
    expected_columns=["Year", "CO2", "TempAnomaly", "CO2_rolling5", "Temp_rolling5"],
    expected_rows=(100, 150),
)

recent = df_story.tail(10)
earliest = df_story.head(10)
print("Recent snapshot:")
print(recent)
print("Change vs. early 1900s:")
print(
    {
        "CO2 increase (Gt)": recent["CO2"].mean() - earliest["CO2"].mean(),
        "Temperature increase (°C)": recent["TempAnomaly"].mean() - earliest["TempAnomaly"].mean(),
    }
)


### Expected combo preview
Your final chart should echo this co-acceleration: emissions rising sharply alongside temperature anomalies. Use the reference below to self-check.

![Preview of the finished capstone chart](../../../plots/day05_solution_plot.png)

## Step 4 · Justify the dual-axis choice
Dual axes can mislead if scales are arbitrary. Here we align them because units differ (gigatonnes vs. °C) but we want to show co-movement. Note in your caption that trends—not absolute magnitudes—are the focus.

## Step 5 · Compose the capstone visual
Apply the storytelling scaffold, annotate the latest values, and include a caption that reiterates the limitation you just noted.

In [None]:
TITLE = "Global emissions and temperatures climbed together in the 20th century"
SUBTITLE = "CO₂ emissions (gigatonnes) vs. NASA GISTEMP anomalies (°C), 1900–2022"
ANNOTATION = "Emissions reached ~37 Gt in 2022 while temperatures sit ~1.1°C above the 20th-century baseline."
SOURCE = "Sources: Our World in Data (Global CO₂), NASA GISTEMP v4 (downloaded 2025-01-05)"
UNITS = "Dual-axis: left = CO₂ (gigatonnes), right = Temperature anomaly (°C)"

validate_story_elements(
    title=TITLE,
    subtitle=SUBTITLE,
    annotation=ANNOTATION,
    source=SOURCE,
    units=UNITS,
)

latest_year = df_story.index.max()
latest_row = df_story.loc[latest_year]

fig_capstone, ax_co2 = plt.subplots(figsize=(12, 6))
ax_co2 = baseline_style(ax_co2)

ax_co2.plot(
    df_story.index,
    df_story["CO2"],
    color="#4c72b0",
    linewidth=2.5,
    label="CO₂ emissions",
)
ax_co2.plot(
    df_story.index,
    df_story["CO2_rolling5"],
    color="#1b9e77",
    linewidth=2,
    linestyle="--",
    label="CO₂ (5-year avg)",
)
ax_co2.set_xlabel("Year")
ax_co2.set_ylabel("CO₂ emissions (gigatonnes)", color="#4c72b0")
ax_co2.tick_params(axis="y", labelcolor="#4c72b0")

ax_temp = ax_co2.twinx()
ax_temp.plot(
    df_story.index,
    df_story["TempAnomaly"],
    color="#d73027",
    linewidth=2.5,
    label="Temperature anomaly",
)
ax_temp.plot(
    df_story.index,
    df_story["Temp_rolling5"],
    color="#f46d43",
    linewidth=2,
    linestyle="--",
    label="Temp (5-year avg)",
)
ax_temp.set_ylabel("Temperature anomaly (°C)", color="#d73027")
ax_temp.tick_params(axis="y", labelcolor="#d73027")

ax_co2.set_title(TITLE, loc="left", pad=18)
ax_co2.text(0.0, 1.03, SUBTITLE, transform=ax_co2.transAxes, fontsize=12, color="#555555")

ax_co2.annotate(
    f"{latest_year}: {latest_row['CO2']:.1f} Gt",
    xy=(latest_year, latest_row["CO2"]),
    xytext=(latest_year - 18, latest_row["CO2"] + 4),
    arrowprops=dict(arrowstyle="->", color="#4c72b0"),
    fontsize=11,
    color="#4c72b0",
)
ax_temp.annotate(
    f"{latest_year}: {latest_row['TempAnomaly']:.2f}°C",
    xy=(latest_year, latest_row["TempAnomaly"]),
    xytext=(latest_year - 35, latest_row["TempAnomaly"] + 0.3),
    arrowprops=dict(arrowstyle="->", color="#d73027"),
    fontsize=11,
    color="#d73027",
)

lines_left, labels_left = ax_co2.get_legend_handles_labels()
lines_right, labels_right = ax_temp.get_legend_handles_labels()
ax_co2.legend(lines_left + lines_right, labels_left + labels_right, loc="upper left")

ax_co2.text(
    0.01,
    -0.23,
    f"{ANNOTATION}
{SOURCE} · {UNITS}. Note: Dual axes show trend alignment, not direct magnitude comparisons.",
    transform=ax_co2.transAxes,
    fontsize=10,
    color="#555555",
    va="top",
)

plt.tight_layout()
plt.show()


In [None]:
save_last_visual(fig_capstone, "day05_solution_plot.png")

## 🔍 Reflection & limitations
- Emphasise that correlation ≠ causation; bring in attribution research or carbon budget visuals for rigor.
- Encourage students to write a short Claim → Evidence → Visual → Takeaway paragraph referencing both rolling averages.
- Ask: what additional data (e.g., methane, land-use change) would strengthen or complicate this story?