## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%205_%20Capstone.ipynb)

# 🌐 Day 5 – Capstone: Linking CO₂ Emissions and Temperature

The final day blends everything: multi-dataset joins, rigorous diagnostics, thoughtful visualization, and storytelling polish. Students build a stacked dashboard linking historical **CO₂ emissions** and **temperature anomalies**.

> **Teacher sidebar — pacing & differentiation**  
> • Timing: ~60 minutes including share-out.  
> • Suggested checkpoints: after the merge (confirm overlapping years) and after the stacked figure.  
> • Differentiation: fast finishers can add annotations for policy milestones; others can rely on the provided scaffold and spend extra time on interpretation.

## 🗺️ Roadmap for Today

Loop | Focus | What success looks like
--- | --- | ---
0 | Story scaffold | Claim + annotation locked in
1 | Load | Clean CO₂ + temperature tables
2 | Merge & diagnose | Shared timeline, correlations computed
3 | Visualize | Stacked plots with consistent styling & context
4 | Interpret & save | Capstone narrative articulated

## 🗂️ Data Cards — Global CO₂ & NASA Temperature Anomalies

- **CO₂ emissions**: Global fossil CO₂ from Our World in Data (based on Global Carbon Project). Units: gigatonnes of CO₂ per year.  
- **Temperature anomalies**: NASA GISTEMP v4 annual global mean relative to 1951–1980. Units: °C anomaly.  
- **Temporal coverage**: CO₂ (1750–2022), temperature (1880–2024).  
- **Caveats**: CO₂ excludes land-use change uncertainty in early years; anomalies reflect deviations from baseline, not absolute temperature.  
- **Update cadence**: Annual updates (GCP in November, NASA in January).

> 🎯 **Integrity cue**: Avoid misleading dual axes. Use aligned subplots or normalized series and discuss what we can (trend alignment) and cannot (direct causality) claim.

In [None]:
# Shared utilities for the DS4S course notebooks
        from pathlib import Path
        import pandas as pd
        import numpy as np
        import matplotlib.pyplot as plt
        from IPython.display import display

        plt.style.use('seaborn-v0_8-whitegrid')
        plt.rcParams.update({
            'figure.dpi': 120,
            'axes.titlesize': 16,
            'axes.labelsize': 13,
            'axes.titlepad': 12,
            'figure.figsize': (10, 5),
        })


        def load_csv(path: Path, **read_kwargs) -> pd.DataFrame:
            '''Load a CSV and report the basic shape.'''
            df = pd.read_csv(path, **read_kwargs)
            print(f"✅ Loaded {path.name} with {df.shape[0]:,} rows and {df.shape[1]} columns")
            return df


        def validate_columns(df: pd.DataFrame, required):
            missing = [col for col in required if col not in df.columns]
            if missing:
                raise ValueError(f"Missing columns: {missing}")
            print(f"✅ Columns present: {', '.join(required)}")


        def expect_rows_between(df: pd.DataFrame, low: int, high: int):
            rows = df.shape[0]
            if not (low <= rows <= high):
                raise ValueError(f"Row count {rows} outside expected range {low}-{high}")
            print(f"✅ Row count {rows} within expected {low}-{high}")


        def quick_snapshot(df: pd.DataFrame, name: str, n: int = 3):
            print(f"
{name} snapshot → shape={df.shape}")
            print("Columns:", list(df.columns))
            print("Nulls:
", df.isna().sum())
            display(df.head(n))


        def ensure_story_elements(title: str, subtitle: str, annotation: str, source: str, units: str):
            fields = {
                'TITLE': title,
                'SUBTITLE': subtitle,
                'ANNOTATION': annotation,
                'SOURCE': source,
                'UNITS': units,
            }
            missing = [key for key, value in fields.items() if not str(value).strip()]
            if missing:
                raise ValueError(f"Please complete these storytelling fields: {', '.join(missing)}")
            print("✅ Story scaffold complete →", ", ".join(f"{k}: {v}" for k, v in fields.items()))
            return fields


        def save_last_fig(filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            fig = plt.gcf()
            if not fig.axes:
                raise RuntimeError("Run the plotting cell before saving.")
            output_path = plots_dir / filename
            fig.savefig(output_path, dpi=300, bbox_inches='tight')
            print(f"📁 Saved figure to {output_path}")


        def save_plotly_fig(fig, filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            output_path = plots_dir / filename
            fig.write_html(str(output_path))
            print(f"📁 Saved interactive figure to {output_path}")

## 🔁 Loop 0 — Story scaffold (3 min)

In [None]:
TITLE = "CO₂ Emissions and Global Temperatures Rise Together"
SUBTITLE = "Global CO₂ emissions (Gt) and NASA temperature anomalies (°C), 1900–2022"
ANNOTATION = "Since 1960, emissions tripled and temperature anomalies climbed above +1.1 °C."
SOURCE = "Our World in Data (Global Carbon Project 2023) & NASA GISTEMP v4 (Jan 2024)"
UNITS = "CO₂ (Gt) + Temperature anomaly (°C)"

story_fields = ensure_story_elements(TITLE, SUBTITLE, ANNOTATION, SOURCE, UNITS)

## 🔁 Loop 1 — Load the datasets (8 min)

In [None]:
data_dir = Path.cwd() / "data"
co2 = load_csv(data_dir / "global_co2.csv")
validate_columns(co2, ["Year", "CO2"])
co2["Year"] = co2["Year"].astype(int)
expect_rows_between(co2, 200, 300)

nasa = load_csv(
    data_dir / "GLB.Ts+dSST.csv",
    skiprows=1,
    usecols=["Year", "J-D"],
)
validate_columns(nasa, ["Year", "J-D"])
nasa = nasa.rename(columns={"J-D": "TempAnomaly"})
nasa["TempAnomaly"] = pd.to_numeric(nasa["TempAnomaly"], errors="coerce")
nasa = nasa.dropna()
nasa["Year"] = nasa["Year"].astype(int)
quick_snapshot(co2.tail(), name="CO₂ tail")
quick_snapshot(nasa.tail(), name="NASA anomalies tail")

## 🔁 Loop 2 — Merge & diagnose (10 min)

In [None]:
merged = (
    co2.merge(nasa, on="Year", how="inner")
    .query("Year >= 1900")
    .reset_index(drop=True)
)
expect_rows_between(merged, 120, 150)
quick_snapshot(merged.head(), name="Merged head")
trend_corr = merged["CO2"].corr(merged["TempAnomaly"], method="pearson")
print("Pearson correlation (CO₂ vs anomaly): {:.2f}".format(trend_corr))

## 🔁 Loop 3 — Visualize (15 min)

Build stacked plots with shared x-axis to emphasize co-trending without dual-axis confusion.

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(10, 8))
fig.suptitle(TITLE, fontsize=18, y=0.97)

ax1.fill_between(merged["Year"], merged["CO2"], color="#6baed6", alpha=0.6)
ax1.plot(merged["Year"], merged["CO2"], color="#2171b5", linewidth=2)
ax1.set_ylabel("CO₂ emissions (Gt)")
ax1.set_title(SUBTITLE, fontsize=12, loc="left", color="#555555")
ax1.grid(True, axis="y", linestyle="--", alpha=0.4)

ax2.plot(merged["Year"], merged["TempAnomaly"], color="#d62728", linewidth=2)
ax2.axhline(0, color="#444444", linestyle="--", linewidth=1)
ax2.set_ylabel("Temperature anomaly (°C)")
ax2.set_xlabel("Year")
ax2.grid(True, axis="y", linestyle="--", alpha=0.4)

latest = merged.iloc[-1]
ax2.annotate(
    ANNOTATION,
    xy=(latest["Year"], latest["TempAnomaly"]),
    xytext=(latest["Year"] - 25, latest["TempAnomaly"] + 0.4),
    arrowprops=dict(arrowstyle="->", color="#333333"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#555555"),
)
fig.text(
    0.01,
    0.01,
    f"{SOURCE} | Units: {UNITS}",
    fontsize=10,
    color="#555555",
)
plt.tight_layout(rect=[0, 0.03, 1, 0.96])
plt.show()

In [None]:
print("Merged rows plotted:", merged.shape[0])
assert merged.shape[0] == len(ax1.lines[0].get_xdata())

## 🔁 Loop 4 — Interpret & save (10 min)

In [None]:
from IPython.display import Markdown

claim = "The climate response mirrors the CO₂ surge of the fossil-fuel era."
evidence = (
    "CO₂ emissions climb from ~6 Gt in 1900 to over 36 Gt today, while anomalies swing from near 0 to +1.2 °C (correlation ≈ {:.2f}).".format(trend_corr)
)
visual = "Stacked Matplotlib plots with shared x-axis (CO₂ fill, temperature line)."
takeaway = "Aligned axes keep the comparison honest; students should caution that correlation supports, but does not solely prove, causation."
Markdown(
    f"""
| Claim | Evidence | Visual | Takeaway |
| --- | --- | --- | --- |
| {claim} | {evidence} | {visual} | {takeaway} |
"""
)

## 💾 Save the figure for the teacher dashboard

In [None]:
save_last_fig("day05_solution_plot.png")

## ✅ Exit Ticket

- What policy milestone would you annotate on this chart and why?  
- How could you adapt this view to explore per-capita emissions?  
- Which uncertainties should we mention when presenting this story to the public?