## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%205_%20Capstone%20%E2%80%93%20CO%E2%82%82%20%26%20Climate.ipynb)

# 🔥 Day 5 – Capstone: CO₂ Emissions & Temperature
### Synthesize the week with a polished story, safeguards, and reflection

This capstone still honors the learn → do → check rhythm, but you now orchestrate multiple datasets to tell the human-caused climate story.

---

## 🧠 Learning Rhythm
- 🔁 Six loops: setup, load emissions, load temperature, align indices, visualize, reflect.
- 🧪 Diagnostics emphasize unit integrity and correlation checks to avoid misleading comparisons.
- 🧱 Scaffold encourages building a narrative with claims, evidence, and limitations.

> **Teacher Sidecar**: Budget 70 minutes. Have students submit the filled scaffold (claim/evidence/visual/takeaway) plus exported figure for quick assessment.

## 📇 Data Card — Global CO₂ & Temperature
- **Sources**: Our World in Data global CO₂ (2023 release) and NASA GISTEMP temperature anomalies.
- **Temporal coverage**: 1900–2023 (annual).
- **Metrics**: CO₂ emissions (gigatonnes/year) and temperature anomaly (°C vs. 1951–1980 baseline).
- **Last updated**: January 2024.
- **Caveats**: NASA anomalies are preliminary for the latest year; dual-axis charts must clearly label units to avoid misinterpretation.

## 🧵 Story Scaffold (Claim → Evidence → Visual → Takeaway)
- **Claim**: Rising CO₂ emissions track closely with global temperature anomalies.
- **Evidence to gather**: Joint dataset with both metrics aligned by year.
- **Visual plan**: Dual-axis line chart with explicit axis labeling and annotation for recent highs.
- **Takeaway**: Human-driven emissions and temperature rise move together, underscoring the urgency of mitigation.


In [None]:

from __future__ import annotations

from pathlib import Path
from typing import Any, Mapping, Sequence

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import display

DATA_DIR = Path.cwd() / "data"

sns.set_theme(style="whitegrid", font_scale=1.1)
plt.rcParams.update({
    "axes.titlesize": 16,
    "axes.labelsize": 13,
    "axes.grid": True,
    "figure.figsize": (11, 6),
    "figure.dpi": 120,
})

def ping_environment(packages: Mapping[str, object]) -> None:
    """Print library versions so teachers can confirm the runtime."""
    for label, module in packages.items():
        version = getattr(module, "__version__", "built-in")
        print(f"{label}: {version}")
    print("Environment check complete ✅")

def load_data(file_name: str, /, **kwargs) -> pd.DataFrame:
    """Load a CSV from the shared data folder with a friendly status message."""
    path = DATA_DIR / file_name
    if not path.exists():
        raise FileNotFoundError(f"Expected data file at {path}")
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {file_name} → shape {df.shape}")
    return df

def validate_columns(df: pd.DataFrame, required: Sequence[str]) -> pd.DataFrame:
    missing = [col for col in required if col not in df.columns]
    if missing:
        raise ValueError(f"Missing expected columns: {missing}")
    print(f"Columns validated ✅ {list(required)}")
    return df

def expect_rows_between(df: pd.DataFrame, lower: int, upper: int, label: str = "rows") -> pd.DataFrame:
    n_rows = len(df)
    if not (lower <= n_rows <= upper):
        raise ValueError(
            f"Unexpected {label}: {n_rows} (expected between {lower} and {upper})"
        )
    print(f"{label.capitalize()} check ✅ {n_rows} (expected {lower}-{upper})")
    return df

def quick_peek(df: pd.DataFrame, n: int = 5) -> pd.DataFrame:
    """Display a head preview and NA counts for formative assessment."""
    display(df.head(n))
    print("Null values per column:")
    print(df.isna().sum())
    return df

def ensure_metadata(**metadata: str) -> None:
    blanks = [key for key, value in metadata.items() if not str(value).strip()]
    if blanks:
        raise ValueError(f"Please fill in metadata fields: {blanks}")
    print("Story metadata looks great ✅")

def annotate_source(ax: plt.Axes, *, source: str, units: str) -> plt.Axes:
    ax.text(
        0.0,
        -0.22,
        f"Source: {source}
Units: {units}",
        transform=ax.transAxes,
        ha="left",
        fontsize=10,
    )
    return ax

def _resolve_fig(fig: Any | None) -> Any:
    if fig is not None:
        return fig
    if plt.get_fignums():
        return plt.gcf()
    return None

def save_last_fig(fig: Any | None, filename: str) -> Path:
    plots_dir = Path.cwd() / "plots"
    plots_dir.mkdir(parents=True, exist_ok=True)
    resolved = _resolve_fig(fig)
    if resolved is None:
        raise ValueError("No recent figure detected.")

    output_path = plots_dir / filename

    if hasattr(resolved, "savefig"):
        resolved.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path}")
        return output_path

    if hasattr(resolved, "write_image"):
        try:
            resolved.write_image(str(output_path))
            print(f"Saved figure to {output_path}")
            return output_path
        except Exception as exc:
            html_path = output_path.with_suffix(".html")
            resolved.write_html(str(html_path))
            print(f"Saved interactive figure to {html_path} (fallback: {exc})")
            return html_path

    raise ValueError("Don't know how to export this figure type.")


## 🔁 Loop 1 · Confirm the setup
*Goal: Ensure pandas and Matplotlib are ready for the capstone analysis.*

In [None]:
ping_environment({"pandas": pd, "matplotlib": plt, "seaborn": sns})
assert DATA_DIR.exists(), f"Data directory missing: {DATA_DIR}"
print(f"Data files available: {len(list(DATA_DIR.glob('*')))} items")

## 🔁 Loop 2 · Load global CO₂ emissions
*Goal: Inspect the emissions series and validate expected ranges.*

In [None]:
co2 = load_data("global_co2.csv")
validate_columns(co2, ["Year", "CO2"])
co2 = co2.set_index("Year").sort_index()
expect_rows_between(co2, 160, 190, label="CO₂ annual rows")
print("CO₂ range (Gt):", co2["CO2"].min(), "→", co2["CO2"].max())
quick_peek(co2.tail())


## 🔁 Loop 3 · Load NASA temperature anomalies
*Goal: Bring in the temperature series with matching structure.*

In [None]:
temp = load_data("GLB.Ts+dSST.csv", skiprows=1, usecols=["Year", "J-D"], na_values=["***"])
validate_columns(temp, ["Year", "J-D"])
temp = temp.rename(columns={"J-D": "TempAnomaly"}).dropna()
temp["TempAnomaly"] = pd.to_numeric(temp["TempAnomaly"], errors="coerce")
temp = temp.set_index("Year").sort_index()
expect_rows_between(temp, 140, 190, label="temperature annual rows")
print("Temperature anomaly range (°C):", temp["TempAnomaly"].min(), "→", temp["TempAnomaly"].max())
quick_peek(temp.tail())


## 🔁 Loop 4 · Align and diagnose the merged dataset
*Goal: Combine on year, forward-fill diagnostics, and confirm alignment.*

In [None]:
climate = co2.join(temp, how="inner")
expect_rows_between(climate, 120, 190, label="aligned years")
print("Merged shape:", climate.shape)
print(climate.tail())
correlation = climate["CO2"].corr(climate["TempAnomaly"])
print(f"Pearson correlation (CO₂ vs Temp anomaly): {correlation:.2f}")
assert correlation > 0.8, "We expect a strong positive correlation over the full record."


## 🔁 Loop 5 · Visualize with story-first metadata
*Goal: Build a dual-axis chart with explicit labeling and annotation.*

In [None]:
TITLE = "CO₂ Emissions and Global Temperature Rise Move Together"
SUBTITLE = "Global totals, 1900–2023"
ANNOTATION = "Both metrics reach record highs in the 2010s and 2020s."
SOURCE = "Our World in Data (CO₂) + NASA GISTEMP (temperature)"
UNITS = "CO₂: gigatonnes/year; Temp: °C anomaly vs 1951–1980"

ensure_metadata(TITLE=TITLE, SUBTITLE=SUBTITLE, ANNOTATION=ANNOTATION, SOURCE=SOURCE, UNITS=UNITS)

fig, ax1 = plt.subplots(figsize=(11, 6))
ax1.plot(climate.index, climate["CO2"], color="#1f77b4", linewidth=2.5, label="CO₂ emissions")
ax1.set_xlabel("Year")
ax1.set_ylabel("CO₂ emissions (Gt/year)", color="#1f77b4")
ax1.tick_params(axis="y", labelcolor="#1f77b4")

ax2 = ax1.twinx()
ax2.plot(climate.index, climate["TempAnomaly"], color="#d62728", linewidth=2.5, label="Temperature anomaly")
ax2.set_ylabel("Temperature anomaly (°C)", color="#d62728")
ax2.tick_params(axis="y", labelcolor="#d62728")

ax1.set_title(f"{TITLE}
{SUBTITLE}")
ax1.grid(True, which="major", axis="both", linestyle="--", alpha=0.4)

last_year = climate.index.max()
ax1.annotate(
    f"{last_year}: {climate.loc[last_year, 'CO2']:.1f} Gt",
    xy=(last_year, climate.loc[last_year, "CO2"]),
    xytext=(last_year - 20, climate.loc[last_year, "CO2"] + 5),
    arrowprops=dict(arrowstyle="->", color="#1f77b4"),
    color="#1f77b4",
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#1f77b4"),
)
ax2.annotate(
    f"{last_year}: {climate.loc[last_year, 'TempAnomaly']:.2f} °C",
    xy=(last_year, climate.loc[last_year, "TempAnomaly"]),
    xytext=(last_year - 35, climate.loc[last_year, "TempAnomaly"] + 0.3),
    arrowprops=dict(arrowstyle="->", color="#d62728"),
    color="#d62728",
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#d62728"),
)
ax1.text(
    climate.index.min() + 5,
    climate["CO2"].min() + 10,
    ANNOTATION,
    fontsize=11,
    color="#333333",
    bbox=dict(boxstyle="round,pad=0.4", facecolor="white", alpha=0.85),
)
annotate_source(ax1, source=SOURCE, units=UNITS)
fig.tight_layout()
plt.show()


## 🔁 Loop 6 · Interpret and self-check
*Goal: Quantify the change and prepare the capstone narrative.*

In [None]:
base_period = climate.loc[1900:1910]
recent_period = climate.loc[2013:2023]
co2_change = recent_period["CO2"].mean() - base_period["CO2"].mean()
temp_change = recent_period["TempAnomaly"].mean() - base_period["TempAnomaly"].mean()
print(f"Average CO₂ increase vs. early 1900s: {co2_change:.1f} Gt/year")
print(f"Average temperature increase vs. early 1900s: {temp_change:.2f} °C")
assert co2_change > 25, "Expect CO₂ to rise by dozens of gigatonnes."
assert temp_change > 1.0, "Expect temperature anomaly to exceed +1 °C."


### 🧾 Claim → Evidence → Visual → Takeaway (filled)
- **Claim**: Human-driven CO₂ emissions and global temperatures rise in tandem.
- **Evidence**: Strong Pearson correlation plus mean increases computed above.
- **Visual**: Dual-axis chart with annotated recent highs and transparent metadata.
- **Takeaway**: Mitigating emissions is essential to slow further warming.

> **Limitation prompt**: Dual axes can confuse; consider normalizing both series or plotting anomalies on a shared index for alternate presentations.

---

### 💾 Save your work
Export the figure for your portfolio or presentation.


In [None]:
save_last_fig(fig, "day05_solution_plot.png")