## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/main/days/day05/notebook/day05_starter.ipynb)

# 🔥 Day 5 – Capstone: CO₂, Warming, and the Human Story
Bring everything together by aligning emissions with temperature change and narrating the stakes.

## 🧾 Data Card – CO₂ Emissions & Surface Temperature
- **Sources:** [Global Carbon Project](https://www.globalcarbonproject.org/) (global CO₂) and [NASA GISTEMP v4](https://data.giss.nasa.gov/gistemp/).
- **Temporal coverage:** 1850–2023 (annual).
- **Units:** CO₂ emissions in billion tonnes; temperature anomaly in °C relative to 1951–1980.
- **Update cadence:** Updated annually each spring.
- **Method notes:** Emissions combine fossil fuel and cement sources; anomalies average land and ocean temperatures.
- **Caveats:** Early years have higher uncertainty; CO₂ excludes land-use change; comparing different units requires careful scaling.

## 🧭 Story Scaffold
- **Claim:** How tightly do emissions and warming track each other post-1950?
- **Evidence:** Rate of change, divergence, and policy milestones.
- **Visual:** Indexed dual-line chart with annotations and key milestones.
- **Takeaway:** Stress the causal link but note uncertainties and mitigation opportunities.

In [None]:
from __future__ import annotations

from pathlib import Path
import sys

import matplotlib.pyplot as plt
import pandas as pd

for candidate in [Path.cwd(), Path.cwd().parent, Path.cwd().parent.parent]:
    utils_path = candidate / "utils.py"
    if utils_path.exists():
        if str(candidate) not in sys.path:
            sys.path.insert(0, str(candidate))
        break
else:
    raise FileNotFoundError("Unable to locate utils.py. Did you download the full project?")

from utils import (
    baseline_style,
    diagnose_dataframe,
    expect_rows_between,
    load_data,
    save_last_fig,
    validate_columns,
    validate_story_elements,
)

baseline_style()


In [None]:
# Example: aligning two time series on a common baseline
example = pd.DataFrame(
    {
        "Year": [2000, 2001, 2002],
        "SeriesA": [10, 12, 15],
        "SeriesB": [100, 120, 150],
    }
)
base_year = 2000
normalized = example.set_index("Year") / example.set_index("Year").loc[base_year]
normalized


In [None]:
# Step 1 – Load global temperature and CO₂ datasets
temperature_raw = load_data(
    "GLB.Ts+dSST.csv",
    skiprows=1,
    usecols=[0, 13],
    names=["Year", "TempAnomaly"],
    header=0,
)
co2_raw = load_data("global_co2.csv")

# --- Starter Notebook elide: load both the NASA temperature file and the CO₂ dataset ---


<details>
<summary>Loading hint</summary>
<ul>
<li>Use <code>load_data</code> for both files.</li>
<li>The temperature CSV needs <code>skiprows</code> and <code>usecols</code> to target the annual mean column.</li>
</ul>
</details>

In [None]:
# Step 2 – Clean the temperature series
temperature = (
    temperature_raw.assign(
        Year=lambda df: pd.to_numeric(df["Year"], errors="coerce"),
        TempAnomaly=lambda df: pd.to_numeric(df["TempAnomaly"], errors="coerce"),
    )
    .dropna()
    .astype({"Year": int})
    .query("Year >= 1950")
)

# --- Starter Notebook elide: coerce numeric values, drop missing rows, and filter from 1950 onward ---


In [None]:
# Step 3 – Clean the CO₂ emissions series
co2_clean = co2_raw.rename(columns={"Annual CO2 emissions": "CO2", "co2": "CO2"})
if "CO2" not in co2_clean.columns:
    raise ValueError("Expected a CO2 column in the emissions dataset.")
co2 = (
    co2_clean[["Year", "CO2"]]
    .rename(columns={"CO2": "CO2_gt"})
    .dropna()
    .query("Year >= 1950")
)
co2["CO2_gt"] = pd.to_numeric(co2["CO2_gt"], errors="coerce")
co2 = co2.dropna()
co2["Year"] = co2["Year"].astype(int)

# --- Starter Notebook elide: rename the emissions column to CO2_gt and focus on post-1950 data ---


<details>
<summary>Cleaning hint</summary>
<ul>
<li>Rename <code>"Annual CO2 emissions"</code> to something shorter.</li>
<li>Filter to the years that overlap with the temperature record (≥1950).</li>
<li>Drop rows with missing data after coercing to numeric.</li>
</ul>
</details>

In [None]:
# Step 4 – Merge and run diagnostics
climate = temperature.merge(co2, on="Year", how="inner")
diagnose_dataframe(climate, name="Temperature & CO2 (post-1950)")
validate_columns(climate, ["Year", "TempAnomaly", "CO2_gt"], name="climate")
expect_rows_between(climate, 60, 80, name="climate")


In [None]:
# Step 5 – Normalize series for a shared axis
baseline_year = 1960
if baseline_year not in climate["Year"].values:
    raise ValueError(f"Baseline year {baseline_year} missing; choose a year present in the merged data.")
baseline_values = climate.set_index("Year").loc[baseline_year]

climate["Temp_index"] = climate["TempAnomaly"] - baseline_values["TempAnomaly"]
climate["CO2_index"] = climate["CO2_gt"] / baseline_values["CO2_gt"]
climate.head()

# --- Starter Notebook elide: compute Temp_index (relative to 1960) and CO2_index (ratio to 1960) ---


<details>
<summary>Indexing hint</summary>
<ul>
<li>Set the dataframe index to <code>Year</code> so you can look up <code>baseline_year</code>.</li>
<li>Subtract the baseline temperature to express change relative to 1960.</li>
<li>Divide emissions by the 1960 value to create a growth index.</li>
</ul>
</details>

In [None]:
# Step 6 – Story metadata strings
TITLE = "Emissions and Warming Rose Together"
SUBTITLE = "Global CO₂ emissions vs. surface temperature anomaly (indexed to 1960)"
ANNOTATION = "Post-1990 emissions flatten briefly, but temperatures keep climbing."
SOURCE = "Global Carbon Project & NASA GISTEMP (downloaded 2024-04-15)"
UNITS = "CO₂ index (1960=1) and temperature change since 1960 (°C)"

validate_story_elements(
    {
        "TITLE": TITLE,
        "SUBTITLE": SUBTITLE,
        "ANNOTATION": ANNOTATION,
        "SOURCE": SOURCE,
        "UNITS": UNITS,
    }
)

# --- Starter Notebook elide: edit the storytelling metadata ---


In [None]:
# Step 7 – Build the capstone chart
fig, ax = plt.subplots(figsize=(11, 6))
ax.plot(climate["Year"], climate["Temp_index"], color="#d62728", linewidth=2.5, label="Temperature change (°C vs 1960)")
ax.plot(climate["Year"], climate["CO2_index"], color="#1f77b4", linewidth=2.5, label="CO₂ emissions (index, 1960=1)")
ax.axhline(0, color="#555555", linestyle="--", linewidth=1)
ax.set_title(TITLE)
ax.set_xlabel(f"Year — {SUBTITLE}")
ax.set_ylabel(UNITS)
ax.legend(loc="upper left")
ax.text(0.01, -0.2, f"Source: {SOURCE}", transform=ax.transAxes)
annotation_year = 1992 if 1992 in climate["Year"].values else int(climate["Year"].iloc[-1])
ax.annotate(
    ANNOTATION,
    xy=(annotation_year, climate.set_index("Year").loc[annotation_year, "CO2_index"]),
    xytext=(1975, climate["CO2_index"].max() + 0.2),
    arrowprops={"arrowstyle": "->", "color": "#333333"},
    fontsize=11,
)
fig.tight_layout()
capstone_fig = fig
fig

# --- Starter Notebook elide: plot both indexed series with a shared axis and annotation ---


In [None]:
# Step 8 – Final validation and save option
validate_story_elements(
    {
        "TITLE": TITLE,
        "SUBTITLE": SUBTITLE,
        "ANNOTATION": ANNOTATION,
        "SOURCE": SOURCE,
        "UNITS": UNITS,
    }
)
save_last_fig("day05_capstone_climate.png", fig=capstone_fig)
