## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/main/days/day05/notebook/day05_starter.ipynb)

# 🌐 Day 5 – Capstone: Carbon Emissions and Global Temperature

For the final lab you will bring together emissions and temperature anomalies to explain how human activity drives warming.

### Data Card — Emissions and Temperature

| Dataset | File | Units | Coverage | Caveats |
| --- | --- | --- | --- | --- |
| Global CO₂ emissions | `data/global_co2.csv` | Gigatonnes CO₂ | 1750–2023 (annual) | Recent estimates may be revised when inventories update. |
| NASA GISTEMP anomalies | `data/GLB.Ts+dSST.csv` | °C anomaly vs. 1951–1980 | 1880–2024 (annual) | Later years provisional; anomalies relative to baseline. |
| Integration | Inner join on Year | 1880–2023 overlap | Aligns only the shared years for a consistent view. |
| Caveats | — | — | — | Emissions dataset excludes land-use change; temperature anomalies include combined land/ocean record. |


### Step 1 · Imports and helpers

We will use pandas and Matplotlib to build the final dual-axis story.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

from utils import (
    baseline_style,
    expect_rows_between,
    load_csv,
    quick_check,
    save_last_fig,
    validate_columns,
    validate_story_elements,
)

baseline_style()


### Step 2 · Load emissions and temperature datasets

Use the shared helper so GitHub can fetch the CSVs if they are missing locally.

In [None]:
co2 = load_csv("data/global_co2.csv")
temperatures = load_csv("data/GLB.Ts+dSST.csv", read_csv_kwargs=dict(skiprows=1))
quick_check(co2.head(), name="CO2 preview")
quick_check(temperatures.head(), name="Temperature preview")


### Step 3 · Clean and align the columns we need

Focus on the overlapping years and convert numeric fields properly.

In [None]:
co2 = co2.rename(columns={"CO2": "CO2_Gt"})
validate_columns(co2, ["Year", "CO2_Gt"])
co2["CO2_Gt"] = pd.to_numeric(co2["CO2_Gt"], errors="coerce")

temps = (
    temperatures[["Year", "J-D"]]
    .rename(columns={"J-D": "TempAnomaly"})
    .assign(TempAnomaly=lambda df: pd.to_numeric(df["TempAnomaly"], errors="coerce"))
    .dropna(subset=["TempAnomaly"])
)
validate_columns(temps, ["Year", "TempAnomaly"])


### Step 4 · Merge the datasets and run diagnostics

Check the joined range and ensure there are no gaps.

In [None]:
merged = co2.merge(temps, on="Year", how="inner")
merged = merged.dropna()
expect_rows_between(merged, minimum=100, maximum=150)
quick_check(merged.head(), name="Merged head")
quick_check(merged.tail(), name="Merged tail")


### Step 5 · Prepare rolling averages for smoother context

A 5-year mean helps highlight the long-term co-movement.

In [None]:
merged = merged.assign(
    CO2_rolling=lambda df: df["CO2_Gt"].rolling(window=5, min_periods=1).mean(),
    Temp_rolling=lambda df: df["TempAnomaly"].rolling(window=5, min_periods=1).mean(),
)
quick_check(merged.tail(), name="Merged with rolling")


### Step 6 · Story checklist

Craft the final narrative before plotting.

In [None]:
story = {
    "title": "Emissions and Temperatures Rise in Lockstep",
    "subtitle": "Global CO₂ emissions (gigatonnes) vs. NASA temperature anomalies (°C)",
    "annotation": "Both metrics accelerate sharply after 1950, underscoring fossil fuel dependence.",
    "source": "Global Carbon Project & NASA GISS (downloaded 2025-01)",
    "units": "Gigatonnes CO₂ / °C anomaly",
}
validate_story_elements(story)


### Step 7 · Dual-axis plot with safeguards

Use contrasting colours and matching annotations so the two axes are unambiguous.

In [None]:
fig, ax_co2 = plt.subplots(figsize=(11, 6))
ax_temp = ax_co2.twinx()

ax_co2.plot(merged["Year"], merged["CO2_Gt"], color="#495057", linewidth=1.2, alpha=0.5, label="CO₂ emissions (annual)")
ax_co2.plot(merged["Year"], merged["CO2_rolling"], color="#212529", linewidth=2.2, label="CO₂ emissions (5-year mean)")
ax_co2.set_ylabel("CO₂ emissions (Gt)", color="#212529")
ax_co2.tick_params(axis="y", labelcolor="#212529")

ax_temp.plot(merged["Year"], merged["TempAnomaly"], color="#e85d04", linewidth=1.2, alpha=0.4, label="Temperature anomaly (annual)")
ax_temp.plot(merged["Year"], merged["Temp_rolling"], color="#d00000", linewidth=2.2, label="Temperature anomaly (5-year mean)")
ax_temp.set_ylabel("Temperature anomaly (°C)", color="#d00000")
ax_temp.tick_params(axis="y", labelcolor="#d00000")

ax_co2.set_title(f"{story['title']}\n{story['subtitle']}", pad=16)
ax_co2.set_xlabel("Year")
ax_co2.grid(alpha=0.3)
ax_co2.text(0.01, -0.18, f"Source: {story['source']}", transform=ax_co2.transAxes, fontsize=9, ha="left")

ax_temp.annotate(
    story["annotation"],
    xy=(1960, merged.loc[merged["Year"] == 1960, "Temp_rolling"].iloc[0]),
    xytext=(1900, merged["Temp_rolling"].max()),
    arrowprops=dict(arrowstyle="->", color="#d00000"),
    fontsize=11,
    color="#d00000",
)

lines = ax_co2.get_lines() + ax_temp.get_lines()
labels = [line.get_label() for line in lines]
ax_co2.legend(lines, labels, loc="upper left")
plt.tight_layout()
plt.show()


### Step 8 · Export the figure for archival use

This keeps the final capstone visual high resolution for assessment.

In [None]:
save_last_fig(fig, "plots/day05_solution_dual_axis.png")


### Step 9 · Reflection prompts

- How would you explain this chart to someone unfamiliar with climate science?
- Which diagnostics gave you confidence in the merged dataset?
- Note one uncertainty or caveat to mention alongside your story.