## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day05/notebook/day05_starter.ipynb)

# 🔥 Day 5 – Capstone: CO₂ Emissions & Global Temperature
### Synthesizing multiple indicators into a climate storytelling dashboard

Our final project combines global CO₂ emissions with NASA temperature anomalies to show how emissions growth
aligns with warming. We'll reinforce the inspect → clean → visualize cadence, build a multi-panel figure, and
articulate limitations of the evidence.

### 🗂️ Data card — Global CO₂ emissions & NASA temperature anomalies
- **CO₂ Source:** Global Carbon Project (Our World in Data extract `global_co2.csv`)
- **Temperature Source:** NASA GISTEMP v4 annual anomalies (`GLB.Ts+dSST.csv`)
- **Temporal coverage:** 1900–2020 (years where both datasets overlap)
- **Units:**
  - CO₂ emissions: Gigatonnes of CO₂ per year
  - Temperature anomaly: °C relative to the 1951–1980 baseline
- **Collection notes:** Emissions include fossil fuels and cement; temperature anomalies use land & ocean blended series
- **Caveats:** Recent years may revise; aerosols, land-use change, and natural variability also influence temperatures
- **Mindful design:** Avoid misleading dual axes; align scales via separate panels and include uncertainty context.

### 1. Set up the environment

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

pd.options.display.float_format = "{:.2f}".format

In [None]:
# Shared helper utilities used throughout the week.
from __future__ import annotations

import warnings
from pathlib import Path
from typing import Iterable, Mapping

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


def resolve_data_dir(max_up: int = 5) -> Path:
    """Locate the project-level ``data`` directory regardless of execution location."""
    here = Path.cwd()
    for _ in range(max_up + 1):
        candidate = here / "data"
        if candidate.exists():
            return candidate
        here = here.parent
    raise FileNotFoundError(
        "Could not find a 'data' directory relative to this notebook.
"
        "If you are running in Colab, mount your drive or upload the data folder first."
    )


DATA_DIR = resolve_data_dir()
PROJECT_ROOT = DATA_DIR.parent
PLOTS_DIR = PROJECT_ROOT / "plots"
PLOTS_DIR.mkdir(parents=True, exist_ok=True)


def baseline_style() -> None:
    """Apply a consistent, high-contrast visual style that is colorblind-friendly."""
    sns.set_theme(style="whitegrid", context="talk", font_scale=0.9)
    plt.rcParams.update(
        {
            "figure.dpi": 120,
            "axes.titlesize": 16,
            "axes.labelsize": 13,
            "legend.fontsize": 11,
            "axes.titleweight": "semibold",
        }
    )


def load_data(filename: str | Path, **kwargs) -> pd.DataFrame:
    """Read a CSV file from the shared data directory and report its shape."""
    path = Path(filename)
    if not path.exists():
        path = DATA_DIR / filename
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {path.name} with shape {df.shape}.")
    return df


def validate_columns(df: pd.DataFrame, required: Iterable[str], *, context: str = "") -> None:
    missing = [col for col in required if col not in df.columns]
    if missing:
        warnings.warn(
            f"Missing expected columns {missing} in {context or 'dataframe'}.
"
            "Double-check your renaming and loading steps before moving on."
        )
    else:
        print(f"✅ Columns look good: {list(required)}")


def expect_rows_between(df: pd.DataFrame, low: int, high: int, *, label: str = "rows") -> None:
    count = len(df)
    if not (low <= count <= high):
        warnings.warn(
            f"{label} check: expected between {low:,} and {high:,} but found {count:,}."
        )
    else:
        print(f"✅ {label} check: {count:,} rows is within the expected range.")


def quick_diagnose(df: pd.DataFrame, *, sample: int = 3) -> None:
    print("
Preview of the current dataframe:")
    display(df.head(sample))
    print("
Null values by column:")
    print(df.isna().sum())


def validate_story_fields(fields: Mapping[str, str]) -> None:
    missing = [name for name, value in fields.items() if not str(value).strip()]
    if missing:
        warnings.warn(
            "The following story fields are blank: " + ", ".join(missing) +
            "
Fill them in so your chart has a clear narrative frame."
        )
    else:
        print("✅ Narrative checklist complete.")


def save_last_fig(fig: plt.Figure | None, filename: str) -> Path | None:
    if fig is None:
        fig = plt.gcf()
    if fig and getattr(fig, "axes", None):
        output_path = PLOTS_DIR / filename
        fig.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path.relative_to(PROJECT_ROOT)}")
        return output_path
    warnings.warn("No matplotlib figure available to save yet.")
    return None


### 2. Load and inspect the CO₂ and temperature series
Confirm columns, set year indices, and preview key values.

In [None]:
co2 = load_data("global_co2.csv")
validate_columns(co2, ["Year", "CO2"])
quick_diagnose(co2.head())

temp = load_data("GLB.Ts+dSST.csv", skiprows=1)
validate_columns(temp, ["Year", "J-D"], context="NASA table")
quick_diagnose(temp.iloc[:5, :5])

### 3. Clean and align the datasets
Convert units, handle missing values, and join on the year index.

In [None]:
co2_clean = co2.assign(Year=lambda d: pd.to_numeric(d["Year"], errors="coerce")).dropna(subset=["Year", "CO2"])
co2_clean = co2_clean.set_index("Year").sort_index()
expect_rows_between(co2_clean, 170, 210, label="CO₂ yearly records")

temp_clean = (
    temp[["Year", "J-D"]]
    .rename(columns={"J-D": "temp_anomaly_c"})
    .assign(temp_anomaly_c=lambda d: pd.to_numeric(d["temp_anomaly_c"], errors="coerce"))
    .dropna(subset=["temp_anomaly_c"])
    .set_index("Year")
    .sort_index()
)
expect_rows_between(temp_clean, 140, 150, label="temperature records")

merged = (
    co2_clean.join(temp_clean, how="inner")
    .loc[1900:2020]
    .assign(
        co2_rolling=lambda d: d["CO2"].rolling(window=5, min_periods=1).mean(),
        temp_rolling=lambda d: d["temp_anomaly_c"].rolling(window=5, min_periods=1).mean(),
    )
)
quick_diagnose(merged.head())
quick_diagnose(merged.tail())

### 4. Quantify relationships before plotting
Compute descriptive statistics and the emissions–temperature correlation.

In [None]:
summary = merged[["CO2", "temp_anomaly_c"]].describe()
display(summary)
corr = merged["CO2"].corr(merged["temp_anomaly_c"])
print(f"Pearson correlation (1900-2020): {corr:.2f}")

### 5. Define the storytelling frame
Lock in the narrative scaffolding to guide the visualization design.

In [None]:
TITLE = "Emissions and temperatures climbed together in the 20th century"
SUBTITLE = "Global fossil CO₂ emissions vs. NASA temperature anomalies (1900–2020)"
ANNOTATION = "Rolling averages smooth short-term variability; scatter compares paired years"
SOURCE = "Sources: Global Carbon Project, NASA GISTEMP v4"
UNITS = "CO₂ (Gt) & temperature anomaly (°C)"

validate_story_fields({
    "TITLE": TITLE,
    "SUBTITLE": SUBTITLE,
    "ANNOTATION": ANNOTATION,
    "SOURCE": SOURCE,
    "UNITS": UNITS,
})

### 6. Build the multi-panel figure
Use coordinated subplots instead of dual axes to keep scales interpretable.

In [None]:
baseline_style()

fig, axes = plt.subplots(3, 1, figsize=(12, 12), sharex=True, gridspec_kw={"height_ratios": [1, 1, 0.8]})

ax1 = axes[0]
ax1.plot(merged.index, merged["CO2"], color="#595959", label="Annual CO₂ (Gt)")
ax1.plot(merged.index, merged["co2_rolling"], color="#0072B2", linewidth=2.5, label="5-year average")
ax1.set_ylabel("CO₂ emissions (Gt)")
ax1.legend(loc="upper left", frameon=False)
ax1.set_title(TITLE, loc="left")
ax1.text(0.01, 0.05, SUBTITLE, transform=ax1.transAxes, fontsize=11, ha="left")

ax2 = axes[1]
ax2.plot(merged.index, merged["temp_anomaly_c"], color="#CC79A7", label="Annual anomaly (°C)")
ax2.plot(merged.index, merged["temp_rolling"], color="#D55E00", linewidth=2.5, label="5-year average")
ax2.axhline(0, color="#666666", linestyle="--", linewidth=1)
ax2.set_ylabel("Temperature anomaly (°C)")
ax2.legend(loc="upper left", frameon=False)

ax3 = axes[2]
scatter = ax3.scatter(
    merged["CO2"],
    merged["temp_anomaly_c"],
    c=merged.index,
    cmap="viridis",
    edgecolor="white",
    linewidth=0.5,
)
ax3.set_xlabel("CO₂ emissions (Gt)")
ax3.set_ylabel("Temperature anomaly (°C)")
ax3.set_title("Paired annual values")
cbar = fig.colorbar(scatter, ax=ax3, orientation="horizontal", pad=0.25)
cbar.set_label("Year")

last_year = merged.index.max()
last_row = merged.loc[last_year]
ax1.annotate(
    f"{last_year}: {last_row['CO2']:.1f} Gt",
    xy=(last_year, last_row["CO2"]),
    xytext=(last_year - 25, last_row["CO2"] - 10),
    arrowprops=dict(arrowstyle="->", color="#333333"),
)
ax2.annotate(
    f"{last_year}: {last_row['temp_anomaly_c']:+.2f}°C",
    xy=(last_year, last_row["temp_anomaly_c"]),
    xytext=(last_year - 25, last_row["temp_anomaly_c"] + 0.5),
    arrowprops=dict(arrowstyle="->", color="#333333"),
)

fig.tight_layout()
fig.subplots_adjust(bottom=0.15)
fig.text(0.01, 0.02, f"{SOURCE} · {ANNOTATION}", fontsize=10, ha="left")

plt.show()
final_fig_path = save_last_fig(fig, "day05_solution_plot.png")

### 7. Interpret responsibly
- **Key takeaway:** Fossil CO₂ emissions have more than tripled since 1950, and global temperatures now sit over 1°C above the mid-20th-century baseline.
- **Uncertainty & caveats:** Correlation does not capture lagged climate feedbacks or non-CO₂ forcings; data revisions may adjust the most recent years.
- **What this visualization cannot tell us:** Regional impacts, the role of aerosols, and emissions by sector require additional datasets; communicate these caveats when presenting the story.

### 8. Process micro-rubric
| Step | Evidence of completion |
| --- | --- |
| Data loaded & validated | CO₂ and temperature columns confirmed, year alignment checked |
| Cleaning documented | Rolling averages computed, merged dataset inspected |
| Story frame filled | Narrative checklist completed before visualization |
| Visualization reviewed | Multi-panel layout replaces dual axes; annotations and colorbar included |
| Reflection written | Takeaway, uncertainty, and limitations clearly stated |