## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%201_%20Introduction%20–%20Climate%20Change%20%26%20Basic%20Plotting.ipynb)

# 🌎 Day 1 – Visualizing Global Warming
### Gentle first steps into Python, pandas, and climate storytelling

Welcome! Every day in this course now follows the same heartbeat:

1. **Learn a little** – read a short concept card.
2. **Do a little** – run a scaffolded code cell.
3. **Check yourself** – confirm with the built-in diagnostics.

We repeat that loop four to six times, building toward a mini deliverable rather than one giant leap.

---

## 🧠 Learning Rhythm
- 🪜 Progress in micro-loops that take ~10 minutes each.
- 🧪 Frequent self-checks (shapes, columns, assertions) catch slips early.
- 🧩 Stretch prompts appear after the core task so fast finishers stay engaged.

> **Teacher Sidecar**: Expect 45–50 minutes of class time. Look for students whose diagnostics fail; that is your cue to intervene before they fall behind.

## 📇 Data Card — NASA GISTEMP (Global Annual Anomalies)
- **Source**: NASA Goddard Institute for Space Studies (GISTEMP v4)
- **Temporal coverage**: 1880–2024 (annual)
- **Metric**: Global mean surface temperature anomaly relative to 1951–1980 (°C)
- **Collection notes**: Merges land-station and sea-surface records, homogenized for station moves.
- **Last updated**: January 2024 release.
- **Caveats**: Recent months may be preliminary; anomalies, not absolute temperatures.

## 🧵 Story Scaffold (Claim → Evidence → Visual → Takeaway)
- **Claim**: Global temperatures have risen ~1 °C above the 20th-century baseline.
- **Evidence to gather**: NASA annual anomaly time series, especially the latest decade.
- **Visual plan**: Single line chart with a zero baseline and callout for the latest year.
- **Takeaway**: Climate change is observable in one glance of the trend line.


In [None]:
from __future__ import annotations

from pathlib import Path
from typing import Any, Mapping, Sequence

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import display

DATA_DIR = Path.cwd() / "data"

sns.set_theme(style="whitegrid", font_scale=1.1)
plt.rcParams.update({
    "axes.titlesize": 16,
    "axes.labelsize": 13,
    "axes.grid": True,
    "figure.figsize": (10, 6),
    "figure.dpi": 120,
})

def ping_environment(packages: Mapping[str, object]) -> None:
    """Print library versions so teachers can confirm the runtime."""
    for label, module in packages.items():
        version = getattr(module, "__version__", "built-in")
        print(f"{label}: {version}")
    print("Environment check complete ✅")

def load_data(file_name: str, /, **kwargs) -> pd.DataFrame:
    """Load a CSV from the shared data folder with a friendly status message."""
    path = DATA_DIR / file_name
    if not path.exists():
        raise FileNotFoundError(f"Expected data file at {path}")
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {file_name} → shape {df.shape}")
    return df

def validate_columns(df: pd.DataFrame, required: Sequence[str]) -> pd.DataFrame:
    missing = [col for col in required if col not in df.columns]
    if missing:
        raise ValueError(f"Missing expected columns: {missing}")
    print(f"Columns validated ✅ {list(required)}")
    return df

def expect_rows_between(df: pd.DataFrame, lower: int, upper: int, label: str = "rows") -> pd.DataFrame:
    n_rows = len(df)
    if not (lower <= n_rows <= upper):
        raise ValueError(
            f"Unexpected {label}: {n_rows} (expected between {lower} and {upper})"
        )
    print(f"{label.capitalize()} check ✅ {n_rows} (expected {lower}-{upper})")
    return df

def quick_peek(df: pd.DataFrame, n: int = 5) -> pd.DataFrame:
    """Display a head preview and NA counts for formative assessment."""
    display(df.head(n))
    print("Null values per column:")
    print(df.isna().sum())
    return df

def ensure_metadata(**metadata: str) -> None:
    blanks = [key for key, value in metadata.items() if not str(value).strip()]
    if blanks:
        raise ValueError(f"Please fill in metadata fields: {blanks}")
    print("Story metadata looks great ✅")

def annotate_source(ax: plt.Axes, *, source: str, units: str) -> plt.Axes:
    ax.text(
        0.0,
        -0.22,
        f"Source: {source}
Units: {units}",
        transform=ax.transAxes,
        ha="left",
        fontsize=10,
    )
    return ax

def _resolve_fig(fig: Any | None) -> Any:
    if fig is not None:
        return fig
    if plt.get_fignums():
        return plt.gcf()
    return None

def save_last_fig(fig: Any | None, filename: str) -> Path:
    plots_dir = Path.cwd() / "plots"
    plots_dir.mkdir(parents=True, exist_ok=True)
    resolved = _resolve_fig(fig)
    if resolved is None:
        raise ValueError("No recent figure detected.")

    output_path = plots_dir / filename

    if hasattr(resolved, "savefig"):
        resolved.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path}")
        return output_path

    if hasattr(resolved, "write_image"):
        try:
            resolved.write_image(str(output_path))
            print(f"Saved figure to {output_path}")
            return output_path
        except Exception as exc:
            html_path = output_path.with_suffix(".html")
            resolved.write_html(str(html_path))
            print(f"Saved interactive figure to {html_path} (fallback: {exc})")
            return html_path

    raise ValueError("Don't know how to export this figure type.")


## 🔁 Loop 1 · Confirm the setup
*Goal: Make sure Python, pandas, and Matplotlib are available before touching the data.*

In [None]:
ping_environment({"pandas": pd, "matplotlib": plt, "seaborn": sns})
assert DATA_DIR.exists(), f"Data directory missing: {DATA_DIR}"
print(f"Data files available: {len(list(DATA_DIR.glob('*')))} items")

## 🔁 Loop 2 · Load and inspect the raw data
*Goal: Read the CSV, preview the structure, and confirm key expectations before cleaning.*

In [None]:
raw_temp = load_data(
    "GLB.Ts+dSST.csv",
    skiprows=1,
    usecols=[0, 13],
    names=["Year", "TempAnomaly"],
    header=0,
)
validate_columns(raw_temp, ["Year", "TempAnomaly"])
expect_rows_between(raw_temp, 140, 200, label="annual records")
quick_peek(raw_temp)

## 🔁 Loop 3 · Clean and self-diagnose
*Goal: Convert to numeric types, drop missing values, and double-check ranges.*

In [None]:
tidy_temp = (
    raw_temp.assign(
        Year=lambda df: pd.to_numeric(df["Year"], errors="coerce"),
        TempAnomaly=lambda df: pd.to_numeric(df["TempAnomaly"], errors="coerce"),
    )
    .dropna()
    .sort_values("Year")
)
expect_rows_between(tidy_temp, 140, 200, label="usable rows")
assert tidy_temp["Year"].is_monotonic_increasing, "Years should increase over time"
print("Temperature anomaly range:", tidy_temp["TempAnomaly"].min(), "→", tidy_temp["TempAnomaly"].max())

## 🔁 Loop 4 · Build the baseline plot
*Goal: Plot the trend with consistent styling and metadata before polishing.*

In [None]:
TITLE = "Global Warming Is Visible in the Temperature Record"
SUBTITLE = "NASA GISTEMP annual anomaly relative to the 1951–1980 baseline (1880–2024)"
ANNOTATION = "Recent years stay more than 1 °C above the 20th-century norm."
SOURCE = "NASA GISTEMP v4 (downloaded Jan 2024)"
UNITS = "Degrees Celsius anomaly"

ensure_metadata(TITLE=TITLE, SUBTITLE=SUBTITLE, ANNOTATION=ANNOTATION, SOURCE=SOURCE, UNITS=UNITS)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(tidy_temp["Year"], tidy_temp["TempAnomaly"], color="#d62728", linewidth=2.5)
ax.fill_between(tidy_temp["Year"], tidy_temp["TempAnomaly"], 0, color="#fddede", alpha=0.6)
ax.axhline(0, color="black", linestyle="--", linewidth=1)
ax.set_title(f"{TITLE}
{SUBTITLE}")
ax.set_xlabel("Year")
ax.set_ylabel("Temperature anomaly (°C)")
latest = tidy_temp.iloc[-1]
ax.annotate(
    f"{int(latest.Year)} anomaly: {latest.TempAnomaly:.2f}°C",
    xy=(latest.Year, latest.TempAnomaly),
    xytext=(latest.Year - 25, latest.TempAnomaly + 0.2),
    arrowprops=dict(arrowstyle="->", color="#555555"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#555555"),
)
ax.text(
    tidy_temp["Year"].iloc[5],
    tidy_temp["TempAnomaly"].iloc[5] - 0.5,
    ANNOTATION,
    fontsize=11,
    color="#333333",
    bbox=dict(boxstyle="round,pad=0.4", facecolor="white", alpha=0.85),
)
annotate_source(ax, source=SOURCE, units=UNITS)
plt.show()

## 🔁 Loop 5 · Interpret and self-check
*Goal: Quantify the change and articulate the takeaway using the story scaffold.*

In [None]:
baseline_period = tidy_temp[tidy_temp["Year"].between(1951, 1980)]
recent_period = tidy_temp[tidy_temp["Year"] >= tidy_temp["Year"].max() - 9]
mean_baseline = baseline_period["TempAnomaly"].mean()
mean_recent = recent_period["TempAnomaly"].mean()
delta = mean_recent - mean_baseline
print(f"Baseline anomaly (1951–1980): {mean_baseline:.2f} °C")
print(f"Recent decade anomaly: {mean_recent:.2f} °C")
print(f"Change since baseline: {delta:.2f} °C")
assert delta > 1, "The recent mean should be roughly 1 °C warmer than the mid-century baseline."

### 🧾 Claim → Evidence → Visual → Takeaway (filled)
- **Claim**: Earth is roughly 1 °C warmer than the 1951–1980 average.
- **Evidence**: Compare the computed baseline and recent-decade means above.
- **Visual**: The line chart with a zero baseline and callout for the latest year.
- **Takeaway**: The steady climb shows climate change is ongoing, not a one-off spike.

> **Limitation prompt**: This dataset shows global means; regional extremes and uncertainty bands are not captured here.

---

### 💾 Save your work
Run the next cell to export the most recent figure.


In [None]:
save_last_fig(fig, "day01_solution_plot.png")