## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day01/solution/day01_solution.ipynb)

# 🌎 Day 1 – Visualizing Global Warming
### Guided loops: load → check → plot → reflect

Welcome to your first day of data storytelling. We'll move in short, repeatable cycles: learn a tiny concept, try a focused task, run a quick self-check, then build on it. The final figure is a clean, annotated line chart that shows how Earth's temperature anomalies have climbed since the late 19th century.

## 📇 Data Card — NASA GISTEMP Global Temperature Anomalies
- **Source**: NASA Goddard Institute for Space Studies (GISTEMP v4).<br>Accessed via the course repo copy.
- **Temporal coverage**: 1880–2024 (annual); 2025 is present but provisional and flagged `NaN`.
- **Units**: Temperature anomaly in °C relative to the 1951–1980 baseline.
- **Processing notes**: Converted from wide CSV to `Year`/`TempAnomaly` columns, coercing non-numeric entries to `NaN`.
- **Last updated**: January 2025 data release.
- **Caveats**: Recent months can be revised; anomalies highlight change, not absolute temperature. When interpreting, emphasise long-term trends over single-year spikes.

> 🔎 **What this plot cannot tell us**: It does not separate natural vs human drivers, regional variation, or seasonal extremes. Use it as a big-picture signal, not the whole climate story.

## 🗺️ Workflow Map
We will reuse the same cadence all week:
1. **Setup & shared helpers** – consistent style and diagnostics.
2. **Load a dataset** – confirm shape, columns, and plausible ranges.
3. **Inspect** – quick stats and sanity checks.
4. **Story scaffold** – write the title, subtitle, annotation, units, and source before plotting.
5. **Visualize & annotate** – build the chart, confirm accessibility, and save it.
6. **Reflect** – note limitations, uncertainty, and next questions.

## Step 0 · Imports, style, and quick diagnostics
These helpers keep every notebook consistent so you can focus on reasoning instead of rewriting boilerplate.

In [None]:

from pathlib import Path
from textwrap import dedent

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display

sns.set_theme(style="whitegrid")
plt.rcParams.update({
    "axes.titlesize": 18,
    "axes.labelsize": 13,
    "axes.titleweight": "bold",
    "figure.titlesize": 20,
    "xtick.labelsize": 11,
    "ytick.labelsize": 11,
})


def baseline_style():
    """Reset the Matplotlib/Seaborn style so every figure starts consistent."""
    sns.set_theme(style="whitegrid")
    plt.rcParams.update({
        "axes.titlesize": 18,
        "axes.labelsize": 13,
        "axes.titleweight": "bold",
        "figure.titlesize": 20,
        "xtick.labelsize": 11,
        "ytick.labelsize": 11,
        "legend.title_fontsize": 12,
        "legend.fontsize": 11,
    })
    return plt


def quick_peek(df, expected_columns=None, sample=3, label="DataFrame"):
    """Print a friendly snapshot so students can self-diagnose issues quickly."""
    print(f"
🔍 {label} preview")
    print(df.head(sample))
    print(f"Rows: {len(df):,} | Columns: {list(df.columns)}")
    if expected_columns:
        missing = [col for col in expected_columns if col not in df.columns]
        if missing:
            print(f"⚠️ Missing column(s): {missing}")
        else:
            print("✅ Columns match the expectation.")
    return df


def expect_rows_between(df, low, high, label="row count"):
    rows = len(df)
    if low <= rows <= high:
        print(f"✅ {label.title()} looks right: {rows:,}.")
    else:
        print(f"⚠️ {label.title()} looks off: {rows:,}. Expected between {low:,} and {high:,}.")
    return rows


def validate_story_elements(elements):
    missing = [key for key, value in elements.items() if not value or not str(value).strip()]
    if missing:
        print(f"⚠️ Please fill in these storytelling fields: {', '.join(missing)}")
    else:
        print("✅ Story scaffold is ready — every element is filled in.")
    return elements


def save_last_fig(filename, fig=None, dpi=300):
    """Save the latest Matplotlib figure with consistent export settings."""
    output_path = Path.cwd() / filename
    output_path.parent.mkdir(parents=True, exist_ok=True)
    if fig is None:
        fig = plt.gcf()
    if fig and getattr(fig, "axes", None):
        fig.savefig(output_path, dpi=dpi, bbox_inches="tight")
        print(f"💾 Saved figure to {output_path}")
    else:
        print("⚠️ No figure detected to save.")
    return output_path

baseline_style()


## Step 1 · Load the temperature anomaly data
**Micro-task**: read the NASA CSV, keep the `Year` and global annual anomaly, and drop provisional `NaN` rows.

If things look odd, tiered hints in the starter notebook nudge you through `pd.read_csv`, cleaning, and type conversion.

In [None]:

data_dir = Path.cwd() / "data"
temperature_path = data_dir / "GLB.Ts+dSST.csv"

raw_df = pd.read_csv(
    temperature_path,
    skiprows=1,
    usecols=[0, 13],
    names=["Year", "TempAnomaly"],
    header=0,
)
raw_df["TempAnomaly"] = pd.to_numeric(raw_df["TempAnomaly"], errors="coerce")

df = raw_df.dropna(subset=["TempAnomaly"]).copy()

quick_peek(df, expected_columns=["Year", "TempAnomaly"], label="NASA anomalies")
expect_rows_between(df, 140, 150)


### Self-diagnostic: year span
Students run this right after loading to confirm the timeline feels right.

In [None]:

start_year, end_year = int(df["Year"].min()), int(df["Year"].max())
print(f"🗓️ Years covered: {start_year} → {end_year}")
if start_year > 1880 or end_year < 2024:
    print("⚠️ Check the column selection or dropna step; we expect 1880–2024 with current data.")
else:
    print("✅ Year range matches the data card.")


## Step 2 · Explore interim signals
Before plotting, compare early vs. recent decades. This bite-sized calculation gives students an interpretable checkpoint.

In [None]:

baseline_slice = df[df["Year"].between(1951, 1980)]
recent_slice = df[df["Year"] >= 2000]

mean_baseline = baseline_slice["TempAnomaly"].mean()
mean_recent = recent_slice["TempAnomaly"].mean()
change = mean_recent - mean_baseline

print(f"Baseline (1951–1980) anomaly: {mean_baseline:.2f} °C")
print(f"Recent (2000–2024) anomaly: {mean_recent:.2f} °C")
print(f"Change vs. baseline: {change:.2f} °C")


### Progress anchor
A thumbnail of the interim target helps students know they are on track.

In [None]:
display(Image(filename=str(Path.cwd() / 'plots' / 'day01_solution_plot.png')), width=380)

## Step 3 · Story-first chart checklist
Everyone fills these out before drawing the chart. The validator keeps formatting consistent and prevents unlabeled visuals.

In [None]:

TITLE = "Global Temperature Anomalies Keep Climbing"
SUBTITLE = "Annual departures from the 1951–1980 average, 1880–2024"
ANNOTATION = "2024 anomaly: {:.2f} °C — among the highest on record".format(df["TempAnomaly"].iloc[-1])
SOURCE = "NASA GISTEMP v4 (downloaded Jan 2025)"
UNITS = "Temperature anomaly (°C relative to 1951–1980)"
ACCESSIBILITY_NOTES = "Single red line on white grid; markers every 5 years; colorblind-safe contrast and 14pt labels."

validate_story_elements({
    "TITLE": TITLE,
    "SUBTITLE": SUBTITLE,
    "ANNOTATION": ANNOTATION,
    "SOURCE": SOURCE,
    "UNITS": UNITS,
    "ACCESSIBILITY_NOTES": ACCESSIBILITY_NOTES,
})


## Step 4 · Build, annotate, and sanity-check the figure
This cell mirrors the starter notebook’s scaffold but with all TODOs resolved. Comments flag the same guardrails students saw while working.

In [None]:

baseline_style()

fig, ax = plt.subplots(figsize=(11, 6))
ax.plot(df["Year"], df["TempAnomaly"], color="#d62728", linewidth=2.5)
ax.scatter(df["Year"][::5], df["TempAnomaly"][::5], color="#d62728", s=35)
ax.axhline(0, color="#4f4f4f", linestyle="--", linewidth=1)

ax.set_title(f"{TITLE}
{SUBTITLE}", loc="left")
ax.set_xlabel("Year")
ax.set_ylabel(UNITS)
ax.text(0.01, -0.18, f"Source: {SOURCE}", transform=ax.transAxes, fontsize=10, color="#4f4f4f")
ax.text(0.01, -0.25, f"Notes: {ACCESSIBILITY_NOTES}", transform=ax.transAxes, fontsize=10, color="#4f4f4f")

latest_year = int(df["Year"].iloc[-1])
latest_value = df["TempAnomaly"].iloc[-1]
ax.annotate(
    ANNOTATION,
    xy=(latest_year, latest_value),
    xytext=(latest_year - 20, latest_value + 0.4),
    arrowprops=dict(arrowstyle="->", color="#4f4f4f"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#cccccc"),
)

ax.set_xlim(df["Year"].min(), df["Year"].max())
ax.grid(alpha=0.3)

plt.show()


### Export checkpoint
The helper standardises filenames so every day’s artifact saves cleanly.

In [None]:
save_last_fig('plots/day01_solution_plot.png')

## Step 5 · Reflect, question, and communicate uncertainty
- **Claim → Evidence → Visual → Takeaway**:
  - **Claim**: Earth has warmed by over 1 °C relative to the mid-20th century baseline.
  - **Evidence**: The anomaly line climbs steadily after the 1970s; the recent five-year average exceeds +1 °C.
  - **Visual**: Single-series line chart with baseline reference and annotated latest point.
  - **Takeaway**: Warming is persistent, not a one-off spike — reinforcing the urgency for mitigation.
- **Limitations**: Annual global mean smooths local extremes; revised data may slightly shift recent values.
- **Potential misreads**: Zoomed-in y-axis exaggerates short-term wiggles, but annotation emphasises long-term change.
- **Next questions**: How do emissions or regional patterns line up with this global trend? What uncertainties remain around future projections?

## Process quality checklist
✅ Loaded data with validation • ✅ Ran interim diagnostics • ✅ Completed story scaffold • ✅ Built accessible figure • ✅ Captured interpretation + uncertainty.