## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%201_%20Introduction%20–%20Climate%20Change%20%26%20Basic%20Plotting.ipynb)

# 🌎 Day 1 · Reading Earth's Thermometer

Welcome! Today is about easing into Python while building trust with data. We'll alternate short bursts of reading and doing so you always know what to run next. By the end you will have:

- A clean table of NASA global temperature anomalies
- Diagnostics that catch common mistakes early
- A story-driven line chart with a clear claim, evidence, and takeaway

We'll move through four focused passes: orient to the raw file, tidy and validate, check the signal, and publish a polished visual.

## 🗂️ Data Card · NASA GISTEMP v4
| Field | Details |
| --- | --- |
| Source | [NASA Goddard Institute for Space Studies (GISTEMP v4)](https://data.giss.nasa.gov/gistemp/) |
| Temporal coverage | 1880–2025 (latest year is provisional) |
| Geographic scope | Global mean surface temperature anomaly |
| Units | Degrees Celsius relative to the 1951–1980 baseline |
| Update cadence | Monthly release, aggregated annually in this file |
| Caveats | 2025 value marked `***` is a placeholder until the annual update; polar regions rely on interpolation; baseline choice affects anomaly magnitude. |
| What this chart cannot show | Regional extremes, uncertainty bands, or causes of warming. Pair with greenhouse gas records for causal context. |

In [None]:
# 🔁 Shared scaffolds used across DS4S notebooks
from __future__ import annotations

import warnings
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

plt.rcParams.update({
    "figure.dpi": 120,
    "axes.facecolor": "#f8f9fb",
    "axes.grid": True,
    "grid.alpha": 0.25,
    "grid.linestyle": "--",
    "axes.titlesize": 18,
    "axes.labelsize": 12,
    "axes.titleweight": "bold",
    "legend.frameon": False,
    "legend.fontsize": 11,
    "font.family": "DejaVu Sans",
})

def quick_diagnostics(df: pd.DataFrame, dataset_name: str, *, expected_columns: list[str] | None = None, expected_rows: tuple[int, int] | None = None) -> None:
    """Print lightweight diagnostics without stopping execution."""
    print(f"\n🔍 {dataset_name}")
    print(f"Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    if expected_columns is not None:
        missing = [col for col in expected_columns if col not in df.columns]
        if missing:
            warnings.warn(f"Missing expected columns: {missing}")
    if expected_rows is not None:
        low, high = expected_rows
        if not (low <= len(df) <= high):
            warnings.warn(f"Row count {len(df)} outside expected range {expected_rows}")
        else:
            print(f"Row count within expected range {expected_rows}.")
    print("Null counts:")
    print(df.isna().sum())
    print("Preview:")
    print(df.head())
    print("-" * 60)

def expect_value_range(series: pd.Series, *, lower: float | None = None, upper: float | None = None, context: str = "") -> None:
    """Warn when values fall outside an expected numeric window."""
    label = context or series.name or "series"
    if lower is not None and float(series.min()) < lower:
        warnings.warn(f"{label}: minimum {series.min():.3f} is below expected {lower}")
    if upper is not None and float(series.max()) > upper:
        warnings.warn(f"{label}: maximum {series.max():.3f} is above expected {upper}")
    print(f"{label}: {series.min():.3f} → {series.max():.3f}")

def validate_story_elements(*, title: str, subtitle: str, annotation: str, source: str, units: str) -> None:
    """Confirm the storytelling scaffold is filled before plotting."""
    elements = {
        "TITLE": title,
        "SUBTITLE": subtitle,
        "ANNOTATION": annotation,
        "SOURCE": source,
        "UNITS": units,
    }
    missing = [key for key, value in elements.items() if not str(value).strip()]
    if missing:
        warnings.warn(f"Please fill these storytelling fields: {', '.join(missing)}")
    else:
        print("👍 Story scaffold complete.")

def baseline_style(ax: plt.Axes | None = None) -> plt.Axes:
    """Standardise axes styling for consistency across notebooks."""
    ax = ax or plt.gca()
    for spine in ["top", "right"]:
        ax.spines[spine].set_visible(False)
    ax.set_facecolor("#ffffff")
    return ax

def save_last_visual(fig, filename: str, *, subfolder: str = "plots") -> None:
    """Persist the most recent Matplotlib or Plotly figure without failing the run."""
    plots_dir = Path.cwd() / subfolder
    plots_dir.mkdir(parents=True, exist_ok=True)
    output_path = plots_dir / filename
    try:
        if hasattr(fig, "write_image"):
            fig.write_image(str(output_path))
        elif hasattr(fig, "savefig"):
            fig.savefig(output_path, dpi=300, bbox_inches="tight")
        else:
            warnings.warn("Figure type not supported for export; skipping save.")
            return
        print(f"Saved visual to {output_path}")
    except Exception as exc:
        warnings.warn(f"Plot export skipped: {exc}")


## Step 1 · Load the anomaly table
1. Read the CSV with `pd.read_csv` (the data lives in `data/GLB.Ts+dSST.csv`).
2. Inspect shape, column names, and a short preview.
3. Confirm we are close to one row per year.

Run the diagnostics cell right after loading to catch typos immediately.

In [None]:
data_dir = Path.cwd() / "data"
temperature_path = data_dir / "GLB.Ts+dSST.csv"

df_raw = pd.read_csv(
    temperature_path,
    skiprows=1,
    usecols=[0, 13],
    names=["Year", "TempAnomalyRaw"],
    header=0,
)

quick_diagnostics(
    df_raw,
    "NASA GISTEMP raw extract",
    expected_columns=["Year", "TempAnomalyRaw"],
    expected_rows=(140, 200),
)


## Step 2 · Clean and focus the fields
Now we make the column useful:

- Convert the anomaly text to numeric values
- Drop rows where NASA still shows `***`
- Ensure `Year` is an integer and sorted

Use the helper checks to verify the cleaned table before moving on.

In [None]:
df_climate = (
    df_raw.assign(
        TempAnomaly=lambda d: pd.to_numeric(d["TempAnomalyRaw"], errors="coerce"),
    )
    .dropna(subset=["TempAnomaly"])
    .astype({"Year": "int64"})
    .loc[:, ["Year", "TempAnomaly"]]
    .sort_values("Year")
    .reset_index(drop=True)
)

quick_diagnostics(
    df_climate,
    "Cleaned anomaly table",
    expected_columns=["Year", "TempAnomaly"],
    expected_rows=(140, 200),
)
expect_value_range(df_climate["TempAnomaly"], lower=-1.0, upper=1.8, context="TempAnomaly")


## Step 3 · Check the signal before plotting
We pause to read the data like a scientist:

- Print a 5-year rolling mean to smooth short-term swings
- Quantify the change between early and recent decades
- Keep an eye on units and plausible ranges

This small table is a great formative check for you or a teacher walking the room.

In [None]:
df_climate["Rolling5"] = df_climate["TempAnomaly"].rolling(window=5, center=True).mean()

recent_slice = df_climate.tail(10)
print("Recent decade snapshot:")
print(recent_slice)

baseline_slice = df_climate.head(10)
print("Baseline decade snapshot:")
print(baseline_slice)

temp_change = recent_slice["TempAnomaly"].mean() - baseline_slice["TempAnomaly"].mean()
print(f"Average change vs. 1880s: {temp_change:.2f}°C")


### Expected trend preview
Your final chart will echo this upward sweep. Use it as a compass if your plot looks off.

![Preview of the finished chart](../../../plots/day01_solution_plot.png)

## Step 4 · Craft the story-first visual
Fill the storytelling scaffold, validate it, and then render the figure. Annotate the latest value and leave a caption with units and source so the audience can trust what they see.

In [None]:
TITLE = "Global temperatures are ~1.3°C warmer than the 20th-century average"
SUBTITLE = "NASA GISTEMP annual anomalies relative to the 1951–1980 baseline (1880–2024 provisional)"
ANNOTATION = "Recent years stay above +1°C; 2024 closed near +1.3°C."
SOURCE = "Source: NASA GISS Surface Temperature Analysis (downloaded 2025-01-05)"
UNITS = "Temperature anomaly (°C relative to 1951–1980)"

validate_story_elements(
    title=TITLE,
    subtitle=SUBTITLE,
    annotation=ANNOTATION,
    source=SOURCE,
    units=UNITS,
)

latest = df_climate.iloc[-1]
fig_climate_story, ax = plt.subplots(figsize=(11, 6))
ax = baseline_style(ax)

ax.plot(
    df_climate["Year"],
    df_climate["TempAnomaly"],
    color="#d73027",
    linewidth=2.5,
    label="Annual anomaly",
)
ax.plot(
    df_climate["Year"],
    df_climate["Rolling5"],
    color="#1b7837",
    linewidth=2.5,
    label="5-year average",
)
ax.axhline(0, color="#2f4858", linewidth=1, linestyle="--")

ax.set_title(TITLE, loc="left", pad=18)
ax.text(0.0, 1.04, SUBTITLE, transform=ax.transAxes, fontsize=12, color="#555555")
ax.set_xlabel("Year")
ax.set_ylabel(UNITS)
ax.legend(loc="upper left", frameon=False)

ax.annotate(
    f"{int(latest['Year'])}: {latest['TempAnomaly']:.2f}°C",
    xy=(latest["Year"], latest["TempAnomaly"]),
    xytext=(latest["Year"] - 22, latest["TempAnomaly"] + 0.35),
    arrowprops=dict(arrowstyle="->", color="#333333"),
    fontsize=12,
    color="#333333",
)
ax.text(
    0.01,
    -0.18,
    f"{ANNOTATION}
{SOURCE} · Units: {UNITS}",
    transform=ax.transAxes,
    fontsize=10,
    color="#555555",
    va="top",
)

plt.tight_layout()
plt.show()


In [None]:
save_last_visual(fig_climate_story, "day01_solution_plot.png")

## 🔍 Reflection & limitations
- What the chart cannot tell us: regional extremes, ocean heat content, or the role of greenhouse gases. Pair with CO₂ data later in the week.
- Design safeguard: avoid dual axes here; we already centre the narrative on a single, well-labeled metric.
- Uncertainty prompt: How might interpolation in the Arctic or changes in station coverage influence the trend? Note your thoughts below before moving on.