## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day01/notebook/day01_starter.ipynb)

# 🌎 Day 1 – Visualizing Global Warming
### Step-by-step introduction to time-series storytelling

Welcome! Today we learn how to read NASA's global temperature record, check our assumptions at every step,
and craft a line chart that clearly communicates how Earth has warmed since the late 19th century.
We'll move in small loops: set up, load data, verify, visualize, and reflect.

### 🗂️ Data card — NASA GISTEMP Global Surface Temperature Anomalies
- **Source:** NASA Goddard Institute for Space Studies – [GISTEMP v4](https://data.giss.nasa.gov/gistemp/)
- **Temporal coverage:** 1880–2024 (annual)
- **Geography:** Global mean surface temperature
- **Units:** Temperature anomaly in °C relative to the 1951–1980 baseline
- **Collection notes:** Combines land-station and sea-surface temperature records with homogenization
- **Last updated:** January 2025 release
- **Caveats:** Recent years may be revised; anomalies describe change relative to baseline, not absolute temperature
- **Mindful design:** Avoid truncated axes; annotate the baseline so viewers grasp the meaning of 0°C.

### 1. Set up the environment

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

pd.options.display.float_format = "{:.3f}".format

In [None]:
# Shared helper utilities used throughout the week.
from __future__ import annotations

import warnings
from pathlib import Path
from typing import Iterable, Mapping

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


def resolve_data_dir(max_up: int = 5) -> Path:
    """Locate the project-level ``data`` directory regardless of execution location."""
    here = Path.cwd()
    for _ in range(max_up + 1):
        candidate = here / "data"
        if candidate.exists():
            return candidate
        here = here.parent
    raise FileNotFoundError(
        "Could not find a 'data' directory relative to this notebook.
"
        "If you are running in Colab, mount your drive or upload the data folder first."
    )


DATA_DIR = resolve_data_dir()
PROJECT_ROOT = DATA_DIR.parent
PLOTS_DIR = PROJECT_ROOT / "plots"
PLOTS_DIR.mkdir(parents=True, exist_ok=True)


def baseline_style() -> None:
    """Apply a consistent, high-contrast visual style that is colorblind-friendly."""
    sns.set_theme(style="whitegrid", context="talk", font_scale=0.9)
    plt.rcParams.update(
        {
            "figure.dpi": 120,
            "axes.titlesize": 16,
            "axes.labelsize": 13,
            "legend.fontsize": 11,
            "axes.titleweight": "semibold",
        }
    )


def load_data(filename: str | Path, **kwargs) -> pd.DataFrame:
    """Read a CSV file from the shared data directory and report its shape."""
    path = Path(filename)
    if not path.exists():
        path = DATA_DIR / filename
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {path.name} with shape {df.shape}.")
    return df


def validate_columns(df: pd.DataFrame, required: Iterable[str], *, context: str = "") -> None:
    missing = [col for col in required if col not in df.columns]
    if missing:
        warnings.warn(
            f"Missing expected columns {missing} in {context or 'dataframe'}.
"
            "Double-check your renaming and loading steps before moving on."
        )
    else:
        print(f"✅ Columns look good: {list(required)}")


def expect_rows_between(df: pd.DataFrame, low: int, high: int, *, label: str = "rows") -> None:
    count = len(df)
    if not (low <= count <= high):
        warnings.warn(
            f"{label} check: expected between {low:,} and {high:,} but found {count:,}."
        )
    else:
        print(f"✅ {label} check: {count:,} rows is within the expected range.")


def quick_diagnose(df: pd.DataFrame, *, sample: int = 3) -> None:
    print("
Preview of the current dataframe:")
    display(df.head(sample))
    print("
Null values by column:")
    print(df.isna().sum())


def validate_story_fields(fields: Mapping[str, str]) -> None:
    missing = [name for name, value in fields.items() if not str(value).strip()]
    if missing:
        warnings.warn(
            "The following story fields are blank: " + ", ".join(missing) +
            "
Fill them in so your chart has a clear narrative frame."
        )
    else:
        print("✅ Narrative checklist complete.")


def save_last_fig(fig: plt.Figure | None, filename: str) -> Path | None:
    if fig is None:
        fig = plt.gcf()
    if fig and getattr(fig, "axes", None):
        output_path = PLOTS_DIR / filename
        fig.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path.relative_to(PROJECT_ROOT)}")
        return output_path
    warnings.warn("No matplotlib figure available to save yet.")
    return None


### 2. Load the raw temperature table
We read the CSV once, keep an untouched copy for reference, and immediately confirm the shape and column names.

In [None]:
raw_temp = load_data("GLB.Ts+dSST.csv", skiprows=1)
quick_diagnose(raw_temp.iloc[:5, :5])

In [None]:
expect_rows_between(raw_temp, 140, 150, label="annual records")
validate_columns(raw_temp, ["Year", "J-D"], context="raw NASA table")

### 3. Tidy and focus the dataset
NASA publishes monthly columns; we keep the annual anomaly and ensure numeric types.

In [None]:
climate = (
    raw_temp[["Year", "J-D"]]
    .rename(columns={"J-D": "temp_anomaly_c"})
    .assign(temp_anomaly_c=lambda d: pd.to_numeric(d["temp_anomaly_c"], errors="coerce"))
    .dropna(subset=["temp_anomaly_c"])
)
quick_diagnose(climate)
validate_columns(climate, ["Year", "temp_anomaly_c"], context="tidy climate data")
expect_rows_between(climate, 140, 150, label="clean annual records")

### 4. Inspect trends before plotting
Check the magnitude of change and compute a rolling average so we can explain short-term wiggles vs long-term trend.

In [None]:
summary = climate.describe(percentiles=[0.1, 0.5, 0.9])
display(summary)
print("Largest positive anomaly: {:.2f}°C".format(climate["temp_anomaly_c"].max()))
print("Largest negative anomaly: {:.2f}°C".format(climate["temp_anomaly_c"].min()))

climate["rolling_5yr"] = climate["temp_anomaly_c"].rolling(window=5, center=True).mean()
quick_diagnose(climate.tail())

### 5. Define the storytelling frame
Titles, subtitles, and annotations come first so we know what evidence we need in the plot.

In [None]:
TITLE = "Global temperatures keep climbing"
SUBTITLE = "NASA GISTEMP annual anomalies vs. 1951–1980 average (1880–2024)"
ANNOTATION = "Five-year smoothing shows the persistent rise above the 0°C baseline"
SOURCE = "Source: NASA GISTEMP v4 (Jan 2025 release)"
UNITS = "Temperature anomaly (°C relative to 1951–1980)"

narrative_fields = {
    "TITLE": TITLE,
    "SUBTITLE": SUBTITLE,
    "ANNOTATION": ANNOTATION,
    "SOURCE": SOURCE,
    "UNITS": UNITS,
}
validate_story_fields(narrative_fields)

### 6. Plot with diagnostic checkpoints
We plot both the annual anomalies and the 5-year rolling mean, add accessibility-minded styling, and mark the latest value.

In [None]:
baseline_style()

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(
    climate["Year"],
    climate["temp_anomaly_c"],
    color="#0072B2",
    linewidth=1.5,
    label="Annual anomaly",
)
ax.plot(
    climate["Year"],
    climate["rolling_5yr"],
    color="#D55E00",
    linewidth=2.5,
    label="5-year average",
)
ax.axhline(0, color="#555555", linestyle="--", linewidth=1)

latest = climate.dropna(subset=["temp_anomaly_c"]).iloc[-1]
ax.scatter(latest["Year"], latest["temp_anomaly_c"], color="#CC79A7", zorder=5)
ax.annotate(
    f"{latest['Year']}: {latest['temp_anomaly_c']:+.2f}°C",
    xy=(latest["Year"], latest["temp_anomaly_c"]),
    xytext=(latest["Year"] - 15, latest["temp_anomaly_c"] + 0.3),
    arrowprops=dict(arrowstyle="->", color="#333333"),
    fontsize=11,
)

ax.set_title(TITLE, loc="left")
ax.set_xlabel("Year")
ax.set_ylabel(UNITS)
ax.text(0.01, 0.02, SUBTITLE, transform=ax.transAxes, fontsize=11, ha="left", va="bottom")
ax.legend(loc="upper left", frameon=False)
ax.text(0.01, -0.12, f"{SOURCE} · {ANNOTATION}", transform=ax.transAxes, fontsize=10, ha="left", va="top")

plt.show()
final_fig_path = save_last_fig(fig, "day01_solution_plot.png")

### 7. Interpret responsibly
- **Key takeaway:** Global surface temperatures now sit more than 1°C above the mid-20th-century baseline and have stayed there for the last decade.
- **Uncertainty & caveats:** Measurement updates may nudge recent values; anomalies mask regional extremes; baseline choice affects the numeric value but not the warming trend.
- **What this plot cannot tell us:** It does not show intra-annual variability, regional disparities, or attribution of causes—those require other datasets and models.

### 8. Process micro-rubric
| Step | Evidence of completion |
| --- | --- |
| Data loaded & validated | Shape checks and column validation passed |
| Cleaning documented | Rolling mean added and nulls inspected |
| Story frame filled | Title, subtitle, annotation, source, units set |
| Visualization reviewed | Baseline, annotation, legend, and color-safe palette applied |
| Reflection written | Takeaway plus limitations articulated |