## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/main/days/day01/notebook/day01_starter.ipynb)

# 🌎 Day 1 – Tracking Global Temperature Change
Welcome! Today you will build confidence exploring climate data one small step at a time.

## 🧾 Data Card – NASA GISTEMP Global Mean
- **Source:** [NASA Goddard Institute for Space Studies (GISTEMP v4)](https://data.giss.nasa.gov/gistemp/)
- **Temporal coverage:** 1880–present (annual).
- **Units:** Global surface temperature anomaly in °C relative to 1951–1980 baseline.
- **Update cadence:** Monthly. Last downloaded 2024-04.
- **Method notes:** Combines land and sea temperature records with homogenization adjustments.
- **Caveats:** Early years have wider uncertainty; anomalies use a historical baseline rather than absolute temperatures.

## 🧭 Story Scaffold
Use this template as you work toward the final chart.
- **Claim:** What change or pattern are you highlighting?
- **Evidence:** Which columns and summary stats support it?
- **Visual:** Which chart type and encodings communicate it clearly?
- **Takeaway:** Why does it matter, and what uncertainty should viewers keep in mind?

In [None]:
from __future__ import annotations

from pathlib import Path
import sys

import matplotlib.pyplot as plt
import pandas as pd

for candidate in [Path.cwd(), Path.cwd().parent, Path.cwd().parent.parent]:
    utils_path = candidate / "utils.py"
    if utils_path.exists():
        if str(candidate) not in sys.path:
            sys.path.insert(0, str(candidate))
        break
else:
    raise FileNotFoundError("Unable to locate utils.py. Did you download the full project?")

from utils import (
    baseline_style,
    diagnose_dataframe,
    expect_rows_between,
    load_data,
    save_last_fig,
    validate_columns,
    validate_story_elements,
)

baseline_style()


In [None]:
# Example: plotting a short time series with an annotation
example_years = pd.Series(range(2018, 2025))
example_values = pd.Series([0.42, 0.44, 0.47, 0.52, 0.61, 0.63, 0.72])
example_df = pd.DataFrame({"Year": example_years, "Anomaly": example_values})

fig, ax = plt.subplots(figsize=(6, 3))
ax.plot(example_df["Year"], example_df["Anomaly"], marker="o", color="#2ca02c")
ax.annotate(
    "Example annotation",
    xy=(example_df["Year"].iloc[-1], example_df["Anomaly"].iloc[-1]),
    xytext=(2019.5, 0.75),
    arrowprops={"arrowstyle": "->", "color": "#2ca02c"},
)
ax.set_title("Example: keep annotations concise")
ax.set_xlabel("Year")
ax.set_ylabel("Anomaly (°C)")
plt.close(fig)
fig


In [None]:
# Step 1 – Load the NASA temperature anomalies
temperature_raw = load_data(
    "GLB.Ts+dSST.csv",
    skiprows=1,
    usecols=[0, 13],
    names=["Year", "TempAnomaly"],
    header=0,
)

# TODO: Load the GISTEMP dataset into temperature_raw using load_data


<details>
<summary>Need a nudge loading the data?</summary>
<ul>
<li>Use <code>load_data</code> so the code works locally and in Colab.</li>
<li>The annual mean lives in column index 13 of the CSV.</li>
<li>Be sure to name the columns <code>Year</code> and <code>TempAnomaly</code>.</li>
</ul>
</details>

In [None]:
# Step 2 – Clean and filter the table
temperature_clean = (
    temperature_raw.assign(
        Year=lambda frame: pd.to_numeric(frame["Year"], errors="coerce"),
        TempAnomaly=lambda frame: pd.to_numeric(frame["TempAnomaly"], errors="coerce"),
    )
    .dropna(subset=["Year", "TempAnomaly"])
    .astype({"Year": int})
    .query("Year >= 1880")
    .reset_index(drop=True)
)

# TODO: Convert the Year and TempAnomaly columns to numeric and filter from 1880 onward


<details>
<summary>Need help cleaning?</summary>
<ul>
<li><code>pd.to_numeric(..., errors="coerce")</code> turns bad strings into <code>NaN</code>.</li>
<li>After coercing, drop rows with missing values.</li>
<li>Limit the analysis to years 1880 and later for a consistent record.</li>
</ul>
</details>

In [None]:
# Step 3 – Quick diagnostics
diagnose_dataframe(temperature_clean, name="Global temperature anomalies")
validate_columns(temperature_clean, ["Year", "TempAnomaly"], name="temperature_clean")
expect_rows_between(temperature_clean, 140, 200, name="temperature_clean")


In [None]:
# Step 4 – Create a smoothed reference line
temperature_with_trend = temperature_clean.assign(
    Rolling5yr=lambda frame: frame["TempAnomaly"].rolling(window=5, center=True).mean()
)

# TODO: Compute a centered 5-year rolling mean called Rolling5yr


<details>
<summary>Rolling mean hint</summary>
<ul>
<li>Start with <code>frame["TempAnomaly"].rolling(window=5, center=True)</code>.</li>
<li><code>.mean()</code> produces the smoothed series.</li>
<li>Assign it into a new column named <code>Rolling5yr</code>.</li>
</ul>
</details>

In [None]:
# Step 5 – Story metadata (fill these before plotting)
TITLE = "Global Surface Temperature Anomalies (1880–2024)"
SUBTITLE = "NASA GISTEMP v4; baseline 1951–1980"
ANNOTATION = "2023–2024 anomalies top the record by a large margin."
SOURCE = "NASA GISS (downloaded 2024-04-15)"
UNITS = "°C relative to 1951–1980 average"

validate_story_elements(
    {
        "TITLE": TITLE,
        "SUBTITLE": SUBTITLE,
        "ANNOTATION": ANNOTATION,
        "SOURCE": SOURCE,
        "UNITS": UNITS,
    }
)

# TODO: Customize the five storytelling strings so none are blank


In [None]:
# Step 6 – Build the annotated line chart
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(
    temperature_with_trend["Year"],
    temperature_with_trend["TempAnomaly"],
    label="Annual anomaly",
    color="#d62728",
    marker="o",
    markersize=3,
)
ax.plot(
    temperature_with_trend["Year"],
    temperature_with_trend["Rolling5yr"],
    label="5-year rolling average",
    color="#1f77b4",
    linewidth=2,
)
ax.axhline(0, color="#333333", linestyle="--", linewidth=1)

latest = temperature_with_trend.dropna(subset=["TempAnomaly"]).iloc[-1]
ax.annotate(
    ANNOTATION,
    xy=(latest["Year"], latest["TempAnomaly"]),
    xytext=(latest["Year"] - 12, latest["TempAnomaly"] + 0.4),
    arrowprops={"arrowstyle": "->", "color": "#444444"},
    fontsize=11,
)

ax.set_title(TITLE)
ax.set_xlabel(f"Year — {SUBTITLE}")
ax.set_ylabel(f"Temperature anomaly ({UNITS})")
ax.legend()
ax.text(0, -0.18, f"Source: {SOURCE}", transform=ax.transAxes)
ax.set_ylim(
    temperature_with_trend["TempAnomaly"].min() - 0.4,
    temperature_with_trend["TempAnomaly"].max() + 0.4,
)
fig.tight_layout()
temperature_fig = fig
fig

# TODO: Plot both the raw anomalies and the rolling mean with clear annotations


In [None]:
# Step 7 – Final review and (optional) save
validate_story_elements(
    {
        "TITLE": TITLE,
        "SUBTITLE": SUBTITLE,
        "ANNOTATION": ANNOTATION,
        "SOURCE": SOURCE,
        "UNITS": UNITS,
    }
)
save_last_fig("day01_global_temperature.png", fig=temperature_fig)
