## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%201_%20Introduction%20–%20Climate%20Change%20%26%20Basic%20Plotting.ipynb)

# 🌎 Day 1 – Visualizing Global Warming

Welcome to Day 1 of the **Data Stories for Sustainability** sprint. We will work in quick, repeatable loops:

1. **Learn a little** – read a bite-sized brief.
2. **Do a little** – run guided code with immediate feedback.
3. **Check and reflect** – confirm your output and capture the story.

By the end of today you will have a polished climate line chart plus a reusable workflow you’ll lean on all week.

> **Teacher sidebar — pacing & differentiation**  
> • Total time: ~45 minutes (five 8–10 minute loops + 5 minute wrap).  
> • Quick checks (`quick_snapshot`, assertions) make it easy to scan who is stuck.  
> • Fast finishers can experiment with `df.rolling` trends or alternate annotations; students who need more support stay with the scaffolded path.

## 🗺️ Roadmap for Today

Loop | Focus | What success looks like
--- | --- | ---
0 | Setup & storytelling scaffold | Utilities loaded, story fields filled in
1 | Load & validate the dataset | `Year`, `TempAnomaly` columns verified
2 | Explore & self-diagnose | Shapes, null checks, sanity hints
3 | Plot the trend responsibly | Accessible, annotated line chart
4 | Interpret & save | Claim → Evidence → Visual → Takeaway written down

## 🗂️ Data Card — NASA GISTEMP Global Temperature Anomalies

- **Source**: NASA Goddard Institute for Space Studies (GISS) Surface Temperature Analysis (GISTEMP v4).  
- **Temporal coverage**: 1880 – present (updated yearly).  
- **Units**: °C anomaly relative to the 1951–1980 baseline.  
- **Method notes**: Combined land-surface air temperatures with sea-surface temperatures; homogenized station data; uncertainty ~±0.05 °C in recent decades.  
- **Last updated**: January 2024 release.  
- **Known caveats**: Sparse early coverage pre-1900, anomalies rather than absolute °C; revisions happen as new station data arrive.

> 🎯 **Integrity cue**: Keep axes linear, include the zero baseline, and note that anomalies describe deviation from a reference period (not absolute heat).

In [None]:
# Shared utilities for the DS4S course notebooks
        from pathlib import Path
        import pandas as pd
        import numpy as np
        import matplotlib.pyplot as plt
        from IPython.display import display

        plt.style.use('seaborn-v0_8-whitegrid')
        plt.rcParams.update({
            'figure.dpi': 120,
            'axes.titlesize': 16,
            'axes.labelsize': 13,
            'axes.titlepad': 12,
            'figure.figsize': (10, 5),
        })


        def load_csv(path: Path, **read_kwargs) -> pd.DataFrame:
            '''Load a CSV and report the basic shape.'''
            df = pd.read_csv(path, **read_kwargs)
            print(f"✅ Loaded {path.name} with {df.shape[0]:,} rows and {df.shape[1]} columns")
            return df


        def validate_columns(df: pd.DataFrame, required):
            missing = [col for col in required if col not in df.columns]
            if missing:
                raise ValueError(f"Missing columns: {missing}")
            print(f"✅ Columns present: {', '.join(required)}")


        def expect_rows_between(df: pd.DataFrame, low: int, high: int):
            rows = df.shape[0]
            if not (low <= rows <= high):
                raise ValueError(f"Row count {rows} outside expected range {low}-{high}")
            print(f"✅ Row count {rows} within expected {low}-{high}")


        def quick_snapshot(df: pd.DataFrame, name: str, n: int = 3):
            print(f"
{name} snapshot → shape={df.shape}")
            print("Columns:", list(df.columns))
            print("Nulls:
", df.isna().sum())
            display(df.head(n))


        def ensure_story_elements(title: str, subtitle: str, annotation: str, source: str, units: str):
            fields = {
                'TITLE': title,
                'SUBTITLE': subtitle,
                'ANNOTATION': annotation,
                'SOURCE': source,
                'UNITS': units,
            }
            missing = [key for key, value in fields.items() if not str(value).strip()]
            if missing:
                raise ValueError(f"Please complete these storytelling fields: {', '.join(missing)}")
            print("✅ Story scaffold complete →", ", ".join(f"{k}: {v}" for k, v in fields.items()))
            return fields


        def save_last_fig(filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            fig = plt.gcf()
            if not fig.axes:
                raise RuntimeError("Run the plotting cell before saving.")
            output_path = plots_dir / filename
            fig.savefig(output_path, dpi=300, bbox_inches='tight')
            print(f"📁 Saved figure to {output_path}")


        def save_plotly_fig(fig, filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            output_path = plots_dir / filename
            fig.write_html(str(output_path))
            print(f"📁 Saved interactive figure to {output_path}")

## 🔁 Loop 0 — Establish today’s story scaffold (2 min read + 1 min action)

Fill in the storytelling fields before plotting. This keeps the narrative front-and-center and prevents “label-last” habits.

In [None]:
TITLE = "Global Temperature Anomalies Are Rising"
SUBTITLE = "NASA GISTEMP, 1880–2024 relative to the 1951–1980 baseline"
ANNOTATION = "2023 marks ~1.35 °C above the mid-20th century norm."
SOURCE = "NASA GISS Surface Temperature Analysis (GISTEMP v4, Jan 2024)"
UNITS = "Temperature anomaly in °C"

story_fields = ensure_story_elements(TITLE, SUBTITLE, ANNOTATION, SOURCE, UNITS)

## 🔁 Loop 1 — Load & validate the climate dataset (3 min)

1. Use `load_csv` to read the curated NASA CSV.  
2. Keep only the year and the global annual anomaly column.  
3. Run the quick diagnostic cell to confirm the structure before moving on.

In [None]:
data_dir = Path.cwd() / "data"
raw_df = load_csv(
    data_dir / "GLB.Ts+dSST.csv",
    skiprows=1,
    usecols=[0, 13],
    names=["Year", "TempAnomaly"],
    header=0,
)
raw_df["TempAnomaly"] = pd.to_numeric(raw_df["TempAnomaly"], errors="coerce")
validate_columns(raw_df, ["Year", "TempAnomaly"])
expect_rows_between(raw_df, 140, 170)

In [None]:
quick_snapshot(raw_df, name="NASA anomalies (raw)")

## 🔁 Loop 2 — Explore & self-diagnose (6 min)

Before plotting, look for warning signs:

- Are years monotonic?  
- Are recent anomalies higher than early years?  
- Any missing values that would break a line chart?

In [None]:
clean_df = (
    raw_df.dropna(subset=["TempAnomaly"])
    .sort_values("Year")
    .assign(RollingTrend=lambda d: d["TempAnomaly"].rolling(window=5, center=True).mean())
)
quick_snapshot(clean_df.tail(), name="Clean anomalies (tail)")
print(
    "Expected rising trend?",
    "✅" if clean_df["TempAnomaly"].iloc[-1] > clean_df["TempAnomaly"].iloc[0] else "⚠️ check calculations",
)

## 🔁 Loop 3 — Plot with intent (10 min)

Use the shared style, keep the zero baseline visible, and annotate the latest data point to anchor the audience. Run the quick visual check afterwards.

In [None]:
fig, ax = plt.subplots()
ax.plot(clean_df["Year"], clean_df["TempAnomaly"], color="#d62728", linewidth=2, label="Annual anomaly")
ax.plot(clean_df["Year"], clean_df["RollingTrend"], color="#1f77b4", linewidth=3, label="5-year trend")
ax.axhline(0, color="black", linestyle="--", linewidth=1)
ax.set_title(TITLE)
ax.set_xlabel("Year")
ax.set_ylabel(UNITS)
ax.legend(loc="upper left")
latest = clean_df.iloc[-1]
ax.annotate(
    ANNOTATION,
    xy=(latest["Year"], latest["TempAnomaly"]),
    xytext=(latest["Year"] - 25, latest["TempAnomaly"] + 0.3),
    arrowprops=dict(arrowstyle="->", color="black"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="black", alpha=0.8),
)
ax.text(
    0.01,
    -0.18,
    f"{SOURCE} | Units: {UNITS}",
    transform=ax.transAxes,
    fontsize=10,
    color="#555555",
)
ax.set_ylim(clean_df["TempAnomaly"].min() - 0.3, clean_df["TempAnomaly"].max() + 0.3)
plt.suptitle(SUBTITLE, fontsize=13, y=1.02, color="#444444")
plt.tight_layout()
plt.show()

In [None]:
print("Line segments plotted:", len(ax.lines))
assert len(ax.lines) >= 2, "Need both annual and rolling trend lines."
print("Latest anomaly: {:.2f}°C".format(latest["TempAnomaly"]))

## 🔁 Loop 4 — Claim → Evidence → Visual → Takeaway (5 min)

Record the interpretation so students practice closing the loop with words, not just pixels.

In [None]:
from IPython.display import Markdown

claim = "Earth is unequivocally warmer than it was in the late 19th century."
evidence = (
    "NASA’s global anomaly climbed from roughly −0.2 °C in the 1880s to more than +1.3 °C in recent years. "
    "The smoothed trend shows an especially sharp rise after 1970."
)
visual = "Annotated Matplotlib line chart with 5-year rolling mean and zero baseline."
takeaway = (
    "The story is about the magnitude and persistence of warming — not a single spike — so we emphasize the long-term trend."
)
Markdown(
    f"""
| Claim | Evidence | Visual | Takeaway |
| --- | --- | --- | --- |
| {claim} | {evidence} | {visual} | {takeaway} |
"""
)

## 💾 Save the figure for the teacher dashboard

In [None]:
save_last_fig("day01_solution_plot.png")

## ✅ Exit Ticket

- What surprised you about the rate of change after 1970?  
- Where might the uncertainty be highest in this dataset?  
- How would you explain the concept of a temperature anomaly to a ninth grader?