## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day01/notebook/day01_starter.ipynb)

# 🌎 Day 1 – Visualizing Global Warming
### From raw anomalies to a climate story

We will turn NASA's temperature anomaly table into a polished visual that highlights how much warmer the world is today than during the 20th century baseline.

#### Data card: NASA GISTEMP global mean temperature anomalies
* **Source:** [NASA GISTEMP v4](https://data.giss.nasa.gov/gistemp/) (global land–ocean mean).
* **Temporal coverage:** 1880–2023, monthly and annual temperature anomalies.
* **Units:** Degrees Celsius relative to the 1951–1980 baseline.
* **Refresh cadence:** Updated monthly; downloaded September 2024.
* **Caveats:** Missing values marked as `***`; anomalies blend land station and SST records.

In [None]:
# Core imports and shared helpers
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Markdown, display

import utils

utils.baseline_style()


## Step 1: Load the temperature anomalies
Use the shared loader so the file is fetched automatically if it is missing, then run diagnostics to make sure the table looks as expected.

In [None]:
# Example: validate_columns reports missing headers if we mistype one
demo_df = pd.DataFrame({
    'Year': [2000, 2001],
    'Value': [0.1, 0.2]
})
_ = utils.validate_columns(demo_df, ['Year', 'Value'])


In [None]:
temperature_raw = utils.load_data('GLB.Ts+dSST.csv', skiprows=1)
utils.diagnostics(
    temperature_raw,
    'GISTEMP anomalies (raw)',
    expected_columns=['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'J-D'],
    expected_row_range=(140, 200),
)


## Step 2: Keep the annual series and tidy the columns
Convert the `J-D` annual anomaly column into numeric form, drop placeholder values, and confirm the dataframe only contains the fields we need.

In [None]:
temperature_annual = (
    temperature_raw[['Year', 'J-D']]
    .rename(columns={'J-D': 'temp_anomaly_c'})
    .assign(
        Year=lambda df: pd.to_numeric(df['Year'], errors='coerce'),
        temp_anomaly_c=lambda df: pd.to_numeric(df['temp_anomaly_c'], errors='coerce'),
    )
    .dropna()
    .astype({'Year': 'int64'})
)
utils.validate_columns(temperature_annual, ['Year', 'temp_anomaly_c'])
utils.expect_rows_between(temperature_annual, 140, 200)
temperature_annual.head()


## Step 3: Build context for the 1951–1980 baseline
A quick check of the baseline window grounds our interpretation of the anomalies.

In [None]:
baseline_window = temperature_annual.query('1951 <= Year <= 1980')
baseline_mean = baseline_window['temp_anomaly_c'].mean()
print(f'Baseline (1951–1980) mean anomaly: {baseline_mean:.3f} °C')
utils.diagnostics(
    baseline_window,
    'Baseline slice',
    expected_columns=['Year', 'temp_anomaly_c'],
    expected_row_range=(25, 35),
)


## Step 4: Add a smooth trend for recent decades
A centred rolling average makes the structural warming trend easier to read while keeping the annual line for context.

In [None]:
temperature_story = temperature_annual.assign(
    rolling_5yr=lambda df: df['temp_anomaly_c'].rolling(window=5, center=True).mean()
)
recent = temperature_story.query('Year >= 1880')
utils.expect_rows_between(recent, 130, 200)
recent.tail()


## Step 5: Craft the story-first chart
Set the narrative scaffolding (title, subtitle, annotation, source, units) before drawing the visual so the message leads the design.

In [None]:
TITLE = 'Global surface temperatures are well above the 20th-century norm'
SUBTITLE = 'NASA GISTEMP global mean temperature anomalies, 1951–1980 baseline (°C)'
ANNOTATION = 'Every year since 2015 has exceeded the baseline by more than 0.8 °C.'
SOURCE = 'NASA GISTEMP v4, downloaded September 2024'
UNITS = 'Temperature anomaly (°C)'
metadata = {
    'title': TITLE,
    'subtitle': SUBTITLE,
    'annotation': ANNOTATION,
    'source': SOURCE,
    'units': UNITS,
}
utils.validate_story_elements(metadata)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(recent['Year'], recent['temp_anomaly_c'], color='#1f77b4', label='Annual anomaly', linewidth=1.6)
ax.plot(recent['Year'], recent['rolling_5yr'], color='#d62728', linewidth=2.4, label='5-year mean')
utils.apply_story_template(ax, title=TITLE, subtitle=SUBTITLE, source=SOURCE, units=UNITS)
peak = recent.loc[recent['temp_anomaly_c'].idxmax()]
ax.annotate(
    ANNOTATION,
    xy=(peak['Year'], peak['temp_anomaly_c']),
    xycoords='data',
    xytext=(-120, 30),
    textcoords='offset points',
    arrowprops=dict(arrowstyle='->', color='#444444'),
    fontsize=11,
    ha='left',
    va='bottom',
    bbox=dict(boxstyle='round,pad=0.3', fc='white', ec='#888888', alpha=0.85),
)
ax.axhline(0, color='#666666', linestyle='--', linewidth=1)
ax.legend(loc='upper left')
ax.set_xlim(recent['Year'].min(), recent['Year'].max())
plt.tight_layout()
utils.save_last_fig('day01_solution_plot.png')


In [None]:
display(Markdown(utils.summarize_claim(
    claim='The planet has warmed dramatically over the past four decades.',
    evidence='Annual anomalies now sit more than 0.8 °C above the mid-20th century baseline.',
    takeaway='Sustained warming means short-term cool years no longer return the climate to its previous state.',
)))
