## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day05/notebook/day05_starter.ipynb)

# 🔥 Day 5 – Capstone: CO₂ Emissions & Global Temperature
### Connecting greenhouse gases to warming trends

We will merge the global CO₂ record with temperature anomalies to tell a two-panel story about the drivers and impacts of climate change.

#### Data card: CO₂ emissions & temperature anomalies
* **Sources:** [Our World in Data – Global CO₂](https://ourworldindata.org/co2-and-greenhouse-gas-emissions) and [NASA GISTEMP](https://data.giss.nasa.gov/gistemp/).
* **Temporal coverage:** CO₂ from 1750–2023; temperature anomalies from 1880–2023.
* **Units:** CO₂ in gigatonnes; temperature anomalies in °C relative to 1951–1980.
* **Refresh cadence:** Updated annually (CO₂) and monthly (GISTEMP); downloaded September 2024.
* **Caveats:** Pre-1950 CO₂ estimates have higher uncertainty; anomalies are relative to a mid-20th-century baseline.

In [None]:
# Core imports and shared helpers
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Markdown, display

import utils

utils.baseline_style()


## Step 1: Load both datasets
Use the helper to pull the CO₂ series and the temperature anomalies, then inspect their shapes and key columns.

In [None]:
co2_raw = utils.load_data('global_co2.csv')
temp_raw = utils.load_data('GLB.Ts+dSST.csv', skiprows=1)
utils.diagnostics(co2_raw, 'Global CO₂ (raw)', expected_columns=['Year', 'CO2'], expected_row_range=(250, 300))
utils.diagnostics(temp_raw, 'Temperature anomalies (raw)', expected_columns=['Year', 'J-D'], expected_row_range=(140, 200))


## Step 2: Tidy the temperature data
Keep the annual mean (`J-D`), convert to numeric values, and restrict to the overlap with the CO₂ series.

In [None]:
temp_annual = (
    temp_raw[['Year', 'J-D']]
    .rename(columns={'J-D': 'temp_anomaly_c'})
    .assign(
        Year=lambda df: pd.to_numeric(df['Year'], errors='coerce'),
        temp_anomaly_c=lambda df: pd.to_numeric(df['temp_anomaly_c'], errors='coerce'),
    )
    .dropna()
    .astype({'Year': 'int64'})
)
utils.diagnostics(temp_annual, 'Temperature anomalies (tidy)', expected_columns=['Year', 'temp_anomaly_c'], expected_row_range=(140, 200))


## Step 3: Prepare the CO₂ series
Convert the CO₂ column to numeric and limit to years with matching temperature data.

In [None]:
co2_clean = co2_raw.assign(
    Year=lambda df: pd.to_numeric(df['Year'], errors='coerce').astype('Int64'),
    CO2=lambda df: pd.to_numeric(df['CO2'], errors='coerce'),
)
co2_clean = co2_clean.dropna()
utils.diagnostics(co2_clean, 'CO₂ (tidy)', expected_columns=['Year', 'CO2'], expected_row_range=(250, 300))


## Step 4: Merge the datasets and compute derived metrics
Align the series on common years and calculate normalised versions for easier comparison.

In [None]:
merged = (
    temp_annual.merge(co2_clean, on='Year', how='inner')
    .query('Year >= 1880')
)
utils.diagnostics(
    merged,
    'Merged climate dataset',
    expected_columns=['Year', 'temp_anomaly_c', 'CO2'],
    expected_row_range=(120, 200),
)


## Step 5: Build the capstone figure
Combine a CO₂ trendline and a temperature anomaly panel, then add a scatter inset linking the two.

In [None]:
TITLE = 'Rising CO₂ emissions track the sharp increase in global temperatures'
SUBTITLE = 'Global totals, 1880–2023'
ANNOTATION = 'Since 1980, CO₂ emissions and temperature anomalies have both accelerated sharply.'
SOURCE = 'Our World in Data (Global CO₂) & NASA GISTEMP'
UNITS = 'Temperature anomaly (°C)'

metadata = {
    'title': TITLE,
    'subtitle': SUBTITLE,
    'annotation': ANNOTATION,
    'source': SOURCE,
    'units': UNITS,
}
utils.validate_story_elements(metadata)

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(11, 10), sharex=True, gridspec_kw={'height_ratios': [2, 2]})

ax1.plot(merged['Year'], merged['CO2'], color='#8c564b', linewidth=2.2)
ax1.set_ylabel('CO₂ emissions (Gt)')
ax1.set_title(TITLE, loc='left', pad=14)
ax1.text(0, 1.02, SUBTITLE, transform=ax1.transAxes, fontsize=12, ha='left', va='bottom', color='#4f4f4f')
ax1.grid(alpha=0.2)

ax2.plot(merged['Year'], merged['temp_anomaly_c'], color='#1f77b4', linewidth=2.2, label='Annual anomaly')
ax2.plot(
    merged['Year'],
    merged['temp_anomaly_c'].rolling(window=5, center=True).mean(),
    color='#d62728',
    linewidth=2.6,
    label='5-year mean',
)
ax2.set_ylabel(UNITS)
ax2.axhline(0, color='#666666', linestyle='--', linewidth=1)
ax2.legend(loc='upper left', frameon=False)
ax2.grid(alpha=0.2)

ax2.text(
    0,
    -0.22,
    f"Source: {SOURCE}",
    transform=ax2.transAxes,
    fontsize=10,
    color='#555555',
)

inset_ax = fig.add_axes([0.62, 0.15, 0.32, 0.25])
inset_ax.scatter(merged['CO2'], merged['temp_anomaly_c'], alpha=0.6, s=25, color='#3182bd')
inset_ax.set_xlabel('CO₂ emissions (Gt)')
inset_ax.set_ylabel('Temp anomaly (°C)')
inset_ax.grid(alpha=0.2)

inset_ax.annotate(
    ANNOTATION,
    xy=(merged['CO2'].iloc[-1], merged['temp_anomaly_c'].iloc[-1]),
    xytext=(-120, 60),
    textcoords='offset points',
    arrowprops=dict(arrowstyle='->', color='#444444'),
    fontsize=10,
    ha='left',
    va='bottom',
    bbox=dict(boxstyle='round,pad=0.3', fc='white', ec='#777777', alpha=0.85),
)

plt.tight_layout()
utils.save_last_fig('day05_solution_plot.png', fig=fig)


In [None]:
display(
    Markdown(
        utils.summarize_claim(
            claim='CO₂ emissions and global temperatures are tightly linked.',
            evidence='Both the emission curve and temperature anomalies rise sharply after 1950, with a strong positive relationship in the inset scatter.',
            takeaway='Cutting greenhouse gases is essential to stabilise the climate; the data show no sign of the link weakening.',
        )
    )
)
