## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day02/solution/day02_solution.ipynb)

# ⚡ Day 2 – Exploring Energy Transitions
### Guided loops: reshape → check → compare → narrate

We continue the learn-a-little, do-a-little cadence by comparing the rise of renewables against the stubborn dominance of fossil fuels. Each loop adds one analytical skill: reshaping tables, validating joins, layering multiple series, and narrating trade-offs.

## 📇 Data Card — Our World in Data: Renewable Energy Share
- **Sources**: BP Statistical Review / IEA via Our World in Data (2016 methodology).
- **Temporal coverage**: 1965–2022, annual, global and national entities.
- **Units**: Share of total primary energy (% of equivalent primary energy).
- **Files used**: `01 renewable-share-energy.csv`, `06 hydro-share-energy.csv`, `10 wind-share-energy.csv`, `14 solar-share-energy.csv`.
- **Processing notes**: Filter to the `World` entity, align years across series, reshape to wide format for stacked visuals.
- **Last updated**: OWID download refreshed January 2024.
- **Caveats**: Method harmonises BP and IEA estimates; bioenergy and geothermal excluded. Country definitions follow OWID, which may differ from national statistics.

> 🔎 **What this analysis cannot tell us**: Absolute energy demand, fossil fuel breakdown, or grid reliability. Treat the share metric as a relative indicator, not total consumption.

## 🗺️ Workflow Map
1. **Setup & shared helpers**.
2. **Load multiple CSVs** and verify column alignment.
3. **Filter & merge** world-level series into one tidy table.
4. **Story scaffold** with title, subtitle, annotation, units, and source.
5. **Visualise** a progress line plus stacked composition, checking accessibility as you go.
6. **Reflect** on pace of change, limitations, and policy implications.

## Step 0 · Imports, style, and quick diagnostics
Reusing the shared helper block keeps every day’s workflow familiar.

In [None]:

from pathlib import Path
from textwrap import dedent

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display

sns.set_theme(style="whitegrid")
plt.rcParams.update({
    "axes.titlesize": 18,
    "axes.labelsize": 13,
    "axes.titleweight": "bold",
    "figure.titlesize": 20,
    "xtick.labelsize": 11,
    "ytick.labelsize": 11,
})


def baseline_style():
    """Reset the Matplotlib/Seaborn style so every figure starts consistent."""
    sns.set_theme(style="whitegrid")
    plt.rcParams.update({
        "axes.titlesize": 18,
        "axes.labelsize": 13,
        "axes.titleweight": "bold",
        "figure.titlesize": 20,
        "xtick.labelsize": 11,
        "ytick.labelsize": 11,
        "legend.title_fontsize": 12,
        "legend.fontsize": 11,
    })
    return plt


def quick_peek(df, expected_columns=None, sample=3, label="DataFrame"):
    """Print a friendly snapshot so students can self-diagnose issues quickly."""
    print(f"
🔍 {label} preview")
    print(df.head(sample))
    print(f"Rows: {len(df):,} | Columns: {list(df.columns)}")
    if expected_columns:
        missing = [col for col in expected_columns if col not in df.columns]
        if missing:
            print(f"⚠️ Missing column(s): {missing}")
        else:
            print("✅ Columns match the expectation.")
    return df


def expect_rows_between(df, low, high, label="row count"):
    rows = len(df)
    if low <= rows <= high:
        print(f"✅ {label.title()} looks right: {rows:,}.")
    else:
        print(f"⚠️ {label.title()} looks off: {rows:,}. Expected between {low:,} and {high:,}.")
    return rows


def validate_story_elements(elements):
    missing = [key for key, value in elements.items() if not value or not str(value).strip()]
    if missing:
        print(f"⚠️ Please fill in these storytelling fields: {', '.join(missing)}")
    else:
        print("✅ Story scaffold is ready — every element is filled in.")
    return elements


def save_last_fig(filename, fig=None, dpi=300):
    """Save the latest Matplotlib figure with consistent export settings."""
    output_path = Path.cwd() / filename
    output_path.parent.mkdir(parents=True, exist_ok=True)
    if fig is None:
        fig = plt.gcf()
    if fig and getattr(fig, "axes", None):
        fig.savefig(output_path, dpi=dpi, bbox_inches="tight")
        print(f"💾 Saved figure to {output_path}")
    else:
        print("⚠️ No figure detected to save.")
    return output_path

baseline_style()


## Step 1 · Load renewable energy series
**Micro-task**: read each CSV, keep the `Entity`, `Year`, and share column, and check the world rows align. Tiered hints in the starter notebook walk through `pd.read_csv`, filtering, and renaming.

In [None]:

data_dir = Path.cwd() / "data"
files = {
    "total": "01 renewable-share-energy.csv",
    "hydro": "06 hydro-share-energy.csv",
    "wind": "10 wind-share-energy.csv",
    "solar": "14 solar-share-energy.csv",
}

datasets = {}
for key, filename in files.items():
    df_part = pd.read_csv(data_dir / filename)
    datasets[key] = df_part
    quick_peek(df_part, expected_columns=["Entity", "Code", "Year"], label=f"{key.title()} file snapshot")

print("Loaded", ', '.join(files.values()))


### Self-diagnostic: world filter
We focus on the global trend before exploring country splits.

In [None]:

df_total = datasets["total"]
df_world_total = df_total[df_total["Entity"] == "World"].copy()
quick_peek(df_world_total, expected_columns=["Entity", "Year", "Renewables (% equivalent primary energy)"], label="World renewables share")
expect_rows_between(df_world_total, 55, 60)


## Step 2 · Merge the core renewable technologies
This loop reinforces column alignment, joining, and tidy reshaping.

In [None]:

df_hydro = datasets["hydro"][datasets["hydro"]["Entity"] == "World"].rename(columns={"Hydro (% equivalent primary energy)": "Hydro"})
df_wind = datasets["wind"][datasets["wind"]["Entity"] == "World"].rename(columns={"Wind (% equivalent primary energy)": "Wind"})
df_solar = datasets["solar"][datasets["solar"]["Entity"] == "World"].rename(columns={"Solar (% equivalent primary energy)": "Solar"})

df_world = (
    df_world_total[["Year", "Renewables (% equivalent primary energy)"]]
    .rename(columns={"Renewables (% equivalent primary energy)": "Total Renewable"})
    .merge(df_hydro[["Year", "Hydro"]], on="Year", how="left")
    .merge(df_wind[["Year", "Wind"]], on="Year", how="left")
    .merge(df_solar[["Year", "Solar"]], on="Year", how="left")
    .sort_values("Year")
)

quick_peek(df_world, expected_columns=["Year", "Total Renewable", "Hydro", "Wind", "Solar"], label="Merged world table")


### Progress anchor
Reference image to help students gauge their intermediate plot.

In [None]:
display(Image(filename=str(Path.cwd() / 'plots' / 'day02_solution_plot.png')), width=420)

## Step 3 · Story-first chart checklist
The metadata scaffold keeps titles, subtitles, annotations, and sources consistent.

In [None]:

latest_year = int(df_world["Year"].max())
latest_share = df_world.loc[df_world["Year"] == latest_year, "Total Renewable"].iloc[0]
start_year = int(df_world["Year"].min())
start_share = df_world.loc[df_world["Year"] == start_year, "Total Renewable"].iloc[0]
TITLE = "Renewables Tripled, Yet Fossil Fuels Still Dominate"
SUBTITLE = f"Share of global primary energy from renewables, {start_year}–{latest_year}"
ANNOTATION = f"{latest_year}: {latest_share:.1f}% — up from {start_share:.1f}% in {start_year}"
SOURCE = "Our World in Data – Renewable energy share (BP & IEA synthesis)"
UNITS = "Share of total primary energy (%)"
ACCESSIBILITY_NOTES = "Line + stackplot use colorblind-safe palette; axes labelled; annotation highlights latest share."

validate_story_elements({
    "TITLE": TITLE,
    "SUBTITLE": SUBTITLE,
    "ANNOTATION": ANNOTATION,
    "SOURCE": SOURCE,
    "UNITS": UNITS,
    "ACCESSIBILITY_NOTES": ACCESSIBILITY_NOTES,
})


## Step 4 · Visualise the trend and composition
We combine a progress line (process quality) with a stacked area view (composition). Inline comments mirror the starter notebook hints.

In [None]:

baseline_style()

fig, (ax_line, ax_stack) = plt.subplots(2, 1, figsize=(12, 10), sharex=True)

# Line plot for overall renewable share
sns.lineplot(data=df_world, x="Year", y="Total Renewable", ax=ax_line, color="#2a9d8f", linewidth=2.5)
ax_line.fill_between(df_world["Year"], 0, df_world["Total Renewable"], color="#2a9d8f", alpha=0.2)
ax_line.set_title(f"{TITLE}
{SUBTITLE}", loc="left")
ax_line.set_ylabel(UNITS)
ax_line.axhline(10, color="#b5b5b5", linestyle="--", linewidth=1)
ax_line.text(0.01, -0.25, f"Source: {SOURCE}", transform=ax_line.transAxes, fontsize=10, color="#4f4f4f")
ax_line.text(0.01, -0.32, f"Notes: {ACCESSIBILITY_NOTES}", transform=ax_line.transAxes, fontsize=10, color="#4f4f4f")
ax_line.annotate(
    ANNOTATION,
    xy=(latest_year, latest_share),
    xytext=(latest_year - 20, latest_share + 5),
    arrowprops=dict(arrowstyle="->", color="#264653"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#b5b5b5"),
)

# Stacked area chart for technology mix
ax_stack.stackplot(
    df_world["Year"],
    df_world["Hydro"],
    df_world["Wind"],
    df_world["Solar"],
    labels=["Hydro", "Wind", "Solar"],
    colors=["#457b9d", "#e9c46a", "#f4a261"],
    alpha=0.85,
)
ax_stack.set_ylabel("Share of total energy (%)")
ax_stack.set_xlabel("Year")
ax_stack.legend(loc="upper left", frameon=False)
ax_stack.set_title("What fuels make up renewable energy?", loc="left")
ax_stack.grid(alpha=0.3)

plt.tight_layout()
plt.show()


### Export checkpoint

In [None]:
save_last_fig('plots/day02_solution_plot.png')

## Step 5 · Reflect on progress and gaps
- **Claim → Evidence → Visual → Takeaway**:
  - **Claim**: Renewable share has roughly tripled since the late 1960s but remains a minority of global energy.
  - **Evidence**: The line plot climbs from low single digits to the mid-teens; the stackplot shows hydro’s dominance alongside rapid wind and solar growth after 2000.
  - **Visual**: Dual-panel figure balancing long-run trend (top) with composition detail (bottom).
  - **Takeaway**: Momentum is building, yet the pace is still slower than climate pledges require.
- **Limitations**: Shares mask absolute demand growth; traditional biomass and emerging tech are excluded.
- **Potential misreads**: Avoid interpreting the stackplot as cumulative capacity; it shows part-to-whole shares.
- **Next questions**: How quickly must renewables grow to bend the emissions curve? Which countries are accelerating the mix fastest?

## Process quality checklist
✅ Loaded four aligned datasets • ✅ Filtered and merged with diagnostics • ✅ Completed story scaffold • ✅ Built accessible line + stack visuals • ✅ Reflected on limitations and policy angles.