## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%202_%20Energy%20Mix%20Trends.ipynb)

# ⚡ Day 2 – Exploring the Global Energy Mix

Today we practice multi-series storytelling: comparing the growth of **hydro, wind, solar, and total renewables** within the global energy system. The cadence stays the same — short loops that alternate reading, doing, and checking — but the datasets get wider and the narrative gets more nuanced.

> **Teacher sidebar — pacing & differentiation**  
> • Target time: ~50 minutes (story scaffold + four work loops + debrief).  
> • Diagnostic anchors: `quick_snapshot` after each merge surfaces column misalignment quickly.  
> • Differentiation: offer stretch goal to compute regional breakouts; support learners by pre-filtering the `World` subset in advance if needed.

## 🗺️ Roadmap for Today

Loop | Focus | What success looks like
--- | --- | ---
0 | Story scaffold | Claim and annotation drafted before coding
1 | Load + filter | World-only tables with expected columns
2 | Merge + tidy | One tidy DataFrame with % share columns
3 | Self-diagnose | Checks for ranges, nulls, and inflection points
4 | Visualize & interpret | Accessible multi-series chart + narrative takeaway

## 🗂️ Data Cards — Our World in Data Renewable Energy Series

- **Sources**: Our World in Data, based on BP Statistical Review and Ember (released 2023).  
- **Temporal coverage**: 1965 – 2022 (annual).  
- **Units**: Percent share of primary energy consumption.  
- **Method notes**: BP converts electricity to primary energy equivalents; solar/wind series become non-zero in late 20th century; revisions occur with BP annual updates.  
- **Known caveats**: Hydro dominates early decades; negative values are not possible; missing values reflect years before technology adoption.  
- **Update cadence**: Yearly, typically every June.

> 🎯 **Integrity cue**: Always state that these are shares of *primary energy*, not final electricity. Comparing percentages keeps the scale honest and avoids overstating the role of a single technology.

In [None]:
# Shared utilities for the DS4S course notebooks
        from pathlib import Path
        import pandas as pd
        import numpy as np
        import matplotlib.pyplot as plt
        from IPython.display import display
        import seaborn as sns

        plt.style.use('seaborn-v0_8-whitegrid')
        plt.rcParams.update({
            'figure.dpi': 120,
            'axes.titlesize': 16,
            'axes.labelsize': 13,
            'axes.titlepad': 12,
            'figure.figsize': (10, 5),
        })


        def load_csv(path: Path, **read_kwargs) -> pd.DataFrame:
            '''Load a CSV and report the basic shape.'''
            df = pd.read_csv(path, **read_kwargs)
            print(f"✅ Loaded {path.name} with {df.shape[0]:,} rows and {df.shape[1]} columns")
            return df


        def validate_columns(df: pd.DataFrame, required):
            missing = [col for col in required if col not in df.columns]
            if missing:
                raise ValueError(f"Missing columns: {missing}")
            print(f"✅ Columns present: {', '.join(required)}")


        def expect_rows_between(df: pd.DataFrame, low: int, high: int):
            rows = df.shape[0]
            if not (low <= rows <= high):
                raise ValueError(f"Row count {rows} outside expected range {low}-{high}")
            print(f"✅ Row count {rows} within expected {low}-{high}")


        def quick_snapshot(df: pd.DataFrame, name: str, n: int = 3):
            print(f"
{name} snapshot → shape={df.shape}")
            print("Columns:", list(df.columns))
            print("Nulls:
", df.isna().sum())
            display(df.head(n))


        def ensure_story_elements(title: str, subtitle: str, annotation: str, source: str, units: str):
            fields = {
                'TITLE': title,
                'SUBTITLE': subtitle,
                'ANNOTATION': annotation,
                'SOURCE': source,
                'UNITS': units,
            }
            missing = [key for key, value in fields.items() if not str(value).strip()]
            if missing:
                raise ValueError(f"Please complete these storytelling fields: {', '.join(missing)}")
            print("✅ Story scaffold complete →", ", ".join(f"{k}: {v}" for k, v in fields.items()))
            return fields


        def save_last_fig(filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            fig = plt.gcf()
            if not fig.axes:
                raise RuntimeError("Run the plotting cell before saving.")
            output_path = plots_dir / filename
            fig.savefig(output_path, dpi=300, bbox_inches='tight')
            print(f"📁 Saved figure to {output_path}")


        def save_plotly_fig(fig, filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            output_path = plots_dir / filename
            fig.write_html(str(output_path))
            print(f"📁 Saved interactive figure to {output_path}")

## 🔁 Loop 0 — Story scaffold (3 min)

Draft the narrative frame before touching data. What change are we trying to reveal?

In [None]:
TITLE = "Renewables Quadrupled Their Share of Global Energy"
SUBTITLE = "World primary energy mix, 1965–2022"
ANNOTATION = "Solar and wind now supply ~13% of global primary energy combined."
SOURCE = "Our World in Data (BP Statistical Review 2023, Ember Solar/Wind supplement)"
UNITS = "Share of primary energy (%)"

story_fields = ensure_story_elements(TITLE, SUBTITLE, ANNOTATION, SOURCE, UNITS)

## 🔁 Loop 1 — Load and filter the World aggregates (8 min)

We only need the `World` rows today. Use the shared helpers to load and validate each CSV.

In [None]:
data_dir = Path.cwd() / "data"
renewables = load_csv(data_dir / "01 renewable-share-energy.csv")
hydro = load_csv(data_dir / "06 hydro-share-energy.csv")
wind = load_csv(data_dir / "10 wind-share-energy.csv")
solar = load_csv(data_dir / "14 solar-share-energy.csv")

for frame in [renewables, hydro, wind, solar]:
    validate_columns(frame, ["Entity", "Code", "Year", frame.columns[-1]])

world_frames = {
    "Total Renewable": renewables.query("Entity == 'World'")[["Year", "Renewables (% equivalent primary energy)"]],
    "Hydro": hydro.query("Entity == 'World'")[["Year", "Hydro (% equivalent primary energy)"]],
    "Wind": wind.query("Entity == 'World'")[["Year", "Wind (% equivalent primary energy)"]],
    "Solar": solar.query("Entity == 'World'")[["Year", "Solar (% equivalent primary energy)"]],
}
for name, df in world_frames.items():
    expect_rows_between(df, 50, 70)

In [None]:
for label, df in world_frames.items():
    quick_snapshot(df.tail(), name=f"{label} share (tail)")

## 🔁 Loop 2 — Merge and tidy for plotting (8 min)

Merge the technology-specific series into a single tidy DataFrame, then reshape to long form for plotting and diagnostics.

In [None]:
energy_wide = world_frames["Total Renewable"].rename(columns={"Renewables (% equivalent primary energy)": "Total Renewable"})
for name, df in world_frames.items():
    if name == "Total Renewable":
        continue
    col_name = df.columns[-1]
    energy_wide = energy_wide.merge(df.rename(columns={col_name: name}), on="Year", how="left")

ergy_wide["Year"] = energy_wide["Year"].astype(int)
energy_long = energy_wide.melt(id_vars="Year", var_name="Source", value_name="Share")
energy_long["Share"] = energy_long["Share"].clip(lower=0)
energy_long = energy_long.dropna()
quick_snapshot(energy_wide.tail(), name="Merged energy shares")

## 🔁 Loop 3 — Self-diagnose with quick checks (6 min)

Look for data quality issues before plotting:

- Shares should stay between 0 and 100.  
- Total renewable share should exceed the sum of solar + wind alone.  
- Growth should accelerate after ~2000.

In [None]:
assert energy_long["Share"].between(0, 100).all(), "Energy shares should stay within 0-100%."
recent = energy_wide.query("Year >= 2000")
print("Average renewable share since 2000: {:.1f}%".format(recent["Total Renewable"].mean()))
print(
    "Solar + Wind share 2022: {:.1f}%".format(
        energy_wide.loc[energy_wide["Year"].idxmax(), ["Wind", "Solar"]].sum()
    )
)
quick_snapshot(energy_long.query("Year in [1970, 2000, 2022]").head(6), name="Sample long-form rows")

## 🔁 Loop 4 — Visualize responsibly (12 min)

Use a colorblind-safe palette, label every axis, and highlight the inflection when wind and solar accelerate. Finish with the Claim → Evidence table.

In [None]:
sns.set_theme(style="whitegrid")
color_map = dict(
    zip(
        ["Total Renewable", "Hydro", "Wind", "Solar"],
        sns.color_palette("colorblind", 4),
    )
)
fig, ax = plt.subplots()
for source, group in energy_long.groupby("Source"):
    ax.plot(group["Year"], group["Share"], label=source, linewidth=2.5, color=color_map[source])
ax.set_title(TITLE)
ax.set_xlabel("Year")
ax.set_ylabel(UNITS)
ax.set_ylim(0, max(energy_wide[["Total Renewable", "Hydro", "Wind", "Solar"]].max()) + 5)
ax.legend(title="Source", loc="upper left")
inflection_year = 2010
inflection_value = energy_wide.set_index("Year").loc[inflection_year, "Wind"] + energy_wide.set_index("Year").loc[inflection_year, "Solar"]
ax.annotate(
    ANNOTATION,
    xy=(2022, energy_wide.set_index("Year").loc[2022, ["Wind", "Solar"]].sum()),
    xytext=(1995, 25),
    arrowprops=dict(arrowstyle="->", color="#444444"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#666666"),
)
ax.text(
    0.01,
    -0.2,
    f"{SOURCE} | Units: {UNITS}",
    transform=ax.transAxes,
    fontsize=10,
    color="#555555",
)
plt.suptitle(SUBTITLE, fontsize=13, y=1.02, color="#444444")
plt.tight_layout()
plt.show()

In [None]:
assert energy_wide["Total Renewable"].iloc[-1] >= energy_wide[["Wind", "Solar"]].sum(axis=1).iloc[-1]
print("Latest total renewable share: {:.1f}%".format(energy_wide["Total Renewable"].iloc[-1]))

In [None]:
from IPython.display import Markdown

claim = "Renewables moved from niche to mainstream in the global energy mix."
evidence = (
    "Total renewable share climbed from ~6% in 1965 to nearly 15% today, while wind+solar jumped from near-zero to double digits."
)
visual = "Four-line Matplotlib chart with colorblind-safe palette and annotated 2022 milestone."
takeaway = "Energy transitions compound slowly, so celebrating growth means acknowledging hydro’s legacy and the recent surge of wind+solar."
Markdown(
    f"""
| Claim | Evidence | Visual | Takeaway |
| --- | --- | --- | --- |
| {claim} | {evidence} | {visual} | {takeaway} |
"""
)

## 💾 Save the figure for the teacher dashboard

In [None]:
save_last_fig("day02_solution_plot.png")

## ✅ Exit Ticket

- Which renewable technology changed the fastest after 2010?  
- What questions would you ask about regions hidden inside the `World` aggregate?  
- How might you extend this chart for fast finishers (e.g., add stacked area or regional comparison)?