## 🔗 Open Solution in Google Colab

[![Open Solution in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day02/solution/day02_solution.ipynb)

# ⚡ Day 2 – Tracking the Renewable Energy Transition

Today you will layer multiple datasets to see how the world’s energy mix is changing. Each step builds toward a story-ready comparison of total renewable share and the technologies driving it.

### Data Card — Our World in Data: Energy Mix (1965–2022)

| Dataset | Coverage | Units | Notes & Caveats |
| --- | --- | --- | --- |
| `01 renewable-share-energy.csv` | Global + countries, annual | % of primary energy from renewables | Derived from BP Statistical Review; revisions released yearly. |
| `06 hydro-share-energy.csv` | Global + countries, annual | % share | Hydro share of total primary energy. |
| `10 wind-share-energy.csv` | Global + countries, annual | % share | Wind generation share; minor gaps interpolated. |
| `14 solar-share-energy.csv` | Global + countries, annual | % share | Solar share; early years sparse (<1990). |
| Metadata | Updated July 2024 | Percent of primary energy consumption | Shares may not sum to 100% because other renewables (bioenergy, geothermal) are excluded. |


### Step 1 · Imports and shared helpers

We will use pandas for wrangling plus Matplotlib/Seaborn for plotting.

In [None]:
# Ensure shared helpers are available when running on Google Colab
import pathlib
import urllib.request

UTILS_PATH = pathlib.Path("utils.py")
if not UTILS_PATH.exists():
    UTILS_URL = "https://raw.githubusercontent.com/DavidLangworthy/ds4s/master/utils.py"
    print("Fetching shared helper module…")
    UTILS_PATH.write_bytes(urllib.request.urlopen(UTILS_URL).read())

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from utils import (
    baseline_style,
    expect_rows_between,
    load_csv,
    quick_check,
    save_last_fig,
    validate_columns,
    validate_story_elements,
)

sns.set_palette("deep")
baseline_style()


### Step 2 · Load the global renewable share dataset

Start with the total renewable share to anchor the analysis.

In [None]:
renewable_total = load_csv("data/01 renewable-share-energy.csv")
quick_check(renewable_total, name="Renewable share (all entities)")


<details>
<summary>Need a nudge?</summary>

- Expect columns like `Entity`, `Code`, `Year`, `Renewables (% equivalent primary energy)`.
- If your dataset looks empty, check the relative path and quoting (note the leading `01`).

</details>

### Step 3 · Focus on the world aggregate

Filtering early prevents mix-ups when merging technology-level series.

In [None]:
world_total = renewable_total.loc[
    renewable_total["Entity"] == "World",
    ["Year", "Renewables (% equivalent primary energy)"]
].rename(columns={"Renewables (% equivalent primary energy)": "TotalRenewables"})

validate_columns(world_total, ["Year", "TotalRenewables"])
expect_rows_between(world_total, minimum=50, maximum=70)
quick_check(world_total.tail(), name="World renewable share")


### Step 4 · Load the technology-specific shares

Repeat the pattern for hydro, wind, and solar.

In [None]:
hydro = load_csv("data/06 hydro-share-energy.csv")
wind = load_csv("data/10 wind-share-energy.csv")
solar = load_csv("data/14 solar-share-energy.csv")

for label, frame in {"Hydro": hydro, "Wind": wind, "Solar": solar}.items():
    quick_check(frame.head(), name=f"{label} share (preview)")


<details>
<summary>Check your columns</summary>

- Hydro column: `Hydro (% equivalent primary energy)`
- Wind column: `Wind (% equivalent primary energy)`
- Solar column: `Solar (% equivalent primary energy)`

</details>

### Step 5 · Align on the world totals and merge

Keep each series tidy before combining.

In [None]:
world_sources = []
for label, frame, value_col in [
    ("Hydro", hydro, "Hydro (% equivalent primary energy)"),
    ("Wind", wind, "Wind (% equivalent primary energy)"),
    ("Solar", solar, "Solar (% equivalent primary energy)"),
]:
    subset = frame.loc[frame["Entity"] == "World", ["Year", value_col]].rename(columns={value_col: label})
    world_sources.append(subset)

world_mix = world_total.copy()
for piece in world_sources:
    world_mix = world_mix.merge(piece, on="Year", how="left")

validate_columns(world_mix, ["Year", "TotalRenewables", "Hydro", "Wind", "Solar"])
quick_check(world_mix.tail(), name="World renewable mix")


### Step 6 · Reshape for flexible plotting

A long format simplifies stacked areas and grouped bars.

In [None]:
world_long = world_mix.melt(
    id_vars="Year",
    value_vars=["Hydro", "Wind", "Solar"],
    var_name="Technology",
    value_name="Share",
)
quick_check(world_long.tail(), name="Long-format renewables")


### Step 7 · Story checklist

Fill in the narrative scaffolding before rendering charts.

In [None]:
story = {
    "title": "Renewables Triple, but Fossil Fuels Still Dominate",
    "subtitle": "Global share of primary energy from renewables and the mix of hydro, wind, and solar (1965–2022)",
    "annotation": "Wind and solar now provide >6% combined—small but accelerating since 2005.",
    "source": "Our World in Data, BP Statistical Review 2024 edition",
    "units": "% of primary energy",
}
validate_story_elements(story)


### Step 8 · Line plot of the global renewable share

Confirm the curve climbs steadily after 2000.

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(world_mix["Year"], world_mix["TotalRenewables"], color="#2a9d8f", linewidth=2.2, marker="o", label="Total renewables")
ax.fill_between(world_mix["Year"], world_mix["TotalRenewables"], color="#2a9d8f", alpha=0.1)
ax.set_title(story["title"], pad=14)
ax.set_xlabel("Year")
ax.set_ylabel(story["units"])
ax.grid(alpha=0.3)
ax.text(0.01, -0.2, f"Source: {story['source']}", transform=ax.transAxes, fontsize=9, ha="left")
annotation_y = world_mix.loc[world_mix["Year"] == 2015, "TotalRenewables"].iloc[0]
ax.annotate(
    story["annotation"],
    xy=(2015, annotation_y),
    xytext=(1995, 22),
    arrowprops=dict(arrowstyle="->", color="#555"),
    fontsize=11,
)
plt.tight_layout()
plt.show()


<details>
<summary>Quick diagnostic</summary>

- Hovering around 5% in 2000, reaching ~15% by 2022.
- If the line is flat, double-check you filtered for `World`.

</details>

### Step 9 · Stacked area chart for the technology mix

Expect hydro to dominate early years, with wind and solar climbing after 2005.

In [None]:
fig_stack, ax_stack = plt.subplots(figsize=(10, 5))
tech_order = ["Hydro", "Wind", "Solar"]
colors = ["#264653", "#e76f51", "#f4a261"]
ax_stack.stackplot(
    world_mix["Year"],
    *(world_mix[tech] for tech in tech_order),
    labels=tech_order,
    colors=colors,
    alpha=0.85,
)
ax_stack.set_xlabel("Year")
ax_stack.set_ylabel(story["units"])
ax_stack.set_title("What powers global renewables?")
ax_stack.legend(loc="upper left")
ax_stack.grid(alpha=0.25)
plt.tight_layout()
plt.show()


### Step 10 · Compare the latest year with a bar chart

A grouped bar clarifies the current composition.

In [None]:
latest_year = int(world_mix["Year"].max())
latest_mix = world_long.loc[world_long["Year"] == latest_year]
fig_bar, ax_bar = plt.subplots(figsize=(6, 4))
sns.barplot(data=latest_mix, x="Technology", y="Share", palette=colors, ax=ax_bar)
ax_bar.set_title(f"Renewable mix in {latest_year}")
ax_bar.set_ylabel(story["units"])
ax_bar.set_xlabel("Technology")
ax_bar.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()


### Step 11 · Save your favourite figure

Pick the chart that best communicates your story before moving on.

In [None]:
save_last_fig(fig_stack, "plots/day02_solution_stack.png")


### Step 12 · Reflection prompts

- Which checkpoints surfaced issues before plotting?
- What headline will you give this comparison?
- Identify one uncertainty (e.g., missing other renewables) to note in discussion.