## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/main/days/day02/notebook/day02_starter.ipynb)

# ⚡ Day 2 – Following the Renewable Energy Transition
Today you will deconstruct how the world's energy mix has shifted from hydro toward wind and solar.

## 🧾 Data Card – Our World in Data: Renewable Energy Shares
- **Source:** [Our World in Data – Energy](https://ourworldindata.org/energy).
- **Temporal coverage:** 1965–2022 (annual).
- **Units:** Percentage of primary energy consumption.
- **Update cadence:** Updated annually as BP Statistical Review releases new data.
- **Method notes:** Shares computed using primary energy equivalents for each technology.
- **Caveats:** Hydroelectric share dominates early years; values are rounded to one decimal place.

## 🧭 Story Scaffold
Keep this checklist nearby as you explore.
- **Claim:** What shift in the renewable mix stands out?
- **Evidence:** Which metrics (e.g., growth rates, totals) support it?
- **Visual:** How will you encode trend plus composition?
- **Takeaway:** What should audiences remember about pace and scale?

In [None]:
from __future__ import annotations

from pathlib import Path
import sys

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

for candidate in [Path.cwd(), Path.cwd().parent, Path.cwd().parent.parent]:
    utils_path = candidate / "utils.py"
    if utils_path.exists():
        if str(candidate) not in sys.path:
            sys.path.insert(0, str(candidate))
        break
else:
    raise FileNotFoundError("Unable to locate utils.py. Did you download the full project?")

from utils import (
    baseline_style,
    diagnose_dataframe,
    expect_rows_between,
    load_data,
    save_last_fig,
    validate_columns,
    validate_story_elements,
)

baseline_style()
sns.set_palette("colorblind")


In [None]:
# Example: melting wide data into a long format for plotting
example_df = pd.DataFrame(
    {
        "Year": [2019, 2020, 2021],
        "Hydro": [7.1, 7.0, 6.8],
        "Wind": [2.2, 2.6, 3.0],
    }
)
example_long = example_df.melt(id_vars="Year", var_name="Type", value_name="Share")
example_long


In [None]:
# Step 1 – Load each renewable energy dataset
energy_sources = {
    "Total": "01 renewable-share-energy.csv",
    "Hydro": "06 hydro-share-energy.csv",
    "Wind": "10 wind-share-energy.csv",
    "Solar": "14 solar-share-energy.csv",
}

datasets = {
    name: load_data(filename)
    for name, filename in energy_sources.items()
}

# TODO: Populate the datasets dictionary using load_data


<details>
<summary>Hint for loading files</summary>
<ul>
<li>Iterate over the <code>energy_sources</code> dictionary.</li>
<li>Call <code>load_data</code> so the CSV works on GitHub or locally.</li>
<li>Store each dataframe in the <code>datasets</code> dictionary keyed by its friendly name.</li>
</ul>
</details>

In [None]:
# Step 2 – Keep only the world-level rows and tidy the columns
metric_lookup = {
    "Total": "Renewables (% equivalent primary energy)",
    "Hydro": "Hydro (% equivalent primary energy)",
    "Wind": "Wind (% equivalent primary energy)",
    "Solar": "Solar (% equivalent primary energy)",
}

world_frames = []
for name, frame in datasets.items():
    column = metric_lookup[name]
    filtered = (
        frame.loc[frame["Entity"] == "World", ["Year", column]]
        .rename(columns={column: "Share"})
        .assign(Technology=name)
    )
    world_frames.append(filtered)

renewables_long = pd.concat(world_frames, ignore_index=True)

# TODO: Filter for Entity == "World", rename to Share, and track the technology name


<details>
<summary>Need a push on the tidy step?</summary>
<ul>
<li>Each CSV uses the technology name inside parentheses.</li>
<li>Select just <code>Year</code> and that column, then rename it to <code>Share</code>.</li>
<li>Add a <code>Technology</code> column so you can combine the tables later.</li>
</ul>
</details>

In [None]:
# Step 3 – Diagnostics
diagnose_dataframe(renewables_long, name="Renewable energy shares (World)")
validate_columns(renewables_long, ["Year", "Share", "Technology"], name="renewables_long")
expect_rows_between(renewables_long, 600, 800, name="renewables_long")


In [None]:
# Step 4 – Summaries to inform the story
latest_year = renewables_long["Year"].max()
composition_latest = renewables_long.query("Year == @latest_year and Technology != 'Total'")
share_growth = (
    renewables_long.query("Technology == 'Total'")
    .assign(Change=lambda df: df["Share"].diff())
    .tail(10)
)
composition_latest


In [None]:
# Step 5 – Prepare data for the final chart
tech_stack = (
    renewables_long
    .query("Technology != 'Total'")
    .pivot(index="Year", columns="Technology", values="Share")
    .fillna(0)
    .sort_index()
)

total_trend = (
    renewables_long
    .query("Technology == 'Total'")
    .set_index("Year")["Share"]
    .sort_index()
)

# TODO: Create tech_stack (Year x Technology) and total_trend series


<details>
<summary>Stack vs. trend hint</summary>
<ul>
<li>Use <code>pivot</code> to turn the long format into columns per technology.</li>
<li>Wind and solar start later; <code>fillna(0)</code> keeps the stackplot tidy.</li>
<li>Build <code>total_trend</code> as a Series indexed by year for the line overlay.</li>
</ul>
</details>

In [None]:
# Step 6 – Story metadata for every chart
TITLE = "Renewables Are Accelerating, but Hydro Still Leads"
SUBTITLE = "World energy mix, 1965–2022 (Our World in Data / BP Statistical Review)"
ANNOTATION = "Wind and solar finally rival hydro's contribution after 2015."
SOURCE = "Our World in Data (downloaded 2024-04-15)"
UNITS = "Share of primary energy (%)"

validate_story_elements(
    {
        "TITLE": TITLE,
        "SUBTITLE": SUBTITLE,
        "ANNOTATION": ANNOTATION,
        "SOURCE": SOURCE,
        "UNITS": UNITS,
    }
)

# TODO: Edit each metadata string with your own wording


In [None]:
# Step 7 – Compose the story-first chart
fig, ax = plt.subplots(figsize=(11, 6))
ax.stackplot(
    tech_stack.index,
    tech_stack["Hydro"],
    tech_stack["Wind"],
    tech_stack["Solar"],
    labels=["Hydro", "Wind", "Solar"],
    colors=["#4c72b0", "#55a868", "#c44e52"],
    alpha=0.75,
)
ax.plot(total_trend.index, total_trend.values, color="#1f77b4", linewidth=2.5, label="Total share")
ax.set_title(TITLE)
ax.set_xlabel(f"Year — {SUBTITLE}")
ax.set_ylabel(f"{UNITS}")
ax.legend(loc="upper left", frameon=False)
ax.text(0.01, -0.2, f"Source: {SOURCE}", transform=ax.transAxes)
annotation_year = 2016 if 2016 in total_trend.index else int(total_trend.index[-1])
ax.annotate(
    ANNOTATION,
    xy=(annotation_year, total_trend.loc[annotation_year]),
    xytext=(1990, total_trend.max() - 2),
    arrowprops={"arrowstyle": "->", "color": "#333333"},
    fontsize=11,
)
fig.tight_layout()
energy_fig = fig
fig

# TODO: Layer the stackplot with the total trend and annotations


In [None]:
# Step 8 – Final check and save option
validate_story_elements(
    {
        "TITLE": TITLE,
        "SUBTITLE": SUBTITLE,
        "ANNOTATION": ANNOTATION,
        "SOURCE": SOURCE,
        "UNITS": UNITS,
    }
)
save_last_fig("day02_renewables_transition.png", fig=energy_fig)
