## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day02/notebook/day02_starter.ipynb)

# ⚡ Day 2 – Tracking the Energy Transition
### Comparing global renewable energy growth against legacy sources

Today we build on yesterday's workflow and practice layering multiple datasets about the global energy mix.
By the end of the lab you will quantify how quickly renewables are growing, which technologies drive the change,
and how the mix still compares with fossil fuels.

### 🗂️ Data card — Our World in Data energy mix series
- **Source:** Our World in Data, based on BP Statistical Review of World Energy (2023 edition)
- **Temporal coverage:** 1965–2022 (annual)
- **Geography:** Global aggregate (Entity = "World")
- **Units:** Share of primary energy consumption (% of total)
- **Collection notes:** Shares are calculated on an energy-equivalent basis and may not sum to 100% due to rounding
- **Last updated:** Published July 2023; later revisions may update recent years
- **Caveats:** Hydropower includes large-scale projects only; bioenergy not included in this subset; totals omit traditional biomass
- **Mindful design:** Compare percentages on the same baseline; label technologies clearly to avoid confusion.

### 1. Set up the environment

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

pd.options.display.float_format = "{:.2f}".format

In [None]:
# Shared helper utilities used throughout the week.
from __future__ import annotations

import warnings
from pathlib import Path
from typing import Iterable, Mapping

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


def resolve_data_dir(max_up: int = 5) -> Path:
    """Locate the project-level ``data`` directory regardless of execution location."""
    here = Path.cwd()
    for _ in range(max_up + 1):
        candidate = here / "data"
        if candidate.exists():
            return candidate
        here = here.parent
    raise FileNotFoundError(
        "Could not find a 'data' directory relative to this notebook.
"
        "If you are running in Colab, mount your drive or upload the data folder first."
    )


DATA_DIR = resolve_data_dir()
PROJECT_ROOT = DATA_DIR.parent
PLOTS_DIR = PROJECT_ROOT / "plots"
PLOTS_DIR.mkdir(parents=True, exist_ok=True)


def baseline_style() -> None:
    """Apply a consistent, high-contrast visual style that is colorblind-friendly."""
    sns.set_theme(style="whitegrid", context="talk", font_scale=0.9)
    plt.rcParams.update(
        {
            "figure.dpi": 120,
            "axes.titlesize": 16,
            "axes.labelsize": 13,
            "legend.fontsize": 11,
            "axes.titleweight": "semibold",
        }
    )


def load_data(filename: str | Path, **kwargs) -> pd.DataFrame:
    """Read a CSV file from the shared data directory and report its shape."""
    path = Path(filename)
    if not path.exists():
        path = DATA_DIR / filename
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {path.name} with shape {df.shape}.")
    return df


def validate_columns(df: pd.DataFrame, required: Iterable[str], *, context: str = "") -> None:
    missing = [col for col in required if col not in df.columns]
    if missing:
        warnings.warn(
            f"Missing expected columns {missing} in {context or 'dataframe'}.
"
            "Double-check your renaming and loading steps before moving on."
        )
    else:
        print(f"✅ Columns look good: {list(required)}")


def expect_rows_between(df: pd.DataFrame, low: int, high: int, *, label: str = "rows") -> None:
    count = len(df)
    if not (low <= count <= high):
        warnings.warn(
            f"{label} check: expected between {low:,} and {high:,} but found {count:,}."
        )
    else:
        print(f"✅ {label} check: {count:,} rows is within the expected range.")


def quick_diagnose(df: pd.DataFrame, *, sample: int = 3) -> None:
    print("
Preview of the current dataframe:")
    display(df.head(sample))
    print("
Null values by column:")
    print(df.isna().sum())


def validate_story_fields(fields: Mapping[str, str]) -> None:
    missing = [name for name, value in fields.items() if not str(value).strip()]
    if missing:
        warnings.warn(
            "The following story fields are blank: " + ", ".join(missing) +
            "
Fill them in so your chart has a clear narrative frame."
        )
    else:
        print("✅ Narrative checklist complete.")


def save_last_fig(fig: plt.Figure | None, filename: str) -> Path | None:
    if fig is None:
        fig = plt.gcf()
    if fig and getattr(fig, "axes", None):
        output_path = PLOTS_DIR / filename
        fig.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path.relative_to(PROJECT_ROOT)}")
        return output_path
    warnings.warn("No matplotlib figure available to save yet.")
    return None


### 2. Load renewable energy datasets
We pull total renewable share plus hydro, wind, and solar components, then confirm the schema.

In [None]:
renewables_total = load_data("01 renewable-share-energy.csv")
hydro = load_data("06 hydro-share-energy.csv")
wind = load_data("10 wind-share-energy.csv")
solar = load_data("14 solar-share-energy.csv")

In [None]:
required_cols = ["Entity", "Year"]
validate_columns(renewables_total, required_cols + ["Renewables (% equivalent primary energy)"])
validate_columns(hydro, required_cols + ["Hydro (% equivalent primary energy)"])
validate_columns(wind, required_cols + ["Wind (% equivalent primary energy)"])
validate_columns(solar, required_cols + ["Solar (% equivalent primary energy)"])

### 3. Focus on the world aggregate
Filter for the global totals, keep relevant columns, and verify the year coverage before merging.

In [None]:
def world_slice(df: pd.DataFrame, value_col: str) -> pd.DataFrame:
    return (
        df[df["Entity"] == "World"]
        .loc[:, ["Year", value_col]]
        .dropna()
        .sort_values("Year")
        .reset_index(drop=True)
    )


world_total = world_slice(renewables_total, "Renewables (% equivalent primary energy)")
world_hydro = world_slice(hydro, "Hydro (% equivalent primary energy)")
world_wind = world_slice(wind, "Wind (% equivalent primary energy)")
world_solar = world_slice(solar, "Solar (% equivalent primary energy)")

for label, df in {
    "total": world_total,
    "hydro": world_hydro,
    "wind": world_wind,
    "solar": world_solar,
}.items():
    expect_rows_between(df, 55, 60, label=f"{label} yearly records")
    quick_diagnose(df.tail(3))

### 4. Assemble a tidy table for plotting
Combine the series, calculate non-renewable share, and create a long-form version for stacked visuals.

In [None]:
renewables_world = (
    world_total
    .rename(columns={"Renewables (% equivalent primary energy)": "renewable_share_pct"})
    .merge(world_hydro.rename(columns={"Hydro (% equivalent primary energy)": "hydro_pct"}), on="Year")
    .merge(world_wind.rename(columns={"Wind (% equivalent primary energy)": "wind_pct"}), on="Year")
    .merge(world_solar.rename(columns={"Solar (% equivalent primary energy)": "solar_pct"}), on="Year")
)
renewables_world["non_renewable_pct"] = 100 - renewables_world["renewable_share_pct"]
renewables_world["modern_renewables_pct"] = renewables_world["wind_pct"] + renewables_world["solar_pct"]

quick_diagnose(renewables_world.tail())
tidy_mix = renewables_world.melt(
    id_vars="Year",
    value_vars=["hydro_pct", "wind_pct", "solar_pct"],
    var_name="technology",
    value_name="share_pct",
)
quick_diagnose(tidy_mix.head())

### 5. Define the storytelling frame
Clarify the claim and ensure title, subtitle, annotation, source, and units are ready before plotting.

In [None]:
TITLE = "Renewables are finally bending the energy curve"
SUBTITLE = "Global share of primary energy from hydro, wind, and solar (1965–2022)"
ANNOTATION = "Wind and solar accelerate after 2000 but hydropower still carries half of renewables"
SOURCE = "Source: Our World in Data (BP Statistical Review 2023)"
UNITS = "Share of primary energy consumption (%)"

validate_story_fields({
    "TITLE": TITLE,
    "SUBTITLE": SUBTITLE,
    "ANNOTATION": ANNOTATION,
    "SOURCE": SOURCE,
    "UNITS": UNITS,
})

### 6. Visualize with checkpoints
Use a two-panel figure: top shows renewables vs. everything else, bottom details the technology mix.

In [None]:
baseline_style()

palette = {
    "renewables": "#0072B2",
    "non_renewables": "#999999",
    "hydro_pct": "#56B4E9",
    "wind_pct": "#009E73",
    "solar_pct": "#F0E442",
}

fig, axes = plt.subplots(2, 1, figsize=(11, 9), sharex=True, gridspec_kw={"height_ratios": [1, 1.2]})

ax_top = axes[0]
ax_top.plot(
    renewables_world["Year"],
    renewables_world["renewable_share_pct"],
    color=palette["renewables"],
    linewidth=2.2,
    label="Renewables",
)
ax_top.plot(
    renewables_world["Year"],
    renewables_world["non_renewable_pct"],
    color=palette["non_renewables"],
    linewidth=1.5,
    linestyle="--",
    label="Everything else",
)
ax_top.set_ylabel("Share of energy (%)")
ax_top.legend(loc="upper right", frameon=False)
ax_top.set_title(TITLE, loc="left")
ax_top.text(0.01, 0.05, SUBTITLE, transform=ax_top.transAxes, fontsize=11, ha="left")

ax_bottom = axes[1]
ax_bottom.stackplot(
    renewables_world["Year"],
    renewables_world["hydro_pct"],
    renewables_world["wind_pct"],
    renewables_world["solar_pct"],
    labels=["Hydro", "Wind", "Solar"],
    colors=[palette["hydro_pct"], palette["wind_pct"], palette["solar_pct"]],
    alpha=0.85,
)
ax_bottom.set_ylabel("Renewable share (%)")
ax_bottom.legend(loc="upper left", frameon=False)
ax_bottom.text(0.01, -0.18, f"{SOURCE} · {ANNOTATION}", transform=ax_bottom.transAxes, fontsize=10, ha="left")
ax_bottom.set_xlabel("Year")

latest_year = renewables_world["Year"].iloc[-1]
latest_share = renewables_world["renewable_share_pct"].iloc[-1]
ax_top.annotate(
    f"{latest_year}: {latest_share:.1f}% renewables",
    xy=(latest_year, latest_share),
    xytext=(latest_year - 15, latest_share + 5),
    arrowprops=dict(arrowstyle="->", color="#333333"),
)

plt.tight_layout()
plt.show()
final_fig_path = save_last_fig(fig, "day02_solution_plot.png")

### 7. Interpret responsibly
- **Key takeaway:** Renewables tripled their share of global energy since 2000, yet still supply <15% of total demand. Hydropower remains the backbone while wind and solar now drive growth.
- **Uncertainty & caveats:** Share estimates depend on BP's energy accounting; the data exclude traditional biomass and newer technologies like geothermal; recent years may revise.
- **What this plot cannot tell us:** It omits regional disparities, storage constraints, and absolute consumption volumes—pair with country-level data or per-capita metrics for deeper insight.

### 8. Process micro-rubric
| Step | Evidence of completion |
| --- | --- |
| Data loaded & validated | Required columns confirmed for all four CSVs |
| Cleaning documented | World slices checked for expected year counts |
| Story frame filled | Narrative checklist (title, subtitle, annotation, source, units) completed |
| Visualization reviewed | Two-panel layout with colorblind-safe palette and annotations |
| Reflection written | Takeaway plus limitations articulated |