## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%202%20%E2%80%93%20Fossil%20Fuels%20vs.%20Renewables.ipynb)

# ⚡ Day 2 – Exploring the Energy Transition
### Compare fossil dependence and renewable momentum with richer visuals

Today we keep the same rhythm: learn a little, do a little, check immediately. Our focus shifts to the **global energy mix**—how quickly renewables are catching up to fossil fuels.

---

## 🧠 Learning Rhythm
- 🔁 Four mini-loops walk from loading multi-table data to a polished stacked area chart.
- 🧪 Diagnostics flag missing columns or mismatched years before plotting time.
- 🧮 Stretch prompts invite fast finishers to compare specific regions or add moving averages.

> **Teacher Sidecar**: This lab fits in ~50 minutes. Circulate after Loop 3 to help students who see NaNs—usually a merge issue.

## 📇 Data Card — Our World in Data: Global Energy Shares
- **Source**: Our World in Data compilation of BP Statistical Review (2023 edition).
- **Temporal coverage**: 1965–2022 (annual).
- **Metrics**: Share of primary energy (%) from renewables, plus hydro, wind, and solar components.
- **Last updated**: June 2023.
- **Caveats**: Fossil fuels still dominate; shares below 0.1% before 1990 may appear as zeros after rounding.

## 🧵 Story Scaffold (Claim → Evidence → Visual → Takeaway)
- **Claim**: Renewables are growing fast but remain a minority of the global energy mix.
- **Evidence to gather**: Total renewable share and the contributions from hydro, wind, and solar.
- **Visual plan**: Stacked area chart with an overlay line for the total share.
- **Takeaway**: Even with rapid growth, renewables supply <20% of global energy, signaling the transition gap.


In [None]:

from __future__ import annotations

from pathlib import Path
from typing import Any, Mapping, Sequence

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import display

DATA_DIR = Path.cwd() / "data"

sns.set_theme(style="whitegrid", font_scale=1.1)
plt.rcParams.update({
    "axes.titlesize": 16,
    "axes.labelsize": 13,
    "axes.grid": True,
    "figure.figsize": (11, 6),
    "figure.dpi": 120,
})

def ping_environment(packages: Mapping[str, object]) -> None:
    """Print library versions so teachers can confirm the runtime."""
    for label, module in packages.items():
        version = getattr(module, "__version__", "built-in")
        print(f"{label}: {version}")
    print("Environment check complete ✅")

def load_data(file_name: str, /, **kwargs) -> pd.DataFrame:
    """Load a CSV from the shared data folder with a friendly status message."""
    path = DATA_DIR / file_name
    if not path.exists():
        raise FileNotFoundError(f"Expected data file at {path}")
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {file_name} → shape {df.shape}")
    return df

def validate_columns(df: pd.DataFrame, required: Sequence[str]) -> pd.DataFrame:
    missing = [col for col in required if col not in df.columns]
    if missing:
        raise ValueError(f"Missing expected columns: {missing}")
    print(f"Columns validated ✅ {list(required)}")
    return df

def expect_rows_between(df: pd.DataFrame, lower: int, upper: int, label: str = "rows") -> pd.DataFrame:
    n_rows = len(df)
    if not (lower <= n_rows <= upper):
        raise ValueError(
            f"Unexpected {label}: {n_rows} (expected between {lower} and {upper})"
        )
    print(f"{label.capitalize()} check ✅ {n_rows} (expected {lower}-{upper})")
    return df

def quick_peek(df: pd.DataFrame, n: int = 5) -> pd.DataFrame:
    """Display a head preview and NA counts for formative assessment."""
    display(df.head(n))
    print("Null values per column:")
    print(df.isna().sum())
    return df

def ensure_metadata(**metadata: str) -> None:
    blanks = [key for key, value in metadata.items() if not str(value).strip()]
    if blanks:
        raise ValueError(f"Please fill in metadata fields: {blanks}")
    print("Story metadata looks great ✅")

def annotate_source(ax: plt.Axes, *, source: str, units: str) -> plt.Axes:
    ax.text(
        0.0,
        -0.22,
        f"Source: {source}
Units: {units}",
        transform=ax.transAxes,
        ha="left",
        fontsize=10,
    )
    return ax

def _resolve_fig(fig: Any | None) -> Any:
    if fig is not None:
        return fig
    if plt.get_fignums():
        return plt.gcf()
    return None

def save_last_fig(fig: Any | None, filename: str) -> Path:
    plots_dir = Path.cwd() / "plots"
    plots_dir.mkdir(parents=True, exist_ok=True)
    resolved = _resolve_fig(fig)
    if resolved is None:
        raise ValueError("No recent figure detected.")

    output_path = plots_dir / filename

    if hasattr(resolved, "savefig"):
        resolved.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path}")
        return output_path

    if hasattr(resolved, "write_image"):
        try:
            resolved.write_image(str(output_path))
            print(f"Saved figure to {output_path}")
            return output_path
        except Exception as exc:
            html_path = output_path.with_suffix(".html")
            resolved.write_html(str(html_path))
            print(f"Saved interactive figure to {html_path} (fallback: {exc})")
            return html_path

    raise ValueError("Don't know how to export this figure type.")


## 🔁 Loop 1 · Confirm the setup
*Goal: Verify the analytics stack is ready and the data folder is reachable.*

In [None]:
ping_environment({"pandas": pd, "matplotlib": plt, "seaborn": sns})
assert DATA_DIR.exists(), f"Data directory missing: {DATA_DIR}"
print(f"Data files available: {len(list(DATA_DIR.glob('*')))} items")

## 🔁 Loop 2 · Load the world renewable share series
*Goal: Import the total renewable share dataset and run quick diagnostics.*

In [None]:
total_raw = load_data("01 renewable-share-energy.csv")
validate_columns(total_raw, ["Entity", "Year", "Renewables (% equivalent primary energy)"])
world_total = total_raw.query("Entity == 'World'")[["Year", "Renewables (% equivalent primary energy)"]]
world_total = world_total.rename(columns={"Renewables (% equivalent primary energy)": "Total Renewable"}).sort_values("Year")
expect_rows_between(world_total, 55, 70, label="world annual records")
quick_peek(world_total.tail())


## 🔁 Loop 3 · Merge hydro, wind, and solar components
*Goal: Assemble a tidy table with each renewable technology for the world aggregate.*

In [None]:
component_specs = {
    "Hydro": ("06 hydro-share-energy.csv", "Hydro (% equivalent primary energy)"),
    "Wind": ("10 wind-share-energy.csv", "Wind (% equivalent primary energy)"),
    "Solar": ("14 solar-share-energy.csv", "Solar (% equivalent primary energy)"),
}
components = []
for label, (file_name, column_name) in component_specs.items():
    df = load_data(file_name)
    validate_columns(df, ["Entity", "Year", column_name])
    world_slice = (
        df.query("Entity == 'World'")[["Year", column_name]]
        .rename(columns={column_name: label})
        .sort_values("Year")
    )
    expect_rows_between(world_slice, 55, 70, label=f"{label} annual records")
    components.append(world_slice)

renewables_world = world_total.copy()
for piece in components:
    renewables_world = renewables_world.merge(piece, on="Year", how="left")

renewables_world = renewables_world.fillna(0)
validate_columns(renewables_world, ["Year", "Total Renewable", "Hydro", "Wind", "Solar"])
quick_peek(renewables_world.tail())


## 🔁 Loop 4 · Build the stacked area story
*Goal: Apply consistent metadata, chart styling, and annotations.*

In [None]:
TITLE = "Renewables Surge but Trail Fossil Fuels"
SUBTITLE = "World share of primary energy from renewables, 1965–2022"
ANNOTATION = "Wind and solar accelerate after 2000, yet hydro still carries most of the share."
SOURCE = "Our World in Data • BP Statistical Review (2023)"
UNITS = "Percent of global primary energy"

ensure_metadata(TITLE=TITLE, SUBTITLE=SUBTITLE, ANNOTATION=ANNOTATION, SOURCE=SOURCE, UNITS=UNITS)

fig, ax = plt.subplots(figsize=(11, 6))
ax.stackplot(
    renewables_world["Year"],
    renewables_world["Hydro"],
    renewables_world["Wind"],
    renewables_world["Solar"],
    labels=["Hydro", "Wind", "Solar"],
    alpha=0.8,
    colors=["#4c72b0", "#55a868", "#c44e52"],
)
ax.plot(
    renewables_world["Year"],
    renewables_world["Total Renewable"],
    color="#222222",
    linewidth=3,
    label="Total renewable share",
)
ax.set_title(f"{TITLE}
{SUBTITLE}")
ax.set_xlabel("Year")
ax.set_ylabel("Share of global primary energy (%)")
ax.legend(loc="upper left", ncol=2)
latest_row = renewables_world.iloc[-1]
ax.annotate(
    f"{int(latest_row['Year'])}: {latest_row['Total Renewable']:.1f}% total",
    xy=(latest_row["Year"], latest_row["Total Renewable"]),
    xytext=(latest_row["Year"] - 12, latest_row["Total Renewable"] + 3),
    arrowprops=dict(arrowstyle="->", color="#222222"),
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#222222"),
)
ax.text(
    renewables_world["Year"].min() + 3,
    max(renewables_world["Total Renewable"].max() - 8, 2),
    ANNOTATION,
    fontsize=11,
    color="#333333",
    bbox=dict(boxstyle="round,pad=0.4", facecolor="white", alpha=0.85),
)
annotate_source(ax, source=SOURCE, units=UNITS)
plt.show()


## 🔁 Loop 5 · Interpret and self-check
*Goal: Quantify change over time and connect back to the scaffold.*

In [None]:
start_year = renewables_world["Year"].min()
end_year = renewables_world["Year"].max()
start_share = renewables_world.loc[renewables_world["Year"] == start_year, "Total Renewable"].iloc[0]
end_share = renewables_world.loc[renewables_world["Year"] == end_year, "Total Renewable"].iloc[0]
growth = end_share - start_share
print(f"Starting share in {start_year}: {start_share:.1f}%")
print(f"Latest share in {end_year}: {end_share:.1f}%")
print(f"Absolute growth: {growth:.1f} percentage points")
assert end_share < 25, "World renewable share should still be under 25%."
assert growth > 10, "Expect double-digit growth over the time series."


### 🧾 Claim → Evidence → Visual → Takeaway (filled)
- **Claim**: Renewables are accelerating yet still supply a minority of energy.
- **Evidence**: Inspect the computed starting vs. latest shares above and note hydro's long-running dominance.
- **Visual**: Stacked area chart plus total-share line, with metadata enforced before rendering.
- **Takeaway**: Transition progress is real but not sufficient; fossil fuels remain >80% of supply.

> **Limitation prompt**: Shares do not reveal total energy demand—pair with absolute consumption for deeper insight.

---

### 💾 Save your work
Run the next cell to export the latest figure for your portfolio.


In [None]:
save_last_fig(fig, "day02_solution_plot.png")