## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%204%20%E2%80%93%20Mapping%20Biodiversity%20%26%20Deforestation.ipynb)

# 🌍 Day 4 · Mapping Forest Change

Maps demand extra care: projections, color scales, and annotations can all shift how the story lands. We'll keep the loop tight—load, tidy, inspect, and then animate the map with clear guardrails.

## 🗂️ Data Card · World Bank Forest Area
| Field | Details |
| --- | --- |
| Source | [World Bank – Forest area (% of land area)](https://data.worldbank.org/indicator/AG.LND.FRST.ZS) |
| Temporal coverage | 1990–2021 |
| Geographic scope | Countries and World Bank regions |
| Units | Percent of land area covered by forest |
| Update cadence | Annual |
| Caveats | Administrative boundaries may shift; some small territories missing; satellite revisions occasionally update historical values. |
| What this chart cannot show | Forest quality (primary vs. plantation), biodiversity, or drivers of change. Pair with qualitative case studies for context. |

In [None]:
# 🔁 Shared scaffolds used across DS4S notebooks
from __future__ import annotations

import warnings
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

plt.rcParams.update({
    "figure.dpi": 120,
    "axes.facecolor": "#f8f9fb",
    "axes.grid": True,
    "grid.alpha": 0.25,
    "grid.linestyle": "--",
    "axes.titlesize": 18,
    "axes.labelsize": 12,
    "axes.titleweight": "bold",
    "legend.frameon": False,
    "legend.fontsize": 11,
    "font.family": "DejaVu Sans",
})

def quick_diagnostics(df: pd.DataFrame, dataset_name: str, *, expected_columns: list[str] | None = None, expected_rows: tuple[int, int] | None = None) -> None:
    """Print lightweight diagnostics without stopping execution."""
    print(f"\n🔍 {dataset_name}")
    print(f"Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    if expected_columns is not None:
        missing = [col for col in expected_columns if col not in df.columns]
        if missing:
            warnings.warn(f"Missing expected columns: {missing}")
    if expected_rows is not None:
        low, high = expected_rows
        if not (low <= len(df) <= high):
            warnings.warn(f"Row count {len(df)} outside expected range {expected_rows}")
        else:
            print(f"Row count within expected range {expected_rows}.")
    print("Null counts:")
    print(df.isna().sum())
    print("Preview:")
    print(df.head())
    print("-" * 60)

def expect_value_range(series: pd.Series, *, lower: float | None = None, upper: float | None = None, context: str = "") -> None:
    """Warn when values fall outside an expected numeric window."""
    label = context or series.name or "series"
    if lower is not None and float(series.min()) < lower:
        warnings.warn(f"{label}: minimum {series.min():.3f} is below expected {lower}")
    if upper is not None and float(series.max()) > upper:
        warnings.warn(f"{label}: maximum {series.max():.3f} is above expected {upper}")
    print(f"{label}: {series.min():.3f} → {series.max():.3f}")

def validate_story_elements(*, title: str, subtitle: str, annotation: str, source: str, units: str) -> None:
    """Confirm the storytelling scaffold is filled before plotting."""
    elements = {
        "TITLE": title,
        "SUBTITLE": subtitle,
        "ANNOTATION": annotation,
        "SOURCE": source,
        "UNITS": units,
    }
    missing = [key for key, value in elements.items() if not str(value).strip()]
    if missing:
        warnings.warn(f"Please fill these storytelling fields: {', '.join(missing)}")
    else:
        print("👍 Story scaffold complete.")

def baseline_style(ax: plt.Axes | None = None) -> plt.Axes:
    """Standardise axes styling for consistency across notebooks."""
    ax = ax or plt.gca()
    for spine in ["top", "right"]:
        ax.spines[spine].set_visible(False)
    ax.set_facecolor("#ffffff")
    return ax

def save_last_visual(fig, filename: str, *, subfolder: str = "plots") -> None:
    """Persist the most recent Matplotlib or Plotly figure without failing the run."""
    plots_dir = Path.cwd() / subfolder
    plots_dir.mkdir(parents=True, exist_ok=True)
    output_path = plots_dir / filename
    try:
        if hasattr(fig, "write_image"):
            fig.write_image(str(output_path))
        elif hasattr(fig, "savefig"):
            fig.savefig(output_path, dpi=300, bbox_inches="tight")
        else:
            warnings.warn("Figure type not supported for export; skipping save.")
            return
        print(f"Saved visual to {output_path}")
    except Exception as exc:
        warnings.warn(f"Plot export skipped: {exc}")


## Step 1 · Load the long-format forest table
Read the prepared CSV and confirm the expected columns are present.

In [None]:
data_dir = Path.cwd() / "data"
df_forest = pd.read_csv(data_dir / "forest_area_long.csv")

quick_diagnostics(
    df_forest,
    "Forest area long table",
    expected_columns=["Country Name", "Country Code", "Year", "ForestPercent"],
    expected_rows=(8000, 9000),
)


## Step 2 · Prepare numeric fields
Ensure the year is numeric and sort the table. These checks keep the animation timeline consistent.

In [None]:
df_forest["Year"] = df_forest["Year"].astype(int)
df_forest = df_forest.sort_values(["Country Code", "Year"]).reset_index(drop=True)

quick_diagnostics(
    df_forest,
    "Forest table after type cleaning",
    expected_columns=["Country Name", "Country Code", "Year", "ForestPercent"],
    expected_rows=(8000, 9000),
)
expect_value_range(df_forest["ForestPercent"], lower=0, upper=100, context="ForestPercent (%)")


## Step 3 · Spot-check latest year values
Grab the most recent snapshot to see which countries are at the extremes. This doubles as a formative assessment opportunity.

In [None]:
latest_year = df_forest["Year"].max()
df_latest = df_forest[df_forest["Year"] == latest_year]
print(f"Latest year: {latest_year}")
print(df_latest.sort_values("ForestPercent", ascending=False).head())
print(df_latest.sort_values("ForestPercent").head())


### Expected map preview
Compare your animation to the reference frame below. Colors should trend from deep greens (high coverage) to pale shades (low coverage).

![Preview of the finished map](../../../plots/day04_solution_plot.png)

## Step 4 · Render the animated choropleth
Use a colorblind-safe sequential palette, add clear titles, and include a caption that notes limitations.

In [None]:
import plotly.express as px

TITLE = "Forest cover is shrinking fastest across tropical regions"
SUBTITLE = "Share of land area that is forested, 1990–2021 (World Bank)"
ANNOTATION = "Large losses appear in parts of South America, Southeast Asia, and Central Africa; some temperate countries show gradual gains."
SOURCE = "Source: World Bank forest area (% of land area), processed for DS4S (downloaded 2025-01-05)"
UNITS = "Forest area (% of land area)"

validate_story_elements(
    title=TITLE,
    subtitle=SUBTITLE,
    annotation=ANNOTATION,
    source=SOURCE,
    units=UNITS,
)

fig_forest = px.choropleth(
    df_forest,
    locations="Country Code",
    color="ForestPercent",
    hover_name="Country Name",
    animation_frame="Year",
    color_continuous_scale="YlGn",
    range_color=[0, 100],
    labels={"ForestPercent": UNITS},
    template="plotly_white",
)

fig_forest.update_layout(
    title=dict(text=f"{TITLE}<br><sup>{SUBTITLE}</sup>", x=0, font=dict(size=22)),
    margin=dict(l=20, r=20, t=100, b=40),
    coloraxis_colorbar=dict(title=UNITS),
)

fig_forest.update_geos(
    showcountries=True,
    showcoastlines=False,
    showframe=False,
    projection_type="natural earth",
)

fig_forest.add_annotation(
    xref="paper",
    yref="paper",
    x=0.01,
    y=-0.20,
    text=f"{ANNOTATION}<br>{SOURCE} · Units: {UNITS}",
    showarrow=False,
    align="left",
    font=dict(size=12, color="#555555"),
)

fig_forest.show()


In [None]:
save_last_visual(fig_forest, "day04_solution_plot.png")

## 🔍 Reflection & limitations
- Animated choropleths can overwhelm—pause on key frames and ask students what design choice might mislead (e.g., uneven time gaps, missing small islands).
- Emphasise that percent cover hides forest quality; encourage notes about biodiversity, indigenous stewardship, and carbon density.
- Invite discussion on uncertainties: how might different satellite algorithms shift these values?