## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day04/notebook/day04_starter.ipynb)

# 🌳 Day 4 – Biodiversity & Deforestation Mapping
### Visualizing forest cover change on a global choropleth map

Today we work with geospatial time-series data to spotlight where forest cover is shrinking or rebounding.
You'll practice validating geocodes, computing change metrics, and producing an animated map that supports
thoughtful conservation storytelling.

### 🗂️ Data card — World Bank forest area (% of land area)
- **Source:** World Bank World Development Indicators, indicator `AG.LND.FRST.ZS`
- **Temporal coverage:** 1990–2021 (annual)
- **Geography:** Countries and territories with ISO3 country codes
- **Units:** Percent of total land area covered by forest
- **Collection notes:** Forest includes natural and planted stands of trees of at least 5 meters tall; data harmonized across FAO submissions
- **Caveats:** Estimates rely on national reporting; some countries interpolate between survey years; excludes tree crops outside forest definitions
- **Mindful design:** Use a perceptually uniform color scale, provide context for the baseline year, and avoid implying causality from a single map.

### 1. Set up the environment

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
from IPython.display import display
import matplotlib.pyplot as plt  # for save_last_fig compatibility if needed

pd.options.display.float_format = "{:.2f}".format

In [None]:
# Shared helper utilities used throughout the week.
from __future__ import annotations

import warnings
from pathlib import Path
from typing import Iterable, Mapping

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


def resolve_data_dir(max_up: int = 5) -> Path:
    """Locate the project-level ``data`` directory regardless of execution location."""
    here = Path.cwd()
    for _ in range(max_up + 1):
        candidate = here / "data"
        if candidate.exists():
            return candidate
        here = here.parent
    raise FileNotFoundError(
        "Could not find a 'data' directory relative to this notebook.
"
        "If you are running in Colab, mount your drive or upload the data folder first."
    )


DATA_DIR = resolve_data_dir()
PROJECT_ROOT = DATA_DIR.parent
PLOTS_DIR = PROJECT_ROOT / "plots"
PLOTS_DIR.mkdir(parents=True, exist_ok=True)


def baseline_style() -> None:
    """Apply a consistent, high-contrast visual style that is colorblind-friendly."""
    sns.set_theme(style="whitegrid", context="talk", font_scale=0.9)
    plt.rcParams.update(
        {
            "figure.dpi": 120,
            "axes.titlesize": 16,
            "axes.labelsize": 13,
            "legend.fontsize": 11,
            "axes.titleweight": "semibold",
        }
    )


def load_data(filename: str | Path, **kwargs) -> pd.DataFrame:
    """Read a CSV file from the shared data directory and report its shape."""
    path = Path(filename)
    if not path.exists():
        path = DATA_DIR / filename
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {path.name} with shape {df.shape}.")
    return df


def validate_columns(df: pd.DataFrame, required: Iterable[str], *, context: str = "") -> None:
    missing = [col for col in required if col not in df.columns]
    if missing:
        warnings.warn(
            f"Missing expected columns {missing} in {context or 'dataframe'}.
"
            "Double-check your renaming and loading steps before moving on."
        )
    else:
        print(f"✅ Columns look good: {list(required)}")


def expect_rows_between(df: pd.DataFrame, low: int, high: int, *, label: str = "rows") -> None:
    count = len(df)
    if not (low <= count <= high):
        warnings.warn(
            f"{label} check: expected between {low:,} and {high:,} but found {count:,}."
        )
    else:
        print(f"✅ {label} check: {count:,} rows is within the expected range.")


def quick_diagnose(df: pd.DataFrame, *, sample: int = 3) -> None:
    print("
Preview of the current dataframe:")
    display(df.head(sample))
    print("
Null values by column:")
    print(df.isna().sum())


def validate_story_fields(fields: Mapping[str, str]) -> None:
    missing = [name for name, value in fields.items() if not str(value).strip()]
    if missing:
        warnings.warn(
            "The following story fields are blank: " + ", ".join(missing) +
            "
Fill them in so your chart has a clear narrative frame."
        )
    else:
        print("✅ Narrative checklist complete.")


def save_last_fig(fig: plt.Figure | None, filename: str) -> Path | None:
    if fig is None:
        fig = plt.gcf()
    if fig and getattr(fig, "axes", None):
        output_path = PLOTS_DIR / filename
        fig.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path.relative_to(PROJECT_ROOT)}")
        return output_path
    warnings.warn("No matplotlib figure available to save yet.")
    return None


### 2. Load and inspect the forest dataset
Confirm expected columns and look at a few records before deeper analysis.

In [None]:
forest = load_data("forest_area_long.csv")
validate_columns(forest, ["Country Name", "Country Code", "Year", "ForestPercent"])
expect_rows_between(forest, 6000, 7000, label="country-year records")
quick_diagnose(forest.head())

### 3. Focus on the modern era and compute change metrics
Filter to 1990 onward, compute 5-year rolling medians, and calculate change between 1990 and 2020.

In [None]:
forest_recent = forest[forest["Year"] >= 1990].copy()
forest_recent["forest_pct_rolling"] = (
    forest_recent.sort_values(["Country Code", "Year"])
    .groupby("Country Code")["ForestPercent"]
    .transform(lambda s: s.rolling(window=5, min_periods=1).median())
)

baseline = forest_recent[forest_recent["Year"] == 1990][["Country Code", "ForestPercent"]]
latest = forest_recent[forest_recent["Year"] == 2020][["Country Code", "ForestPercent"]]
change = (
    baseline.merge(latest, on="Country Code", suffixes=("_1990", "_2020"))
    .assign(forest_change=lambda d: d["ForestPercent_2020"] - d["ForestPercent_1990"])
)
quick_diagnose(change.head())

### 4. Define the storytelling frame
Clarify the narrative elements so the choropleth focuses attention on the right message.

In [None]:
TITLE = "Forest cover is declining in key biodiversity hotspots"
SUBTITLE = "Percent of land area classified as forest, 1990–2020"
ANNOTATION = "Slider reveals regional trajectories; call out countries with steep losses"
SOURCE = "Source: World Bank World Development Indicators"
UNITS = "Forest area (% of land area)"

validate_story_fields({
    "TITLE": TITLE,
    "SUBTITLE": SUBTITLE,
    "ANNOTATION": ANNOTATION,
    "SOURCE": SOURCE,
    "UNITS": UNITS,
})

### 5. Build the animated choropleth
Use a sequential green-to-brown scale, include a diverging change tooltip, and annotate usage guidance.

In [None]:
forest_for_plot = forest_recent.merge(
    change[["Country Code", "forest_change"]], on="Country Code", how="left"
)

fig = px.choropleth(
    forest_for_plot,
    locations="Country Code",
    color="forest_pct_rolling",
    hover_name="Country Name",
    hover_data={
        "Year": True,
        "ForestPercent": ":.1f",
        "forest_pct_rolling": ":.1f",
        "forest_change": ":.1f",
    },
    animation_frame="Year",
    color_continuous_scale=["#8FBC8F", "#2E8B57", "#A0522D"],
    range_color=[0, 80],
    title=f"{TITLE}<br><sup>{SUBTITLE}</sup>",
)
fig.update_layout(
    coloraxis_colorbar=dict(title=UNITS),
    margin=dict(l=0, r=0, t=80, b=0),
    annotations=[
        dict(
            text=f"{ANNOTATION}<br>{SOURCE}",
            x=0,
            y=-0.12,
            xref="paper",
            yref="paper",
            showarrow=False,
            align="left",
        )
    ],
)
fig.update_traces(marker_line_width=0.2, marker_line_color="#222222")
fig.show()

html_path = (PROJECT_ROOT / "plots" / "day04_solution_map.html")
fig.write_html(html_path)
print(f"Saved interactive map to {html_path.relative_to(PROJECT_ROOT)}")

### 6. Interpret responsibly
- **Key takeaway:** Large swaths of South America and Southeast Asia lost more than 5 percentage points of forest cover since 1990, while parts of Europe stabilized or grew forests.
- **Uncertainty & caveats:** National reporting standards vary; some apparent gains come from plantation expansion, not native forest recovery; choropleths can hide subnational variation.
- **What this map cannot tell us:** Drivers of change (logging, fire, policy) and biodiversity quality require complementary datasets; consider pairing the map with time-series charts and qualitative context.

### 7. Process micro-rubric
| Step | Evidence of completion |
| --- | --- |
| Data loaded & validated | Country codes, years, and forest percent columns confirmed |
| Cleaning documented | Rolling medians and 1990→2020 change computed with diagnostics |
| Story frame filled | Title, subtitle, annotation, source, units prepared pre-visualization |
| Visualization reviewed | Color scale, hover data, and annotations align with ethical mapping guidance |
| Reflection written | Takeaway, caveats, and limitations articulated |