## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%204_%20Mapping%20Forests.ipynb)

# 🌲 Day 4 – Mapping Global Forest Change

We move from plots to maps, animating three decades of forest cover change. Today’s emphasis is on geographic integrity, clear color scales, and articulating what the map **can** and **cannot** prove.

> **Teacher sidebar — pacing & differentiation**  
> • Timing: ~55 minutes (story scaffold + four loops + reflection).  
> • Checkpoint: after the diagnostic summary, ensure `Year` is numeric and no missing `Country Code`.  
> • Differentiation: stretch learners can add regional filters or small multiples; scaffolders can pre-filter to a subset of continents.

## 🗺️ Roadmap for Today

Loop | Focus | What success looks like
--- | --- | ---
0 | Story scaffold | Title + annotation lock in the narrative
1 | Load & validate | Forest dataset columns confirmed, years typed
2 | Diagnostics | Null checks & range expectations (0–100%)
3 | Visualize | Choropleth animation with ethical framing
4 | Interpret & save | Written takeaway + exported HTML map

## 🗂️ Data Card — World Bank Forest Area (% of Land Area)

- **Source**: World Bank World Development Indicators (AG.LND.FRST.ZS).  
- **Temporal coverage**: 1990 – 2020 (annual).  
- **Units**: Percent of land area covered by forest.  
- **Method notes**: Based on FAO Global Forest Resources Assessments; combines national reports with remote sensing.  
- **Caveats**: Some small nations report infrequently; natural vs plantation forests are not distinguished.  
- **Update cadence**: Every five years historically, now annual updates.

> 🎯 **Integrity cue**: Avoid implying deforestation causes. Use annotations to note uncertainties and regional context.

In [None]:
# Shared utilities for the DS4S course notebooks
        from pathlib import Path
        import pandas as pd
        import numpy as np
        import matplotlib.pyplot as plt
        from IPython.display import display
        import plotly.express as px

        plt.style.use('seaborn-v0_8-whitegrid')
        plt.rcParams.update({
            'figure.dpi': 120,
            'axes.titlesize': 16,
            'axes.labelsize': 13,
            'axes.titlepad': 12,
            'figure.figsize': (10, 5),
        })


        def load_csv(path: Path, **read_kwargs) -> pd.DataFrame:
            '''Load a CSV and report the basic shape.'''
            df = pd.read_csv(path, **read_kwargs)
            print(f"✅ Loaded {path.name} with {df.shape[0]:,} rows and {df.shape[1]} columns")
            return df


        def validate_columns(df: pd.DataFrame, required):
            missing = [col for col in required if col not in df.columns]
            if missing:
                raise ValueError(f"Missing columns: {missing}")
            print(f"✅ Columns present: {', '.join(required)}")


        def expect_rows_between(df: pd.DataFrame, low: int, high: int):
            rows = df.shape[0]
            if not (low <= rows <= high):
                raise ValueError(f"Row count {rows} outside expected range {low}-{high}")
            print(f"✅ Row count {rows} within expected {low}-{high}")


        def quick_snapshot(df: pd.DataFrame, name: str, n: int = 3):
            print(f"
{name} snapshot → shape={df.shape}")
            print("Columns:", list(df.columns))
            print("Nulls:
", df.isna().sum())
            display(df.head(n))


        def ensure_story_elements(title: str, subtitle: str, annotation: str, source: str, units: str):
            fields = {
                'TITLE': title,
                'SUBTITLE': subtitle,
                'ANNOTATION': annotation,
                'SOURCE': source,
                'UNITS': units,
            }
            missing = [key for key, value in fields.items() if not str(value).strip()]
            if missing:
                raise ValueError(f"Please complete these storytelling fields: {', '.join(missing)}")
            print("✅ Story scaffold complete →", ", ".join(f"{k}: {v}" for k, v in fields.items()))
            return fields


        def save_last_fig(filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            fig = plt.gcf()
            if not fig.axes:
                raise RuntimeError("Run the plotting cell before saving.")
            output_path = plots_dir / filename
            fig.savefig(output_path, dpi=300, bbox_inches='tight')
            print(f"📁 Saved figure to {output_path}")


        def save_plotly_fig(fig, filename: str):
            plots_dir = Path.cwd() / "plots"
            plots_dir.mkdir(parents=True, exist_ok=True)
            output_path = plots_dir / filename
            fig.write_html(str(output_path))
            print(f"📁 Saved interactive figure to {output_path}")

## 🔁 Loop 0 — Story scaffold (3 min)

In [None]:
TITLE = "Forest Cover Is Shrinking in Key Regions"
SUBTITLE = "Share of land area covered by forest, 1990–2020"
ANNOTATION = "Amazon countries lost >5 percentage points of forest cover since 1990."
SOURCE = "World Bank WDI (AG.LND.FRST.ZS), processed 2024"
UNITS = "Forest area (% of land area)"

story_fields = ensure_story_elements(TITLE, SUBTITLE, ANNOTATION, SOURCE, UNITS)

## 🔁 Loop 1 — Load & validate (8 min)

In [None]:
data_dir = Path.cwd() / "data"
forest = load_csv(data_dir / "forest_area_long.csv")
validate_columns(forest, ["Country Name", "Country Code", "Year", "ForestPercent"])
forest["Year"] = forest["Year"].astype(int)
expect_rows_between(forest, 5000, 6000)
quick_snapshot(forest.head(), name="Forest data head")

## 🔁 Loop 2 — Diagnostics (6 min)

Ensure percentages stay between 0 and 100, and note any missing codes before mapping.

In [None]:
assert forest["ForestPercent"].between(0, 100).all(), "Forest percent must remain within 0–100%."
null_codes = forest[forest["Country Code"].isna()]
print("Rows with missing codes:", null_codes.shape[0])
quick_snapshot(forest.groupby("Year")["ForestPercent"].mean().reset_index().tail(), name="Global mean forest % by year")

## 🔁 Loop 3 — Visualize (12 min)

Animate forest change with a consistent color scale and explicit annotation text.

In [None]:
fig = px.choropleth(
    forest,
    locations="Country Code",
    color="ForestPercent",
    hover_name="Country Name",
    animation_frame="Year",
    color_continuous_scale="YlGn",
    range_color=[0, 80],
    title=f"{TITLE}<br><sup>{SUBTITLE}</sup>",
    labels={"ForestPercent": UNITS},
)
fig.update_layout(
    margin=dict(l=20, r=20, t=80, b=20),
    coloraxis_colorbar=dict(title="Forest %"),
)
fig.add_annotation(
    text=ANNOTATION,
    x=0.02,
    y=0.02,
    xref="paper",
    yref="paper",
    showarrow=False,
    bgcolor="rgba(255,255,255,0.85)",
    bordercolor="#444",
)
fig

In [None]:
print("Animation frames:", len(fig.frames))
assert len(fig.frames) == forest["Year"].nunique()

## 🔁 Loop 4 — Interpret & save (8 min)

In [None]:
from IPython.display import Markdown

claim = "Forest cover declined in parts of South America and Southeast Asia despite global stability."
evidence = (
    "The animation shows forest percentages falling by >5 points in Amazon nations while Europe stays near 35%."
)
visual = "Plotly choropleth animation with annotation of Amazon losses."
takeaway = "Maps highlight where change happens, but we need ground truth to attribute causes (logging, agriculture, fire)."
Markdown(
    f"""
| Claim | Evidence | Visual | Takeaway |
| --- | --- | --- | --- |
| {claim} | {evidence} | {visual} | {takeaway} |
"""
)

## 💾 Save the figure for the teacher dashboard

In [None]:
save_plotly_fig(fig, "day04_solution_map.html")

## ✅ Exit Ticket

- Which region changed the most in the animation?  
- What forest dynamics are hidden by percent-of-land metrics?  
- How would you adapt this map for colorblind accessibility (palette, annotations)?