## üîó Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/Day%204%20%E2%80%93%20Forests%20%26%20Biodiversity%20Maps.ipynb)

# üå≤ Day 4 ‚Äì Mapping Forest Change
### Animate deforestation and regrowth with an ethical mapping checklist

We maintain the same rhythm‚Äîbite-sized loops with immediate validation‚Äîwhile moving to a geographic story about forest cover.

---

## üß† Learning Rhythm
- üîÅ Five loops: setup, load forest data, tidy, validate, map.
- üß™ Guardrails catch missing ISO codes or impossible percentages before Plotly renders.
- üó∫Ô∏è Accessibility reminders include color scale guidance and annotation prompts.

> **Teacher Sidecar**: Allocate ~60 minutes. Encourage students to pause after Loop 4 to interpret the diagnostics before hitting play on the animation.

## üìá Data Card ‚Äî World Bank Forest Area (% of Land)
- **Source**: World Bank World Development Indicators, curated for this course.
- **Temporal coverage**: 1990‚Äì2020 (annual).
- **Metric**: Forest area as a percentage of a country‚Äôs total land area.
- **Last updated**: September 2023 download.
- **Caveats**: Some small territories lack recent updates; values above 100% indicate data entry issues and should be filtered.

## üßµ Story Scaffold (Claim ‚Üí Evidence ‚Üí Visual ‚Üí Takeaway)
- **Claim**: Tropical regions show the steepest forest loss since 1990, while temperate zones remain comparatively stable.
- **Evidence to gather**: Country-level forest percentage over time with consistent ISO3 codes.
- **Visual plan**: Plotly choropleth animation with colorblind-safe palette and annotation for key hotspots.
- **Takeaway**: Deforestation is uneven; policy and enforcement matter regionally.


In [None]:

from __future__ import annotations

from pathlib import Path
from typing import Any, Mapping, Sequence

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import display

DATA_DIR = Path.cwd() / "data"

sns.set_theme(style="whitegrid", font_scale=1.1)
plt.rcParams.update({
    "axes.titlesize": 16,
    "axes.labelsize": 13,
    "axes.grid": True,
    "figure.figsize": (11, 6),
    "figure.dpi": 120,
})

def ping_environment(packages: Mapping[str, object]) -> None:
    """Print library versions so teachers can confirm the runtime."""
    for label, module in packages.items():
        version = getattr(module, "__version__", "built-in")
        print(f"{label}: {version}")
    print("Environment check complete ‚úÖ")

def load_data(file_name: str, /, **kwargs) -> pd.DataFrame:
    """Load a CSV from the shared data folder with a friendly status message."""
    path = DATA_DIR / file_name
    if not path.exists():
        raise FileNotFoundError(f"Expected data file at {path}")
    df = pd.read_csv(path, **kwargs)
    print(f"Loaded {file_name} ‚Üí shape {df.shape}")
    return df

def validate_columns(df: pd.DataFrame, required: Sequence[str]) -> pd.DataFrame:
    missing = [col for col in required if col not in df.columns]
    if missing:
        raise ValueError(f"Missing expected columns: {missing}")
    print(f"Columns validated ‚úÖ {list(required)}")
    return df

def expect_rows_between(df: pd.DataFrame, lower: int, upper: int, label: str = "rows") -> pd.DataFrame:
    n_rows = len(df)
    if not (lower <= n_rows <= upper):
        raise ValueError(
            f"Unexpected {label}: {n_rows} (expected between {lower} and {upper})"
        )
    print(f"{label.capitalize()} check ‚úÖ {n_rows} (expected {lower}-{upper})")
    return df

def quick_peek(df: pd.DataFrame, n: int = 5) -> pd.DataFrame:
    """Display a head preview and NA counts for formative assessment."""
    display(df.head(n))
    print("Null values per column:")
    print(df.isna().sum())
    return df

def ensure_metadata(**metadata: str) -> None:
    blanks = [key for key, value in metadata.items() if not str(value).strip()]
    if blanks:
        raise ValueError(f"Please fill in metadata fields: {blanks}")
    print("Story metadata looks great ‚úÖ")

def annotate_source(ax: plt.Axes, *, source: str, units: str) -> plt.Axes:
    ax.text(
        0.0,
        -0.22,
        f"Source: {source}
Units: {units}",
        transform=ax.transAxes,
        ha="left",
        fontsize=10,
    )
    return ax

def _resolve_fig(fig: Any | None) -> Any:
    if fig is not None:
        return fig
    if plt.get_fignums():
        return plt.gcf()
    return None

def save_last_fig(fig: Any | None, filename: str) -> Path:
    plots_dir = Path.cwd() / "plots"
    plots_dir.mkdir(parents=True, exist_ok=True)
    resolved = _resolve_fig(fig)
    if resolved is None:
        raise ValueError("No recent figure detected.")

    output_path = plots_dir / filename

    if hasattr(resolved, "savefig"):
        resolved.savefig(output_path, dpi=300, bbox_inches="tight")
        print(f"Saved figure to {output_path}")
        return output_path

    if hasattr(resolved, "write_image"):
        try:
            resolved.write_image(str(output_path))
            print(f"Saved figure to {output_path}")
            return output_path
        except Exception as exc:
            html_path = output_path.with_suffix(".html")
            resolved.write_html(str(html_path))
            print(f"Saved interactive figure to {html_path} (fallback: {exc})")
            return html_path

    raise ValueError("Don't know how to export this figure type.")


## üîÅ Loop 1 ¬∑ Confirm the setup
*Goal: Check Plotly availability and confirm data files exist.*

In [None]:
import plotly.express as px
ping_environment({"pandas": pd, "plotly": px})
assert DATA_DIR.exists(), f"Data directory missing: {DATA_DIR}"
print(f"Data files available: {len(list(DATA_DIR.glob('*')))} items")

## üîÅ Loop 2 ¬∑ Load the forest cover dataset
*Goal: Read the tidy long-form table and preview its structure.*

In [None]:
forest_long = load_data("forest_area_long.csv")
validate_columns(forest_long, ["Country Name", "Country Code", "Year", "ForestPercent"])
expect_rows_between(forest_long, 7000, 9000, label="country-year rows")
quick_peek(forest_long.sample(5, random_state=2))


## üîÅ Loop 3 ¬∑ Clean and constrain values
*Goal: Ensure numeric types, valid year range, and realistic percentages.*

In [None]:
forest_long["Year"] = pd.to_numeric(forest_long["Year"], errors="coerce")
forest_long["ForestPercent"] = pd.to_numeric(forest_long["ForestPercent"], errors="coerce")
forest_clean = forest_long.dropna(subset=["Year", "ForestPercent", "Country Code"])
forest_clean = forest_clean[(forest_clean["Year"].between(1990, 2020)) & (forest_clean["ForestPercent"].between(0, 100))]
expect_rows_between(forest_clean, 6000, 8500, label="valid country-year rows")
print(f"Countries represented: {forest_clean['Country Code'].nunique()}")
print(f"Year span: {int(forest_clean['Year'].min())}‚Äì{int(forest_clean['Year'].max())}")


## üîÅ Loop 4 ¬∑ Prep summary slices for quick comparisons
*Goal: Provide checkpoints teachers can use to discuss trends before mapping.*

In [None]:
latest_year = forest_clean["Year"].max()
latest_slice = forest_clean[forest_clean["Year"] == latest_year]
print(f"{latest_year} median forest percent: {latest_slice['ForestPercent'].median():.1f}%")
print("Top 5 forested countries:")
print(latest_slice.sort_values("ForestPercent", ascending=False).head(5)[["Country Name", "ForestPercent"]])
print("Lowest 5 forested countries:")
print(latest_slice.sort_values("ForestPercent").head(5)[["Country Name", "ForestPercent"]])


## üîÅ Loop 5 ¬∑ Build the animated choropleth
*Goal: Apply metadata, ethical color choices, and annotations.*

In [None]:
TITLE = "Tropical Forest Loss Has Outpaced Temperate Regions"
SUBTITLE = "Forest area as a share of land area, 1990‚Äì2020"
ANNOTATION = "Watch Brazil, Indonesia, and Congo Basin countries for steep declines."
SOURCE = "World Bank WDI (September 2023 download)"
UNITS = "Percent of land area covered by forest"

ensure_metadata(TITLE=TITLE, SUBTITLE=SUBTITLE, ANNOTATION=ANNOTATION, SOURCE=SOURCE, UNITS=UNITS)

fig = px.choropleth(
    forest_clean,
    locations="Country Code",
    color="ForestPercent",
    hover_name="Country Name",
    animation_frame="Year",
    color_continuous_scale="YlGn",
    range_color=[0, 100],
)
fig.update_layout(
    title={"text": f"{TITLE}<br><sup>{SUBTITLE}</sup>", "x": 0.02, "xanchor": "left"},
    coloraxis_colorbar=dict(title="Forest %", ticksuffix="%"),
)
fig.add_annotation(
    text=ANNOTATION,
    xref="paper",
    yref="paper",
    x=0.02,
    y=0.92,
    showarrow=False,
    align="left",
    bgcolor="rgba(255,255,255,0.85)",
    bordercolor="#333333",
    borderwidth=1,
    font=dict(size=12),
)
fig.add_annotation(
    text=f"Source: {SOURCE}<br>Units: {UNITS}",
    xref="paper",
    yref="paper",
    x=0.0,
    y=-0.12,
    showarrow=False,
    align="left",
    font=dict(size=11, color="#444444"),
)
fig.show()


## üîÅ Loop 6 ¬∑ Interpret and self-check
*Goal: Quantify deforestation hotspots for discussion prompts.*

In [None]:
forest_baseline = forest_clean[forest_clean["Year"] == 1990]
forest_latest = forest_clean[forest_clean["Year"] == forest_clean["Year"].max()]
merged = forest_baseline.merge(forest_latest, on="Country Code", suffixes=("_1990", "_latest"))
merged["delta"] = merged["ForestPercent_latest"] - merged["ForestPercent_1990"]
worst_decline = merged.nsmallest(5, "delta")[["Country Name_1990", "delta"]]
print("Largest declines since 1990 (percentage points):")
print(worst_decline.rename(columns={"Country Name_1990": "Country"}))
assert worst_decline["delta"].min() < -15, "Expect at least one double-digit decline."


### üßæ Claim ‚Üí Evidence ‚Üí Visual ‚Üí Takeaway (filled)
- **Claim**: Tropical forest regions face sharper declines than temperate ones.
- **Evidence**: Decline table above plus the animated map highlighting Brazil, Indonesia, and Congo Basin losses.
- **Visual**: Choropleth animation with enforced metadata, color guidance, and annotation prompts.
- **Takeaway**: Forest policy impacts are region-specific‚Äîconservation wins and losses coexist.

> **Limitation prompt**: Percent cover hides absolute area differences; combine with hectares lost for a fuller story.

---

### üíæ Save your work
Run the next cell to export the interactive figure reference.


In [None]:
save_last_fig(fig, "day04_solution_plot.png")