## 🔗 Open This Notebook in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidLangworthy/ds4s/blob/master/days/day04/notebook/day04_starter.ipynb)

# 🌳 Day 4 – Biodiversity & Deforestation Mapping
### Spotting forest loss hotspots with choropleth maps

We will quantify how forest cover has changed since 1990 and visualise the shifts with an accessible, colourblind-safe map.

#### Data card: World Bank – Forest area (% of land area)
* **Source:** [World Bank Indicators](https://data.worldbank.org/indicator/AG.LND.FRST.ZS) based on FAO Forest Resources Assessment.
* **Temporal coverage:** 1990–2022.
* **Units:** Percent of land area covered by forest.
* **Refresh cadence:** Updated annually; downloaded September 2024.
* **Caveats:** Regional aggregates are included; some small territories have missing years.

In [None]:
# Core imports and shared helpers
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from IPython.display import Markdown, display

import utils

utils.baseline_style()


## Step 1: Load and inspect the forest coverage table
The dataset is already tidy: one row per country per year. We'll still run diagnostics to confirm the structure.

In [None]:
forest = utils.load_data('forest_area_long.csv')
utils.diagnostics(
    forest,
    'Forest area share (raw)',
    expected_columns=['Country Name', 'Country Code', 'Year', 'ForestPercent'],
    expected_row_range=(6000, 9000),
)


## Step 2: Filter to countries with complete 1990 and 2020 data
We need both endpoints to compute change. Drop aggregates and keep country codes with exactly three letters.

In [None]:
country_forest = forest[forest['Country Code'].str.len() == 3]
paired_years = country_forest[country_forest['Year'].isin([1990, 2020])]
utils.diagnostics(
    paired_years,
    'Forest percent (1990 vs 2020)',
    expected_columns=['Country Name', 'Country Code', 'Year', 'ForestPercent'],
    expected_row_range=(3500, 4500),
)


## Step 3: Calculate absolute and percentage change
Pivot the paired data so each country has 1990 and 2020 columns, then compute the change values.

In [None]:
change_table = (
    paired_years.pivot_table(
        index='Country Code',
        columns='Year',
        values='ForestPercent',
    )
    .rename(columns={1990: 'forest_1990', 2020: 'forest_2020'})
)
change_table = change_table.dropna()
change_table = change_table.assign(
    change_pct=lambda df: df['forest_2020'] - df['forest_1990'],
    pct_change_rel=lambda df: (df['forest_2020'] - df['forest_1990']) / df['forest_1990'] * 100,
)
change_table = change_table.merge(
    paired_years[['Country Code', 'Country Name']].drop_duplicates(),
    left_index=True,
    right_on='Country Code',
    how='left',
)
utils.diagnostics(
    change_table,
    'Forest change summary',
    expected_columns=['Country Code', 'forest_1990', 'forest_2020', 'change_pct'],
    expected_row_range=(150, 220),
)


## Step 4: Prepare the map-ready dataframe
Keep the columns we need and ensure the change metrics are numeric.

In [None]:
map_data = change_table[['Country Code', 'Country Name', 'forest_1990', 'forest_2020', 'change_pct', 'pct_change_rel']].copy()
map_data = map_data.assign(
    forest_1990=lambda df: df['forest_1990'].round(1),
    forest_2020=lambda df: df['forest_2020'].round(1),
    change_pct=lambda df: df['change_pct'].round(1),
    pct_change_rel=lambda df: df['pct_change_rel'].round(1),
)
utils.diagnostics(
    map_data,
    'Map dataset',
    expected_columns=['Country Code', 'Country Name', 'change_pct'],
    expected_row_range=(150, 220),
)


## Step 5: Build the choropleth map with story metadata
Use a diverging colour scale centred on zero so gains and losses are equally visible.

In [None]:
TITLE = 'Tropical regions have lost the most forest cover since 1990'
SUBTITLE = 'Change in forest area (% of land area), 1990 to 2020'
ANNOTATION = 'Brazil, Bolivia, and Indonesia show steep declines; parts of Europe record modest gains.'
SOURCE = 'World Bank (AG.LND.FRST.ZS) via FAO Forest Resources Assessment'
UNITS = 'Percentage point change in forest cover'

metadata = {
    'title': TITLE,
    'subtitle': SUBTITLE,
    'annotation': ANNOTATION,
    'source': SOURCE,
    'units': UNITS,
}
utils.validate_story_elements(metadata)

fig = px.choropleth(
    map_data,
    locations='Country Code',
    color='change_pct',
    hover_name='Country Name',
    color_continuous_scale='RdYlGn',
    color_continuous_midpoint=0,
    hover_data={
        'forest_1990': True,
        'forest_2020': True,
        'change_pct': True,
        'pct_change_rel': True,
    },
    labels={'change_pct': 'Change (percentage points)'},
    title=TITLE,
)
fig.update_layout(
    margin=dict(l=40, r=40, t=90, b=40),
    coloraxis_colorbar=dict(title='pp change'),
    template='plotly_white',
)
fig.add_annotation(
    text=ANNOTATION,
    x=0.02,
    y=-0.18,
    xref='paper',
    yref='paper',
    align='left',
    showarrow=False,
    font=dict(size=12),
    bgcolor='rgba(255,255,255,0.9)',
    bordercolor='#555555',
    borderwidth=1,
)
fig.add_annotation(
    text=f"Source: {SOURCE}",
    x=0.0,
    y=-0.24,
    xref='paper',
    yref='paper',
    showarrow=False,
    align='left',
    font=dict(size=10, color='#555555'),
)
fig.show()
utils.save_last_fig('day04_solution_map.html', fig=fig)


In [None]:
display(
    Markdown(
        utils.summarize_claim(
            claim='Forest loss is concentrated in tropical countries.',
            evidence='The map highlights double-digit percentage-point declines across South America and Southeast Asia.',
            takeaway='Protecting biodiversity hinges on targeted conservation and sustainable land policies in tropical nations.',
        )
    )
)
