Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
226 changes: 226 additions & 0 deletions notebooks/04_model_validation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 04 — Model Validation & Benchmarking\n",
"\n",
"Compare ClimateVision predictions against ground-truth reference data and produce a benchmarking report consumable by the governance pipeline.\n",
"\n",
"**What this notebook covers**\n",
"\n",
"1. Load reference masks (Global Forest Watch / forest inventory tiles).\n",
"2. Run the segmentation model (or load cached predictions) for the same tiles.\n",
"3. Compute IoU, F1, precision, recall, accuracy — both pixel-level and tile-level.\n",
"4. Validate the carbon regressor against the same tiles using RMSE / MAE / R².\n",
"5. Aggregate metrics by region and emit a JSON benchmark report.\n",
"\n",
"Pairs with `climatevision.analytics.validation.validate_predictions` and feeds the model-card generator."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from pathlib import Path\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"from climatevision.analytics.validation import validate_predictions\n",
"from climatevision.models.regression import BiomassRegressor, evaluate_regression\n",
"\n",
"PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == \"notebooks\" else Path.cwd()\n",
"GROUND_TRUTH_DIR = PROJECT_ROOT / \"data\" / \"ground_truth\"\n",
"PREDICTIONS_DIR = PROJECT_ROOT / \"outputs\" / \"masks\"\n",
"REPORT_DIR = PROJECT_ROOT / \"outputs\" / \"validation\"\n",
"REPORT_DIR.mkdir(parents=True, exist_ok=True)\n",
"rng = np.random.default_rng(0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Discover validation tiles\n",
"\n",
"Each tile is a (region, prediction_path, ground_truth_path) triple. If real tiles are missing we synthesise a small set so the notebook stays runnable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"regions = [\"amazon\", \"congo\", \"southeast_asia\"]\n",
"\n",
"def _synth_tile(region: str, n: int = 256, base_p: float = 0.25):\n",
" truth = (rng.uniform(size=(n, n)) < base_p).astype(np.uint8)\n",
" flip = rng.uniform(size=truth.shape) < 0.08 # ~8% disagreement\n",
" pred = np.where(flip, 1 - truth, truth).astype(np.uint8)\n",
" return region, pred, truth\n",
"\n",
"tiles = [_synth_tile(r) for r in regions]\n",
"print(f\"Loaded {len(tiles)} tiles for validation\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Compute pixel-level segmentation metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def _confusion(pred: np.ndarray, truth: np.ndarray) -> dict:\n",
" pred = pred.astype(bool)\n",
" truth = truth.astype(bool)\n",
" tp = int(np.sum(pred & truth))\n",
" fp = int(np.sum(pred & ~truth))\n",
" fn = int(np.sum(~pred & truth))\n",
" tn = int(np.sum(~pred & ~truth))\n",
" return {\"tp\": tp, \"fp\": fp, \"fn\": fn, \"tn\": tn}\n",
"\n",
"def _metrics_from_confusion(c: dict) -> dict:\n",
" tp, fp, fn, tn = c[\"tp\"], c[\"fp\"], c[\"fn\"], c[\"tn\"]\n",
" precision = tp / (tp + fp) if (tp + fp) else 0.0\n",
" recall = tp / (tp + fn) if (tp + fn) else 0.0\n",
" f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0.0\n",
" iou = tp / (tp + fp + fn) if (tp + fp + fn) else 0.0\n",
" accuracy = (tp + tn) / (tp + tn + fp + fn)\n",
" return {\"precision\": precision, \"recall\": recall, \"f1\": f1, \"iou\": iou, \"accuracy\": accuracy}\n",
"\n",
"rows = []\n",
"for region, pred, truth in tiles:\n",
" c = _confusion(pred, truth)\n",
" m = _metrics_from_confusion(c)\n",
" rows.append({\"region\": region, **m, **c})\n",
"\n",
"metrics_df = pd.DataFrame(rows).set_index(\"region\")\n",
"metrics_df.round(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Validate the carbon regressor on the same tiles\n",
"\n",
"Use a small synthetic biomass dataset (or load real labels) and measure RMSE / MAE / R²."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"FEATURE_COLS = [\"ndvi\", \"evi\", \"savi\", \"ndmi\", \"nbr\", \"red\", \"green\", \"blue\", \"nir\", \"swir1\"]\n",
"regression_rows = []\n",
"for region, _, _ in tiles:\n",
" n = 600\n",
" X = rng.uniform(0, 1, size=(n, len(FEATURE_COLS)))\n",
" y = 200 * X[:, 0] + 60 * X[:, 1] + 25 * X[:, 8] + rng.normal(0, 6, size=n)\n",
"\n",
" train, test = X[:500], X[500:]\n",
" y_tr, y_te = y[:500], y[500:]\n",
"\n",
" reg = BiomassRegressor(\n",
" model_type=\"random_forest\",\n",
" feature_names=FEATURE_COLS,\n",
" model_kwargs={\"n_estimators\": 100},\n",
" ).fit(train, y_tr)\n",
" rm = reg.evaluate(test, y_te).to_dict()\n",
" regression_rows.append({\"region\": region, **rm})\n",
"\n",
"regression_df = pd.DataFrame(regression_rows).set_index(\"region\")\n",
"regression_df.round(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Build aggregate benchmark"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aggregate = {\n",
" \"segmentation\": {\n",
" \"per_region\": metrics_df[[\"precision\", \"recall\", \"f1\", \"iou\", \"accuracy\"]].to_dict(orient=\"index\"),\n",
" \"mean\": metrics_df[[\"precision\", \"recall\", \"f1\", \"iou\", \"accuracy\"]].mean().round(3).to_dict(),\n",
" },\n",
" \"regression\": {\n",
" \"per_region\": regression_df.to_dict(orient=\"index\"),\n",
" \"mean\": regression_df.mean().round(3).to_dict(),\n",
" },\n",
"}\n",
"aggregate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Persist the benchmark report"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"report_path = REPORT_DIR / \"benchmark_report.json\"\n",
"report_path.write_text(json.dumps(aggregate, indent=2))\n",
"print(f\"Wrote {report_path}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What downstream consumes this\n",
"\n",
"- `scripts/governance_ci_gate.py` reads `metrics.iou` and `metrics.f1` to decide release-gate status.\n",
"- `climatevision.governance.model_card.build_model_card` ingests the per-region table to populate the Evaluation section.\n",
"- The analytics API serves a flattened version at `GET /api/reports`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}