feat(notebooks): add 04_model_validation benchmarking notebook#48
Merged
Conversation
Validates segmentation predictions against reference masks and the biomass regressor against held-out labels across Amazon, Congo, and Southeast Asia. Computes IoU, F1, precision, recall, accuracy, and the regression metrics RMSE/MAE/R^2/MAPE. Aggregates per-region and mean values into a single benchmark_report.json that the governance CI gate and the model-card generator consume directly.
1 task
obielin
approved these changes
May 5, 2026
Collaborator
obielin
left a comment
There was a problem hiding this comment.
Happy with this from the governance angle — the per-region table is exactly the shape build_model_card consumes for the Fairness section once we attach a region-level disparity metric on top.
One thought for a follow-up (not for this PR): the synthetic confusion matrix is built with a uniform 8% disagreement rate, so the per-region metrics look identical. When real ground-truth tiles land we'll see actual divergence. Approving.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
notebooks/04_model_validation.ipynb— validates segmentation predictions against reference masks and the biomass regressor against held-out labels across Amazon, Congo, and Southeast Asia.outputs/validation/benchmark_report.json— the single artifact consumed byscripts/governance_ci_gate.pyandgovernance.model_card.build_model_card.Why
Sprint deliverable: "Write 04_model_validation.ipynb with baseline validation results."
Test plan
papermillwithout errors.Notes for reviewers
validate_predictionsfromanalytics.validation(already on develop) andBiomassRegressorfrom feat(models): add biomass and carbon-stock regression module #46.