Skip to content

feat(models): add biomass and carbon-stock regression module#46

Merged
Goldokpa merged 1 commit into
developfrom
feature/models-regression
May 5, 2026
Merged

feat(models): add biomass and carbon-stock regression module#46
Goldokpa merged 1 commit into
developfrom
feature/models-regression

Conversation

@franchaise
Copy link
Copy Markdown
Collaborator

Summary

  • Adds src/climatevision/models/regression.pyBiomassRegressor, a wrapper around sklearn RandomForestRegressor and xgboost.XGBRegressor with a stable fit / predict / evaluate / save / load API.
  • Default feature ordering matches the spectral indices produced by the data preprocessor: NDVI, EVI, SAVI, NDMI, NBR, R, G, B, NIR, SWIR1.
  • Helpers: biomass_to_carbon and biomass_to_co2e use IPCC defaults (carbon fraction 0.47, 44/12 ratio for CO2e).
  • evaluate_regression computes RMSE / MAE / R^2 / MAPE for the eval and model-card pipelines.
  • estimate_biomass_from_indices accepts a dict of per-pixel index arrays and runs inference in one call.
  • Wires symbols into models/__init__.py.

Why

Sprint deliverable: "Build carbon.py — Random Forest & XGBoost regression for biomass prediction" / "Implement metrics for regression evaluation (RMSE, MAE, R-squared)." Backs the carbon analytics module Francis is delivering this sprint.

Test plan

  • pytest tests/test_regression.py — 11/11 pass
    • biomass_to_carbon and biomass_to_co2e use the documented constants
    • evaluate_regression returns rmse=mae=0 and r2=1 on a perfect fit
    • shape-mismatch raises ValueError
    • RandomForest fits, predicts, and yields sensible metrics on synthetic data
    • feature_importances() returns one entry per feature name and sums to 1.0
    • unsupported model_type raises ValueError
    • predict() before fit() raises RuntimeError
    • save() / load() round-trip yields identical predictions
    • estimate_biomass_from_indices builds the right feature matrix and matches predict()
    • missing index in the dict raises KeyError
    • serialize_metrics writes a well-formed JSON

Notes for reviewers

  • xgboost is a soft dependency — import xgboost is guarded so installs without it still work.
  • models/__init__.py extends the existing __all__; no existing imports are touched.

Adds BiomassRegressor — a wrapper around sklearn RandomForest and
xgboost.XGBRegressor that exposes a stable fit/predict/evaluate/save
API for ClimateVision pipelines. Default feature ordering matches the
spectral indices produced by the data preprocessor (NDVI, EVI, SAVI,
NDMI, NBR, R, G, B, NIR, SWIR1).

Also adds:
- biomass_to_carbon / biomass_to_co2e helpers using IPCC defaults
  (carbon fraction 0.47, 44/12 ratio for CO2e).
- evaluate_regression for RMSE, MAE, R^2, and MAPE.
- estimate_biomass_from_indices for inference over a dict of
  per-pixel index arrays.
- save() / load() round-trip via pickle.
Copy link
Copy Markdown
Member

@Goldokpa Goldokpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid foundation. The fit/predict/evaluate/save/load API matches what the analytics module expects and the test coverage (11 cases) hits the important branches — perfect-fit r2=1, save/load round-trip, missing-index KeyError. xgboost guarded as a soft dependency is the right call.

Minor nit (non-blocking): evaluate_regression returns NaN for r2 when ss_tot=0 — fine, just worth a short docstring note so callers don't trip on it. Approving.

@Goldokpa Goldokpa merged commit 1ce8bcc into develop May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants