An automated, cloud-native machine learning pipeline for regional drought forecasting. This system leverages Gradient Boosted Trees and Automated MLOps to predict drought severity (FEWS NET Classification) across specific regions with lead times ranging from 0 to 12 months. The model was originally written by Tim Busker to run on a hosted HPC cluster. I re-wrote it to run on vertex AI.
| Component | Technology | Role |
|---|---|---|
| Orchestration | Vertex AI Pipelines (KFP) | Manages sequential execution of HP tuning and training. |
| Compute | Google Artifact Registry | Stores versioned Docker images for reproducible environments. |
| Intelligence | XGBoost / Random Forest | Core regressor models for multi-lead time forecasting. |
| Observability | Weights & Biases (W&B) | Live experiment tracking and SHAP value visualization. |
| Automation | GitHub Actions (OIDC) | "Push-to-Deploy" CI/CD with Workload Identity Federation. |
| Data Engine | GCS / Pathlib | Cloud-native storage for master datasets and results. |
The system employs a Modular Aggregation Strategy to capture spatial nuances while maintaining computational efficiency.
Separate models are trained for each lead time
To handle regional heterogeneity, we use a Clustered Regression approach:
- Pastoral (P): Focuses on vegetation and rainfall indices.
- Agro-Pastoral (AP): Incorporates soil moisture and crop-specific metrics.
- Other: Generic land-use zones.
A custom JSON-based Handover mechanism connects the Tuning phase to the Execution phase:
- Step A (Tuning): Performs
TimeSeriesSplitcross-validation to find optimal parameters. - Step B (Execution): Dynamically loads these parameters from GCS to train the production model.
The project follows a "Code as Infrastructure" philosophy.
- Code Commit: Developer pushes code to the
mainbranch. - Identity Handshake: GitHub authenticates with GCP via Workload Identity Federation (OIDC).
- Containerization: GitHub builds a Debian-based Docker image containing all spatial dependencies (
GDAL,PROJ). - Push: Image is pushed to Artifact Registry.
- Trigger:
pipeline_deploy.pyis executed on the GitHub runner, submitting the job to Vertex AI Pipelines.
├── .github/workflows/
│ └── deploy.yml # CI/CD Automation script
├── input_collector/ # Training data (CSV/Parquet)
├── src/
│ ├── ML_functions.py # Core helper functions & GCS logic
│ ├── HP_tuning.py # Grid search & parameter optimization
│ └── ML_execution.py # Production training & SHAP analysis
├── Dockerfile # Production environment definition
├── pipeline_deploy.py # Vertex AI Orchestrator (KFP)
└── pyproject.toml # Dependency management (uv)
All results are automatically versioned and stored for auditability and research.
-
Storage: All results are synced to the
ML_results/directory in Google Cloud Storage. -
Metrics: Detailed Evaluation Statistics (MAE, RMSE, and
$R^2$ ) are generated for every lead time and county. - Explainability: SHAP Summary plots are logged directly to the Weights & Biases dashboard, providing global and local feature importance tracking.
To test the environment or run small-scale experiments locally:
-
Generate Lockfile:
uv lock
-
Build Locally:
docker build -t drought-test . -
Run Test:
docker run -it drought-test /bin/bash