🌍 Predictive food-insecurity Intelligence System

An automated, cloud-native machine learning pipeline for regional drought forecasting. This system leverages Gradient Boosted Trees and Automated MLOps to predict drought severity (FEWS NET Classification) across specific regions with lead times ranging from 0 to 12 months. The model was originally written by Tim Busker to run on a hosted HPC cluster. I re-wrote it to run on vertex AI.

🛠️ Tech Stack & Ecosystem

Component	Technology	Role
Orchestration	Vertex AI Pipelines (KFP)	Manages sequential execution of HP tuning and training.
Compute	Google Artifact Registry	Stores versioned Docker images for reproducible environments.
Intelligence	XGBoost / Random Forest	Core regressor models for multi-lead time forecasting.
Observability	Weights & Biases (W&B)	Live experiment tracking and SHAP value visualization.
Automation	GitHub Actions (OIDC)	"Push-to-Deploy" CI/CD with Workload Identity Federation.
Data Engine	GCS / Pathlib	Cloud-native storage for master datasets and results.

🏗️ Model Architecture

The system employs a Modular Aggregation Strategy to capture spatial nuances while maintaining computational efficiency.

1. Multi-Lead Forecasting

Separate models are trained for each lead time $L \in {0, 1, 2, 3, 4, 8, 12}$. This accounts for the diminishing signal-to-noise ratio as the forecast horizon increases.

2. Clustering Logic

To handle regional heterogeneity, we use a Clustered Regression approach:

Pastoral (P): Focuses on vegetation and rainfall indices.
Agro-Pastoral (AP): Incorporates soil moisture and crop-specific metrics.
Other: Generic land-use zones.

3. Hyperparameter Bridge

A custom JSON-based Handover mechanism connects the Tuning phase to the Execution phase:

Step A (Tuning): Performs TimeSeriesSplit cross-validation to find optimal parameters.
Step B (Execution): Dynamically loads these parameters from GCS to train the production model.

🚀 Deployment Pipeline (CI/CD)

The project follows a "Code as Infrastructure" philosophy.

Automated Workflow:

Code Commit: Developer pushes code to the main branch.
Identity Handshake: GitHub authenticates with GCP via Workload Identity Federation (OIDC).
Containerization: GitHub builds a Debian-based Docker image containing all spatial dependencies (GDAL, PROJ).
Push: Image is pushed to Artifact Registry.
Trigger: pipeline_deploy.py is executed on the GitHub runner, submitting the job to Vertex AI Pipelines.

📁 Project Structure

├── .github/workflows/
│   └── deploy.yml           # CI/CD Automation script
├── input_collector/         # Training data (CSV/Parquet)
├── src/
│   ├── ML_functions.py      # Core helper functions & GCS logic
│   ├── HP_tuning.py         # Grid search & parameter optimization
│   └── ML_execution.py      # Production training & SHAP analysis
├── Dockerfile               # Production environment definition
├── pipeline_deploy.py       # Vertex AI Orchestrator (KFP)
└── pyproject.toml           # Dependency management (uv)

📊 Monitoring & Results

All results are automatically versioned and stored for auditability and research.

Storage: All results are synced to the ML_results/ directory in Google Cloud Storage.
Metrics: Detailed Evaluation Statistics (MAE, RMSE, and $R^2$) are generated for every lead time and county.
Explainability: SHAP Summary plots are logged directly to the Weights & Biases dashboard, providing global and local feature importance tracking.

🛠️ Local Setup

To test the environment or run small-scale experiments locally:

Generate Lockfile:
```
uv lock
```
Build Locally:
```
docker build -t drought-test .
```
Run Test:
```
docker run -it drought-test /bin/bash
```

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
Figures		Figures
README_files/libs		README_files/libs
src		src
.gitignore		.gitignore
.python-version		.python-version
README.html		README.html
README.md		README.md
README.txt		README.txt
dockerfile		dockerfile
local_test.py		local_test.py
main.py		main.py
oldpipeline_deploy.py		oldpipeline_deploy.py
oldpyproject.toml		oldpyproject.toml
pipeline_deploy.py		pipeline_deploy.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 Predictive food-insecurity Intelligence System

🛠️ Tech Stack & Ecosystem

🏗️ Model Architecture

1. Multi-Lead Forecasting

2. Clustering Logic

3. Hyperparameter Bridge

🚀 Deployment Pipeline (CI/CD)

Automated Workflow:

📁 Project Structure

📊 Monitoring & Results

🛠️ Local Setup

About

Uh oh!

Releases

Packages

Languages

Bokola/foodSecurityPred

Folders and files

Latest commit

History

Repository files navigation

🌍 Predictive food-insecurity Intelligence System

🛠️ Tech Stack & Ecosystem

🏗️ Model Architecture

1. Multi-Lead Forecasting

2. Clustering Logic

3. Hyperparameter Bridge

🚀 Deployment Pipeline (CI/CD)

Automated Workflow:

📁 Project Structure

📊 Monitoring & Results

🛠️ Local Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages