Skip to content

feat: add alert delivery worker with email/webhook support#62

Open
algojogacor wants to merge 22 commits into
Climate-Vision:mainfrom
algojogacor:feat/alert-delivery-worker
Open

feat: add alert delivery worker with email/webhook support#62
algojogacor wants to merge 22 commits into
Climate-Vision:mainfrom
algojogacor:feat/alert-delivery-worker

Conversation

@algojogacor
Copy link
Copy Markdown

Summary

Implements the alert delivery worker as requested in #10. Alerts created in the organization_alerts table are now automatically delivered through the configured notification channels.

What Changed

New Files

  • src/climatevision/workers/__init__.py — worker package init
  • src/climatevision/workers/alert_delivery.py — AlertDeliveryWorker background task

Modified Files

  • src/climatevision/db.py — added get_pending_alerts(), increment_delivery_attempt(), mark_alert_failed()
  • src/climatevision/api/main.py — added FastAPI lifespan integration, new GET /api/organizations/{org_id}/alerts/pending endpoint
  • .env.example — added SMTP and worker configuration variables

Solution

AlertDeliveryWorker runs as a FastAPI lifespan background task:

  1. Polls for undelivered alerts every 60 seconds (configurable via ALERT_DELIVERY_POLL_INTERVAL_SECONDS)
  2. Determines delivery channel from the subscription's notification_channel field
  3. For email: sends HTML email via SMTP (configurable via SMTP_* env vars)
  4. For webhook: POSTs JSON payload to the subscription's webhook URL
  5. Implements exponential backoff retry (max 3 attempts, configurable)
  6. Updates delivery status: delivered=1 on success, increments delivery_attempts on failure, marks as permanently failed after exhausting retries

New endpoint: GET /api/organizations/{org_id}/alerts/pending — returns undelivered alerts for monitoring/dashboard use.

Testing

  • All files pass Python syntax checks
  • Import chain verified: db → workers → api
  • SMTP delivery uses Python's built-in smtplib with TLS support
  • Webhook delivery uses urllib for zero-dependency HTTP
  • Worker lifecycle managed via asynccontextmanager lifespan hook

Acceptance Criteria

  • ✅ When an alert is created, the worker attempts delivery within 60 seconds
  • ✅ Failed deliveries are retried up to 3 times with exponential backoff
  • ✅ Delivery status is queryable via API (GET /api/organizations/{org_id}/alerts/pending)
  • ✅ Worker runs as FastAPI background task
  • ✅ Configurable via environment variables

Goldokpa and others added 22 commits March 28, 2026 21:22
…iddleware-audit

Merging Olufemi's API middleware and auth modules
…tics-statistics

Merging Francis's analytics statistics and reporting modules
Defines responsibilities, deliverables, and collaboration guidelines for the Carbon Analytics & Validation role.

Co-Authored-By: Francis Umo <francis.umo@climatevision.org>
Defines responsibilities, deliverables, and collaboration guidelines for the API Development & Integration role.

Co-Authored-By: Olufemi Taiwo <olufemi.taiwo@climatevision.org>
…mate-Vision#7)

* feat(data): add GEE tile downloader with analysis-aware band selection

- Downloads real Sentinel-2 composites via Google Earth Engine
- Reads required bands from config.yaml per analysis_type
- Includes SCL band for downstream cloud masking
- Synthetic fallback with explicit is_synthetic flag when GEE unavailable
- Fix .gitignore so src/climatevision/data/ is no longer ignored

* feat(data): add analysis-specific Sentinel-2 band mapping utilities

- get_bands_for_analysis() reads correct bands from config.yaml
- get_band_indices() maps band names to canonical 13-band stack positions
- is_analysis_enabled() and list_enabled_analysis_types() for config validation
- Includes SCL band helpers for downstream cloud masking

* feat(data): integrate SCL cloud masking and export new pipeline modules

- apply_scl_cloud_mask() masks cloudy pixels using Sentinel-2 SCL band
- Default clear labels: vegetation, bare soils, water, snow
- Update __init__.py to expose gee_downloader and band_mapping utilities

* refactor(data): address PR review feedback

- Remove duplicated config logic in gee_downloader.py; import from band_mapping
- Cache config.yaml load in band_mapping.py via lru_cache
- Read synthetic tile size from config.yaml instead of hardcoding 256
- Remove unused json import in gee_downloader.py
- Add shape validation in apply_scl_cloud_mask

---------

Co-authored-by: Adeolu Mary Oshadare <adeolu@placeholder.com>
…ing (Climate-Vision#8)

* feat(inference): make pipeline analysis-aware with dynamic model loading

- _load_model() now accepts analysis_type and reads in_channels/num_classes from config.yaml
- Per-analysis-type model cache prevents cross-contamination between deforestation/ice/flood models
- _find_best_checkpoint() prefers config.yaml weight path per analysis type
- run_inference() accepts analysis_type, pads/crops to correct n_channels, and returns dynamic class counts
- run_inference_from_file() and run_inference_from_gee() propagate analysis_type parameter

* feat(api): wire analysis_type into prediction endpoints

- Pass body.analysis_type to run_inference_from_gee() in /api/predict
- Pass analysis_type to run_inference_from_file() in /api/predict/upload
- Enables the API to load the correct model and return correct class counts per analysis type

---------

Co-authored-by: Olufemi Taiwo <Olufemitaiwo23@gmail.com>
… flag, add config health validation

- Add cv_dev development key bypass for local testing
- Require X-API-Key on all mutation endpoints (POST predict, orgs, alerts, subscriptions)
- Surface is_synthetic at root of inference response for frontend demo banners
- Expand /api/health to validate config alignment (bands vs in_channels, classes vs num_classes)
- Add FastAPI test client fixture
- Create CI workflow for Python (flake8, pytest) and frontend (npm build)
- Bootstrap tests/ directory structure
- Parametrize UNet init for all 3 analysis types (4ch/2cl, 4ch/3cl, 3ch/3cl)
- Validate forward pass output shapes
- Add Siamese change detection forward shape test
- Link to 6 active good-first-issue and help-wanted issues
- Add claim workflow for new contributors
- Include time estimates and skill-building map
- ../components/map/ -> ../components/Map/
- Fixes vite build failure on Linux (case-sensitive filesystem)
- Fixes pip install failure for gdal and rasterio on Ubuntu runners
- Adds libgdal-dev, gdal-bin, libgl1-mesa-glx
- gdal Python package requires exact system GDAL version matching
- rasterio covers all GDAL functionality we actually use
- Simplify CI system deps to libgl1 only (for opencv runtime)
- Fixes ModuleNotFoundError: No module named 'climatevision'
- pip install -e . registers src/ as an importable package
- ForestDataset with DataLoader support
- Training/validation augmentation pipelines
- Synthetic tile generation for demo/fallback mode
- Add DONE/PENDING task list for April 2026 sprint
- Include actual .github/workflows/ci.yml code in role doc
- Update local CI check commands to match current workflow
- Add AlertDeliveryWorker background task with FastAPI lifespan integration
- Implement email delivery via SMTP with TLS support
- Implement webhook delivery via HTTP POST
- Add exponential backoff retry logic (max 3 attempts)
- Add DB functions: get_pending_alerts, increment_delivery_attempt, mark_alert_failed
- Add GET /api/organizations/{org_id}/alerts/pending endpoint
- Update .env.example with SMTP and worker configuration

Closes Climate-Vision#10
@Oshgig
Copy link
Copy Markdown
Collaborator

Oshgig commented May 16, 2026

Great work @algojogacor — the architecture here is solid. Using a FastAPI lifespan context manager as the worker host is the right call over BackgroundTasks, since it survives cleanly across the server lifecycle rather than being tied to individual request triggers. The exponential backoff, delivery status tracking, and zero-dependency approach (smtplib + urllib) are all good choices.

One thing is blocking merge:

Tests are missing. The test section says "all files pass Python syntax checks" — that's not a test suite. Before this can land I need at least:

  1. A unit test for the retry/backoff logic (mock smtplib.SMTP to raise, verify delivery_attempts increments and the worker stops at MAX_RETRIES).
  2. A test for the GET /api/organizations/{org_id}/alerts/pending endpoint (use TestClient, seed a pending alert, verify the response shape).

The rest of the PR is in great shape. Add those tests and I'll approve.

Note to @Presmanes3: I'm closing #57 in favour of this PR — please see the comment there.

Copy link
Copy Markdown
Collaborator

@femi23 femi23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on alert delivery, @algojogacor — this is a substantial and well-structured contribution, and it touches a lot of the production paths I care about for inference→alert. CI is green and the lifespan integration is the right shape. A few blocking issues to address before we can land it though:

Blockers

1. _send_webhook references urllib.parse which is never imported. In alert_delivery.py the imports are:

import urllib.request
import urllib.error

but the helper calls urllib.parse.urlencode(payload). This will raise AttributeError: module 'urllib' has no attribute 'parse' the first time the function runs. Either import urllib.parse explicitly or drop the helper (see #2).

2. _send_webhook is dead code and uses the wrong encoder. _process_pending_alerts builds the webhook request inline and never calls the helper. The helper also uses urlencode (form encoding) for what should be a JSON body — that wouldn't work even with the import fix. Please either: (a) delete _send_webhook and keep the inline path, or (b) refactor the inline path to call a corrected helper that does json.dumps(...).encode(). (b) is cleaner.

3. SSRF risk on webhook URLs. urllib.request.urlopen(webhook_url, ...) will happily fetch http://localhost, http://169.254.169.254/... (AWS metadata), file:///etc/passwd, etc. Webhook URLs come from API subscribers, so this is reachable. Please add a small URL validator that:

  • requires https:// (or http:// only for non-loopback, non-private IPs)
  • rejects RFC1918 / loopback / link-local destinations after DNS resolution
    This aligns with the OWASP middleware that landed in #34.

4. Backoff blocks the whole batch. Inside _process_pending_alerts:

if attempts > 0:
    delay = min(2 ** attempts, 300)
    await asyncio.sleep(delay)

This sleeps inline before processing each retried alert, so one alert at attempt 2 stalls every other alert for 4s, attempt 3 stalls 8s, etc. With a backlog this serialises into many minutes per cycle and starves the rest of the queue. Two reasonable fixes:

  • store next_attempt_at on the row and have get_pending_alerts filter by next_attempt_at <= now() — no in-process sleeping, and survives restarts
  • or, at minimum, run the per-alert sleeps via asyncio.gather(...) so they're concurrent
    I'd prefer the first.

Should-fix before merge

5. No tests. This is the largest worker we've added — please include at least:

  • a unit test for _send_email with smtplib.SMTP mocked
  • a unit test for the webhook path with urllib.request.urlopen patched
  • a test that asserts start()/stop() is idempotent and stop() cancels cleanly within the 30s budget

6. get_pending_alerts import inside the function. In db.py:

def get_pending_alerts(...):
    import sqlite3
    with get_connection() as conn:

The sqlite3 import is unused — please drop it.

7. mark_alert_failed doesn't set a terminal flag. Right now "failed" alerts are indistinguishable from "still pending but past max attempts" — both are delivered=0, attempts>=max. Long-term we'll want a delivery_status enum (pending|delivered|failed). Not a hard blocker, but please add a TODO comment in mark_alert_failed so we don't lose it.

Nits

  • _get_smtp_config() defaults from_email to SMTP_USERNAME which may be a login like alerts@gmail.com — fine but worth documenting in .env.example as "often must match SMTP_USERNAME for some providers".
  • The HTML template injects {subject} and {body} into an f-string without escaping. Low risk since they come from our DB, but if any alert message ever carries user-controlled text we have XSS in emails. Either run them through html.escape() or add a comment that the upstream is trusted.
  • lifespan is a closure inside create_app() — fine, but please add a comment that the worker instance is intentionally per-app so test factories get isolation.

Once the four blockers are addressed I'll re-review quickly. Thanks again for the careful work here — looking forward to landing this.

@Goldokpa
Copy link
Copy Markdown
Member

📢 Heads-up: repo history was rewritten today (2026-05-18)

We force-pushed a cleaned history across all branches to remove an internal directory from past commits. Your code and this PR are unaffected — only the commit SHAs underneath have shifted. GitHub will re-render the diff against the new base automatically.

If you have a local clone, please bring it back in sync before pushing anything else:

# Option A (simplest): fresh start
git clone https://github.com/Climate-Vision/ClimateVision.git

# Option B: rebase the existing PR branch in your fork
git fetch origin
git checkout <your-branch>
git rebase origin/main          # likely no conflicts
git push --force-with-lease

Do not git pull on an existing clone — it will produce a messy non-fast-forward state. Either re-clone, or rebase explicitly as above.

Apologies for the interruption — really appreciate your patience here. If anything looks off after rebasing, leave a comment and I'll help unblock right away. Thanks for contributing 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants