Skip to content

clarkbox/waterWatch

Repository files navigation

waterWatch

A self-hosted Node.js backend that ingests Alberta river conditions (water level and discharge) from Environment and Climate Change Canada's (ECCC) public hydrometric real-time feed — the same data behind rivers.alberta.ca — and caches them behind a storage abstraction for later querying.

What's here now: the province-wide ingestion pipeline plus the multi-user service layered on top of it — a map web UI (public/), passwordless (magic-link) auth, per-station favorites with threshold alert rules, and breach email delivery. It can run as one combined process or split into separate API and ingest processes that coordinate only through Postgres. See docs/ingestion-backend-plan.md for the ingestion backend and docs/plan-highlevel.md for the multi-user email-alerts epic.

Requirements

  • Node.js 20+ (uses native fetch and the built-in test runner)
  • npm 10+

Quick start

npm install
npm start

The server boots on http://localhost:3000 by default. Verify it is healthy:

curl http://localhost:3000/health
# {"status":"ok","service":"waterwatch","env":"development","uptimeSeconds":0,"time":"..."}

For local development with auto-reload:

npm run dev

Storage (Postgres)

waterWatch stores everything in Postgres — there is a single storage path in every environment (dev, test, prod); there is no SQLite fallback. Provide the connection via DATABASE_URL (or the discrete PG* variables) in your env file; see Configuration.

For local development and tests, a committed docker-compose.yml stands up a throwaway Postgres 17 matching the .env.example defaults:

docker compose up -d   # Postgres on localhost:5432 (waterwatch/waterwatch)
npm test               # uses the docker-compose default connection

To point the tests at a different database, set TEST_DATABASE_URL:

TEST_DATABASE_URL=postgres://user:pass@host:5432/db npm test

The schema is created automatically on first connect from the consolidated src/storage/schema.sql.

Run with Docker

A multi-stage Dockerfile is provided for self-hosted deployment. The builder stage installs production dependencies against the lockfile and the slim runtime stage ships only those plus the app, running as the non-root node user.

Build the image:

docker build -t waterwatch:latest .

Run it. The container binds 0.0.0.0:3000 and starts ingesting on boot. Provide configuration with --env-file (copy .env.example to .env first) and point DATABASE_URL at a reachable Postgres:

cp .env.example .env   # then edit DATABASE_URL to your Postgres
docker run -d --name waterwatch \
  -p 3000:3000 \
  --env-file .env \
  waterwatch:latest

The image declares a HEALTHCHECK that polls /health, so docker ps shows the container's health. Verify and inspect it like any other deployment:

curl http://localhost:3000/health
curl http://localhost:3000/ingestion/status

Stop it gracefully (the entrypoint uses exec form, so node is PID 1 and receives SIGTERM directly — it stops the poller, drains in-flight requests, and closes the Postgres pool before exiting):

docker stop waterwatch

Inspection / debug endpoints

These endpoints exist only to confirm that province-wide ingestion is working — they are operational/debug aids, not a consumer query API. A rich query surface (history ranges, filtering, consumer pagination) is intentionally deferred to the later monitoring/alerting work (see docs/ingestion-backend-plan.md, Phase 5).

Endpoint Purpose
GET /health Liveness check (process is up).
GET /ingestion/status Last successful poll time, current watermark, total readings, distinct stations.
GET /stations Count + station ids/names — confirms province-wide coverage.
GET /stations/:stationId/latest Most recent reading for one station (spot-check, e.g. the test station).
# Is data landing and fresh?
curl http://localhost:3000/ingestion/status
# {"lastSuccessAt":"...","watermark":"...","totalReadings":600,"distinctStations":227,...}

# How many stations do we know about?
curl http://localhost:3000/stations
# {"count":1104,"stations":[{"stationId":"05AA004","name":"..."}, ...]}

# Spot-check the test station's latest reading.
curl http://localhost:3000/stations/05BL004/latest
# {"stationId":"05BL004","timestamp":"...","level":3.01,"discharge":177,...}
# Unknown station -> 404 {"error":"not_found","stationId":"..."}

npm scripts

Script Purpose
npm start Start the server (node src/index.js)
npm run dev Start with --watch auto-reload
npm test Run the test suite (node --test)
npm run lint Lint with ESLint
npm run lint:fix Lint and auto-fix
npm run format Format with Prettier
npm run format:check Check formatting without writing

Configuration

All configuration is read from environment variables once, at startup (src/config/index.js), validated, and exposed as an immutable object. Copy .env.example to .env and adjust as needed. Defaults:

Variable Default Description
NODE_ENV development Runtime environment.
LOG_LEVEL debug (dev) / info (prod) Minimum log level: trace,debug,info,warn,error,fatal,silent.
HOST 0.0.0.0 HTTP bind address.
PORT 3000 HTTP port.
PROVINCE AB Province/territory filter for the ECCC feed (province-wide ingest).
ECCC_BASE_URL https://api.weather.gc.ca ECCC OGC API base URL.
TEST_STATION 05BL004 Station id used for manual spot-checks (not a scope limit).
CONTACT_EMAIL info@<APP_BASE_URL host> Contact address sent in the ECCC API User-Agent (MSC usage policy); defaults from APP_BASE_URL.
POLL_INTERVAL_MINUTES 10 Ingestion poll interval (source updates ~every 5 min).
INITIAL_BACKFILL_HOURS 24 Cold-start backfill window when the store is empty.
POLL_OVERLAP_MINUTES 15 Re-query window before the watermark to catch late stragglers.
RETENTION_DAYS 30 Retention window for pruning old readings (the app keeps only the last 30 days).
STATION_ACTIVE_WINDOW_DAYS 30 Stations are shown on the map / in search only if they reported within this many days; hides dead/discontinued gauges. 0 disables; per-request opt-out via ?includeInactive=true.
DATABASE_URL (docker-compose default) Postgres connection string. Wins over the discrete PG* vars.
PGHOST/PGPORT localhost/5432 Discrete Postgres host/port (libpq names) if no DATABASE_URL.
PGDATABASE waterwatch Postgres database name.
PGUSER/PGPASSWORD waterwatch/waterwatch Postgres credentials.
PG_POOL_SIZE 10 Max connections in the pg pool.
AUTH_ADMIN_EMAILS (empty) Comma-separated admin allow-list (case-insensitive); admin is config-derived, never stored.
APP_BASE_URL http://localhost:3000 Public base URL for magic-link callbacks and post-login redirect.
MAGIC_LINK_TTL_MINUTES 15 Single-use magic-link token lifetime (minutes).
SESSION_TTL_DAYS 30 Session lifetime after login (days).
AUTH_COOKIE_NAME ww_session Name of the session cookie.
AUTH_COOKIE_SECURE true (prod) / false (dev) Set the Secure flag on the session cookie.

Postgres is the only storage backend. A connection (DATABASE_URL or PGDATABASE) is required in every environment; src/config/index.js rejects startup without one.

Data-growth caveat: province-wide ingestion at ~5-minute cadence across hundreds of active stations accumulates quickly. RETENTION_DAYS bounds the table: the ingestion scheduler prunes readings older than the window on every poll cycle.

Project structure

src/
  config/       # Centralized env-based configuration (single source of truth)
  http/         # Fastify server + routes (/health + Phase 5 inspection endpoints)
    routes/
  lib/          # Cross-cutting utilities (structured logger)
  datasource/   # ECCC OGC API client + normalization (Phase 2)
  storage/      # Repository interface + Postgres implementation + migrations
  ingestion/    # Scheduled polling service (Phase 4)
  index.js      # Entrypoint: wires config + logger + server, lifecycle
test/           # Test suite (node:test)
docs/ingestion-backend-plan.md  # Implementation plan and phase status (this backend)
docs/plan-highlevel.md          # High-level plan for the multi-user email-alerts epic
docker-compose.yml  # Local/CI Postgres for dev + tests
Dockerfile      # Multi-stage container build
.dockerignore   # Keeps the build context small / image clean
.env.example    # Sample env file (copy to .env)
ansible/        # Single-node deploy: provision (playbook.yml) + redeploy (deploy.yml)

Deployment

waterWatch deploys as a single node: one host runs Postgres plus the combined app container (node src/index.js = HTTP API + static web UI + ingestion + alerts). The ansible/ directory automates the whole thing — it builds the image on the host from source (no registry, no git on the box), provisions Postgres, renders the env file, and runs the app as a systemd service.

First-time provisioning of a fresh Debian/Ubuntu host (run in order):

cd ansible
ansible-galaxy collection install -r requirements.yml
cp inventory.example.ini inventory.ini
cp group_vars/all.example.yml group_vars/all.yml
cp secrets.example.yml secrets.yml      # set DB password and API tokens
$EDITOR inventory.ini                    # point at your host
$EDITOR group_vars/all.yml               # domain, admin emails, deploy defaults
ansible-playbook firstrun.yml            # timezone/NTP, upgrades, firewall
ansible-playbook playbook.yml            # Postgres + build/run the app
ansible-playbook proxy.yml               # (optional) nginx + Let's Encrypt TLS

Routine code updates — after a host is provisioned, this is the everyday "push my latest code" command. It ships whatever source is currently on disk (committed or not), rebuilds the image, and restarts only if something changed:

cd ansible
ansible-playbook deploy.yml

deploy.yml deliberately does not touch Postgres, the firewall, or backups — it is just the fast app redeploy. Full details (what each playbook does, TLS, outbound email, backups, security notes) live in ansible/README.md; a high-level overview of the moving parts is in docs/DEPLOY.md.

Running by hand (no Ansible)

You can also build and run the container yourself against your own Postgres — see Run with Docker above and docs/DEPLOY.md.

Note

Don't edge-cache the dynamic endpoints. If you front the app with a CDN (e.g. Cloudflare), keep caching off for the JSON API (/api/*, /stations/*, /ingestion/status) — caching them serves stale readings even when ingestion is current. Static assets under public/ are safe to cache.

Future-upgrade path

These are deliberately not built in this deliverable (see the Scope boundary in docs/ingestion-backend-plan.md); they are the intended evolution and the design keeps each change isolated to one seam:

  • Alternative storage engines. Storage lives behind the Repository interface (src/storage/repository.js); the Postgres implementation is built by createRepository() in src/storage/index.js. A different engine would slot in there behind the same interface with no change to ingestion or the HTTP layer, and is validated by reusing the existing storage contract test suite (runRepositoryContract). Queries already use real SQL so semantics carry over.
  • AMQP / Sarracenia push ingestion. The scheduled poller (src/ingestion/scheduler.js) is the only component that decides when new readings arrive. Swapping the poll loop for ECCC's AMQP push notifications changes just that module — it still normalizes via the Phase 2 client and writes through the same repository, so storage and reads are untouched.
  • Ingestion / read-API process split. (Built.) The app can run combined (src/index.js) or as two processes — the read/API server (src/api.js) and the ingest scheduler (src/ingest.js) — that coordinate only through Postgres (writer vs. readers). All three reuse the same wiring factories (src/app/wiring.js), so the storage interface stays the seam. Scaling the pieces onto separate hosts is the remaining step.

Status

All six phases in docs/ingestion-backend-plan.md are complete: backend scaffolding, the ECCC client, storage, the ingestion scheduler, the minimal inspection endpoints, and packaging/ops/run docs (this Docker build + run guide). Phase status is tracked in docs/ingestion-backend-plan.md.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors