A self-hosted Node.js backend that ingests Alberta river conditions (water level
and discharge) from Environment and Climate Change Canada's (ECCC) public
hydrometric real-time feed — the same data behind rivers.alberta.ca — and
caches them behind a storage abstraction for later querying.
What's here now: the province-wide ingestion pipeline plus the multi-user service layered on top of it — a map web UI (
public/), passwordless (magic-link) auth, per-station favorites with threshold alert rules, and breach email delivery. It can run as one combined process or split into separate API and ingest processes that coordinate only through Postgres. Seedocs/ingestion-backend-plan.mdfor the ingestion backend anddocs/plan-highlevel.mdfor the multi-user email-alerts epic.
- Node.js 20+ (uses native
fetchand the built-in test runner) - npm 10+
npm install
npm startThe server boots on http://localhost:3000 by default. Verify it is healthy:
curl http://localhost:3000/health
# {"status":"ok","service":"waterwatch","env":"development","uptimeSeconds":0,"time":"..."}For local development with auto-reload:
npm run devwaterWatch stores everything in Postgres — there is a single storage path in
every environment (dev, test, prod); there is no SQLite fallback. Provide the
connection via DATABASE_URL (or the discrete PG* variables) in your env file;
see Configuration.
For local development and tests, a committed docker-compose.yml
stands up a throwaway Postgres 17 matching the .env.example defaults:
docker compose up -d # Postgres on localhost:5432 (waterwatch/waterwatch)
npm test # uses the docker-compose default connectionTo point the tests at a different database, set TEST_DATABASE_URL:
TEST_DATABASE_URL=postgres://user:pass@host:5432/db npm testThe schema is created automatically on first connect from the consolidated
src/storage/schema.sql.
A multi-stage Dockerfile is provided for self-hosted deployment.
The builder stage installs production dependencies against the lockfile and the
slim runtime stage ships only those plus the app, running as the non-root node
user.
Build the image:
docker build -t waterwatch:latest .Run it. The container binds 0.0.0.0:3000 and starts ingesting on boot. Provide
configuration with --env-file (copy .env.example to .env
first) and point DATABASE_URL at a reachable Postgres:
cp .env.example .env # then edit DATABASE_URL to your Postgres
docker run -d --name waterwatch \
-p 3000:3000 \
--env-file .env \
waterwatch:latestThe image declares a HEALTHCHECK that polls /health, so docker ps shows the
container's health. Verify and inspect it like any other deployment:
curl http://localhost:3000/health
curl http://localhost:3000/ingestion/statusStop it gracefully (the entrypoint uses exec form, so node is PID 1 and
receives SIGTERM directly — it stops the poller, drains in-flight requests, and
closes the Postgres pool before exiting):
docker stop waterwatchThese endpoints exist only to confirm that province-wide ingestion is
working — they are operational/debug aids, not a consumer query API. A rich
query surface (history ranges, filtering, consumer pagination) is intentionally
deferred to the later monitoring/alerting work (see
docs/ingestion-backend-plan.md, Phase 5).
| Endpoint | Purpose |
|---|---|
GET /health |
Liveness check (process is up). |
GET /ingestion/status |
Last successful poll time, current watermark, total readings, distinct stations. |
GET /stations |
Count + station ids/names — confirms province-wide coverage. |
GET /stations/:stationId/latest |
Most recent reading for one station (spot-check, e.g. the test station). |
# Is data landing and fresh?
curl http://localhost:3000/ingestion/status
# {"lastSuccessAt":"...","watermark":"...","totalReadings":600,"distinctStations":227,...}
# How many stations do we know about?
curl http://localhost:3000/stations
# {"count":1104,"stations":[{"stationId":"05AA004","name":"..."}, ...]}
# Spot-check the test station's latest reading.
curl http://localhost:3000/stations/05BL004/latest
# {"stationId":"05BL004","timestamp":"...","level":3.01,"discharge":177,...}
# Unknown station -> 404 {"error":"not_found","stationId":"..."}| Script | Purpose |
|---|---|
npm start |
Start the server (node src/index.js) |
npm run dev |
Start with --watch auto-reload |
npm test |
Run the test suite (node --test) |
npm run lint |
Lint with ESLint |
npm run lint:fix |
Lint and auto-fix |
npm run format |
Format with Prettier |
npm run format:check |
Check formatting without writing |
All configuration is read from environment variables once, at startup
(src/config/index.js), validated, and exposed as an immutable object. Copy
.env.example to .env and adjust as needed. Defaults:
| Variable | Default | Description |
|---|---|---|
NODE_ENV |
development |
Runtime environment. |
LOG_LEVEL |
debug (dev) / info (prod) |
Minimum log level: trace,debug,info,warn,error,fatal,silent. |
HOST |
0.0.0.0 |
HTTP bind address. |
PORT |
3000 |
HTTP port. |
PROVINCE |
AB |
Province/territory filter for the ECCC feed (province-wide ingest). |
ECCC_BASE_URL |
https://api.weather.gc.ca |
ECCC OGC API base URL. |
TEST_STATION |
05BL004 |
Station id used for manual spot-checks (not a scope limit). |
CONTACT_EMAIL |
info@<APP_BASE_URL host> |
Contact address sent in the ECCC API User-Agent (MSC usage policy); defaults from APP_BASE_URL. |
POLL_INTERVAL_MINUTES |
10 |
Ingestion poll interval (source updates ~every 5 min). |
INITIAL_BACKFILL_HOURS |
24 |
Cold-start backfill window when the store is empty. |
POLL_OVERLAP_MINUTES |
15 |
Re-query window before the watermark to catch late stragglers. |
RETENTION_DAYS |
30 |
Retention window for pruning old readings (the app keeps only the last 30 days). |
STATION_ACTIVE_WINDOW_DAYS |
30 |
Stations are shown on the map / in search only if they reported within this many days; hides dead/discontinued gauges. 0 disables; per-request opt-out via ?includeInactive=true. |
DATABASE_URL |
(docker-compose default) | Postgres connection string. Wins over the discrete PG* vars. |
PGHOST/PGPORT |
localhost/5432 |
Discrete Postgres host/port (libpq names) if no DATABASE_URL. |
PGDATABASE |
waterwatch |
Postgres database name. |
PGUSER/PGPASSWORD |
waterwatch/waterwatch |
Postgres credentials. |
PG_POOL_SIZE |
10 |
Max connections in the pg pool. |
AUTH_ADMIN_EMAILS |
(empty) | Comma-separated admin allow-list (case-insensitive); admin is config-derived, never stored. |
APP_BASE_URL |
http://localhost:3000 |
Public base URL for magic-link callbacks and post-login redirect. |
MAGIC_LINK_TTL_MINUTES |
15 |
Single-use magic-link token lifetime (minutes). |
SESSION_TTL_DAYS |
30 |
Session lifetime after login (days). |
AUTH_COOKIE_NAME |
ww_session |
Name of the session cookie. |
AUTH_COOKIE_SECURE |
true (prod) / false (dev) |
Set the Secure flag on the session cookie. |
Postgres is the only storage backend. A connection (
DATABASE_URLorPGDATABASE) is required in every environment;src/config/index.jsrejects startup without one.
Data-growth caveat: province-wide ingestion at ~5-minute cadence across hundreds of active stations accumulates quickly.
RETENTION_DAYSbounds the table: the ingestion scheduler prunes readings older than the window on every poll cycle.
src/
config/ # Centralized env-based configuration (single source of truth)
http/ # Fastify server + routes (/health + Phase 5 inspection endpoints)
routes/
lib/ # Cross-cutting utilities (structured logger)
datasource/ # ECCC OGC API client + normalization (Phase 2)
storage/ # Repository interface + Postgres implementation + migrations
ingestion/ # Scheduled polling service (Phase 4)
index.js # Entrypoint: wires config + logger + server, lifecycle
test/ # Test suite (node:test)
docs/ingestion-backend-plan.md # Implementation plan and phase status (this backend)
docs/plan-highlevel.md # High-level plan for the multi-user email-alerts epic
docker-compose.yml # Local/CI Postgres for dev + tests
Dockerfile # Multi-stage container build
.dockerignore # Keeps the build context small / image clean
.env.example # Sample env file (copy to .env)
ansible/ # Single-node deploy: provision (playbook.yml) + redeploy (deploy.yml)
waterWatch deploys as a single node: one host runs Postgres plus the combined
app container (node src/index.js = HTTP API + static web UI + ingestion +
alerts). The ansible/ directory automates the whole thing — it
builds the image on the host from source (no registry, no git on the box),
provisions Postgres, renders the env file, and runs the app as a systemd service.
First-time provisioning of a fresh Debian/Ubuntu host (run in order):
cd ansible
ansible-galaxy collection install -r requirements.yml
cp inventory.example.ini inventory.ini
cp group_vars/all.example.yml group_vars/all.yml
cp secrets.example.yml secrets.yml # set DB password and API tokens
$EDITOR inventory.ini # point at your host
$EDITOR group_vars/all.yml # domain, admin emails, deploy defaults
ansible-playbook firstrun.yml # timezone/NTP, upgrades, firewall
ansible-playbook playbook.yml # Postgres + build/run the app
ansible-playbook proxy.yml # (optional) nginx + Let's Encrypt TLSRoutine code updates — after a host is provisioned, this is the everyday "push my latest code" command. It ships whatever source is currently on disk (committed or not), rebuilds the image, and restarts only if something changed:
cd ansible
ansible-playbook deploy.ymldeploy.yml deliberately does not touch Postgres, the firewall, or backups —
it is just the fast app redeploy. Full details (what each playbook does, TLS,
outbound email, backups, security notes) live in
ansible/README.md; a high-level overview of the moving
parts is in docs/DEPLOY.md.
You can also build and run the container yourself against your own Postgres — see
Run with Docker above and docs/DEPLOY.md.
Note
Don't edge-cache the dynamic endpoints. If you front the app with a CDN
(e.g. Cloudflare), keep caching off for the JSON API (/api/*,
/stations/*, /ingestion/status) — caching them serves stale readings even
when ingestion is current. Static assets under public/ are safe to cache.
These are deliberately not built in this deliverable (see the Scope boundary
in docs/ingestion-backend-plan.md); they are the intended evolution and the design
keeps each change isolated to one seam:
- Alternative storage engines. Storage lives behind the
Repositoryinterface (src/storage/repository.js); the Postgres implementation is built bycreateRepository()insrc/storage/index.js. A different engine would slot in there behind the same interface with no change to ingestion or the HTTP layer, and is validated by reusing the existing storage contract test suite (runRepositoryContract). Queries already use real SQL so semantics carry over. - AMQP / Sarracenia push ingestion. The scheduled poller
(
src/ingestion/scheduler.js) is the only component that decides when new readings arrive. Swapping the poll loop for ECCC's AMQP push notifications changes just that module — it still normalizes via the Phase 2 client and writes through the same repository, so storage and reads are untouched. - Ingestion / read-API process split. (Built.) The app can run combined
(
src/index.js) or as two processes — the read/API server (src/api.js) and the ingest scheduler (src/ingest.js) — that coordinate only through Postgres (writer vs. readers). All three reuse the same wiring factories (src/app/wiring.js), so the storage interface stays the seam. Scaling the pieces onto separate hosts is the remaining step.
All six phases in docs/ingestion-backend-plan.md are complete: backend
scaffolding, the ECCC client, storage, the ingestion scheduler, the minimal
inspection endpoints, and packaging/ops/run docs (this Docker build + run guide).
Phase status is tracked in docs/ingestion-backend-plan.md.
MIT