XDP47 is a lightweight control plane + agent system for staged software updates on edge fleets (kiosks, POS, IoT). It supports multi-tenant data, wave-based rollouts, policies, and a demo stack with real-ish metrics from co-located apps.
- Control plane (Go/Chi + Postgres): REST + HTML UI for devices, rollouts, artifacts, policies, tenants, tokens, system metrics. Scheduler runs waves and enforces policies (require_ok, grace, ratios, skip_offline).
- Agent (Go): Claims a device, sends heartbeats, reads desired version/channel, downloads the artifact (URI), verifies SHA256, writes to shared state (
/var/lib/xdp47), and marks applied. Can source metrics from a local app viaXDP47_METRICS_URL. - Demo app (Go): Tiny HTTP server per device; reads
applied_versionfrom the shared volume and serves/metrics(cpu/mem/status derived from version). Used in demos to replace synthetic metrics. - Data model (aggregates): Device, Rollout, RolloutRun, Artifact, Policy, Tenant, ApiToken.
- Create an artifact with URI + SHA (optional signature/metadata).
- (Optional) Define a policy to set gating and offline rules.
- Create a rollout (tenant, selector by labels, channel, waves, artifact_id, policy_id).
- Start rollout: scheduler groups devices into waves; agents pull/apply the artifact.
- Observe status/wave history via UI or
/api/rollouts/{id}/runs; abort/retry as needed.
- DevOps home + diagram:
documentation/devops/README.md(seedevops_flow.pngfor CI/CD → GHCR → Helm → Kubernetes). - Deploy/Helm/local compose:
documentation/devops/deploy/README.mdanddocumentation/devops/docker/README.md. - CI/CD specifics:
documentation/devops/pipelines/README.md(GitHub Actions, Jenkins). - Security/scanning:
documentation/devops/security/README.md. - Testing flows:
documentation/devops/testing/README.md.
- Tokens: pages for rollouts/policies/tenants/artifacts have a “Bearer token” box. Dev token
devtoken123has full access; otherwise use a scoped token from/ui/tenants(e.g.,rollouts:write,artifacts:write,policies:write). - Devices:
/ui/devices(live list, auto-refresh). No token needed for GET. - Artifacts:
/ui/artifacts(create/list artifacts; needsartifacts:writefor POST). Required fields: tenant, URI, SHA256; optional: type, version, signature, metadata (JSON). - Rollouts:
/ui/rollouts(create/start/retry/abort rollouts; needsrollouts:write). Fields: tenant, artifact ID, channel, waves, optional policy, selector (key=value per line or comma-separated). - Policies:
/ui/policies(create/list policies; needspolicies:write). - Tenants & tokens:
/ui/tenants(create tenant; create API tokens with scopes; needstenants:write/tokens:write). - System:
/ui/system(runtime metrics from control pod; no token).
Use the dev compose described in documentation/devops/deploy/README.md for a local control + Postgres + agents + demo apps setup.
- Public GET:
/api/devices,/api/rollouts,/api/rollouts/{id}/runs,/api/artifacts,/api/policies,/api/tenants,/api/tokens,/healthz. - Auth POST (Bearer):
/api/devices/claim,/api/devices/{id}/heartbeat,/api/rollouts+:start|:retry|:abort,/api/artifacts,/api/policies,/api/tenants,/api/tokens. - Scopes:
devices:write,rollouts:write,artifacts:write,policies:write,tenants:write,tokens:write(dev token bypasses scopes).
- Control:
XDP47_DB_URL,XDP47_AGENT_TOKEN,XDP47_LISTEN_ADDR,XDP47_SCHED_INTERVAL,XDP47_SCHED_GRACE,XDP47_SCHED_REQUIRE_OK,XDP47_SCHED_SKIP_OFFLINE. - Agent:
XDP47_CONTROL_URL,XDP47_TENANT,XDP47_DEVICE_LABELS(comma key=val),XDP47_AGENT_TOKEN,XDP47_METRICS_URL(e.g., http://app:9090/metrics),XDP47_STATE_DIR,XDP47_DEVICE_ID(optional override).
Chart: source/deploy/helm/xdp47. See documentation/devops/deploy/README.md for install flags, images, and values (control/agent/demo app, Postgres, ingress).
Pipelines, stages, and secrets are documented in documentation/devops/pipelines/README.md (GitHub Actions and Jenkins).
- Control state in Postgres.
- Agent state in volume:
device_id,applied_version,applied_<artifact_id>. - Metrics: from demo app when
XDP47_METRICS_URLis set; fallback to synthetic random values.
- Waves distribute matching devices evenly. Status:
draft→running→completed/partial/failed/aborted. - Policies override scheduler defaults (require_ok, skip_offline, grace_seconds, min_success_ratio, max_failures_per_wave, abort_on_zero_eligible).
Test commands and smoke suites live in documentation/devops/testing/README.md (unit, migrations, API scripts).