NetEngine is a declarative platform for bootstrapping self-contained, authority-autonomous digital worlds. Give it a YAML spec and it provisions authoritative DNS, a private PKI/ACME CA, OIDC identity (Keycloak), network isolation (nftables), domain and world registries, mail, storage, and org applications — all running in Docker on a single host.
A world is a self-contained internet: its own TLD hierarchy, its own certificate authority, its own identity provider, and its own network policies. NetEngine turns a YAML spec into a live world in under ten minutes.
netengine up examples/minimal.yaml
That single command runs nine phases in sequence:
| Phase | What it provisions |
|---|---|
| 0 — Substrate | Docker networks, NTP, orchestrator init |
| 1–2 — DNS | CoreDNS root + platform zones, TLD hierarchy |
| 3 — PKI | step-ca root CA + ACME endpoint |
| 4 — Platform identity | Keycloak realm, admin user, platform OIDC client |
| 5 — Registries | World registry, domain registry, WHOIS server |
| 6 — In-world identity | Per-org Keycloak realms |
| 7 — ANDs | Administrative Network Domains (nftables isolation) |
| 8 — Services | Postfix, MinIO |
| 9 — Org applications | Org app deployments |
Each phase is idempotent — re-running netengine up skips already-completed phases.
- Python 3.13+
- Docker (Engine 24+, Compose optional)
- PostgreSQL 15+ with the pgmq extension — the easiest way is
docker compose up -d dbusing the includeddocker-compose.yml - Poetry (
pip install poetry)
# 1. Clone
git clone https://github.com/Forebase/NetEngine.git
cd NetEngine
# 2. Install dependencies
poetry install
# 3. Verify host prerequisites
poetry run netengine doctor
# 4. Start local Postgres + pgmq (includes pgmq extension pre-installed)
docker compose up -d postgres
# 5. Apply migrations
poetry run python -m netengine.utils.run_migrations
# 6. Boot a minimal world
poetry run netengine up examples/minimal.yamlIf you only want host/container checks before configuring Postgres, run poetry run netengine doctor --skip-db. Check status at any time:
poetry run netengine statusTear down:
poetry run netengine downWorlds are defined in YAML. See examples/ for reference:
| File | Description |
|---|---|
examples/minimal.yaml |
Bare minimum — no orgs, no ANDs, services off |
examples/single-org.yaml |
One organisation with residential AND |
examples/dev-sandbox.yaml |
Full dev setup with orgs, ANDs, mail, storage |
These are the official alpha operator paths. Run them after the Quickstart setup to validate the supported lifecycle.
poetry run netengine up examples/minimal.yaml
poetry run netengine up examples/minimal.yaml
poetry run netengine status
poetry run netengine diagnose examples/minimal.yaml
poetry run netengine down --dry-run
poetry run netengine down --yesThe second up proves idempotency.
Uses examples/single-org.yaml to prove org identity, DNS delegation, AND profile basics, and registry records.
poetry run netengine up examples/single-org.yaml
poetry run netengine up examples/single-org.yaml
poetry run netengine status
poetry run netengine diagnose examples/single-org.yaml
poetry run netengine down --dry-run
poetry run netengine down --yesUses examples/dev-sandbox.yaml as the feature-rich alpha demo. It is more experimental than Paths A/B if some integrations are still stabilizing.
poetry run netengine up examples/dev-sandbox.yaml
poetry run netengine up examples/dev-sandbox.yaml
poetry run netengine status
poetry run netengine diagnose examples/dev-sandbox.yaml
poetry run netengine down --dry-run
poetry run netengine down --yes- Fresh install works
- Boot completes
- Re-running
upis idempotent statusis accuratediagnoseexplains failuresreloadrejects immutable changesdown --dry-runlists resourcesdown --yesleaves no project-owned Docker resources behind
Large specs can be split across files:
# Base + environment overlay
poetry run netengine up examples/spec.base.yaml --env dev
# Inline override
poetry run netengine up spec.yaml --set metadata.name=my-world| Variable | Default | Description |
|---|---|---|
NETENGINE_DB_URL |
postgresql://netengine:dev_password@localhost:5432/netengine |
Local Postgres connection string |
SUPABASE_URL + SUPABASE_SERVICE_KEY |
— | Set both to use Supabase cloud instead of local Postgres |
NETENGINE_STATE_FILE |
netengines_state.json |
Path to runtime state JSON |
NETENGINE_MOCK |
false |
Set true to skip real Docker/DNS/PKI calls (useful for CI) |
NETENGINE_ZONE_DIR |
./data/coredns |
Directory for CoreDNS zone files |
Alpha migrations are forward-only unless a migration file includes explicit manual rollback notes. Inspect applied migrations with:
psql "$NETENGINE_DB_URL" -c "SELECT version, dirty FROM schema_migrations;"
psql "$NETENGINE_DB_URL" -c "SELECT version FROM schema_migrations ORDER BY version DESC LIMIT 1;"The second command identifies the last applied migration. If a migration fails, stop writers, capture the error and current schema_migrations state, inspect partially-created objects, and follow that migration's manual recovery notes before retrying. Without explicit rollback notes, restore from backup or rebuild only if the database is disposable. Wiping and reapplying migrations is acceptable for local/dev/CI alpha databases with no durable data; it is unsafe for shared, persistent, staging, production, or customer-like environments without an approved backup/restore plan.
pgmq queue additions should be treated as forward schema changes and should create both the queue and matching *_dlq queue. Prefer migrations over manual changes. If a queue must be created manually during alpha recovery, first inspect existing queues and then create both queues explicitly; remove queues only in disposable environments or after confirming no pending/audit messages are needed:
psql "$NETENGINE_DB_URL" -c "SELECT queue_name FROM pgmq.list_queues() ORDER BY queue_name;"
psql "$NETENGINE_DB_URL" -c "SELECT pgmq.create('new_queue'); SELECT pgmq.create('new_queue_dlq');"See docs/runbook.md for the full rollback and recovery procedure.
┌─────────────────────────────────────────────────┐
│ netengine CLI (click) │
│ up / down / status / reload / migrate │
└────────────────┬────────────────────────────────┘
│
┌────────────────▼────────────────────────────────┐
│ Orchestrator (core/orchestrator.py) │
│ Sequential phase execution, state machine │
│ Skip logic (idempotent re-runs) │
└────────────────┬────────────────────────────────┘
│
┌────────┴─────────┐
│ Phase handlers │ phases/ + handlers/
│ 0–9, each with: │
│ execute() │
│ healthcheck() │
│ should_skip() │
└────────┬──────────┘
│
┌────────────────▼────────────────────────────────┐
│ Event bus (pgmq over Postgres) │
│ EventEnvelope with correlation_id │
│ ConsumerSupervisor for background workers │
└─────────────────────────────────────────────────┘
Runtime state is persisted to netengines_state.json after each phase so interrupted runs can resume where they left off. The state file carries schema_version: netengine.runtime_state.v1; pre-version alpha.1 state files are detected as v1-compatible and stamped on the next save, while unknown future/foreign versions fail closed with instructions to export using the older release or migrate through a compatible release. The file is written atomically with 0600 permissions because it can contain runtime secrets (bootstrap admin password, OIDC client secrets, generated in-world admin passwords, and other phase outputs). Do not commit it, attach it to issues, or copy it into shared logs; use a sanitized support bundle instead.
Spec compatibility is tracked in metadata.schema_version (default netengine.spec.v1). The loader accepts missing schema versions for existing alpha specs, rejects unsupported versions before boot/import, and includes the spec schema in support bundles so alpha.2+ can decide whether an alpha.1 world is safe to restore.
Support bundles are produced with netengine export --out netengine-support-bundle.json or GET /api/v1/export. Bundles include schema metadata, the world spec, phase completion, public CA material, and sanitized phase outputs with secret-looking fields/private PEMs removed. Restore with netengine import <bundle-file> or POST /api/v1/import; import validates the bundle schema, spec schema compatibility, spec parseability, known phases, phase prerequisites, and required outputs before replacing local runtime state.
Persistent teardown safety requires typed confirmation. CLI teardown of a persistent world must pass netengine down --confirm <world-name>; the operator API requires both confirm=true and confirmation equal to the world name. --yes is intentionally not enough for persistent destructive operations.
Events flow phase-to-phase via pgmq queues defined by netengine/events/queues.py::PRIMARY_QUEUES. There are currently 11 primary queues (dns_updates, oidc_provisioning, and_provisioning, inworld_admissions, services_admissions, and_admissions, pki_cert_rotation_events, drift_events, world_health, gateway_portal_events, phase_events) plus 11 matching dead-letter queues (*_dlq) for failed messages.
# Run tests
poetry run pytest
# Type checking
poetry run mypy netengine
# Linting
poetry run black netengine tests
poetry run isort netengine tests
poetry run flake8 netengine
# Mock-mode test (no Docker needed)
NETENGINE_MOCK=true poetry run netengine up examples/minimal.yamlnetengine/
cli/ Click CLI (up, down, status, reload, migrate)
core/ Orchestrator, state, pgmq client, consumer supervisor
handlers/ Phase implementation handlers (DNS, PKI, gateway, …)
phases/ Phase handler wrappers (identity, registries, ANDs, services)
spec/ Pydantic v2 models + YAML loader with cross-field validation
events/ EventEnvelope schema (locked)
api/ FastAPI operator API
logging/ Structured logging (loguru)
errors.py Error hierarchy (SubstrateError, DNSError, PKIError, …)
migrations/ SQL schema + pgmq queue setup
examples/ Reference YAML specs
docs/ Architecture decisions, audit findings
When a world is running, the operator API is available at https://api.platform.internal:8080. Authentication uses the platform OIDC realm — include a bearer token from Keycloak.
GET /health Liveness check
GET /world Current world spec + phase status
GET /phases/{n} Individual phase status and output
The active development roadmap lives in the GitHub project. Key items:
- End-to-end integration test (real Docker, live DNS query, cert issuance, OIDC login)
- Complete operator API (org CRUD, AND management, domain management)
- Cross-world federation
-
persistentlifecycle mode (import/export, lifecycle guards, teardown confirmation) -
netengine down --dry-run
See LICENSE.