Last updated: 2026-05-03.
Infrastructure and deployment orchestration for the AxiomNode platform.
- Kubernetes base manifests and overlays.
- Environment-specific compose assets.
- Infrastructure validation and deployment automation.
- Cross-repository image build orchestration.
- Dev local orchestration for full-stack container runtime.
-
dev- Local-only distribution.
- Full stack runs via Docker Compose on a developer machine.
- Service-to-service connections are local (localhost/host.docker.internal).
- Entry point:
scripts/dev-local-stack.sh.
-
stg- Remote Kubernetes distribution on
sebss@amksandbox.cloud. - Public domains route through ingress.
- The default staging overlay currently deploys the split AI runtime (
ai-engine-api,ai-engine-stats,ai-engine-cache) in-cluster. - The llama runtime may remain external and is therefore still a runtime-routing concern rather than a pure manifest concern.
- When you need the full in-cluster AI topology again, use the optional
kubernetes/overlays/stg-with-ai-enginevariant through manual deploy. - CI/CD auto-deploy target after successful image builds on
main.
- Remote Kubernetes distribution on
-
prod- Final distribution tier for production scalability.
- Can run distributed services and external cloud-managed resources (DB, ingress, scaling).
- Deployment is manual/controlled, not the default automatic target.
kubernetes/: base resources +dev/stg/prodoverlays.environments/: compose-based integration environments.terraform/: infrastructure as code modules..github/workflows/: CI/CD workflows.
kubernetes/README.mdkubernetes/base/README.mdenvironments/dev/README.mdenvironments/stg/README.mdenvironments/prod/README.mdterraform/README.md
This repository owns deployment and packaging policy, not business behavior.
Concrete ownership includes:
- manifest shape and overlay composition
- image selection and rollout behavior
- cross-repository build orchestration
- environment rendering and rollout validation
Service-specific business contracts, route semantics, and domain validation belong in their respective service repositories.
-
validate-infra.yml- Trigger: push (
main,develop), pull request, manual dispatch. - Purpose: validates required infrastructure directories, blocks mutable Kubernetes
:latestimage tags, and rendersdev/stg/prodoverlays withkubectl kustomize.
- Trigger: push (
-
build-push.yaml(Build & Push Docker Images)- Trigger: push (
main,develop) and manual dispatch. - Purpose: detects changed services (or selected service), checks out source repos, and publishes images to GHCR.
- Notes:
- Uses
CROSS_REPO_READ_TOKENto access private source repos. - Publishes
devtags, and onmainalso publishesstg. - Covered service repos dispatch this workflow only after their own validation jobs succeed on
main. - Optional
publish_prod_tag=trueon manual dispatch adds mutableprodtags for controlled production promotion.
- Uses
- Trigger: push (
-
deploy.yaml(Deploy to Kubernetes)- Trigger: successful completion of
build-push.yamlonmain, or manual dispatch. - Current policy: automatic deployment is pinned to
stg. - Purpose: validates manifests, renders the selected overlay, applies manifests to k3s, and waits for rollout.
- Notes:
- Workflow-driven staging deploys pin changed services to the immutable short-SHA tags produced by the triggering build run.
- Manual deploys keep the environment tags (
stg/prod) and still force restarts when a mutable tag must be refreshed. - Manual staging deploys can opt into the
stg-with-ai-engineoverlay withinclude_ai_engine=true.
- Safety: rollout status + available replica checks fail the workflow if services are not healthy.
- Trigger: successful completion of
- A service repo receives a push on
main. - That repo CI validates build, tests, lint, and any service-specific smoke checks.
- Only after those checks succeed, the repo dispatches
platform-infra/.github/workflows/build-push.yamlwith a service input. - Build/push publishes updated image tags in GHCR.
deploy.yamlruns and applies changes toaxiomnode-stg.
Covered automatic chain services:
api-gatewaybff-mobilebff-backofficebackofficeai-engine-apiai-engine-statsmicroservice-quizzmicroservice-wordpassmicroservice-users
Not covered by this automatic GHCR-to-k3s chain:
mobile-app- external llama runtime hosts
The optional stg-with-ai-engine overlay is still required when you want the full in-cluster AI topology, including the llama runtime, for controlled smoke tests or diagnostics measurements.
platform-infra describes deployed resources, but not the whole effective topology.
In particular:
bff-backofficecan persist service-target overridesapi-gatewaycan persist the live ai-engine targetai-engine-apican persist the active llama target
Operational documentation must therefore be read together with the central runtime-routing documents.
Run all dev services locally with a single script:
./scripts/dev-local-stack.sh up cpuUseful commands:
./scripts/dev-local-stack.sh status
./scripts/dev-local-stack.sh logs api-gateway
./scripts/dev-local-stack.sh downRun an in-cluster ai-engine canary against staging without port-forwarding, but only when you deliberately deploy the optional in-cluster ai-engine manifests:
./scripts/ai-engine-stg-canary.shRun a public staging smoke for edge, aggregated services, apps, and AI-exposed checks, excluding the llama runtime:
./scripts/smoke-stg-edge.shUseful overrides:
GAME_TYPE=word-pass QUERY="sistema solar" ./scripts/ai-engine-stg-canary.sh
QUERY="teorema de pitagoras" CATEGORY_ID=19 NUM_QUESTIONS=3 ./scripts/ai-engine-stg-canary.shCROSS_REPO_READ_TOKENGHCR_PULL_USERNAMEGHCR_PULL_TOKENK3S_HOSTK3S_USERK3S_SSH_KEY
GITHUB_TOKEN is used by the build workflow to publish packages to GHCR.
- Keep real environment secret files only in untracked local paths such as
secrets/dev.env,secrets/stg.env, andsecrets/prod.env. - Use the committed templates under
secrets/*.env.exampleas the starting point. - Do not commit populated
secrets/*.envfiles;.gitignoreintentionally blocks them.
This repository should document:
- what gets built and deployed
- which overlays exist and when each one is used
- how automatic versus manual rollout works
- which parts of runtime behavior are outside declarative manifest ownership
kubernetes/README.mdenvironments/dev/README.mdenvironments/stg/README.mdenvironments/prod/README.md