An AWS-native Internal Developer Platform delivered by coding agents. The repo is the platform, GitHub Issues are its state machine, and Cursor / Claude Code / Gemini CLI agents are the primary contributors. Humans review, approve, and unblock.
| Operating contract | AGENTS.md |
| Agent onboarding | docs/AGENT-GETTING-STARTED.md |
| AWS operator onboarding | docs/onboarding/ (Platform / Organization / Application tracks) |
| Workflow board | Catalyst Progress / Andon (Project #3) |
Catalyst is an Internal Developer Platform (IDP) control plane for AWS: deployable Terraform modules, a FastAPI control-plane service (services/catalyst-api/), a Knack CLI (clients/catalyst-cli/), CI/CD pipelines, and the agent-system itself. It is also an AI-native delivery model (ADR-011): instead of humans driving a portal or a Terraform repo by hand, agents pick up tracked issues and execute against a Gherkin acceptance contract.
Three properties make this work:
- GitHub Issues are the durable state machine. Every unit of work is a labelled issue. The legal
state/*transitions and the rest of the label vocabulary live indocs/ADR/STATE-MACHINE.md; the rationale is ADR-001. - One operating contract, three tools.
AGENTS.mdis the shared contract for Cursor, Claude Code, and Gemini CLI. Canonicalskills/andagents/trees are wired into each tool's directory bypython platform/bootstrap.py(ADR-024, ADR-022). - AWS work is OIDC-only. No long-lived AWS keys live in CI, secrets, or developer laptops. Plan, apply, deploy, and drift all use distinct GitHub OIDC roles (ADR-005, ADR-006).
git clone https://github.com/Cloud-Byte-Consulting/Catalyst.git
cd Catalyst
git checkout release
python platform/bootstrap.py # wire .cursor/, .claude/, .gemini/ to canonical skills/ + agents/
python platform/bootstrap.py --check # verify (CI runs this; exits 1 on drift)Then follow docs/AGENT-GETTING-STARTED.md for the clone → IDE → AGENTS.md → first issue path. Per-IDE specifics (rules, hooks, MCP wiring) live in docs/multi-tool-config.md.
| IDE | Entry file | Generated MCP config |
|---|---|---|
| Cursor | AGENTS.md |
.cursor/mcp.json |
| Claude Code | CLAUDE.md + AGENTS.md |
.mcp.json |
| Gemini CLI | GEMINI.md + AGENTS.md |
.gemini/settings.json |
Picking a first issue. Browse state/pending work with no kind/umbrella:
gh issue list --label "state/pending" --label "type/kaizen" --search "no:assignee"
gh issue list --label "state/pending" --label "kind/docs"The full label vocabulary and what each kind/* routes to is in docs/ADR/STATE-MACHINE.md §2.7.
| Path | What lives here |
|---|---|
AGENTS.md, CLAUDE.md, GEMINI.md |
Operating contract (shared) + tool-specific entry stubs |
agents/ |
Canonical persona system prompts (terraform-engineer, checkov-expert, security-hardener, cicd-operator, etc.) |
skills/ |
Canonical agent skills, including catalyst-agent-routing and catalyst-issue-context |
platform/ |
Bootstrap (bootstrap.py), MCP registry (mcp.servers.json), persona dispatch metadata — see platform/README.md |
services/catalyst-api/ |
FastAPI control plane (Tier 1/2 golden paths, product catalogue, inventory, deployment history) |
clients/catalyst-cli/ |
Knack-based CLI for the API |
infrastructure/ |
Terraform root + modules (network, IAM, runtime, KMS, ECR, ECS, Lambda, Aurora) |
.github/workflows/ |
CI/CD pipelines — see .github/workflows/README.md |
docs/ADR/ |
Accepted architectural decisions (start at ADR-001) |
docs/onboarding/ |
AWS operator runbooks (Platform / Organization / Application) per ADR-012 |
diagrams/ |
Architecture diagrams (Mermaid + Draw.io) — index at diagrams/README.md |
registry/ |
catalyst-domains.yaml and other machine-queryable indexes for agent routing |
AGENTS.md is the operating contract, shared across Cursor, Claude Code, and Gemini CLI. The contract is six in-session gates (ADR-011) layered on top of the unified agent config (ADR-024):
- Model decision logging — before implementation and at every material architecture, security, deploy, or scope decision, post an
### Agent Decision Logcomment on the tracking issue (model selected, fallback known, rationale, next). - RLM trigger at ~50k chars — long artifacts (diffs, terraform plans, CloudWatch exports, codebase reads spanning >10 files) route through the recursive language model scaffold (ADR-004) before inline reading.
- AWS OIDC only — no long-lived AWS keys anywhere; plan/apply/deploy/drift use distinct GitHub OIDC roles (ADR-006), enforced by
validate_workflows.py. - Container security gate — Trivy (or Scout) HIGH/CRITICAL gate, SPDX 2.3 + CycloneDX 1.5 SBOMs, SARIF upload to GitHub code scanning on every PR touching a Dockerfile or image.
- Pre-PR peer review (mandatory) — before
gh pr create, spawn a peer-review sub-agent that checks correctness vs Gherkin AC, ADR compliance, security, coverage (--cov-fail-under=85minimum), and naming. Every suggestion is recorded on the tracking issue as ACCEPT or REJECT with a one-line rationale. - GitHub MCP secret scanning —
secret_protectiontoolset invoked on any PR adding files or credentials before posting thepr_merged=truedone gate.
Beyond the gates, day-1 agent work also follows the PR-open contract: every PR MUST contain Closes #<N> AND be added to the Catalyst Progress project (#3) via gh project item-add 3 --owner Cloud-Byte-Consulting --url <pr-url>. Agents refuse to call gh pr create if either check would fail — see skills/pr-open-contract/SKILL.md.
Issue body shape, comment headings, and dependency handling are spelled out in docs/issue-execution-gherkin-workflow-2026-05-13.md. Persona routing per topic is in skills/catalyst-agent-routing/SKILL.md. Evidence the contract is followed in practice: docs/ai-workflow-narrative.md.
What still needs humans. PR merges (branch protection on release), any issue in state/blocked-on-human, the one-time AWS account bootstrap, and any agent action the intent-judge rule flags as clarify or deny.
gh issue view 331 --json title,body,labels
python3 scripts/catalyst-issue-context.py --issue 331 # Gherkin AC + handoff summary
# Read skills/catalyst-agent-routing/SKILL.md, pick agents/<persona>.md, load itFree-form work without an existing issue: python3 scripts/catalyst-ask.py "<task>" to find the right domain in registry/catalyst-domains.yaml, then open an issue with the correct kind/* and construct-address labels.
- Implement against the Gherkin AC on the tracking issue.
- Run the pre-PR peer review sub-agent (gate 5).
gh pr createwithCloses #<N>in the body.gh project item-add 3 --owner Cloud-Byte-Consulting --url <pr-url>(requiresread:projectscope on the gh token —gh auth refresh -s read:projectif missing).gh pr view --json projectItemsto confirm the project assignment took.
# Terraform sanity (no AWS calls)
terraform -chdir=infrastructure init -backend=false
terraform -chdir=infrastructure fmt -recursive -check
terraform -chdir=infrastructure validate
terraform -chdir=infrastructure test
# API service (coverage gate 93.83%)
pip install -r services/catalyst-api/requirements-dev.txt
pytest services/catalyst-api/tests --cov=services/catalyst-api/catalyst --cov-branch --cov-fail-under=93.83
# CLI client (coverage gate 80%; live tests opt-in via -m live)
pip install -r clients/catalyst-cli/requirements-dev.txt
pytest clients/catalyst-cli/tests -m 'not live' --cov=catalyst_cli --cov-branch --cov-fail-under=80API smoke against a deployed Catalyst ALB lives in docs/smoke-tests.md. Full CI inventory is in .github/workflows/README.md.
| Doc | What it covers |
|---|---|
| ADR-001 — Issues as state machine | Why GitHub Issues are the durable state machine and comments are the audit trail |
| ADR-002 — Construct hierarchy | The five-level Tenant → Environment → LandingZone → Project → Application model and its label/tag mapping |
| ADR-005 — AWS agentic platform engineering | OIDC-only AWS, Trivy/SBOM/SARIF gates, knack CLI + pytest/moto test conventions |
| ADR-011 — Catalyst agentic workflow | The six-gate operating contract (AGENTS.md is the implementation) |
| ADR-012 — Onboarding experience | The three AWS-operator onboarding tracks under docs/onboarding/ |
| ADR-022 — Multi-tool skill layout | Canonical skills/ and agents/ paths shared across tools |
| ADR-024 — Unified agent config + issue indexing | Bootstrap-generated tool trees and the kind/* label index |
docs/ADR/STATE-MACHINE.md |
Authoritative label vocabulary, legal transitions, kind/* routing table |
docs/issue-execution-gherkin-workflow-2026-05-13.md |
Issue body shape, Gherkin AC, comment headings |
docs/multi-tool-config.md |
Per-IDE wiring (Cursor rules, Claude adapters, Gemini settings, MCP) |
docs/ai-workflow-narrative.md |
Evidence trail for the contract — real PRs, decision logs, peer-review threads |
The full index is docs/ADR/ (24 ADRs accepted as of release). DECISIONS.md is a pointer back to it.
For AWS platform operators (not agent contributors), pick the audience-sliced track that matches your role (ADR-012):
| Track | Audience | Runbook | Outcome |
|---|---|---|---|
| A — Platform | Cloud / platform engineering | docs/onboarding/platform.md |
Fresh AWS account ready for GitOps; Catalyst API reachable |
| B — Organization | Team / lab leaders (Owners) | docs/onboarding/organization.md |
Tenant hierarchy registered (OUs, landing zones, environments) |
| C — Application | Application teams (Administrators) | docs/onboarding/application.md |
App onboarded onto an existing tenant via POST /services/onboard |
Index and cross-track sequencing: docs/onboarding/README.md. Day-0 canonical sequence: docs/operator-bootstrap.md. The Catalyst GitHub Actions environment variables and secrets are in .github/workflows/README.md.
Seven diagrams, indexed at diagrams/README.md.
GitHub-inline (Mermaid):
diagrams/control-plane.md— request path (caller → ALB → Lambda → DynamoDB/SSM/STS), provisioning tiers, phase ordering enforced byterraform.yml+service-cd.ymldiagrams/ha.md— multi-AZ ECS topology with Aurora Serverless v2, NAT redundancy, failure-mode coveragediagrams/gitops.md— PR → plan → review → apply → deploy → andon flow with OIDC role separation
Draw.io (open in IDE plugin or app.diagrams.net):
diagrams/network-layer.drawio— VPC, subnets, NAT, IGW, VPC endpoints, security groupsdiagrams/application-layer.drawio— request path, persistence, SigV4 RBAC, ECS Fargate runtimediagrams/agentic-workflow.drawio— six gates, Issues state machine, peer-review fork, OIDC per phasediagrams/cicd-pipeline.drawio— all nine GitHub Actions workflows, gates, OIDC roles, outcomes
Authoring conventions: skills/draw-aws-diagrams/SKILL.md.
Catalyst splits provisioning into two tiers; the GitHub Actions pipeline is the single source of truth for everything outside the one-time bootstrap. Full table in .github/workflows/README.md.
| Tier | Owns | How |
|---|---|---|
| Bootstrap (one-time) | IAM bootstrap-admin role, GitHub OIDC provider, catalyst-github-{plan,apply,deploy} roles, RBAC IAM groups, Terraform state S3 bucket, Terraform DynamoDB lock, Catalyst API data S3 bucket |
scripts/bootstrap-aws-account.sh |
| Pipeline (ongoing) | VPC + subnets + NAT + endpoints, security groups, ECR, ECS cluster + ALB + target group, Lambda runtime, DynamoDB platform-state table, KMS CMKs (ADR-016), Aurora Serverless v2 (ADR-019), optional Network Firewall | terraform.yml (PR plan + release apply) → tf-drift.yml (daily) |
| Service deploy | Catalyst API container image build/push + runtime update (Lambda or ECS via RUNTIME knob — ADR-009) |
service-cd.yml |
Self-deploy on top of Catalyst itself: catalyst products deploy catalyst-api --construct <addr> (docs/products.md, #103).
After teardown or to trigger teardown on demand (demo env auto-destructs nightly at 22:00 UTC Mon–Fri via teardown-scheduled.yml): docs/teardown.md.
Resolution order per setting: explicit env var → SSM parameter (CATALYST_*_PARAMETER) → Secrets Manager secret (CATALYST_*_SECRET) → built-in default.
| Setting | Env var | Default | Notes |
|---|---|---|---|
| Repository backend | CATALYST_REPOSITORY |
memory |
Set to dynamodb in production. |
| DynamoDB table name | CATALYST_DYNAMODB_TABLE or CATALYST_DYNAMODB_TABLE_PARAMETER |
resolved from SSM in prod | Module: infrastructure/modules/dynamodb. |
| Auth mode | CATALYST_AUTH_MODE |
headers |
sigv4 enforces presigned-STS verification + iam:ListGroupsForUser. |
| Group cache TTL | CATALYST_GROUP_CACHE_TTL |
300 (seconds) |
Per-process IAM group cache. |
| Runtime secrets | CATALYST_RUNTIME_SECRET_<KEY>_SECRET |
unset | Names a Secrets Manager secret for <KEY>. |
| API ingress allowlist | CATALYST_API_INGRESS_ALLOWLIST |
["73.239.59.22"] |
JSON array; plain IPv4 entries normalized to /32, passed as TF_VAR_alb_ingress_allowlist. |
Scoped RBAC groups use catalyst-{tenant}--{project}--{role}. Migration notes: docs/migrations/2026-05-15-rbac-scoped-group-delimiter.md.
- Tracking work / status: Catalyst issues and the Catalyst Progress / Andon board (Project #3). Viewing project items via
gh projectrequires theread:projectscope on your gh token (gh auth refresh -s read:projectonce). - Issue context for an agent session:
skills/catalyst-issue-context/SKILL.md—python3 scripts/catalyst-issue-context.py --issue <N>exports labels, Gherkin AC, and handoff comments. - Agent routing questions ("which persona owns this?"):
skills/catalyst-agent-routing/SKILL.md— topic-to-persona table. - Specific personas: see
agents/<name>.md(e.g.agents/terraform-engineer.md,agents/checkov-expert.md,agents/security-hardener.md,agents/cicd-operator.md,agents/platform-engineering-architect.md). - Stuck on a gate / refusal: when an agent posts a
state/blocked-on-humancomment, the comment is the handoff. Read it, decide, then re-labelstate/agent-working(rules indocs/ADR/STATE-MACHINE.md§3). - File a kaizen issue (improvement to docs, gates, contracts, or tooling): open with labels
type/kaizen·kind/<domain>·tenant/catalyst·env/shared·lz/shared·project/platform·app/kaizen·state/pending. The kind/docs default routes toplatform-engineering-architect; otherkind/*values route per the STATE-MACHINE.md §2.7 table. - Maintainers: no
CODEOWNERSorCONTRIBUTING.mdonreleaseyet — opening a kaizen issue is the canonical way to flag both content and policy gaps.
- Run
python platform/bootstrap.pyafter clone or wheneverskills//agents/change (ADR-024). - Read
AGENTS.md— it is the operating contract for every tool (Cursor, Claude Code, Gemini CLI). - Work an issue per
docs/issue-execution-gherkin-workflow-2026-05-13.md: Context / Scope / Gherkin AC + structured handoff comments. - Run the pre-PR peer-review sub-agent (gate 5) before
gh pr create. - Honour the PR-open contract:
Closes #<N>in the body andgh project item-add 3 --owner Cloud-Byte-Consulting --url <pr-url>immediately after PR creation. Seeskills/pr-open-contract/SKILL.md.
BSD 3-Clause. Security policy: SECURITY.md.