-
Notifications
You must be signed in to change notification settings - Fork 18
docs: architecture diagram #1142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| --- | ||
| title: "System Overview" | ||
| description: "Birds-eye view of the ctrlplane orchestration flow" | ||
| --- | ||
|
|
||
| This is the developer-facing entry point to the ctrlplane codebase. It shows | ||
| how the apps in this monorepo fit together when a deployment version moves | ||
| from creation to execution. | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| CLI["CLI / curl"]:::ext | ||
| Users["Users<br/>(browser)"]:::ext | ||
| Web["apps/web<br/><i>React + tRPC client</i>"] | ||
| API["apps/api<br/><i>Express + tRPC + webhooks</i>"] | ||
| DB[("Postgres<br/><b>reconcile_work_scope</b>")] | ||
| Engine["apps/workspace-engine<br/><i>Go controllers</i>"] | ||
| Agents["Job agents<br/>GitHub Actions · ArgoCD ·<br/>Terraform Cloud · custom"]:::ext | ||
|
|
||
| Users --> Web | ||
| Web -->|tRPC| API | ||
| CLI -->|"① register version"| API | ||
| API -->|"② enqueue work"| DB | ||
| DB <-->|"③ lease / requeue"| Engine | ||
| Engine -->|"④ dispatch job"| Agents | ||
| Agents -->|"⑤ result"| API | ||
| API -->|"⑥ enqueue follow-up"| DB | ||
|
|
||
| classDef ext fill:#444,stroke:#888,color:#ddd | ||
| ``` | ||
|
|
||
| ## The orchestration loop | ||
|
|
||
| CLI or `curl` calls register a deployment version against `apps/api` (①). The | ||
| api persists the version and writes a work item into the `reconcile_work_scope` | ||
| table in Postgres (②) — **this is the only thing the api does to "start" | ||
| orchestration; it does not call the engine.** | ||
|
|
||
| `apps/workspace-engine` controllers continuously lease items from that queue | ||
| (③), and each controller's output enqueues work for the next controller | ||
| (planning → policy → dispatch). When dispatch fires, the engine reaches out to | ||
| a job agent over HTTPS (④). | ||
|
|
||
| Results come back through webhooks to the api (⑤), which writes the job update | ||
| plus any follow-up work into the queue (⑥). The engine picks it up again. The | ||
| loop ③↔⑥ is the whole orchestration model — every release phase is a trip | ||
| through the queue. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,143 @@ | ||
| --- | ||
| title: "Workspace Engine" | ||
| description: "How apps/workspace-engine orchestrates the release lifecycle" | ||
| --- | ||
|
|
||
| The workspace-engine is the Go service that drives every release forward. It | ||
| polls a Postgres work queue (`reconcile_work_scope`), leases items by `kind`, | ||
| and runs the matching controller. Each controller's output is enqueueing more | ||
| work, so a single release moves through phases by chaining items through the | ||
| queue. | ||
|
|
||
| ## The release-flow chain | ||
|
|
||
| When a release-target needs to be evaluated (a new version was created, a | ||
| policy changed, a job finished, a resource started matching), a | ||
| `desired-release` work item lands in the queue. From there: | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| autonumber | ||
| participant Q as reconcile_work_scope | ||
| participant DR as desiredrelease | ||
| participant JE as jobeligibility | ||
| participant JD as jobdispatch | ||
| participant JV as jobverificationmetric | ||
| participant Ext as Job agent | ||
|
|
||
| Note over Q: kind = desired-release<br/>scope = release-target | ||
| Q->>DR: lease | ||
| Note over DR: evaluate policies, pick the<br/>deployable version, resolve<br/>variables, persist release | ||
| DR-->>Q: enqueue kind=job-eligibility | ||
| Q->>JE: lease | ||
| Note over JE: can this release run now?<br/>(concurrency, retry rules) | ||
| JE-->>Q: enqueue kind=job-dispatch | ||
| Q->>JD: lease | ||
| Note over JD: create job, route to the<br/>right job agent | ||
| JD->>Ext: dispatch | ||
| Ext-->>Q: result via api, enqueue kind=job-verification-metric | ||
| Q->>JV: lease | ||
| Note over JV: poll metrics, on completion<br/>re-enqueue desired-release | ||
| JV-->>Q: enqueue kind=desired-release (loop) | ||
| ``` | ||
|
|
||
| Four controllers, one queue between them. **No controller calls another | ||
| directly** — handoff is always via insert-then-lease. That means each phase is | ||
| independently retriable, leasable, and observable, and the engine can run as | ||
| multiple instances safely. | ||
|
|
||
| ## How every controller works | ||
|
|
||
| Every controller is a `reconcile.Processor` registered for one `kind`. The | ||
| pattern is identical across all of them: lease an event, recompute the desired | ||
| state from current Postgres state, persist the result, enqueue follow-up. | ||
|
|
||
| ```mermaid | ||
| flowchart LR | ||
| DB[(reconcile_work_scope)] | ||
| C[Controller<br/>handles one kind] | ||
| DB -->|lease event by kind| C | ||
| C -->|persist results<br/>+ enqueue next kind| DB | ||
| ``` | ||
|
|
||
| Two things make this a reconciler rather than a job runner. First, | ||
| **controllers are stateless** — every invocation re-reads input from Postgres | ||
| rather than carrying state forward in memory. If the world changes between | ||
| events (a policy is disabled, an approval lands, a new version appears), the | ||
| next event picks up the change automatically. Second, **the loop closes back | ||
| to the start** — when a job finishes, `jobverificationmetric` enqueues another | ||
| `desired-release` event and `desiredrelease` recomputes from scratch. | ||
| Idempotent recomputation is the orchestration model. | ||
|
|
||
| ## Inside `desiredrelease` | ||
|
|
||
| `desiredrelease` is the only controller in the chain that does meaningful | ||
| internal work — the other three are mostly routing or checking. Here is what | ||
| happens on a single lease: | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| In([dequeued: desired-release work item]) | ||
| LP[load scope and policies] | ||
| Iter[iterate candidate versions<br/>newest-first] | ||
| Eval[evaluate policy rules<br/>inline via policyeval library] | ||
| Decide{any version passes?} | ||
| NoRel[persist 'no release'] | ||
| Resolve[resolve variables] | ||
| Persist[persist release record] | ||
| Out[enqueue job-eligibility] | ||
|
|
||
| In --> LP --> Iter --> Eval --> Decide | ||
| Decide -->|no| NoRel | ||
| Decide -->|yes| Resolve --> Persist --> Out | ||
| ``` | ||
|
|
||
| Two things worth knowing: | ||
|
|
||
| 1. **Policy evaluation is inline, not a separate controller.** A `policyeval` | ||
| directory exists at `svc/controllers/policyeval/` but that's a different | ||
| controller that writes per-version rule evaluations for the UI. The gating | ||
| logic that decides whether a version can deploy lives in the `policyeval` | ||
| *library subpackage* at `svc/controllers/desiredrelease/policyeval/` and is | ||
| called as a function from inside `desiredrelease`. | ||
| 2. **Versions are evaluated newest-first as a stream.** The controller doesn't | ||
| load all candidate versions then filter — it iterates them and stops at the | ||
| first one that passes all policy rules. That's what makes "skip blocked | ||
| versions but deploy the newest passing one" cheap. | ||
|
|
||
| ## Other release-flow controllers | ||
|
|
||
| **`jobeligibility`** — given a release record, decides whether a job can run | ||
| *right now*. Runs two evaluators: `releasetargetconcurrency` (under the | ||
| configured concurrency cap?) and `retry` (under the retry budget?). If both | ||
| pass, enqueue `job-dispatch`. If not, requeue with `notBefore`. | ||
|
|
||
| **`jobdispatch`** — given a job, picks the right job-agent adapter (GitHub | ||
| Actions, ArgoCD, Terraform Cloud, Argo Workflows, or the test runner) and | ||
| sends the job over HTTPS. The agent's `externalId` is recorded so results can | ||
| be correlated back later. | ||
|
|
||
| **`jobverificationmetric`** — given a finished job, polls verification | ||
| providers (Datadog, HTTP probes, Terraform Cloud run status, etc.) until they | ||
| return pass/fail. On completion, calls `EnqueueDesiredRelease` to close the | ||
| loop. | ||
|
|
||
| ## Controllers outside the release-flow chain | ||
|
|
||
| The `svc/controllers/` directory contains several other controllers that exist | ||
| for UI surface or precomputed state, not for moving a release through phases: | ||
|
|
||
| - `policyeval` (top-level) — computes per-version rule evaluations so the UI | ||
| can show "why isn't this version deploying yet." | ||
| - `deploymentplan` / `deploymentplanresult` — power plan previews and dry-run | ||
| views. | ||
| - `deploymentresourceselectoreval` / `environmentresourceselectoreval` — | ||
| precompute which resources currently match a deployment or environment | ||
| selector. | ||
| - `relationshipeval` — evaluates resource relationship rules into the resource | ||
| graph. | ||
| - `forcedeploy` — handles user-triggered manual deploys (a separate path from | ||
| the policy-gated chain). | ||
|
|
||
| If you're trying to understand "what happens when I push a version," you can | ||
| safely ignore these and focus on the four chain controllers. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix spelling at Line 8 (
enqueueing→enqueuing).This is a user-facing docs typo and should be corrected for consistency.
🧰 Tools
🪛 LanguageTool
[grammar] ~8-~8: Ensure spelling is correct
Context: ...controller. Each controller's output is enqueueing more work, so a single release moves th...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents