JacobPEvans · JacobPEvans · May 24, 2026 · May 22, 2026 · May 22, 2026 · May 22, 2026
diff --git a/.claude/skills/mintlify-docs-update/SKILL.md b/.claude/skills/mintlify-docs-update/SKILL.md
@@ -16,9 +16,9 @@ Discover public repos under `JacobPEvans` and `dryvist`, diff against pages on t
 
 ## When NOT to use
 
-- Authoring deep technical content for a single page. Use the future `mintlify-page-author` skill (issue tracked).
-- Visual polish across the site. Use the future `mintlify-visual-audit` skill (issue tracked).
-- Rewriting `docs.json` from scratch. Use the future `mintlify-nav-sync` skill (issue tracked).
+- Authoring deep technical content for a single page. Edit the page directly — Claude is more capable than a fixed templated skill here.
+- Visual polish across the site. Edit the offending pages directly.
+- Rewriting `docs.json` from scratch. Edit it directly.
 
 ## Workflow
 
@@ -33,7 +33,23 @@ gh repo list dryvist --limit 200 --json name,description,visibility,isArchived,i
 
 Filter to: `visibility == PUBLIC` AND `isFork == false`. Skip the meta `docs` and `JacobPEvans` profile repos.
 
-### Step 2 — Categorize each repo
+### Step 2 — Coverage blacklist
+
+These repos must NOT get a page on the public docs site. Filter the enumerated list against this blacklist before categorization:
+
+| Repo | Reason |
+| --- | --- |
+| `terraform-aws-bedrock` | Test/playground project; not part of the homelab story. |
+| `terraform-aws-static-website` | Being replaced by this docs site itself. |
+| `VisiCore_App_for_AI_Observability` | Work-related — kept out of personal docs. |
+| `VisiCore_TA_AI_Observability` | Work-related — kept out of personal docs. |
+| (any other repo under the `visicore` org) | Work-related — kept out of personal docs. |
+
+Hard rule: when in doubt, do not add the page. Ask the author first.
+
+If a blacklisted repo is found, log it under the `blacklisted` reason in the final summary report — do not attempt to scaffold it.
+
+### Step 3 — Categorize each repo
 
 Map repo name and topics to a sidebar group:
 
@@ -49,11 +65,11 @@ Map repo name and topics to a sidebar group:
 
 Ties → prefer the more specific match. When uncertain, ask before scaffolding.
 
-### Step 3 — Diff against existing pages
+### Step 4 — Diff against existing pages
 
 For each repo, the expected path is `<group-prefix><repo-name>.mdx`. If the file exists, skip. If it doesn't, queue for scaffolding.
 
-### Step 4 — Scaffold
+### Step 5 — Scaffold
 
 For each queued repo, copy `template-repo-page.mdx` and replace the marked placeholders:
 
@@ -70,11 +86,11 @@ Every token in `template-repo-page.mdx` must be replaced. The table below lists
 | `REPO_LAST_ACTIVE` | relative time from `pushedAt` (e.g., `"this week"`, `"3 days ago"`) |
 | `REPO_URL` | `url` field |
 
-**Derived from Step 2 categorization:**
+**Derived from Step 3 categorization:**
 
 | Placeholder | How filled |
 | --- | --- |
-| `SIDEBAR_GROUP_NAME` | the matched group name from Step 2 (e.g., `Infrastructure`, `Nix Ecosystem`, `AI Development`, `Observability`, `Tools`) |
+| `SIDEBAR_GROUP_NAME` | the matched group name from Step 3 (e.g., `Infrastructure`, `Nix Ecosystem`, `AI Development`, `Observability`, `Tools`) |
 
 **Author-filled (skill emits empty markers; author writes the prose):**
 
@@ -95,11 +111,11 @@ Every token in `template-repo-page.mdx` must be replaced. The table below lists
 
 Replacements happen via `Edit` tool with `replace_all: true`. Never use `sed` — this is exact-string replacement.
 
-### Step 5 — Update `docs.json`
+### Step 6 — Update `docs.json`
 
 For each new page, insert its path into the appropriate sidebar group's `pages` array, preserving alphabetical order. Use `Edit` on `docs.json`; never regenerate the file.
 
-### Step 6 — Validate
+### Step 7 — Validate
 
 Run, sequentially:
 
@@ -118,7 +134,7 @@ Hub-and-spoke layouts with `flowchart LR` will still stack their spokes
 vertically — replace the hub node with a horizontal subgraph border, or
 split into smaller diagrams.
 
-### Step 7 — Tiered word-count guard
+### Step 8 — Tiered word-count guard
 
 For every scaffolded page:
 
@@ -133,7 +149,7 @@ Over-budget pages get a `<!-- TIER-GUARD: over budget — consider splitting int
 - New MDX files under the right sidebar group
 - Updated `docs.json`
 - A summary report: `Added N pages: <list>`
-- A list of skipped repos with reasons (`already-documented`, `private`, `archived`, `fork`, `uncategorizable`)
+- A list of skipped repos with reasons (`already-documented`, `private`, `archived`, `fork`, `uncategorizable`, `blacklisted`)
 
 ## Flags (planned)
 
@@ -148,4 +164,3 @@ These flags are interpreted manually in the conversation; there is no CLI binary
 
 - See `tools/automation.mdx` for the user-facing description.
 - See `README.md` in this directory for human-readable usage.
-- See open issues with label `skill` for planned improvements.
diff --git a/about/homelab.mdx b/about/homelab.mdx
@@ -106,8 +106,8 @@ LXC by default. Native packages where possible. Docker is the exception — high
 
 ## Provisioning + configuration
 
-[terraform-proxmox](https://github.com/JacobPEvans/terraform-proxmox) builds the VMs and LXCs. [ansible-proxmox](https://github.com/JacobPEvans/ansible-proxmox) configures the host. [ansible-proxmox-apps](https://github.com/JacobPEvans/ansible-proxmox-apps) configures everything on top.
+[terraform-proxmox](https://github.com/JacobPEvans/terraform-proxmox) builds the VMs and LXCs. [ansible-proxmox](https://github.com/JacobPEvans/ansible-proxmox) configures the host. [ansible-proxmox-apps](https://github.com/JacobPEvans/ansible-proxmox-apps) configures everything on top. For the rationale on LXC defaults vs the Docker exception, see [LXC vs Docker](/infrastructure/lxc-vs-docker); for the macOS counterpart that runs the monitoring stack as Kubernetes, see [Kubernetes overview](/infrastructure/kubernetes-overview) and [`orbstack-kubernetes`](/infrastructure/orbstack-kubernetes).
 
 ## DR plan
 
-[terraform-aws](https://github.com/JacobPEvans/terraform-aws) defines a cold AWS footprint sized to take a Splunk failover. Cribl Edge routes can be flipped to the AWS HEC endpoint via config change; the AI-observability dashboards keep working because they target the same indexes.
+[terraform-aws](https://github.com/JacobPEvans/terraform-aws) defines a cold AWS footprint sized to take a Splunk failover. Cribl Edge routes can be flipped to the AWS HEC endpoint via config change; the AI-observability dashboards keep working because they target the same indexes. The full cross-stack map of every collector and where it runs lives at [Monitoring agents](/observability/monitoring-agents).
diff --git a/docs.json b/docs.json
@@ -68,7 +68,18 @@
               "infrastructure/overview",
               "infrastructure/terraform-proxmox",
               "infrastructure/ansible-proxmox",
-              "infrastructure/ansible-proxmox-apps"
+              "infrastructure/ansible-proxmox-apps",
+              "infrastructure/orbstack-kubernetes",
+              "infrastructure/kubernetes-overview",
+              "infrastructure/lxc-vs-docker",
+              "infrastructure/secrets-sops",
+              {
+                "group": "CI/CD",
+                "pages": [
+                  "infrastructure/cicd/overview",
+                  "infrastructure/cicd/terraform-runs-on"
+                ]
+              }
             ]
           },
           {
@@ -123,7 +134,9 @@
             "pages": [
               "observability/overview",
               "observability/ansible-splunk",
-              "observability/tf-splunk-aws"
+              "observability/tf-splunk-aws",
+              "observability/cc-edge-the-mac-pack",
+              "observability/monitoring-agents"
             ]
           },
           {

diff --git a/infrastructure/cicd/overview.mdx b/infrastructure/cicd/overview.mdx
@@ -0,0 +1,60 @@
+---
+title: "CI/CD"
+description: "Four runner tiers (GitHub-hosted, RunsOn AWS spot, self-hosted Mac, self-hosted locked-down), the PR-plan / OIDC-apply pattern, and the branch ruleset that gates every merge."
+tier: 1
+---
+
+> Every infra change goes through PR-plan, then OIDC-authenticated apply. The runner tier follows the workload — never the other way around.
+
+The CI/CD surface spans four runner tiers, with workflows picking a tier by the work they need to do, not by what's cheapest in the abstract. The patterns below — plan/apply gating, OIDC trust, branch rulesets — are shared across all four tiers.
+
+For how secrets reach a workflow regardless of tier, read [Security](/security/overview) — this page does not duplicate that material.
+
+## Runner tiers
+
+Pick by what the workload actually needs:
+
+| Tier | Where | When to use |
+| --- | --- | --- |
+| **GitHub-hosted** | GitHub Actions cloud (free for public repos) | Public repos. No AWS work, no internal-host access, nothing that needs a private runner. Cheapest path. |
+| **RunsOn AWS spot** | EC2 spot via [`terraform-runs-on`](/infrastructure/cicd/terraform-runs-on) | Private repos. Much cheaper than GitHub-hosted private-repo minutes; same OIDC trust into AWS. Default for IaC apply jobs that authenticate to AWS. |
+| **Self-hosted Mac** | A Mac in the homelab running the Actions runner agent | Any macOS-only requirement: signing, codesigning, `xcrun`, `pmset`/`powermetrics` validation, macOS-native binary builds. There is no cloud equivalent. |
+| **Self-hosted locked-down** | A dedicated runner host in the homelab (separate from the Mac) | Pre-built environments, jobs that need tighter control over what's on the runner, jobs that handle highly-sensitive credentials that must never leave the homelab boundary, or anything that needs a network-locked execution environment. |
+
+The decision tree is workload-first: a macOS build picks the Mac tier; an IaC apply picks RunsOn; a public-repo lint picks GitHub-hosted; a sensitive-credential job picks the locked-down self-hosted runner. The cost ordering is "free → very cheap → host-cost → host-cost", but the cost is rarely what drives the choice.
+
+## The shape of every IaC pipeline
+
+| Stage | Trigger | Where it runs | What it does |
+| --- | --- | --- | --- |
+| PR plan | `pull_request` | The tier the repo declares (typically GitHub-hosted or RunsOn) | `terragrunt plan -no-color`, posted via `tf-summarize` as a redacted structural summary — addresses + change actions only, never resolved values |
+| Manual review | human reviewer | n/a | Reads the plan summary, checks impact, approves or asks for revisions |
+| Apply | `push` to `main` after merge | The repo's apply-tier runner, OIDC into the target account | `terragrunt apply -auto-approve` gated by the `production` GitHub Environment approval |
+
+The redacted-plan rule is non-negotiable: PR plan output reveals only resource addresses and change actions. Resolved attribute values — anything an attacker reading a PR could weaponize — never appear in PR comments. See each repo's `docs/ci-plan-output-policy.md` for the rationale.
+
+## Branch protection and merge rules
+
+The `main` branch on every IaC repo is protected by a ruleset, not a legacy branch-protection rule:
+
+- Required signatures (GPG)
+- Required linear history (no merge commits)
+- Required review-thread resolution before merge
+- Squash or rebase merge methods only (no merge-commit option)
+- Copilot Code Review auto-requested on every PR (review-on-open, not review-on-push)
+
+There is intentionally **no required approving review count** on solo-maintained personal repos — the gates that matter are the ruleset checks and the OIDC scope of the apply role. Multi-maintainer org repos under `dryvist` set the count in their own rulesets.
+
+## Where to go next
+
+<CardGroup cols={2}>
+  <Card title="terraform-runs-on" icon="play" href="/infrastructure/cicd/terraform-runs-on">
+    The RunsOn tier — the runner pool itself, OIDC trust, migration guide.
+  </Card>
+  <Card title="Security overview" icon="lock" href="/security/overview">
+    How secrets reach a workflow, across all four runner tiers.
+  </Card>
+  <Card title="Infrastructure overview" icon="server" href="/infrastructure/overview">
+    Where CI/CD fits in the broader Proxmox + AWS picture.
+  </Card>
+</CardGroup>
diff --git a/infrastructure/cicd/terraform-runs-on.mdx b/infrastructure/cicd/terraform-runs-on.mdx
@@ -0,0 +1,78 @@
+---
+title: "Self-hosted GitHub Actions runners"
+description: "Terraform/Terragrunt for RunsOn — self-hosted GitHub Actions runners on AWS EC2 spot. Cheaper, faster, observable."
+tier: 2
+---
+
+import { RepoMeta, RepoFit } from "/snippets/repo-summary.mdx";
+
+> GitHub Actions runners on AWS spot, on demand. ~10× cheaper than GitHub-hosted compute and twice as fast on warm cache.
+
+<RepoMeta language="HCL" status="active" lastActive="this week" repoUrl="https://github.com/JacobPEvans/terraform-runs-on" />
+
+`terraform-runs-on` provisions a [RunsOn](https://runs-on.com) v3 control plane on AWS — API Gateway + Lambda + ECS/Fargate — plus the IAM and networking it needs to spin up EC2 spot runners on demand. Workflows opt in with a `runs-on:` label; runners launch in seconds, run the job, terminate. Cribl.Cloud Free collects OTLP telemetry for runner performance tracking.
+
+## What it does
+
+- Deploys the RunsOn control plane (ECS/Fargate + Lambda + API Gateway) on AWS
+- Spins up EC2 spot runners on demand across 3 availability zones in `us-east-2`
+- Falls back to on-demand instances automatically if spot capacity goes thin (spot circuit breaker)
+- Tags every runner with workflow/job/repo for AWS cost allocation
+- Optional managed WAF (`enable_waf = true`, on by default) protects the public ingress
+- Optional Bedrock IAM grant (`enable_bedrock = true`) lets CI invoke Bedrock models directly
+- Forwards OTLP runner telemetry to Cribl.Cloud Free (zero-cost observability tier)
+
+Cost guardrails (Budgets thresholds, alarm targets, expected spend envelope) live in the repo's own README — they're tuned per-deployment and don't belong in cross-repo docs.
+
+## How it fits
+
+| Trigger | Runtime |
+| --- | --- |
+| `runs-on=...` label in any workflow `runs-on:` clause | A fresh EC2 spot instance per job, terminating on completion |
+
+<RepoFit>
+The compute layer for CI. Replaces GitHub-hosted `ubuntu-latest` runners across the org for any workflow that benefits from cheaper, faster, or larger compute.
+</RepoFit>
+
+## Post-setup hardening
+
+After the first apply finishes and the GitHub App is registered through the ingress URL, flip `enable_admin_routes = false` and re-apply. That closes the public `/admin` and `/setup` routes; the runner + webhook paths keep working.
+
+## Getting started
+
+<Steps>
+  <Step title="Clone and let direnv activate the dev shell">
+    `git clone --bare https://github.com/JacobPEvans/terraform-runs-on.git terraform-runs-on/.git && cd terraform-runs-on && git worktree add main main && cd main && direnv allow`
+  </Step>
+  <Step title="Supply credentials via aws-vault + Doppler">
+    Profile is `tf-runs-on`; Doppler config is inherited from `iac-conf-mgmt/prd`. `RUNSON_LICENSE_KEY` is mapped into `license_key` via `terragrunt.hcl`.
+  </Step>
+  <Step title="Bootstrap">
+    `aws-vault exec tf-runs-on -- doppler run -- terragrunt init && terragrunt apply`. The bootstrap creates its own S3 state + DynamoDB lock table on first run.
+  </Step>
+  <Step title="Use a runner">
+    In any workflow: `runs-on: "runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/family=c7+m7"`. The `github.run_id` segment is what RunsOn correlates back to the originating workflow.
+  </Step>
+</Steps>
+
+## Migrating existing repos
+
+The repo ships `docs/migration-guide.md` — the canonical per-repo playbook: which workflows benefit, which don't, the runner-label catalog used across the org, rollout order, and how to verify a migrated workflow actually landed on a RunsOn runner instead of a GitHub-hosted one.
+
+## CI/CD safety
+
+PR plans are posted via [`tf-summarize`](https://github.com/dineshba/tf-summarize) as a redacted structural summary — resource addresses + change actions only. Resolved attribute values never appear in PR comments. Merge to `main` triggers an OIDC-authenticated `terragrunt apply` (gated by the `production` GitHub Environment approval). See `docs/ci-plan-output-policy.md` for the full rationale.
+
+## Related repos
+
+<CardGroup cols={2}>
+  <Card title="Infrastructure overview" icon="server" href="/infrastructure/overview">
+    Where RunsOn fits in the broader AWS surface.
+  </Card>
+  <Card title="terraform-aws" icon="aws" href="https://github.com/JacobPEvans/terraform-aws">
+    The DR-tier AWS footprint these runners can deploy to.
+  </Card>
+  <Card title="Source on GitHub" icon="github" href="https://github.com/JacobPEvans/terraform-runs-on">
+    Full module, migration guide, CI plan-output policy.
+  </Card>
+</CardGroup>