Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 28 additions & 13 deletions .claude/skills/mintlify-docs-update/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Discover public repos under `JacobPEvans` and `dryvist`, diff against pages on t

## When NOT to use

- Authoring deep technical content for a single page. Use the future `mintlify-page-author` skill (issue tracked).
- Visual polish across the site. Use the future `mintlify-visual-audit` skill (issue tracked).
- Rewriting `docs.json` from scratch. Use the future `mintlify-nav-sync` skill (issue tracked).
- Authoring deep technical content for a single page. Edit the page directly — Claude is more capable than a fixed templated skill here.
- Visual polish across the site. Edit the offending pages directly.
- Rewriting `docs.json` from scratch. Edit it directly.

## Workflow

Expand All @@ -33,7 +33,23 @@ gh repo list dryvist --limit 200 --json name,description,visibility,isArchived,i

Filter to: `visibility == PUBLIC` AND `isFork == false`. Skip the meta `docs` and `JacobPEvans` profile repos.

### Step 2 — Categorize each repo
### Step 2 — Coverage blacklist

These repos must NOT get a page on the public docs site. Filter the enumerated list against this blacklist before categorization:

| Repo | Reason |
| --- | --- |
| `terraform-aws-bedrock` | Test/playground project; not part of the homelab story. |
| `terraform-aws-static-website` | Being replaced by this docs site itself. |
| `VisiCore_App_for_AI_Observability` | Work-related — kept out of personal docs. |
| `VisiCore_TA_AI_Observability` | Work-related — kept out of personal docs. |
| (any other repo under the `visicore` org) | Work-related — kept out of personal docs. |

Hard rule: when in doubt, do not add the page. Ask the author first.

If a blacklisted repo is found, log it under the `blacklisted` reason in the final summary report — do not attempt to scaffold it.

### Step 3 — Categorize each repo

Map repo name and topics to a sidebar group:

Expand All @@ -49,11 +65,11 @@ Map repo name and topics to a sidebar group:

Ties → prefer the more specific match. When uncertain, ask before scaffolding.

### Step 3 — Diff against existing pages
### Step 4 — Diff against existing pages

For each repo, the expected path is `<group-prefix><repo-name>.mdx`. If the file exists, skip. If it doesn't, queue for scaffolding.

### Step 4 — Scaffold
### Step 5 — Scaffold

For each queued repo, copy `template-repo-page.mdx` and replace the marked placeholders:

Expand All @@ -70,11 +86,11 @@ Every token in `template-repo-page.mdx` must be replaced. The table below lists
| `REPO_LAST_ACTIVE` | relative time from `pushedAt` (e.g., `"this week"`, `"3 days ago"`) |
| `REPO_URL` | `url` field |

**Derived from Step 2 categorization:**
**Derived from Step 3 categorization:**

| Placeholder | How filled |
| --- | --- |
| `SIDEBAR_GROUP_NAME` | the matched group name from Step 2 (e.g., `Infrastructure`, `Nix Ecosystem`, `AI Development`, `Observability`, `Tools`) |
| `SIDEBAR_GROUP_NAME` | the matched group name from Step 3 (e.g., `Infrastructure`, `Nix Ecosystem`, `AI Development`, `Observability`, `Tools`) |

**Author-filled (skill emits empty markers; author writes the prose):**

Expand All @@ -95,11 +111,11 @@ Every token in `template-repo-page.mdx` must be replaced. The table below lists

Replacements happen via `Edit` tool with `replace_all: true`. Never use `sed` — this is exact-string replacement.

### Step 5 — Update `docs.json`
### Step 6 — Update `docs.json`

For each new page, insert its path into the appropriate sidebar group's `pages` array, preserving alphabetical order. Use `Edit` on `docs.json`; never regenerate the file.

### Step 6 — Validate
### Step 7 — Validate

Run, sequentially:

Expand All @@ -118,7 +134,7 @@ Hub-and-spoke layouts with `flowchart LR` will still stack their spokes
vertically — replace the hub node with a horizontal subgraph border, or
split into smaller diagrams.

### Step 7 — Tiered word-count guard
### Step 8 — Tiered word-count guard

For every scaffolded page:

Expand All @@ -133,7 +149,7 @@ Over-budget pages get a `<!-- TIER-GUARD: over budget — consider splitting int
- New MDX files under the right sidebar group
- Updated `docs.json`
- A summary report: `Added N pages: <list>`
- A list of skipped repos with reasons (`already-documented`, `private`, `archived`, `fork`, `uncategorizable`)
- A list of skipped repos with reasons (`already-documented`, `private`, `archived`, `fork`, `uncategorizable`, `blacklisted`)

## Flags (planned)

Expand All @@ -148,4 +164,3 @@ These flags are interpreted manually in the conversation; there is no CLI binary

- See `tools/automation.mdx` for the user-facing description.
- See `README.md` in this directory for human-readable usage.
- See open issues with label `skill` for planned improvements.
4 changes: 2 additions & 2 deletions about/homelab.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ LXC by default. Native packages where possible. Docker is the exception — high

## Provisioning + configuration

[terraform-proxmox](https://github.com/JacobPEvans/terraform-proxmox) builds the VMs and LXCs. [ansible-proxmox](https://github.com/JacobPEvans/ansible-proxmox) configures the host. [ansible-proxmox-apps](https://github.com/JacobPEvans/ansible-proxmox-apps) configures everything on top.
[terraform-proxmox](https://github.com/JacobPEvans/terraform-proxmox) builds the VMs and LXCs. [ansible-proxmox](https://github.com/JacobPEvans/ansible-proxmox) configures the host. [ansible-proxmox-apps](https://github.com/JacobPEvans/ansible-proxmox-apps) configures everything on top. For the rationale on LXC defaults vs the Docker exception, see [LXC vs Docker](/infrastructure/lxc-vs-docker); for the macOS counterpart that runs the monitoring stack as Kubernetes, see [Kubernetes overview](/infrastructure/kubernetes-overview) and [`orbstack-kubernetes`](/infrastructure/orbstack-kubernetes).

## DR plan

[terraform-aws](https://github.com/JacobPEvans/terraform-aws) defines a cold AWS footprint sized to take a Splunk failover. Cribl Edge routes can be flipped to the AWS HEC endpoint via config change; the AI-observability dashboards keep working because they target the same indexes.
[terraform-aws](https://github.com/JacobPEvans/terraform-aws) defines a cold AWS footprint sized to take a Splunk failover. Cribl Edge routes can be flipped to the AWS HEC endpoint via config change; the AI-observability dashboards keep working because they target the same indexes. The full cross-stack map of every collector and where it runs lives at [Monitoring agents](/observability/monitoring-agents).
17 changes: 15 additions & 2 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,18 @@
"infrastructure/overview",
"infrastructure/terraform-proxmox",
"infrastructure/ansible-proxmox",
"infrastructure/ansible-proxmox-apps"
"infrastructure/ansible-proxmox-apps",
"infrastructure/orbstack-kubernetes",
"infrastructure/kubernetes-overview",
"infrastructure/lxc-vs-docker",
"infrastructure/secrets-sops",
{
"group": "CI/CD",
"pages": [
"infrastructure/cicd/overview",
"infrastructure/cicd/terraform-runs-on"
]
}
]
},
{
Expand Down Expand Up @@ -123,7 +134,9 @@
"pages": [
"observability/overview",
"observability/ansible-splunk",
"observability/tf-splunk-aws"
"observability/tf-splunk-aws",
"observability/cc-edge-the-mac-pack",
"observability/monitoring-agents"
]
},
{
Expand Down
60 changes: 60 additions & 0 deletions infrastructure/cicd/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: "CI/CD"
description: "Four runner tiers (GitHub-hosted, RunsOn AWS spot, self-hosted Mac, self-hosted locked-down), the PR-plan / OIDC-apply pattern, and the branch ruleset that gates every merge."
tier: 1
---

> Every infra change goes through PR-plan, then OIDC-authenticated apply. The runner tier follows the workload — never the other way around.

The CI/CD surface spans four runner tiers, with workflows picking a tier by the work they need to do, not by what's cheapest in the abstract. The patterns below — plan/apply gating, OIDC trust, branch rulesets — are shared across all four tiers.

For how secrets reach a workflow regardless of tier, read [Security](/security/overview) — this page does not duplicate that material.

## Runner tiers

Pick by what the workload actually needs:

| Tier | Where | When to use |
| --- | --- | --- |
| **GitHub-hosted** | GitHub Actions cloud (free for public repos) | Public repos. No AWS work, no internal-host access, nothing that needs a private runner. Cheapest path. |
| **RunsOn AWS spot** | EC2 spot via [`terraform-runs-on`](/infrastructure/cicd/terraform-runs-on) | Private repos. Much cheaper than GitHub-hosted private-repo minutes; same OIDC trust into AWS. Default for IaC apply jobs that authenticate to AWS. |
| **Self-hosted Mac** | A Mac in the homelab running the Actions runner agent | Any macOS-only requirement: signing, codesigning, `xcrun`, `pmset`/`powermetrics` validation, macOS-native binary builds. There is no cloud equivalent. |
| **Self-hosted locked-down** | A dedicated runner host in the homelab (separate from the Mac) | Pre-built environments, jobs that need tighter control over what's on the runner, jobs that handle highly-sensitive credentials that must never leave the homelab boundary, or anything that needs a network-locked execution environment. |

The decision tree is workload-first: a macOS build picks the Mac tier; an IaC apply picks RunsOn; a public-repo lint picks GitHub-hosted; a sensitive-credential job picks the locked-down self-hosted runner. The cost ordering is "free → very cheap → host-cost → host-cost", but the cost is rarely what drives the choice.

## The shape of every IaC pipeline

| Stage | Trigger | Where it runs | What it does |
| --- | --- | --- | --- |
| PR plan | `pull_request` | The tier the repo declares (typically GitHub-hosted or RunsOn) | `terragrunt plan -no-color`, posted via `tf-summarize` as a redacted structural summary — addresses + change actions only, never resolved values |
| Manual review | human reviewer | n/a | Reads the plan summary, checks impact, approves or asks for revisions |
| Apply | `push` to `main` after merge | The repo's apply-tier runner, OIDC into the target account | `terragrunt apply -auto-approve` gated by the `production` GitHub Environment approval |

The redacted-plan rule is non-negotiable: PR plan output reveals only resource addresses and change actions. Resolved attribute values — anything an attacker reading a PR could weaponize — never appear in PR comments. See each repo's `docs/ci-plan-output-policy.md` for the rationale.

## Branch protection and merge rules

The `main` branch on every IaC repo is protected by a ruleset, not a legacy branch-protection rule:

- Required signatures (GPG)
- Required linear history (no merge commits)
- Required review-thread resolution before merge
- Squash or rebase merge methods only (no merge-commit option)
- Copilot Code Review auto-requested on every PR (review-on-open, not review-on-push)

There is intentionally **no required approving review count** on solo-maintained personal repos — the gates that matter are the ruleset checks and the OIDC scope of the apply role. Multi-maintainer org repos under `dryvist` set the count in their own rulesets.

## Where to go next

<CardGroup cols={2}>
<Card title="terraform-runs-on" icon="play" href="/infrastructure/cicd/terraform-runs-on">
The RunsOn tier — the runner pool itself, OIDC trust, migration guide.
</Card>
<Card title="Security overview" icon="lock" href="/security/overview">
How secrets reach a workflow, across all four runner tiers.
</Card>
<Card title="Infrastructure overview" icon="server" href="/infrastructure/overview">
Where CI/CD fits in the broader Proxmox + AWS picture.
</Card>
</CardGroup>
78 changes: 78 additions & 0 deletions infrastructure/cicd/terraform-runs-on.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: "Self-hosted GitHub Actions runners"
description: "Terraform/Terragrunt for RunsOn — self-hosted GitHub Actions runners on AWS EC2 spot. Cheaper, faster, observable."
tier: 2
---

import { RepoMeta, RepoFit } from "/snippets/repo-summary.mdx";

> GitHub Actions runners on AWS spot, on demand. ~10× cheaper than GitHub-hosted compute and twice as fast on warm cache.

<RepoMeta language="HCL" status="active" lastActive="this week" repoUrl="https://github.com/JacobPEvans/terraform-runs-on" />

`terraform-runs-on` provisions a [RunsOn](https://runs-on.com) v3 control plane on AWS — API Gateway + Lambda + ECS/Fargate — plus the IAM and networking it needs to spin up EC2 spot runners on demand. Workflows opt in with a `runs-on:` label; runners launch in seconds, run the job, terminate. Cribl.Cloud Free collects OTLP telemetry for runner performance tracking.

## What it does

- Deploys the RunsOn control plane (ECS/Fargate + Lambda + API Gateway) on AWS
- Spins up EC2 spot runners on demand across 3 availability zones in `us-east-2`
- Falls back to on-demand instances automatically if spot capacity goes thin (spot circuit breaker)
- Tags every runner with workflow/job/repo for AWS cost allocation
- Optional managed WAF (`enable_waf = true`, on by default) protects the public ingress
- Optional Bedrock IAM grant (`enable_bedrock = true`) lets CI invoke Bedrock models directly
- Forwards OTLP runner telemetry to Cribl.Cloud Free (zero-cost observability tier)

Cost guardrails (Budgets thresholds, alarm targets, expected spend envelope) live in the repo's own README — they're tuned per-deployment and don't belong in cross-repo docs.

## How it fits

| Trigger | Runtime |
| --- | --- |
| `runs-on=...` label in any workflow `runs-on:` clause | A fresh EC2 spot instance per job, terminating on completion |

<RepoFit>
The compute layer for CI. Replaces GitHub-hosted `ubuntu-latest` runners across the org for any workflow that benefits from cheaper, faster, or larger compute.
</RepoFit>

## Post-setup hardening

After the first apply finishes and the GitHub App is registered through the ingress URL, flip `enable_admin_routes = false` and re-apply. That closes the public `/admin` and `/setup` routes; the runner + webhook paths keep working.

## Getting started

<Steps>
<Step title="Clone and let direnv activate the dev shell">
`git clone --bare https://github.com/JacobPEvans/terraform-runs-on.git terraform-runs-on/.git && cd terraform-runs-on && git worktree add main main && cd main && direnv allow`
</Step>
<Step title="Supply credentials via aws-vault + Doppler">
Profile is `tf-runs-on`; Doppler config is inherited from `iac-conf-mgmt/prd`. `RUNSON_LICENSE_KEY` is mapped into `license_key` via `terragrunt.hcl`.
</Step>
<Step title="Bootstrap">
`aws-vault exec tf-runs-on -- doppler run -- terragrunt init && terragrunt apply`. The bootstrap creates its own S3 state + DynamoDB lock table on first run.
</Step>
<Step title="Use a runner">
In any workflow: `runs-on: "runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/family=c7+m7"`. The `github.run_id` segment is what RunsOn correlates back to the originating workflow.
</Step>
</Steps>

## Migrating existing repos

The repo ships `docs/migration-guide.md` — the canonical per-repo playbook: which workflows benefit, which don't, the runner-label catalog used across the org, rollout order, and how to verify a migrated workflow actually landed on a RunsOn runner instead of a GitHub-hosted one.

## CI/CD safety

PR plans are posted via [`tf-summarize`](https://github.com/dineshba/tf-summarize) as a redacted structural summary — resource addresses + change actions only. Resolved attribute values never appear in PR comments. Merge to `main` triggers an OIDC-authenticated `terragrunt apply` (gated by the `production` GitHub Environment approval). See `docs/ci-plan-output-policy.md` for the full rationale.

## Related repos

<CardGroup cols={2}>
<Card title="Infrastructure overview" icon="server" href="/infrastructure/overview">
Where RunsOn fits in the broader AWS surface.
</Card>
<Card title="terraform-aws" icon="aws" href="https://github.com/JacobPEvans/terraform-aws">
The DR-tier AWS footprint these runners can deploy to.
</Card>
<Card title="Source on GitHub" icon="github" href="https://github.com/JacobPEvans/terraform-runs-on">
Full module, migration guide, CI plan-output policy.
</Card>
</CardGroup>
Loading
Loading