From c0ff0006337c3eca3747c43b893e9cf5179084e2 Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Sun, 19 Apr 2026 18:09:48 +0200 Subject: [PATCH 1/2] refactor(skills)!: reinstall from upstreams, drop skills-lock.json MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Re-install each agent skill from its original upstream repo via `gh skill install`, so every installed SKILL.md now carries `metadata.github-*` frontmatter pointing at the true origin instead of `devantler-tech/skills`. `gh skill update --all` reads that metadata directly, making the sidecar `skills-lock.json` redundant. - Delete `skills-lock.json`. - Reinstall 11 of 12 skills from upstream via `gh skill install --agent github-copilot --scope project --dir .agents/skills`. - `siderolabs/docs` is not re-installed: the upstream ships `public/skill.md` (lowercase) which `gh skill install` rejects (it only recognises `SKILL.md`). Tracking upstream for a fix — until then, manage that skill out-of-band or drop it. - Bump `update-skills.yaml` to pin the refreshed reusable workflow and swap `skills-lock` for the new `dir` input. BREAKING CHANGE: `skills-lock.json` is removed. The `siderolabs` skill is temporarily unavailable pending an upstream rename to `SKILL.md`. Refs: devantler-tech/skills#16, devantler-tech/actions#95, devantler-tech/reusable-workflows#207 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../SKILL.md | 7 +- .agents/skills/find-skills/SKILL.md | 40 +- .agents/skills/gh-cli/SKILL.md | 7 +- .agents/skills/gh-stack/SKILL.md | 7 +- .agents/skills/git-commit/SKILL.md | 7 +- .agents/skills/github-actions-docs/SKILL.md | 21 +- .agents/skills/github-issues/SKILL.md | 7 +- .agents/skills/gitops-cluster-debug/SKILL.md | 7 +- .agents/skills/gitops-knowledge/SKILL.md | 108 +---- .../skills/gitops-knowledge/evals/evals.json | 20 + .../references/best-practices.md | 5 +- .../references/flux-operator.md | 24 +- .../references/gitless-image-automation.md | 242 ++++++++++ .../references/helmrelease.md | 6 - .../references/kustomization.md | 10 +- .../references/repo-patterns.md | 6 +- .../references/resourcesets.md | 127 ++--- .../gitops-knowledge/references/sources.md | 19 +- .../references/terraform-bootstrap.md | 457 ++++++++++++++++++ .agents/skills/gitops-repo-audit/SKILL.md | 7 +- .agents/skills/refactor/SKILL.md | 7 +- .agents/skills/siderolabs/SKILL.md | 321 ------------ .github/workflows/update-skills.yaml | 6 +- skills-lock.json | 77 --- 24 files changed, 906 insertions(+), 639 deletions(-) create mode 100644 .agents/skills/gitops-knowledge/references/gitless-image-automation.md create mode 100644 .agents/skills/gitops-knowledge/references/terraform-bootstrap.md delete mode 100644 .agents/skills/siderolabs/SKILL.md delete mode 100644 skills-lock.json diff --git a/.agents/skills/copilot-instructions-blueprint-generator/SKILL.md b/.agents/skills/copilot-instructions-blueprint-generator/SKILL.md index fededfb14..b25ed8214 100644 --- a/.agents/skills/copilot-instructions-blueprint-generator/SKILL.md +++ b/.agents/skills/copilot-instructions-blueprint-generator/SKILL.md @@ -1,10 +1,9 @@ --- description: Technology-agnostic blueprint generator for creating comprehensive copilot-instructions.md files that guide GitHub Copilot to produce code consistent with project standards, architecture patterns, and exact technology versions by analyzing existing codebase patterns and avoiding assumptions. metadata: - github-path: copilot-instructions-blueprint-generator - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/copilot-instructions-blueprint-generator + github-ref: refs/heads/main + github-repo: https://github.com/github/awesome-copilot github-tree-sha: a67fccf32fc28eda70e5868e65e61e2fce3e64ef name: copilot-instructions-blueprint-generator --- diff --git a/.agents/skills/find-skills/SKILL.md b/.agents/skills/find-skills/SKILL.md index 7b20250a1..56f15f7db 100644 --- a/.agents/skills/find-skills/SKILL.md +++ b/.agents/skills/find-skills/SKILL.md @@ -1,11 +1,10 @@ --- description: Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill. metadata: - github-path: find-skills - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills - github-tree-sha: 9d01b3f7405bfefe19fa4e343c1841303f91889d + github-path: skills/find-skills + github-ref: refs/tags/v1.5.1 + github-repo: https://github.com/vercel-labs/skills + github-tree-sha: 3013fdeb8a11b10b1eb795ec3ae8bfca38f7c26d name: find-skills --- # Find Skills @@ -23,15 +22,16 @@ Use this skill when the user: - Wants to search for tools, templates, or workflows - Mentions they wish they had help with a specific domain (design, testing, deployment, etc.) -## What is `gh skill`? +## What is the Skills CLI? -`gh skill` (a GitHub CLI extension) is the package manager for the open agent skills ecosystem. Skills are modular packages that extend agent capabilities with specialized knowledge, workflows, and tools. +The Skills CLI (`npx skills`) is the package manager for the open agent skills ecosystem. Skills are modular packages that extend agent capabilities with specialized knowledge, workflows, and tools. **Key commands:** -- `gh skill search [query]` - Search for skills by keyword -- `gh skill install --agent github-copilot --scope user` - Install a skill from GitHub for Copilot -- `gh skill update` - Check for and apply skill updates +- `npx skills find [query]` - Search for skills interactively or by keyword +- `npx skills add ` - Install a skill from GitHub or other sources +- `npx skills check` - Check for skill updates +- `npx skills update` - Update all installed skills **Browse skills at:** https://skills.sh/ @@ -58,14 +58,14 @@ For example, top skills for web development include: If the leaderboard doesn't cover the user's need, run the find command: ```bash -gh skill search [query] +npx skills find [query] ``` For example: -- User asks "how do I make my React app faster?" → `gh skill search react performance` -- User asks "can you help me with PR reviews?" → `gh skill search pr review` -- User asks "I need to create a changelog" → `gh skill search changelog` +- User asks "how do I make my React app faster?" → `npx skills find react performance` +- User asks "can you help me with PR reviews?" → `npx skills find pr review` +- User asks "I need to create a changelog" → `npx skills find changelog` ### Step 4: Verify Quality Before Recommending @@ -92,7 +92,7 @@ React and Next.js performance optimization guidelines from Vercel Engineering. (185K installs) To install it: -gh skill install vercel-labs/agent-skills react-best-practices --agent github-copilot --scope user +npx skills add vercel-labs/agent-skills@react-best-practices Learn more: https://skills.sh/vercel-labs/agent-skills/react-best-practices ``` @@ -102,10 +102,10 @@ Learn more: https://skills.sh/vercel-labs/agent-skills/react-best-practices If the user wants to proceed, you can install the skill for them: ```bash -gh skill install --agent github-copilot --scope user +npx skills add -g -y ``` -The `--scope user` flag installs for the user profile. +The `-g` flag installs globally (user-level) and `-y` skips confirmation prompts. ## Common Skill Categories @@ -133,7 +133,7 @@ If no relevant skills exist: 1. Acknowledge that no existing skill was found 2. Offer to help with the task directly using your general capabilities -3. Suggest the user could create their own skill following the Agent Skills specification +3. Suggest the user could create their own skill with `npx skills init` Example: @@ -141,6 +141,6 @@ Example: I searched for skills related to "xyz" but didn't find any matches. I can still help you with this task directly! Would you like me to proceed? -If this is something you do often, you could create your own skill by following: -https://agentskills.io/specification +If this is something you do often, you could create your own skill: +npx skills init my-xyz-skill ``` diff --git a/.agents/skills/gh-cli/SKILL.md b/.agents/skills/gh-cli/SKILL.md index a5e1a63c7..699d8b4c8 100644 --- a/.agents/skills/gh-cli/SKILL.md +++ b/.agents/skills/gh-cli/SKILL.md @@ -1,10 +1,9 @@ --- description: GitHub CLI (gh) comprehensive reference for repositories, issues, pull requests, Actions, projects, releases, gists, codespaces, organizations, extensions, and all GitHub operations from the command line. metadata: - github-path: gh-cli - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/gh-cli + github-ref: refs/heads/main + github-repo: https://github.com/github/awesome-copilot github-tree-sha: 437437f7f20bcdbbdb3081cf164435a388c0a39a name: gh-cli --- diff --git a/.agents/skills/gh-stack/SKILL.md b/.agents/skills/gh-stack/SKILL.md index 7156fb442..aa8f28d5f 100644 --- a/.agents/skills/gh-stack/SKILL.md +++ b/.agents/skills/gh-stack/SKILL.md @@ -3,10 +3,9 @@ description: | Manage stacked branches and pull requests with the gh-stack GitHub CLI extension. Use when the user wants to create, push, rebase, sync, navigate, or view stacks of dependent PRs. Triggers on tasks involving stacked diffs, dependent pull requests, branch chains, or incremental code review workflows. metadata: author: github - github-path: gh-stack - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/gh-stack + github-ref: refs/tags/v0.0.1 + github-repo: https://github.com/github/gh-stack github-tree-sha: f900e01d90dd042162d0632dc31ee9541dd07841 version: 0.0.1 name: gh-stack diff --git a/.agents/skills/git-commit/SKILL.md b/.agents/skills/git-commit/SKILL.md index 539b59620..08a688710 100644 --- a/.agents/skills/git-commit/SKILL.md +++ b/.agents/skills/git-commit/SKILL.md @@ -3,10 +3,9 @@ allowed-tools: Bash description: 'Execute git commit with conventional commit message analysis, intelligent staging, and message generation. Use when user asks to commit changes, create a git commit, or mentions "/commit". Supports: (1) Auto-detecting type and scope from changes, (2) Generating conventional commit messages from diff, (3) Interactive commit with optional type/scope/description overrides, (4) Intelligent file staging for logical grouping' license: MIT metadata: - github-path: git-commit - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/git-commit + github-ref: refs/heads/main + github-repo: https://github.com/github/awesome-copilot github-tree-sha: 883a6a7466f55a9cd9f22cf1cce2d9333fc9b998 name: git-commit --- diff --git a/.agents/skills/github-actions-docs/SKILL.md b/.agents/skills/github-actions-docs/SKILL.md index fd8a6e436..73a18b878 100644 --- a/.agents/skills/github-actions-docs/SKILL.md +++ b/.agents/skills/github-actions-docs/SKILL.md @@ -1,11 +1,10 @@ --- description: Use when users ask how to write, explain, customize, migrate, secure, or troubleshoot GitHub Actions workflows, workflow syntax, triggers, matrices, runners, reusable workflows, artifacts, caching, secrets, OIDC, deployments, custom actions, or Actions Runner Controller, especially when they need official GitHub documentation, exact links, or docs-grounded YAML guidance. metadata: - github-path: github-actions-docs - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills - github-tree-sha: 3b0e64e871f9e83cb394e3ffff8cb9f6bda99deb + github-path: skills/github-actions-docs + github-ref: refs/heads/main + github-repo: https://github.com/xixu-me/skills + github-tree-sha: 518c85da555f91c346813aa5bb3cc62f9db8bfb3 name: github-actions-docs --- GitHub Actions questions are easy to answer from stale memory. Use this skill to ground answers in official GitHub documentation and return the closest authoritative page instead of generic CI/CD advice. @@ -25,10 +24,10 @@ Use this skill when the request is about: Do not use this skill for: -- A specific failing PR check, missing workflow log, or CI failure triage. Use the `gh-cli` skill. -- General GitHub pull request, branch, or repository operations. Use the `gh-cli` skill. -- CodeQL-specific configuration or code scanning guidance. Search the GitHub docs directly. -- Dependabot configuration, grouping, or dependency update strategy. Search the GitHub docs directly. +- A specific failing PR check, missing workflow log, or CI failure triage. Use `gh-fix-ci`. +- General GitHub pull request, branch, or repository operations. Use `github`. +- CodeQL-specific configuration or code scanning guidance. Use `codeql`. +- Dependabot configuration, grouping, or dependency update strategy. Use `dependabot`. ## Workflow @@ -95,8 +94,8 @@ Keep citations close to the claim they support. - Linking the GitHub Actions docs landing page when a narrower page exists - Mixing up reusable workflows and composite actions - Suggesting long-lived cloud credentials when OIDC is the better documented path -- Treating repo-specific CI debugging as a documentation question when it should be handed to `gh-cli` -- Letting adjacent domains absorb the request when a more focused search of GitHub docs is the sharper fit +- Treating repo-specific CI debugging as a documentation question when it should be handed to `gh-fix-ci` +- Letting adjacent domains absorb the request when `codeql` or `dependabot` is the sharper fit ## Bundled Reference diff --git a/.agents/skills/github-issues/SKILL.md b/.agents/skills/github-issues/SKILL.md index 1e7c03fa3..a3a0e7f20 100644 --- a/.agents/skills/github-issues/SKILL.md +++ b/.agents/skills/github-issues/SKILL.md @@ -1,10 +1,9 @@ --- description: Create, update, and manage GitHub issues using MCP tools. Use this skill when users want to create bug reports, feature requests, or task issues, update existing issues, add labels/assignees/milestones, set issue fields (dates, priority, custom fields), set issue types, manage issue workflows, link issues, add dependencies, or track blocked-by/blocking relationships. Triggers on requests like "create an issue", "file a bug", "request a feature", "update issue X", "set the priority", "set the start date", "link issues", "add dependency", "blocked by", "blocking", or any GitHub issue management task. metadata: - github-path: github-issues - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/github-issues + github-ref: refs/heads/main + github-repo: https://github.com/github/awesome-copilot github-tree-sha: 44219c182a1435252a1751313b99fb0a79882bb5 name: github-issues --- diff --git a/.agents/skills/gitops-cluster-debug/SKILL.md b/.agents/skills/gitops-cluster-debug/SKILL.md index e8bb35a2f..42213e897 100644 --- a/.agents/skills/gitops-cluster-debug/SKILL.md +++ b/.agents/skills/gitops-cluster-debug/SKILL.md @@ -4,10 +4,9 @@ description: | Debug and troubleshoot Flux CD on live Kubernetes clusters (not local repo files) via the Flux MCP server — inspects Flux resource status, reads controller logs, traces dependency chains, and performs installation health checks. Use when users report failing, stuck, or not-ready Flux resources on a cluster, reconciliation errors, controller issues, artifact pull failures, or need live cluster Flux Operator troubleshooting. license: Apache-2.0 metadata: - github-path: gitops-cluster-debug - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/gitops-cluster-debug + github-ref: refs/tags/v0.0.3 + github-repo: https://github.com/fluxcd/agent-skills github-tree-sha: f1463d4de7168c3561f680a2613c3b68578f09ca name: gitops-cluster-debug --- diff --git a/.agents/skills/gitops-knowledge/SKILL.md b/.agents/skills/gitops-knowledge/SKILL.md index a34b5261f..d0136e6f9 100644 --- a/.agents/skills/gitops-knowledge/SKILL.md +++ b/.agents/skills/gitops-knowledge/SKILL.md @@ -1,13 +1,12 @@ --- description: | - Flux CD and Flux Operator expert — answers questions and generates schema-validated YAML for all Flux CRDs (not repo auditing or live cluster debugging). Use when users ask about Flux concepts, want manifests for HelmRelease, Kustomization, GitRepository, OCIRepository, ResourceSet, FluxInstance, or any Flux resource, or need guidance on GitOps repository structure, multi-tenancy, OCI-based delivery, image tag automation, drift detection, preview environments, notifications, or the Flux Web UI and MCP Server. Whenever users mention FluxCD, Flux Operator, or any Flux CRD in a question or manifest generation context, always use this skill. + Flux CD and Flux Operator expert — answers questions and generates schema-validated YAML for all Flux CRDs (not repo auditing or live cluster debugging). Use when users ask about Flux concepts, want manifests for HelmRelease, Kustomization, GitRepository, OCIRepository, ResourceSet, FluxInstance, or any Flux resource. When user needs guidance on GitOps repository structure, bootstrap Flux with Terraform, multi-tenancy, OCI-based delivery, image tag automation, drift detection, preview environments, notifications, or the Flux Web UI and MCP Server. license: Apache-2.0 metadata: - github-path: gitops-knowledge - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills - github-tree-sha: 31faa863a7f6f24ef0b6259f4e6ea6c4e1a87ea9 + github-path: skills/gitops-knowledge + github-ref: refs/tags/v0.0.3 + github-repo: https://github.com/fluxcd/agent-skills + github-tree-sha: 7edabe6723319a256d8aa31ad80a5969c3c521c4 name: gitops-knowledge --- # Flux CD Knowledge Base @@ -18,8 +17,7 @@ to answer questions accurately, generate correct YAML manifests, and explain Flu **Rules:** - Always use the exact apiVersion/kind combinations from the CRD table below. Never invent API versions. - Before generating YAML for any CRD, read its OpenAPI schema from `assets/schemas/` to verify field names, types, and enum values. -- ResourceSet templates use `<< >>` delimiters, NEVER `{{ }}` (Go templates are only used inside ImageUpdateAutomation commit messages). -- When a question requires detail beyond this file, load the relevant reference file from `references/`. Load at most 1-2 reference files per question. +- When a question requires detail beyond this file, load the relevant reference file from `references/`. - Prefer Flux Operator (FluxInstance) for cluster setup. Do not reference `flux bootstrap` or legacy `gotk-*` files. ## What is Flux @@ -119,7 +117,7 @@ spec: - name: infra-controllers # wait for this Kustomization to be Ready ``` -ResourceSets support richer dependencies with `readyExpr` (CEL expressions): +ResourceSets support richer dependencies with `readyExpr` (CEL expressions) and can depend on any type of resource: ```yaml spec: @@ -163,44 +161,6 @@ references it via `postBuild.substituteFrom` or `valuesFrom` will reconcile imme - **Helm chart** → `HelmRelease` - Both can deploy to remote clusters via `kubeConfig` and support `dependsOn`. -### How to Reference Helm Charts (3 Patterns) - -**Pattern 1 — HTTPS Helm repository:** -```yaml -# HelmRelease creates a HelmChart automatically -spec: - chart: - spec: - chart: metrics-server - version: "3.x" - sourceRef: - kind: HelmRepository - name: metrics-server -``` - -**Pattern 2 — OCI registry with chartRef (recommended):** -```yaml -# Separate OCIRepository + HelmRelease with chartRef -spec: - chartRef: - kind: OCIRepository - name: nginx-chart -``` - -**Pattern 3 — HelmChart from Git/Bucket source:** -```yaml -# Chart stored in Git, HelmRelease references HelmChart -spec: - chart: - spec: - chart: ./charts/my-app - sourceRef: - kind: GitRepository - name: my-repo -``` - -`chart.spec` and `chartRef` are **mutually exclusive** — use one or the other. - ### ResourceSet vs Kustomization? - **One set of manifests, one deployment** → `Kustomization` @@ -319,7 +279,6 @@ spec: values: crds: enabled: true - keep: false ``` ### 4. FluxInstance with OCI Sync (Gitless GitOps) @@ -431,13 +390,25 @@ spec: prune: true ``` -### 6. Image Automation Pipeline +### 6. Image Automation + +Flux supports two delivery models for updating container images and Helm chart versions. +Pick based on whether the team wants Git commits as the audit log for version changes: -Pipeline: ImageRepository → ImagePolicy → ImageUpdateAutomation. Mark images in YAML with -`# {"$imagepolicy": "namespace:policy-name"}` comment markers for automatic tag updates. +- **Git-based** — `ImageRepository` + `ImagePolicy` + `ImageUpdateAutomation` scan the + registry and commit tag bumps back to Git via `$imagepolicy` YAML markers. Requires + `image-reflector-controller` and `image-automation-controller` on the cluster. Load + `references/image-automation.md`. +- **Gitless** — `ResourceSet` + `ResourceSetInputProvider` (`type: OCIArtifactTag`) + scans the registry and re-renders the `ResourceSet` directly, upgrading the downstream + `HelmRelease` or `Kustomization` without touching Git. No bot credentials, no Git + poll lag, no extra controllers. Recommended default for Flux Operator deployments. + Load `references/gitless-image-automation.md`. -For complete YAML examples, tag filtering, commit message templates, and marker formats, -load `references/image-automation.md`. +Gitless is the better fit when the tag lives in Helm values, when tags should differ per +cluster in a fleet, or when the team doesn't want a bot writing to the repo. Git-based is +the better fit when PR-based approval of version bumps is required or when Git must remain +the canonical record of every deployed version. ### 7. Notifications (Slack, GitHub, Webhooks) @@ -457,18 +428,6 @@ load `references/notifications.md`. - HelmRelease: `spec.chart.spec` and `spec.chartRef` are mutually exclusive - FluxInstance: only one per cluster, must be named `flux` -**Required fields often forgotten:** -- `Kustomization.spec.prune` — must be set (true or false), controls garbage collection -- `Kustomization.spec.sourceRef` — must specify kind and name -- `HelmRelease.spec.interval` — required for reconciliation -- `Alert.spec.eventSources` — at least one source required - -**Wrong API versions:** -- Alert and Provider use `v1beta3`, not `v1` — `notification.toolkit.fluxcd.io/v1beta3` -- Receiver uses `v1` — `notification.toolkit.fluxcd.io/v1` -- HelmRelease uses `v2`, not `v1` or `v2beta1` — `helm.toolkit.fluxcd.io/v2` -- ImageRepository and ImagePolicy use `v1` — `image.toolkit.fluxcd.io/v1` - **HelmRelease strategy fields:** - Install/upgrade strategy is at `spec.install.strategy.name` and `spec.upgrade.strategy.name` - Always use `RetryOnFailure` — it retries without rollback or uninstall, avoiding downtime @@ -481,7 +440,6 @@ load `references/notifications.md`. mediaType: "application/vnd.cncf.helm.chart.content.v1.tar+gzip" operation: copy ``` -- Without `layerSelector`, the OCIRepository fetches the full OCI artifact, not the extracted chart. ## Reference Index @@ -516,19 +474,5 @@ Load at most 1-2 reference files per question. Read schemas for field-level vali | Best practices, dependency management, remediation, versioning | `references/best-practices.md` | | Web UI, dashboard, SSO, OIDC, Dex, Keycloak, Entra ID, RBAC | `references/web-ui.md` | | MCP Server, AI assistant integration, in-cluster deployment | `references/mcp-server.md` | - -## FluxInstance Enums - -**Cluster types:** `kubernetes`, `openshift`, `aws`, `azure`, `gcp` - -**Cluster sizes:** `small` (5 concurrency, 512Mi), `medium` (10, 1Gi), `large` (20, 3Gi) - -**Components:** `source-controller`, `kustomize-controller`, `helm-controller`, -`notification-controller`, `image-reflector-controller`, `image-automation-controller`, `source-watcher` - -**Sync kinds:** `GitRepository`, `OCIRepository`, `Bucket` - -**Distribution variants:** `upstream-alpine`, `enterprise-alpine`, `enterprise-distroless`, `enterprise-distroless-fips` - -For enums of other CRDs (HelmRelease strategies, Provider types, ImagePolicy types, -ResourceSetInputProvider types, etc.), check the relevant reference file or OpenAPI schema. +| Terraform bootstrap of Flux Operator | `references/terraform-bootstrap.md` | +| Gitless image automation (ResourceSet + OCIArtifactTag) | `references/gitless-image-automation.md` | diff --git a/.agents/skills/gitops-knowledge/evals/evals.json b/.agents/skills/gitops-knowledge/evals/evals.json index c7b389a7f..97dbc396a 100644 --- a/.agents/skills/gitops-knowledge/evals/evals.json +++ b/.agents/skills/gitops-knowledge/evals/evals.json @@ -102,6 +102,26 @@ "The apps ResourceSet inputs include entries for frontend and backend", "Generated Kustomizations include spec.prune (required field)" ] + }, + { + "id": 6, + "prompt": "I'm bootstrapping Flux Operator on a new EKS cluster named 'staging' with Terraform. The fleet is a private GitHub repo at https://github.com/acme-corp/fleet.git. We authenticate with a GitHub App (app ID, installation owner, and PEM private key passed as Terraform variables). I need: (1) the Terraform module block that loads the FluxInstance manifest from our fleet repo, (2) the managed Secret for GitHub App authentication, (3) a runtime_info entry publishing CLUSTER_REGION=eu-west-2, (4) the FluxInstance YAML with Git sync and a kustomize patch that wires postBuild.substituteFrom to the flux-runtime-info ConfigMap. Show the recommended repo layout too.", + "expected_output": "The recommended sibling layout (terraform/ and clusters/ at repo root), a module block using controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes with ${path.root}/../clusters/... paths, a managed Secret containing githubAppID/githubAppInstallationOwner/githubAppPrivateKey, runtime_info with CLUSTER_REGION, and a FluxInstance with Git sync (provider: github) and a kustomize.patches entry adding postBuild.substituteFrom referencing flux-runtime-info.", + "files": [], + "expectations": [ + "The Terraform module source is 'controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes'", + "The 'instance_yaml' path uses '${path.root}/../clusters/' (NOT '${path.root}/clusters/')", + "The repo layout places 'terraform/' and 'clusters/' as siblings at the repo root (clusters/ is NOT nested under terraform/)", + "managed_resources.secrets_yaml composes a Secret with stringData keys githubAppID, githubAppInstallationOwner, and githubAppPrivateKey", + "The GitHub App Secret name matches the FluxInstance spec.sync.pullSecret (or is 'flux-system' if pullSecret is omitted)", + "managed_resources.runtime_info.data contains CLUSTER_REGION with value 'eu-west-2'", + "The FluxInstance spec.sync.kind is 'GitRepository' and spec.sync.url points at the GitHub repo", + "The FluxInstance spec.sync.provider is 'github' (required for GitHub App auth)", + "The FluxInstance spec.kustomize.patches adds postBuild.substituteFrom targeting the flux-system Kustomization and referencing the flux-runtime-info ConfigMap", + "The output references or loads the 'terraform-bootstrap.md' reference, OR demonstrates its patterns (sibling layout, ${path.root}/.. paths, managed_resources.runtime_info)", + "The output does NOT use the legacy 'fluxcd/flux' Terraform provider or 'flux bootstrap' CLI", + "The Terraform variable holding the GitHub App PEM is marked 'sensitive = true'" + ] } ] } diff --git a/.agents/skills/gitops-knowledge/references/best-practices.md b/.agents/skills/gitops-knowledge/references/best-practices.md index a6c22bd04..224b038f5 100644 --- a/.agents/skills/gitops-knowledge/references/best-practices.md +++ b/.agents/skills/gitops-knowledge/references/best-practices.md @@ -66,7 +66,6 @@ Prescriptive guidance for production Flux deployments with Flux Operator. - **Service account impersonation:** Use `serviceAccountName` on Kustomizations and HelmReleases with per-namespace ServiceAccounts and RoleBindings for least-privilege access. - ## Flux Operator Configuration - **Cluster sizing:** Match `cluster.size` to workload count — `small` for dev (<50 resources), @@ -84,8 +83,6 @@ Prescriptive guidance for production Flux deployments with Flux Operator. - **Alerts on failures:** Create Provider + Alert for Slack/Teams with `eventSeverity: error` watching all Kustomizations and HelmReleases. Every cluster should have failure notifications. -- **Receivers for fast sync:** Set up Receiver webhooks for GitHub/GitLab push events to trigger - immediate GitRepository reconciliation instead of waiting for the poll interval. - **Intervals:** Use short intervals for sources (`5m`) and longer intervals for appliers (`30m`). Receivers handle immediate triggers; intervals are the fallback. - **Prune: true:** Always set `prune: true` on Kustomizations to enable garbage collection @@ -97,3 +94,5 @@ Prescriptive guidance for production Flux deployments with Flux Operator. (Dex, Keycloak, Microsoft Entra ID, OpenShift) and can be exposed via Ingress. - **Image automation isolation:** Run image automation controllers on a dedicated cluster to isolate Git write access from production clusters. +- **Receivers for fast sync:** Set up Receiver webhooks for GitHub/GitLab push events to trigger + immediate GitRepository reconciliation instead of waiting for the poll interval. diff --git a/.agents/skills/gitops-knowledge/references/flux-operator.md b/.agents/skills/gitops-knowledge/references/flux-operator.md index a33256ff1..f8a7de68e 100644 --- a/.agents/skills/gitops-knowledge/references/flux-operator.md +++ b/.agents/skills/gitops-knowledge/references/flux-operator.md @@ -16,17 +16,18 @@ helm install flux-operator oci://ghcr.io/controlplaneio-fluxcd/charts/flux-opera **Terraform:** ```hcl -resource "helm_release" "flux_operator" { - name = "flux-operator" - namespace = "flux-system" - create_namespace = true - repository = "oci://ghcr.io/controlplaneio-fluxcd/charts/flux-operator" - chart = "flux-operator" +module "flux_operator_bootstrap" { + source = "controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes" + + gitops_resources = { + instance_yaml = file("${path.root}/../clusters/${var.cluster_name}/flux-system/flux-instance.yaml") + } } ``` -For self-managed installation via Flux itself, use a ResourceSet that depends on the -HelmRelease CRD: +For the full Terraform bootstrap workflow — load `references/terraform-bootstrap.md`. + +To automatically update the operator when new versions are released: ```yaml apiVersion: fluxcd.controlplane.io/v1 @@ -48,6 +49,9 @@ spec: spec: interval: 10m url: oci://ghcr.io/controlplaneio-fluxcd/charts/flux-operator + layerSelector: + mediaType: "application/vnd.cncf.helm.chart.content.v1.tar+gzip" + operation: copy ref: semver: '*' - apiVersion: helm.toolkit.fluxcd.io/v2 @@ -65,12 +69,10 @@ spec: install: strategy: name: RetryOnFailure - retryInterval: 3m upgrade: force: true strategy: name: RetryOnFailure - retryInterval: 3m values: multitenancy: enabled: true @@ -119,7 +121,7 @@ spec: | Field | Type | Description | |-------|------|-------------| -| `distribution.version` | string | Semver range (`2.x`, `2.5.x`) or exact version | +| `distribution.version` | string | Semver range (`2.x`, `2.8.x`) or exact version | | `distribution.registry` | string | Container registry (e.g., `ghcr.io/fluxcd`) | | `distribution.variant` | string | `upstream-alpine`, `enterprise-alpine`, `enterprise-distroless`, `enterprise-distroless-fips` | | `distribution.artifact` | string | OCI artifact URL with distribution manifests | diff --git a/.agents/skills/gitops-knowledge/references/gitless-image-automation.md b/.agents/skills/gitops-knowledge/references/gitless-image-automation.md new file mode 100644 index 000000000..6d6ee936e --- /dev/null +++ b/.agents/skills/gitops-knowledge/references/gitless-image-automation.md @@ -0,0 +1,242 @@ +# Gitless Image Automation Reference + +Gitless image automation updates running workloads when new container images or Helm +chart versions are published — **without committing tag bumps back to Git**. It uses +Flux Operator's `ResourceSetInputProvider` to scan OCI registries and re-renders the +owning `ResourceSet` whenever a new tag or digest is detected. + +Contrast with the Git-based variant (`ImageRepository` + `ImagePolicy` + +`ImageUpdateAutomation`, see `references/image-automation.md`): that one mutates YAML +in Git and relies on Flux to re-apply on the next reconciliation. Gitless skips Git +entirely and upgrades the `HelmRelease` (or `Kustomization`) directly. + +**When to pick which:** + +| Concern | Git-based | Gitless | +|---|---|---| +| Audit trail of tag bumps | Git commits | Flux events / notifications | +| PR-based approval workflow for version bumps | Yes (via `push.branch` + PR) | No | +| Requires a bot with write access to the repo | Yes | No | +| Requires `image-automation-controller` + `image-reflector-controller` | Yes | No — only `flux-operator` + helm/kustomize controllers | + +This reference assumes familiarity with `ResourceSet` and `ResourceSetInputProvider` +basics — load `references/resourcesets.md` first if those are new. + +## How It Works + +1. One `ResourceSetInputProvider` per artifact you want to track (Helm chart, + container image). Each scans its registry on its own schedule and exports the + latest matching `tag` and `digest`. +2. A `ResourceSet` consumes all providers via `inputsFrom` with + `inputStrategy: Permute` and templates the tag/digest into a `HelmRelease` or + `Kustomization`. +3. When a provider detects a new tag or digest, the `ResourceSet` re-renders. The + rendered `HelmRelease` values (or `Kustomization.spec.images`) change, which + triggers the downstream controller to upgrade the release. + +## Step 1 — Input Providers + +One provider per artifact. Each runs independently: + +```yaml +apiVersion: fluxcd.controlplane.io/v1 +kind: ResourceSetInputProvider +metadata: + name: podinfo-chart + namespace: apps + annotations: + fluxcd.controlplane.io/reconcileEvery: "15m" +spec: + type: OCIArtifactTag + url: oci://ghcr.io/stefanprodan/charts/podinfo + filter: + semver: ">=6.0.0" + limit: 1 +--- +apiVersion: fluxcd.controlplane.io/v1 +kind: ResourceSetInputProvider +metadata: + name: podinfo-image + namespace: apps + annotations: + fluxcd.controlplane.io/reconcileEvery: "5m" +spec: + type: OCIArtifactTag + url: oci://ghcr.io/stefanprodan/podinfo + filter: + includeTag: "latest" + limit: 1 +--- +apiVersion: fluxcd.controlplane.io/v1 +kind: ResourceSetInputProvider +metadata: + name: redis-image + namespace: apps +spec: + type: OCIArtifactTag + url: oci://docker.io/redis + filter: + semver: ">0.0.0-0" + includeTag: ".*-alpine$" + limit: 1 +``` + +### Filter Shapes + +- `semver: ">=6.0.0"` — highest matching semver tag. +- `includeTag: "latest"` — a specific floating tag. The exported `tag` stays `latest`; + the `digest` changes on every push, and that's what triggers redeployment. +- `semver` + `includeTag` — semver ordering within a tag variant (e.g. only `*-alpine` + builds). +- `excludeTag` — regex to drop tags (e.g. `".*-rc.*"` to skip release candidates). + +### `limit: 1` Is Required + +Without `limit: 1`, the provider emits every matching tag. Combined with +`inputStrategy: Permute` across multiple providers, this produces a combinatorial +explosion of `HelmRelease` instances (one per tag-combination). For image automation +you want exactly one release pinned to the latest tag — always set `limit: 1`. + +### Registry Authentication + +- **Public registries** — no auth needed. +- **Private registries with a pull secret** — reference a + `kubernetes.io/dockerconfigjson` `Secret` via `spec.secretRef.name`. +- **Cloud IAM / workload identity** — use `ACRArtifactTag`, `ECRArtifactTag`, or + `GARArtifactTag` instead of `OCIArtifactTag` and bind a `serviceAccountName` + annotated for the cloud (IRSA for EKS, Workload Identity for GKE, Workload Identity + for AKS). + +## Step 2 — ResourceSet Consuming the Providers + +```yaml +apiVersion: fluxcd.controlplane.io/v1 +kind: ResourceSet +metadata: + name: podinfo + namespace: apps +spec: + inputStrategy: + name: Permute + inputsFrom: + - kind: ResourceSetInputProvider + name: podinfo-chart + - kind: ResourceSetInputProvider + name: podinfo-image + - kind: ResourceSetInputProvider + name: redis-image + resources: + - apiVersion: source.toolkit.fluxcd.io/v1 + kind: OCIRepository + metadata: + name: podinfo + namespace: apps + spec: + interval: 12h + url: oci://ghcr.io/stefanprodan/charts/podinfo + ref: + tag: << inputs.podinfo_chart.tag >> + - apiVersion: helm.toolkit.fluxcd.io/v2 + kind: HelmRelease + metadata: + name: podinfo + namespace: apps + spec: + interval: 30m + releaseName: podinfo + chartRef: + kind: OCIRepository + name: podinfo + values: + image: + tag: "<< inputs.podinfo_image.tag >>@<< inputs.podinfo_image.digest >>" + redis: + enabled: true + tag: "<< inputs.redis_image.tag >>@<< inputs.redis_image.digest >>" +``` + +### Name-to-Key Transformation + +Template keys must be valid identifiers, so hyphens in the provider's `metadata.name` +are converted to underscores in the template: + +| `ResourceSetInputProvider` name | Template key | +|---|---| +| `podinfo-chart` | `inputs.podinfo_chart` | +| `podinfo-image` | `inputs.podinfo_image` | +| `redis-image` | `inputs.redis_image` | + +A provider named `podinfo-image` exporting `tag: 6.5.4` and `digest: sha256:abc…` is +consumed as `<< inputs.podinfo_image.tag >>` and `<< inputs.podinfo_image.digest >>`. + +### Pin Tag + Digest + +Always template **both** tag and digest — `tag@digest`. For immutable semver tags this +is belt-and-braces. For mutable floating tags (`latest`, `main`) the tag string never +changes but the digest does on every push, and the digest change is what causes the +rendered `HelmRelease` values to differ, triggering an upgrade. + +## Step 3 — Images Not Exposed in Helm Values + +Many community charts don't expose every container image in values. Use Kustomize +post-renderers on the `HelmRelease` (or `.spec.images` on a `Kustomization`) to patch +images after rendering: + +```yaml +apiVersion: helm.toolkit.fluxcd.io/v2 +kind: HelmRelease +spec: + postRenderers: + - kustomize: + images: + - name: ghcr.io/stefanprodan/podinfo + newTag: << inputs.podinfo_image.tag | quote >> + digest: << inputs.podinfo_image.digest | quote >> +``` + +For a Flux `Kustomization`: + +```yaml +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +spec: + images: + - name: ghcr.io/stefanprodan/podinfo + newTag: << inputs.podinfo_image.tag | quote >> + digest: << inputs.podinfo_image.digest | quote >> +``` + +## Notifications on Update + +Wire a `Provider` + `Alert` (see `references/notifications.md`) to emit Slack/Teams +notifications when the `ResourceSet` or resulting `HelmRelease` changes state. Event +sources to watch: `ResourceSet`, `HelmRelease`, `Kustomization`, `OCIRepository`. + +## Operations + +Suspend a single provider to pause updates for one artifact without freezing the rest +of the app: + +```shell +flux-operator -n apps suspend rsip redis-image +flux-operator -n apps resume rsip redis-image +``` + +Suspend the `ResourceSet` to freeze the entire deployment workflow: + +```shell +flux-operator -n apps suspend rset podinfo +``` + +Force an immediate registry scan: + +```shell +flux-operator -n apps reconcile rsip podinfo-image +``` + +Dry-run the `ResourceSet` locally with mock inputs to verify the template renders: + +```shell +flux-operator build rset -f podinfo-resourceset.yaml \ + --inputs-from-provider static-inputs.yaml +``` diff --git a/.agents/skills/gitops-knowledge/references/helmrelease.md b/.agents/skills/gitops-knowledge/references/helmrelease.md index fec8a50d1..efaa1de7a 100644 --- a/.agents/skills/gitops-knowledge/references/helmrelease.md +++ b/.agents/skills/gitops-knowledge/references/helmrelease.md @@ -55,15 +55,9 @@ spec: chartRef: kind: OCIRepository name: cert-manager-chart - install: - strategy: - name: RetryOnFailure - retryInterval: 3m upgrade: - force: true strategy: name: RetryOnFailure - retryInterval: 3m values: crds: enabled: true diff --git a/.agents/skills/gitops-knowledge/references/kustomization.md b/.agents/skills/gitops-knowledge/references/kustomization.md index 32f78e4f8..2f5b3f87c 100644 --- a/.agents/skills/gitops-knowledge/references/kustomization.md +++ b/.agents/skills/gitops-knowledge/references/kustomization.md @@ -149,16 +149,12 @@ Without a `secretRef`, the decryption uses Cloud KMS with workload identity (GCP ## Health Checks -By default, Kustomization checks standard Kubernetes readiness conditions. Custom health checks -can be defined for resources that don't follow standard patterns: +By default, Kustomization checks standard Kubernetes readiness conditions when `wait: true`: ```yaml spec: - healthChecks: - - apiVersion: apps/v1 - kind: Deployment - name: my-app - namespace: my-app + wait: true + timeout: 5m ``` Custom health check expressions using CEL (fields: `current` required, `inProgress` and `failed` optional): diff --git a/.agents/skills/gitops-knowledge/references/repo-patterns.md b/.agents/skills/gitops-knowledge/references/repo-patterns.md index a222f918b..980e84215 100644 --- a/.agents/skills/gitops-knowledge/references/repo-patterns.md +++ b/.agents/skills/gitops-knowledge/references/repo-patterns.md @@ -254,7 +254,7 @@ Clusters: ### Directory Structure -**Fleet repo (d2-fleet):** +**Fleet repo:** ``` fleet/ ├── clusters/ @@ -275,7 +275,7 @@ fleet/ └── apps.yaml # Applications ResourceSet ``` -**Infrastructure repo (d2-infra):** +**Infrastructure repo:** ``` infra/ ├── components/ @@ -296,7 +296,7 @@ infra/ └── kube-prometheus-stack.yaml ``` -**Applications repo (d2-apps):** +**Applications repo:** ``` apps/ ├── components/ diff --git a/.agents/skills/gitops-knowledge/references/resourcesets.md b/.agents/skills/gitops-knowledge/references/resourcesets.md index b96bc1597..203b84a05 100644 --- a/.agents/skills/gitops-knowledge/references/resourcesets.md +++ b/.agents/skills/gitops-knowledge/references/resourcesets.md @@ -8,7 +8,7 @@ self-service platforms. ## Canonical YAML -Based on the D2 reference architecture fleet pattern — deploys per-tenant namespaces with +Based on the Gitless reference architecture fleet pattern — deploys per-tenant namespaces with OCIRepository sources and Kustomizations: ```yaml @@ -25,7 +25,6 @@ spec: kind: ResourceSet name: infra ready: true - readyExpr: "status.conditions.filter(e, e.type == 'Ready').all(e, e.status == 'True')" inputs: - tenant: "frontend" tag: "${ARTIFACT_TAG}" @@ -162,25 +161,76 @@ spec: ### Permute (Cartesian Product) -Computes the Cartesian product of all input sources. Useful when combining -independent dimensions. +`Permute` computes the Cartesian product of all input sources. In practice, the +**primary reason teams use `Permute` is not the cross-product but the namespaced field +access** it provides: fields from each source are placed under a key named after the +source object, so values from different providers (or from inline `.spec.inputs`) +don't collide. The canonical shape uses `limit: 1` on every +`ResourceSetInputProvider`, yielding exactly one permutation. + +**Canonical shape — multiple providers, one permutation.** Combining chart version + +image tag + image tag for a single `HelmRelease` (see +`references/gitless-image-automation.md` for the full image-automation pattern): + +```yaml +spec: + inputStrategy: + name: Permute + inputsFrom: + - kind: ResourceSetInputProvider + name: chart-version # limit: 1 → exports one tag + - kind: ResourceSetInputProvider + name: image-tag # limit: 1 → exports one tag+digest + # 1 × 1 = 1 permutation. Inside templates: + # << inputs.chart_version.tag >> + # << inputs.image_tag.tag >>@<< inputs.image_tag.digest >> +``` + +Without `Permute`, both providers' fields would merge into a flat `inputs.tag`, which +would clash. `Permute` keeps them under distinct keys. + +**True cross-product — static dimensions × one provider.** When an actual Cartesian +product is wanted, combine an inline `.spec.inputs` list of dimensions with `limit: 1` +providers: ```yaml spec: inputStrategy: name: Permute inputs: - - region: "us-east" - - region: "eu-west" + - region: us-east + - region: eu-west inputsFrom: - - name: apps-provider # provides [{app: "web"}, {app: "api"}] - # Produces 4 sets: us-east/web, us-east/api, eu-west/web, eu-west/api + - kind: ResourceSetInputProvider + name: image-tag # limit: 1 + # 2 × 1 = 2 permutations: one HelmRelease per region, both pinned to the current image. ``` -**Permute field access:** In Permute mode, inputs from different sources are accessed via -normalized source names. Normalization rules: uppercase → lowercase, spaces/punctuation -(including `-`) → underscores, non-alphanumeric removed. For example, a provider named -`git-tags` is accessed as `inputs.git_tags.tag`, not `inputs.tag`. +**Field access.** Each source's input set is placed under a key derived from the +*normalized name of the object* providing it — **NOT** under its source fields +directly. Normalization: uppercase → lowercase; spaces/punctuation (including `-`) → +underscores; non-alphanumeric removed. + +| Object providing inputs | Template key | +|---|---| +| `ResourceSetInputProvider` named `image-tag` | `inputs.image_tag` | +| `ResourceSetInputProvider` named `chart-version` | `inputs.chart_version` | +| Inline `.spec.inputs` on a `ResourceSet` named `my-apps` | `inputs.my_apps` | + +Two always-flat accessors exist alongside the namespaced keys: + +- `<< inputs.id >>` — auto-generated unique ID per permutation. +- `<< inputs.provider.{apiVersion,kind,name,namespace} >>` — metadata about the source. + +**Inline inputs under Permute — common gotcha.** When `.spec.inputs` is set and +`Permute` is on, those inline inputs are keyed under the **ResourceSet's own +normalized name**. So a ResourceSet named `my-apps` with inline input `{region: +us-east}` needs `<< inputs.my_apps.region >>`, not `<< inputs.region >>`. This +differs from `Flatten` (the default), where inline inputs are accessed flat. + +**Never omit `limit: 1`.** Exporting multiple tags from a single provider and letting +`Permute` cross them produces N redundant `HelmRelease`s for the same app — not what +you want. The operator stalls the `ResourceSet` at 10,000 permutations as a guard. ## Dependencies @@ -195,14 +245,13 @@ spec: name: infra namespace: flux-system ready: true - readyExpr: "status.conditions.filter(e, e.type == 'Ready').all(e, e.status == 'True')" # Wait for a CRD to exist (no readiness check needed) - apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition name: helmreleases.helm.toolkit.fluxcd.io - # Wait for a Kustomization to be Ready + # Wait for a Kustomization to be Ready at creation time (no updates needed after that) - apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization name: infra-configs @@ -368,9 +417,9 @@ spec: ## Use Cases -### Multi-Component Orchestration (D2 Pattern) +### Multi-Component Orchestration (Gitless Pattern) -The D2 reference architecture uses a chain of ResourceSets: +The Gitless reference architecture uses a chain of ResourceSets: 1. **policies** — Creates ValidatingAdmissionPolicies (no inputs needed) 2. **infra** — Creates per-component namespaces + OCIRepository + Kustomization for infrastructure (cert-manager, monitoring) 3. **apps** — Creates per-tenant namespaces + OCIRepository + Kustomization for applications (frontend, backend) @@ -407,43 +456,11 @@ spec: # ... deploy app at the PR's commit SHA ``` -### Image Automation with ResourceSets - -Use a ResourceSet to deploy ImageRepository + ImagePolicy + ImageUpdateAutomation across -multiple component repositories: +### Gitless Image Automation with ResourceSets -```yaml -spec: - inputs: - - namespace: "apps" - repository: "https://github.com/org/apps.git" - pushBranch: "image-updates" - - namespace: "infra" - repository: "https://github.com/org/infra.git" - pushBranch: "image-updates" - resources: - - apiVersion: source.toolkit.fluxcd.io/v1 - kind: GitRepository - metadata: - name: << inputs.namespace >> - namespace: << inputs.namespace >> - spec: - url: << inputs.repository >> - ref: - branch: main - - apiVersion: image.toolkit.fluxcd.io/v1 - kind: ImageUpdateAutomation - metadata: - name: << inputs.namespace >> - namespace: << inputs.namespace >> - spec: - sourceRef: - kind: GitRepository - name: << inputs.namespace >> - git: - push: - branch: << inputs.pushBranch >> - update: - path: "./components" - strategy: Setters -``` +`ResourceSet` + `ResourceSetInputProvider` of `type: OCIArtifactTag` implements image +update automation without committing tag bumps to Git — the provider scans the +registry, the `ResourceSet` re-renders, and the downstream `HelmRelease` or +`Kustomization` upgrades directly. For the full pattern (provider filters, Permute +strategy, `tag@digest` pinning, post-renderers for images not in Helm values) load +`references/gitless-image-automation.md`. diff --git a/.agents/skills/gitops-knowledge/references/sources.md b/.agents/skills/gitops-knowledge/references/sources.md index ef06b0af9..5f1d60424 100644 --- a/.agents/skills/gitops-knowledge/references/sources.md +++ b/.agents/skills/gitops-knowledge/references/sources.md @@ -21,10 +21,9 @@ spec: branch: main secretRef: name: git-credentials - ignore: | - # exclude non-deployment files - /* - !/deploy + sparseCheckout: + - deploy/ + - charts/ ``` **Key spec fields:** @@ -38,10 +37,10 @@ spec: | `ref.semver` | string | Semver constraint (e.g., `>=1.0.0 <2.0.0`) | | `ref.commit` | string | Exact commit SHA | | `secretRef.name` | string | Secret with credentials | -| `ignore` | string | `.gitignore`-style patterns to exclude from artifact | +| `sparseCheckout` | string | List of directories to checkout with Kubernetes mananifest | | `recurseSubmodules` | bool | Include Git submodules (default: false) | | `insecure` | bool | Skip TLS verification for HTTP URLs | -| `verify.provider` | string | Signature verification provider (`cosign`) | +| `verify.secretRef.name` | string | Secret with PGP public keys | **Authentication secrets:** @@ -50,6 +49,9 @@ For HTTPS — Secret with `username` and `password` (or token) fields: stringData: username: git password: ghp_xxxxxxxxxxxx + ca.crt: # Optional CA certificate + tls.crt: # Optional TLS certificate for mTLS + tls.key: # Optional TLS key for mTLS ``` For SSH — Secret with `identity` (private key) and `known_hosts` fields: @@ -61,7 +63,7 @@ stringData: known_hosts: github.com ssh-ed25519 AAAA... ``` -For GitHub App — Secret with `githubAppID`, `githubAppInstallationID`, `githubAppPrivateKey`. +For GitHub App — Secret with `githubAppID`, `githubAppPrivateKey` and `githubAppInstallationID` or `githubAppInstallationOwner`. ## OCIRepository @@ -98,6 +100,7 @@ spec: | `ref.semver` | string | Semver constraint for tag selection | | `ref.digest` | string | Exact digest (`sha256:...`) | | `secretRef.name` | string | Secret of type `kubernetes.io/dockerconfigjson` | +| `certSecretRef.name` | string | Secret with TLS CA and client certs for mTLS auth | | `provider` | string | Cloud OIDC provider for keyless auth: `aws`, `azure`, `gcp` | | `layerSelector.mediaType` | string | Filter OCI layers by media type | | `layerSelector.operation` | string | `extract` (default) or `copy` | @@ -155,6 +158,7 @@ spec: | `url` | string | Helm repository HTTPS URL | | `interval` | duration | How often to fetch the index | | `secretRef.name` | string | Secret with `username`/`password` for auth | +| `certSecretRef.name` | string | Secret with TLS CA and client certs for mTLS auth | | `provider` | string | Cloud OIDC provider for keyless auth | | `passCredentials` | bool | Pass credentials to chart download URLs | | `type` | string | `default` (HTTPS) or `oci` — but prefer `OCIRepository` for OCI registries | @@ -225,6 +229,7 @@ spec: | `region` | string | AWS region (default: `us-east-1`) | | `provider` | string | `generic` (default), `aws`, `azure`, `gcp` | | `secretRef.name` | string | Secret with `accesskey` and `secretkey` fields | +| `certSecretRef.name` | string | Secret with TLS CA and client certs for mTLS auth | | `insecure` | bool | Use HTTP instead of HTTPS | | `prefix` | string | S3 key prefix filter | | `ignore` | string | `.gitignore`-style patterns to exclude | diff --git a/.agents/skills/gitops-knowledge/references/terraform-bootstrap.md b/.agents/skills/gitops-knowledge/references/terraform-bootstrap.md new file mode 100644 index 000000000..6fc4444fb --- /dev/null +++ b/.agents/skills/gitops-knowledge/references/terraform-bootstrap.md @@ -0,0 +1,457 @@ +# Terraform Flux Operator Bootstrap Reference + +The [`controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes`](https://registry.terraform.io/modules/controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes/latest) +Terraform module bootstraps Flux Operator and a `FluxInstance` into a Kubernetes cluster +using a Kubernetes `Job`. Use this module when the cluster is provisioned with Terraform +and Flux should take over GitOps reconciliation afterwards. + +## Ownership Model + +The module solves the bootstrap ownership problem by splitting responsibilities: + +- **Terraform** manages only the ephemeral bootstrap mechanism: a namespace, RBAC, + mounted manifests, and a bootstrap `Job`. +- **Flux** and **Flux Operator** take over steady-state reconciliation of all GitOps + resources once the bootstrap `Job` completes. + +## Repository Layout + +The Terraform root module and the Flux manifests should both live at the repo root, +as siblings. `clusters/` stays at the top level so Flux can reconcile it as the fleet +source of truth, and the Terraform directory stays isolated to the bootstrap concern: + +```text +repo/ +├── terraform/ # Terraform root module +│ ├── main.tf +│ ├── providers.tf +│ └── variables.tf +└── clusters/ + └── staging/ # reconciled by Flux via FluxInstance.spec.sync.path + └── flux-system/ + ├── flux-instance.yaml # applied by the bootstrap Job + ├── flux-operator-values.yaml # shared between Terraform and the Flux-managed HelmRelease + ├── flux-operator.yaml # ResourceSet wrapping the Flux Operator HelmRelease + ├── runtime-info.yaml # Git-managed fields of flux-runtime-info (optional) + └── kustomization.yaml # configMapGenerator for flux-operator-values +``` + +Because the Terraform root is a subdirectory, reach up with `${path.root}/..` when +loading manifests, and parameterize the cluster name: + +```hcl +module "flux_operator_bootstrap" { + source = "controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes" + + gitops_resources = { + instance_yaml = file("${path.root}/../clusters/${var.cluster_name}/flux-system/flux-instance.yaml") + operator_chart = { + values_yaml = file("${path.root}/../clusters/${var.cluster_name}/flux-system/flux-operator-values.yaml") + } + } +} +``` + +Set `FluxInstance.spec.sync.path` to `clusters/${var.cluster_name}` so Flux reconciles +the same directory after bootstrap — pulling in `flux-operator.yaml` and the generated +`flux-operator-values` `ConfigMap` via `kustomization.yaml`. + +## Provider Configuration + +Callers must configure the HashiCorp `helm` and `kubernetes` providers. For local +clusters (KinD, minikube, Docker Desktop) the simplest setup uses `config_path`: + +```hcl +provider "kubernetes" { + config_path = "~/.kube/config" +} + +provider "helm" { + kubernetes = { + config_path = "~/.kube/config" + } +} +``` + +For cloud clusters, derive the connection from the cluster data source or module +outputs. Example for EKS: + +```hcl +provider "kubernetes" { + host = data.aws_eks_cluster.this.endpoint + cluster_ca_certificate = base64decode(data.aws_eks_cluster.this.certificate_authority[0].data) + token = data.aws_eks_cluster_auth.this.token +} + +provider "helm" { + kubernetes = { + host = data.aws_eks_cluster.this.endpoint + cluster_ca_certificate = base64decode(data.aws_eks_cluster.this.certificate_authority[0].data) + token = data.aws_eks_cluster_auth.this.token + } +} +``` + +The module does not require cluster connectivity during `terraform plan`, so it can +live in the **same root module** that creates the cluster. Wire the providers to the +cluster module's outputs and add `depends_on`: + +```hcl +module "eks" { + source = "terraform-aws-modules/eks/aws" + # ... +} + +provider "kubernetes" { + host = module.eks.cluster_endpoint + cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) + exec = { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } +} + +module "flux_operator_bootstrap" { + depends_on = [module.eks] + source = "controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes" + # ... +} +``` + +## GitOps vs Managed Resources + +The module draws a clear line between resources Flux owns and resources Terraform owns. + +**`gitops_resources`** — applied once with create-if-missing semantics, then reconciled +by Flux: +- `instance_yaml` (required): the `FluxInstance` manifest. +- `operator_chart`: Flux Operator Helm chart repository, version, and values. +- `prerequisites.yamls`: ordered manifests applied before the `FluxInstance` + (e.g. Karpenter `NodePool`s, Gateway API CRDs). +- `prerequisites.charts`: Helm charts installed before Flux (e.g. CSI drivers, CNI). + +**`managed_resources`** — reconciled on every bootstrap `Job` run with server-side apply, +correcting drift from manual `kubectl` changes: +- `secrets_yaml`: multi-document `Secret` manifest string reconciled into the + `FluxInstance` target namespace. Namespace must be omitted or match the target. +- `runtime_info`: key-value data published as a `ConfigMap` named `flux-runtime-info` + in the target namespace. + +Managed resources are tracked in an inventory and garbage-collected when removed from +the input. + +### Helm Chart Adoption by Flux + +Each `gitops_resources.prerequisites.charts[]` entry can set `flux_adoption_check` to +let Flux take over the release after bootstrap: + +- **Without `flux_adoption_check`** — the chart is installed create-if-missing. + Subsequent bootstrap runs skip re-install. +- **With `flux_adoption_check`** — the `Job` checks the referenced resource (e.g. a + `Deployment`) for Flux ownership labels. If Flux has adopted it, the chart is + skipped entirely. If not adopted, the full unlock/recover/upgrade flow runs — the + same flow used for the Flux Operator chart itself. + +Use `flux_adoption_check` when the chart will be re-declared as a `HelmRelease` in +the fleet repo and reconciled by Flux post-bootstrap. + +## Revision and Drift + +The bootstrap `Job` re-runs automatically whenever any input content changes. When all +inputs are unchanged, `terraform plan` shows zero diff. The `revision` input is a number +to bump for forcing a re-run without changing content. + +Secret values never appear in the Terraform state — `managed_resources` is marked +`sensitive` and only a SHA-256 hash of the content is persisted. + +## Runtime Info and Variable Substitution + +When `managed_resources.runtime_info` is set, the bootstrap `Job`: + +1. Creates a `ConfigMap` named `flux-runtime-info` in the `FluxInstance` target namespace + with the provided `data`, `labels`, and `annotations`. +2. Substitutes `${variable}` references in all input manifests (`instance_yaml`, + `prerequisites.yamls`, `operator_chart.values_yaml`, per-chart `values_yaml`) using + `flux envsubst --strict` before applying them. + +This lets input manifests use `${cluster_name}` style references resolved at bootstrap +time. For steady-state reconciliation of the same variables, patch the generated +`flux-system` `Kustomization` (created from `.spec.sync`) with `postBuild.substituteFrom` +referencing the same `ConfigMap`: + +```yaml +apiVersion: fluxcd.controlplane.io/v1 +kind: FluxInstance +metadata: + name: flux + namespace: flux-system +spec: + # ... + kustomize: + patches: + - target: + kind: Kustomization + name: flux-system + patch: | + - op: add + path: /spec/postBuild + value: + substituteFrom: + - kind: ConfigMap + name: flux-runtime-info +``` + +### Co-Owning `flux-runtime-info` with Git + +Terraform-owned runtime info and Git-owned runtime info can coexist in the **same** +`flux-runtime-info` `ConfigMap` using server-side apply field ownership. Terraform +writes only the fields it knows (cloud region, account ID, cluster ID), while a +Git-managed `runtime-info.yaml` writes everything else (artifact tag, environment, +cluster name, domain). + +Split by authority: + +- **Terraform-owned fields** — values known only to the infra provisioner, e.g. + `CLUSTER_REGION`, `ACCOUNT_ID`. Set via `managed_resources.runtime_info.data`. +- **Git-owned fields** — values that belong in the fleet repo, e.g. `ARTIFACT_TAG`, + `ENVIRONMENT`, `CLUSTER_NAME`. Reconciled by Flux from a `ConfigMap` in + `clusters//flux-system/runtime-info.yaml`. + +The Git-managed `ConfigMap` must set `kustomize.toolkit.fluxcd.io/ssa: "Merge"` so +kustomize-controller merges its fields instead of replacing the whole `ConfigMap`, +preserving the fields Terraform owns: + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: flux-runtime-info + namespace: flux-system + labels: + toolkit.fluxcd.io/runtime: "true" + reconcile.fluxcd.io/watch: Enabled + annotations: + kustomize.toolkit.fluxcd.io/ssa: "Merge" +data: + ARTIFACT_TAG: latest + ENVIRONMENT: staging + CLUSTER_NAME: staging-1 + CLUSTER_DOMAIN: staging.example.com +``` + +Because Terraform and kustomize-controller act as different SSA field managers, each +owns the keys it sets — neither clobbers the other on reconciliation. + +## Node Scheduling + +When the cluster uses dedicated nodes with taints, configure affinity and tolerations +at three layers: + +**Bootstrap `Job`** — `job.affinity` and `job.tolerations`. Use the taint key/value +that the dedicated node pool carries (e.g. a `dedicated=flux:NoSchedule` taint on a +Karpenter `NodePool`): + +```hcl +job = { + tolerations = [{ + key = "dedicated" + operator = "Equal" + value = "flux" + effect = "NoSchedule" + }] +} +``` + +**Flux Operator** — `gitops_resources.operator_chart.values_yaml`: + +```hcl +gitops_resources = { + operator_chart = { + values_yaml = yamlencode({ + tolerations = [{ + key = "dedicated" + operator = "Equal" + value = "flux" + effect = "NoSchedule" + }] + }) + } +} +``` + +**Flux controllers** (source-controller, etc.) — `.spec.kustomize.patches` in the +`FluxInstance` manifest, targeting `kind: Deployment`. + +If node pools are managed by Karpenter or similar, pass the `NodePool` manifests as +`gitops_resources.prerequisites.yamls` so target nodes exist before the bootstrap +`Job` schedules. + +When the bootstrap `Job` must install a CNI plugin (e.g. Cilium) before pod networking +is available, set `job.host_network = true` so the `Job` runs on the host network. + +## Shared Operator Values File + +A single `flux-operator-values.yaml` can be reused by Terraform (bootstrap) and Flux +(steady-state). Place the file in the fleet repo, bundle it into a `ConfigMap` via +`configMapGenerator`, and reference it from a Flux-managed `HelmRelease` using +`valuesFrom`. The `reconcile.fluxcd.io/watch: Enabled` label on the `ConfigMap` triggers +helm-controller to reconcile when values change. + +During bootstrap, load the same file with `file(...)`: + +```hcl +operator_chart = { + values_yaml = file("${path.root}/../clusters/${var.cluster_name}/flux-system/flux-operator-values.yaml") +} +``` + +When certain fields must differ during bootstrap (e.g. disabling the web UI before +Gateway API CRDs exist), merge overrides in. Terraform's `merge()` is **shallow** — it +replaces top-level keys, so override entire top-level keys, not nested fields: + +```hcl +operator_chart = { + values_yaml = yamlencode(merge( + yamldecode(file("${path.root}/../clusters/${var.cluster_name}/flux-system/flux-operator-values.yaml")), + { web = { enabled = false } }, + )) +} +``` + +Wrap the Flux-managed operator `HelmRelease` in a `ResourceSet` that `dependsOn` the +CRDs its values reference (e.g. `httproutes.gateway.networking.k8s.io` when +`web.httpRoute.enabled: true`) so the operator is only upgraded after the CRDs are +installed. + +## Sync Source Authentication + +When `FluxInstance.spec.sync` points at a private Git repository or OCI registry, +compose the matching `Secret` into `managed_resources.secrets_yaml`. The `Secret` +name must match `spec.sync.pullSecret` (default `flux-system` if omitted). + +**Git PAT (GitLab, Bitbucket, classic GitHub):** + +```hcl +locals { + git_auth_secret = yamlencode({ + apiVersion = "v1" + kind = "Secret" + metadata = { name = "flux-system" } + type = "Opaque" + stringData = { + username = "git" + password = var.git_token + } + }) +} + +module "flux_operator_bootstrap" { + source = "controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes" + + managed_resources = { + secrets_yaml = local.git_auth_secret + } +} +``` + +**GitHub App** (preferred for GitHub — avoids PAT rotation): + +```hcl +locals { + git_auth_secret = yamlencode({ + apiVersion = "v1" + kind = "Secret" + metadata = { name = "flux-system" } + type = "Opaque" + stringData = { + githubAppID = var.github_app_id + githubAppInstallationOwner = var.github_app_installation_owner + githubAppPrivateKey = var.github_app_pem + } + }) +} +``` + +Set `FluxInstance.spec.sync.provider: github` and `pullSecret: flux-system` when using +a GitHub App. Mark `git_token` and `github_app_pem` as `sensitive = true` in their +`variable` blocks — the secret content never appears in the Terraform state regardless, +but marking sensitive prevents leaks in plan/apply output. + +A single expression can branch between auth modes by merging into `stringData` based on +which variables are set, so one module instance handles public repos, PAT-authenticated +repos, and GitHub App-authenticated repos uniformly. + +**OCI pull secret** — for `spec.sync.kind: OCIRepository` pointing at a private +registry (e.g. GHCR), emit a `kubernetes.io/dockerconfigjson` `Secret`. The same +secret can also be used as a Helm chart `pullSecret` on downstream `OCIRepository` +resources. Set `spec.sync.pullSecret` to this `Secret` name. + +The dockerconfig JSON is embedded in a YAML heredoc using single-quoted scalar +syntax, so any single quote inside the JSON must be doubled with +`replace(..., "'", "''")` to avoid breaking the YAML: + +```hcl +locals { + ghcr_auth_dockerconfigjson = jsonencode({ + auths = { + "ghcr.io" = { + username = "flux" + password = var.oci_token + auth = base64encode("flux:${var.oci_token}") + } + } + }) +} + +module "flux_operator_bootstrap" { + source = "controlplaneio-fluxcd/flux-operator-bootstrap/kubernetes" + + managed_resources = { + secrets_yaml = <<-YAML + apiVersion: v1 + kind: Secret + metadata: + name: ghcr-auth + type: kubernetes.io/dockerconfigjson + stringData: + .dockerconfigjson: '${replace(local.ghcr_auth_dockerconfigjson, "'", "''")}' + YAML + } +} +``` + +## Managed Secrets from External Stores + +Pull secret values from AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, or +HashiCorp Vault using Terraform `data` sources and compose them into +`managed_resources.secrets_yaml`: + +```hcl +data "aws_secretsmanager_secret_version" "git_credentials" { + secret_id = "flux/staging/git-credentials" +} + +managed_resources = { + secrets_yaml = <<-YAML + apiVersion: v1 + kind: Secret + metadata: + name: git-credentials + type: Opaque + stringData: + password: '${data.aws_secretsmanager_secret_version.git_credentials.secret_string}' + YAML +} +``` + +Drift from manual `kubectl` changes is corrected on every `Job` run. + +## Debugging Failed Bootstraps + +Set `debug_on_failure = true` to relay the bootstrap `Job` logs to Terraform output +when the `Job` fails or stalls. Requirements on the Terraform execution environment: + +- `bash` on `PATH` (Git Bash satisfies this on Windows) +- `kubectl` on `PATH`, configured with credentials for the target cluster +- the `hashicorp/null` provider (~> 3.2) declared in `required_providers` diff --git a/.agents/skills/gitops-repo-audit/SKILL.md b/.agents/skills/gitops-repo-audit/SKILL.md index 28c2e728f..c47135c3f 100644 --- a/.agents/skills/gitops-repo-audit/SKILL.md +++ b/.agents/skills/gitops-repo-audit/SKILL.md @@ -4,10 +4,9 @@ description: | Audit and validate Flux CD GitOps repositories by scanning local repo files (not live clusters) — runs Kubernetes schema validation, detects deprecated Flux APIs, reviews RBAC/multi-tenancy/secrets management, and produces a prioritized GitOps report. Use when users ask to audit, analyze, validate, review, or security-check a GitOps repo. license: Apache-2.0 metadata: - github-path: gitops-repo-audit - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/gitops-repo-audit + github-ref: refs/tags/v0.0.3 + github-repo: https://github.com/fluxcd/agent-skills github-tree-sha: 6a0bce48f3bda341fce4ad2c4c798b40b40b98c4 name: gitops-repo-audit --- diff --git a/.agents/skills/refactor/SKILL.md b/.agents/skills/refactor/SKILL.md index 098a338a8..8eb362415 100644 --- a/.agents/skills/refactor/SKILL.md +++ b/.agents/skills/refactor/SKILL.md @@ -2,10 +2,9 @@ description: Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. license: MIT metadata: - github-path: refactor - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills + github-path: skills/refactor + github-ref: refs/heads/main + github-repo: https://github.com/github/awesome-copilot github-tree-sha: 01cfb6a412faf9d680880cc3f631c419121a9916 name: refactor --- diff --git a/.agents/skills/siderolabs/SKILL.md b/.agents/skills/siderolabs/SKILL.md deleted file mode 100644 index b0aef22aa..000000000 --- a/.agents/skills/siderolabs/SKILL.md +++ /dev/null @@ -1,321 +0,0 @@ ---- -compatibility: Requires talosctl and/or omnictl. Talos is API-driven and does not support SSH. -description: Deploy and operate Kubernetes clusters using Talos Linux and Omni. Use when generating/applying Talos machine configuration, managing cluster lifecycle in Omni, and troubleshooting common Talos/Omni workflows. -license: Apache-2.0 -metadata: - author: siderolabs - github-path: siderolabs - github-pinned: 5fe05e6dd751519bdc212d80499429651392ac7e - github-ref: 5fe05e6dd751519bdc212d80499429651392ac7e - github-repo: https://github.com/devantler-tech/skills - github-tree-sha: 07d1124f8a7f1a8ab72072e2491b9c30d49339e9 - mintlify-proj: siderolabs - version: "1.0" -name: siderolabs ---- -> **Version notice:** Verify the Talos and Omni documentation links in this file against the currently supported versions before following version-specific guidance. - -# SideroLabs best practices - -**Always consult the [Talos](https://docs.siderolabs.com/talos/v1.12/overview/what-is-talos) and [Omni](https://docs.siderolabs.com/omni/getting-started/getting-started) docs for configuration, latest features and best practices** - -If you are not already connected to the SideroLabs MCP server, [https://docs.siderolabs.com/mcp](https://docs.siderolabs.com/mcp), add it so that you can search more efficiently. - -Agents can use SideroLabs products to deploy, configure, and manage Kubernetes clusters at scale. - -SideroLabs created and currently manages two products: - -- **Talos Linux**: Talos Linux is an API-Managed, secure, immutable, and minimal operating system for Kubernetes. -- **Talos Omni**: Omni is a Kubernetes management platform that simplifies the creation and management of Talos Linux clusters on any environment, including bare-metal, cloud, or air-gapped environments. - -## Key concepts - -- **Machine Configuration**: YAML-based declarative configuration for each node -- **talosctl**: CLI tool for interacting with Talos API and managing machines -- **KubeSpan**: Automatic WireGuard mesh networking for hybrid clusters -- **System Extensions**: Container-based mechanism for adding functionality without modifying core OS -- **Image Factory**: Service for generating customized Talos images with extensions and kernel modules -- **Omni**: SaaS or self-hosted central point of access for multi-cluster management across environments - -## The Talos Linux image - -The Talos image is a bootable operating system image of Talos Linux that you use to install and run Talos on a machine (VM, bare metal, or cloud instance). - -Download the right Talos Linux image for your operating system from the [Image factory](https://factory.talos.dev/). - -## Integration - -Talos Linux and Omni integrate with: -- **Kubernetes**: Native Kubernetes API with RBAC, audit logging, and service accounts -- **Container Registries**: Docker Hub, Quay, GitHub Container Registry, private registries -- **Identity Providers**: SAML (Okta, Entra ID, Workspace One), OIDC (Tailscale), Keycloak -- **Cloud Platforms**: AWS, Azure, GCP, DigitalOcean, Hetzner, Scaleway, Akamai, Oracle, Exoscale, Upcloud, Vultr, CloudStack, OpenStack, Nocloud -- **Virtualization**: VMware, KVM, Hyper-V, Proxmox, OpenNebula, Xen, Vagrant -- **Networking**: WireGuard, Calico, Cilium, Multus CNI -- **Storage**: Rook/Ceph, local storage, Synology CSI, standard Kubernetes storage classes -- **Monitoring**: Metrics server, etcd metrics, Prometheus-compatible endpoints -- **Infrastructure-as-Code**: Cluster Templates, omnictl CLI - -## Install Talos and Omni CLI tools - -### Install via Homebrew (Recommended for macOS and Linux): - -```bash -brew install siderolabs/tap/sidero-tools -``` - -### Install talosctl with curl: - -```bash -curl -sL https://talos.dev/install | sh -``` - -### Install omnictl with curl: - -```bash -curl -sL https://talos.dev/install-omnictl | sh -``` - -## Workflows - -### Create a Talos Linux cluster - -1. Boot machines with a Talos Linux image. -2. `talosctl gen config --install-disk ` -3. Apply machine configuration: `talosctl apply-config --insecure --nodes --file ` -4. Bootstrap etcd **once**: `talosctl bootstrap --nodes ` -5. Fetch kubeconfig: `talosctl kubeconfig --nodes ` -6. Check health: `talosctl health --nodes ` -7. Validate Kubernetes registration: `kubectl get nodes` - -### Create a Talos Linux cluster with Omni - -1. Download Omni-managed boot media from Omni UI. -2. Boot machines so they register into Omni. -3. Create a cluster template YAML. -4. Validate the template: `omnictl cluster template validate -f ` -5. Sync declared state to Omni: `omnictl cluster template sync -f ` -6. Fetch kubeconfig: `omnictl kubeconfig -c ` -7. Download talosconfig: `omnictl talosconfig --cluster ` -8. Merge `talosconfig` and `kubeconfig` configuration: - ```bash - # Merge Talos configuration - talosctl config merge $HOME/Downloads/talosconfig.yaml - - # Merge kubeconfig (combine and flatten) - export KUBECONFIG=~/.kube/config:$HOME/Downloads/talos-default-kubeconfig.yaml - kubectl config view --flatten > ~/.kube/config - ``` -9. Verify nodes: `kubectl get nodes` - -## CLI reference - -### talosctl (allowed actions) - -- `talosctl logs ` - view service logs -- `talosctl upgrade --image ` - upgrade Talos -- `talosctl patch machineconfig --nodes -p ` - patch machine configuration -- `talosctl rollback` - rollback OS version -- `talosctl reset` - **destructive** wipe; requires explicit warning - -Additionally, refer to the [Talos for Linux Admins](https://docs.siderolabs.com/talos/v1.12/learn-more/talos-for-linux-admins) to learn about the Talos alternative for Linux commands. - -### omnictl CLI reference - -Here are some omnictl commands and their uses: - -- `omnictl apply --file ` - create and update a resource using a YAML file as input -- `omnictl cluster delete ` - delete all cluster resources. -- `omnictl config info` - show information about current context. - -## Local configuration file locations - -### talosctl -- `~/.talos/config` - -### omnictl -- Linux: `~/.talos/omni/config` -- macOS: `~/Library/Application Support/omni/config` -- Windows: `%USERPROFILE%\.talos\omni\config` - -## Common gotchas (things agents must not mess up) - -1. **No SSH on Talos.** Never suggest SSH or SSH-based commands. -2. **No in-node file edits.** Never suggest editing `/etc`, `/var`, or other files on Talos nodes, and never reference editors or shell sessions on Talos nodes. Local editing on the operator machine is OK when using supported API-driven workflows such as `talosctl`. -3. **No package managers.** Talos does not support apt, yum, apk, pacman, etc. -4. **No kubeadm.** Talos does not use kubeadm for initialization or upgrades. -5. **Bootstrap is one-time.** Never suggest retry loops or re-running bootstrap unless explicitly recovering from a failed creation. -6. **Be explicit when operations are destructive.** Especially `talosctl reset`. -7. **Do not modify system certificates or systemd units.** Talos uses API-managed services only. -8. **Do not bypass Omni reconciliation.** When a cluster is Omni-managed, changes must go through Omni. -9. **Never invent unsupported integrations or commands.** - -## Allowed agent behavior - -- Generate, patch, and validate Talos machine configuration. -- Suggest `talosctl` or `omnictl` commands. -- Provide step-by-step cluster lifecycle workflows. -- Refer to official documentation links. -- Summarize or explain Talos/Omni concepts. -- Warn users when an action is destructive. - -## Skills - -### Talos Linux cluster deployment - -- Deploy Talos Linux clusters on 15+ cloud platforms (AWS, Azure, GCP, DigitalOcean, Hetzner, Scaleway, etc.) -- Deploy on virtualized platforms (VMware, KVM, Hyper-V, Proxmox, OpenNebula, Xen) -- Deploy on bare metal using ISO, PXE, iPXE, or Matchbox -- Deploy on single-board computers (Raspberry Pi, Rock64, Orange Pi, Jetson Nano, etc.) -- Deploy locally using Docker, QEMU, or VirtualBox for testing -- Support for air-gapped deployments without internet access - -### Machine configuration management - -- Apply machine configuration via `talosctl apply-config` -- Edit machine configuration with `talosctl edit machineconfig` using interactive editor -- Apply JSON patches to machine configuration with `talosctl patch machineconfig` -- Retrieve current machine configuration with `talosctl get machineconfig` -- Support for immediate configuration updates without reboot for networking, logging, kubelet, kernel args, and more -- Reproducible machine configuration for consistent deployments - -### Upgrade Talos Linux Cluster -1. Use `talosctl upgrade` to initiate upgrade -2. Specify target Talos version -3. Upgrade rolls through nodes automatically -4. Control plane nodes upgraded with leader election -5. Worker nodes upgraded sequentially -6. Verify cluster health after upgrade - -### Backup and Restore Etcd -1. Create etcd backup with `talosctl etcd backup` -2. Store backup securely off-cluster -3. In case of disaster, restore from backup -4. Use `talosctl etcd restore` to recover cluster state -5. Verify cluster functionality after restoration - -### Networking Configuration -- Configure static IP addresses, DHCP, or dynamic network settings -- Set up network interfaces with bonds, bridges, and VLANs -- Configure WireGuard VPN for secure inter-node communication -- Enable KubeSpan for hybrid clusters spanning edge, datacenter, and cloud -- Virtual IP (VIP) configuration for high availability -- Host DNS configuration and egress domain filtering -- Predictable interface naming and device selectors -- Support for multihoming and corporate proxies - -### Cluster Scaling and Workload Management -- Scale clusters up by adding new machines to control plane or worker roles -- Scale clusters down by removing machines -- Deploy workloads using standard Kubernetes manifests -- Interactive dashboard for cluster visualization and management -- Support for workers running on control plane nodes -- Cluster autoscaling with Karpenter or Kubernetes Cluster Autoscaler - -### Security and Access Control -- Role-based access control (RBAC) for Talos API -- Certificate authority rotation and management -- Machine configuration OAuth for secure access -- SAML and OIDC authentication integration -- Disk encryption with Omni as Key Management Server -- SELinux support for enhanced security -- Image verification and secure boot support -- Break-glass emergency access for disaster recovery - -### Storage and Disk Management -- Configure disk layouts (system, user, resource partitions) -- Disk encryption with LUKS -- Swap configuration -- Support for existing volumes and raw volumes -- Disk management with layout templates and resource allocation - -### Container Runtime and Image Management -- Containerd configuration and management -- Image cache and pull-through cache for faster deployments -- Registry mirror configuration with authentication and TLS -- Static pod deployment -- Image factory for custom Talos images with system extensions -- Support for custom kernel modules and GPU drivers - -### Hardware and GPU Support -- NVIDIA GPU support (proprietary and open-source drivers) -- NVIDIA Fabric Manager for multi-GPU systems -- AMD GPU support -- Custom kernel argument configuration -- PCI device driver rebinding -- Hardware-specific platform configuration - -### System Extensions and Customization -- Build custom system extensions as container images -- Install system extensions during cluster creation or runtime -- Kernel module compilation and installation -- Custom kernel argument configuration -- Overlay system for additional customizations -- OCI base specification support for extension development - -### Cluster Operations and Maintenance -- Etcd backup and restore for disaster recovery -- Etcd maintenance and defragmentation -- Watchdog timer configuration for automatic recovery -- Cgroups analysis for resource monitoring -- Talos upgrade management with rolling updates -- Machine reset and factory reset capabilities -- Support bundle generation for troubleshooting - -### Omni Cluster Management -- Create and manage clusters from registered machines -- Cluster templates for declarative infrastructure-as-code -- Machine registration from bare metal (ISO, PXE), cloud (AWS, Azure, GCP, Hetzner), or manual provisioning -- Infrastructure providers for bare metal, cloud, and virtualization platforms -- Cluster autoscaling with dynamic machine provisioning -- Etcd backup and restore management -- Audit logging for compliance and security -- Talos configuration overrides and patches -- NTP server configuration -- Support bundle generation - -### Authentication and Authorization -- SAML integration with Okta, Unifi Identity Enterprise, Workspace One, Entra ID, Oracle Cloud -- OIDC login with Tailscale -- Access Control Lists (ACLs) for fine-grained permissions -- Role-based access control (Admin, User, None roles) -- Automatic user provisioning on first login -- Keycloak integration for self-hosted deployments - -### High Availability and Disaster Recovery -- 3-node control plane for HA clusters -- Etcd consensus-based fault tolerance -- Automatic etcd backups with configurable intervals -- Disaster recovery procedures for cluster restoration -- KubeSpan for hybrid cluster resilience - -### Configure Network for Hybrid Cluster with KubeSpan -1. Enable KubeSpan in machine configuration -2. Configure WireGuard settings (private key, listen port) -3. Add peer configurations with public keys and endpoints -4. Talos automatically discovers peers via discovery service -5. Full mesh WireGuard network established across all nodes -6. Cluster spans edge, datacenter, and cloud seamlessly - -### Build Custom Talos Image with System Extensions -1. Define system extensions as container images -2. Create schematic with extension references -3. Use Image Factory to generate custom image -4. Download ISO, kernel, or disk image -5. Boot machines with custom image -6. Extensions automatically installed during boot - -## Context - -**Talos Linux Philosophy**: Talos is designed with a single purpose - running Kubernetes. It removes unnecessary complexity by: -- Using API-driven configuration instead of SSH/files -- Maintaining immutable root filesystem -- Minimizing installed packages -- Defaulting to secure settings -- Supporting declarative, reproducible deployments - -**Deployment Models**: -- Standalone Talos clusters managed via talosctl -- Omni SaaS for managed multi-cluster deployments -- Self-hosted Omni for air-gapped or on-premises environments -- Hybrid deployments spanning multiple infrastructure types diff --git a/.github/workflows/update-skills.yaml b/.github/workflows/update-skills.yaml index 776bd2135..a22b60e70 100644 --- a/.github/workflows/update-skills.yaml +++ b/.github/workflows/update-skills.yaml @@ -15,8 +15,6 @@ jobs: permissions: contents: write pull-requests: write - uses: devantler-tech/reusable-workflows/.github/workflows/update-copilot-skills.yaml@6eea016969bceed84f4fb38c8e79fb04a2cadc31 # main @ 2026-04-18 + uses: devantler-tech/reusable-workflows/.github/workflows/update-copilot-skills.yaml@ee83c5e5cbfb7107701ddda9a7754aeab87d3a0a # main @ 2026-04-19 with: - skills-lock: skills-lock.json - agent: github-copilot - scope: project + dir: .agents/skills diff --git a/skills-lock.json b/skills-lock.json deleted file mode 100644 index 9c7f5cae8..000000000 --- a/skills-lock.json +++ /dev/null @@ -1,77 +0,0 @@ -{ - "version": 1, - "skills": { - "copilot-instructions-blueprint-generator": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "find-skills": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "gh-cli": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "gh-stack": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "git-commit": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "github-actions-docs": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "github-issues": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "gitops-cluster-debug": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "gitops-knowledge": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "gitops-repo-audit": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "refactor": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - }, - "siderolabs": { - "source": "devantler-tech/skills", - "sourceType": "github", - "ref": "v0.2.5", - "digest": "5fe05e6dd751519bdc212d80499429651392ac7e" - } - } -} From 6d88117f8d6c64e1e383b3c6b03ad564e8e52948 Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Sun, 19 Apr 2026 18:17:09 +0200 Subject: [PATCH 2/2] fix(update-skills): pin to merged reusable-workflows commit on main The previous PR-head SHA was unreachable from the default branch after reusable-workflows#207 squash-merged, triggering zizmor's "commit with no history in referenced repository" alert. Repoint at the merge commit on `main`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/workflows/update-skills.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/update-skills.yaml b/.github/workflows/update-skills.yaml index a22b60e70..3640ee643 100644 --- a/.github/workflows/update-skills.yaml +++ b/.github/workflows/update-skills.yaml @@ -15,6 +15,6 @@ jobs: permissions: contents: write pull-requests: write - uses: devantler-tech/reusable-workflows/.github/workflows/update-copilot-skills.yaml@ee83c5e5cbfb7107701ddda9a7754aeab87d3a0a # main @ 2026-04-19 + uses: devantler-tech/reusable-workflows/.github/workflows/update-copilot-skills.yaml@5a334687e73feec66012e62db518716a8618417a # v1.39.0+ (post skills-lock refactor) with: dir: .agents/skills