Skip to content

feat(ci): add self-hosted renovate alongside dependabot#737

Merged
mchmarny merged 2 commits into
NVIDIA:mainfrom
njhensley:feat/renovate-self-hosted
May 5, 2026
Merged

feat(ci): add self-hosted renovate alongside dependabot#737
mchmarny merged 2 commits into
NVIDIA:mainfrom
njhensley:feat/renovate-self-hosted

Conversation

@njhensley
Copy link
Copy Markdown
Member

Summary

Introduces a self-hosted Renovate runner that shadows Dependabot during a soft-launch phase. Renovate replaces and extends Dependabot's coverage by also tracking the tool versions pinned in .settings.yaml (the project's single source of truth) — something Dependabot cannot do.

Motivation / Context

Dependabot covers gomod, github-actions, Dockerfile-only docker, and terraform. It cannot manage the 28 tool versions pinned in .settings.yaml, which currently drift via hand-bumps. Renovate's customManager handles them via # renovate: annotations and bundles updates per top-level YAML section.

Dependabot stays in place; Phase E (cutover) is a follow-up PR that removes dependabot.yml + dependabot-auto-merge.yaml once Renovate has produced one healthy weekday cycle.

Fixes: N/A
Related: pattern modeled on NVIDIA/gpu-operator

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Build/CI/tooling

Component(s) Affected

  • Other: CI / dependency management

Implementation Notes

What's covered

Source Manager Status vs Dependabot
go.mod gomod (groups: kubernetes, golang-x, opencontainers) preserved verbatim
.github/workflows/*.yaml, .github/actions/*/action.yml github-actions (digest-pinned, grouped per cycle) preserved + grouped
validators/*/Dockerfile dockerfile preserved
infra/**/*.tf terraform (grouped) preserved + grouped
site/package.json npm (bundled with docs_tools into docs group) NEW
recipes/components/*/values.yaml helm-values (auto-detects image.repository/image.tag only) NEW (partial)
.settings.yaml (28 tool entries) custom regex manager NEW
.settings.yaml nvkind SHA dedicated git-refs digest customManager NEW
.settings.yaml chainsaw_checksums postUpgradeTaskstools/update-chainsaw-checksums NEW

Design decisions

  • Self-hosted via renovatebot/github-action with the built-in GITHUB_TOKEN. The repo's /ok reviewer-comment policy re-fires CI on bot PRs, sidestepping GitHub's "GITHUB_TOKEN cannot trigger workflows" limitation. No PAT/App needed.
  • Custom regex manager for .settings.yaml — annotations look like # renovate: datasource=<DS> depName=<DN> depType=<section> and drive section-based PR grouping. The depType is the YAML top-level key (e.g. build_tools, testing_tools). Three matchString shapes cover plain scalars, image:tag strings, and YAML list items.
  • nvkind (pinned by main-branch SHA) gets a dedicated git-refs digest customManager with a distinct # renovate-digest: annotation prefix so the broad regex doesn't double-extract it.
  • Group consolidation — section bundles in .settings.yaml (build-tools, linting, security-tools, testing-tools, docs-tools, test-images, languages); cross-manager bundles for supply-chain (anchore/* + sigstore/* spanning regex + github-actions); single docs bundle merging hugo (.settings.yaml) with site/package.json.
  • Conservative auto-merge — positive-listed. Only github-actions/gomod/npm patches plus an explicit allow-list of build/lint/security tooling auto-merge. Cluster-impacting pins (helm, kubectl, kind, kwok, chainsaw, karpenter, gpu-operator, kindest/node, CUDA, Go toolchain, node, hugo, nvkind) require human review even on patches.
  • Release cooldownminimumReleaseAge: "3 days" globally, raised to "7 days" for auto-merged updates. Defends against malicious-publish ratchet attacks (event-stream / colors.js / node-ipc style). internalChecksFilter: "strict" keeps the dashboard clean.
  • Schedulecron: "0 5 * * 1-5" (weekdays 05:00 UTC) is the single source of truth. Intentionally NO schedule: field in renovate.json5 — self-hosted Renovate runs only when the workflow fires, so a config-side schedule would be redundant and create an alignment trap (mismatched windows would silently hold every PR in "Awaiting Schedule").
  • Supply-chain image pinning — both Renovate runner image references (Makefile validator + workflow) digest-pinned to sha256:00185c0d… for consistency with the project's GitHub Actions pinning policy. Lockstep noted in inline comments.
  • Permissions — explicit contents: write, pull-requests: write, issues: write, and statuses: write. The last is load-bearing: Renovate calls POST /repos/{owner}/{repo}/statuses/{sha} after each branch creation to write a stability status check (tied to the cooldown / merge-confidence flow). Without it, the call 403s and Renovate's error handler maps the failure to a misleading "repository-changed" abort. Worth noting for any future workflows trying to use Renovate with restrictive permissions.

CI integration

  • make lint-renovate validates .github/renovate.json5 against the same ghcr.io/renovatebot/renovate:43@sha256:… image the workflow uses. Intentionally NOT part of make lint (that target stays Docker-free); invoked directly by CI.
  • verify-renovate job in .github/workflows/merge-gate.yaml runs make lint-renovate only when .github/renovate.json5 changes (path-filtered via dorny/paths-filter, mirroring the verify-licenses pattern). Skip job + aggregate gate wiring keep the required-status semantics correct.

Soft-launch plan

.github/dependabot.yml and .github/workflows/dependabot-auto-merge.yaml are deliberately untouched. Once this PR merges:

  1. Trigger Renovate via workflow_dispatch and confirm the dashboard issue + first PRs land cleanly.
  2. Watch one weekday cron cycle.
  3. Open a follow-up PR removing the Dependabot config and auto-merge workflow.

Testing

make lint-renovate    # passes against the digest-pinned validator image
actionlint .github/workflows/renovate.yaml .github/workflows/merge-gate.yaml   # clean

The configuration was exercised end-to-end on a fork (njhensley/aicr) before this PR:

  • 4+ Renovate runs producing real PRs across managers (regex, github-actions, npm, dockerfile).
  • verify-renovate gate firing exactly when .github/renovate.json5 changed and skipping otherwise.
  • statuses:write permission verified — the stability-status call writes successfully (renovate/stability-days status: Updates have met minimum release age requirement).
  • Cooldown filtering young releases as designed.
  • Group consolidation producing single PRs spanning multiple managers (renovate/supply-chain bundled anchore/* from both .settings.yaml regex and github-actions workflow files).
  • tools/update-chainsaw-checksums post-upgrade hook validated as idempotent (zero diff against the currently-pinned v0.2.14).
  • The verify-renovate path-filter validated by toggling unrelated changes.

This is a CI/config-only change — no Go code is touched.

Risk Assessment

  • Low — Dependabot remains in place; Renovate runs alongside it during the soft-launch phase. Worst case (Renovate misbehaves) → revert this PR; Dependabot was never disrupted.

Rollout notes: See .github/RENOVATE.md for the full coverage description and the policy choices documented inline in renovate.json5. Cutover is a follow-up PR removing the Dependabot config; this PR does not delete anything.

Checklist

  • Linter passes (make lint-renovate, actionlint on the new + modified workflows)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality — N/A (the configuration is the test surface; verify-renovate validates it on every PR that touches it)
  • I updated docs (.github/RENOVATE.md is the canonical reference)
  • Changes follow existing patterns in the codebase (digest pinning, dorny/paths-filter gating, .settings.yaml as source of truth)
  • Commits are cryptographically signed (git commit -S)

@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Introduces a self-hosted Renovate runner that shadows dependabot during
a soft-launch phase. Renovate replaces and extends dependabot's
coverage by also tracking the tool versions pinned in .settings.yaml
(the project's single source of truth) — something dependabot cannot
do.

## Coverage delta over dependabot

| Source | Manager | Status |
|---|---|---|
| go.mod | gomod | preserved (kubernetes/golang-x/opencontainers groups carried over verbatim) |
| .github/workflows/*, .github/actions/*/action.yml | github-actions | preserved + grouped into one PR/cycle |
| validators/*/Dockerfile | dockerfile | preserved |
| infra/**/*.tf | terraform | preserved + grouped |
| site/package.json | npm | NEW |
| recipes/components/*/values.yaml | helm-values | NEW (partial — auto-detects image.repository/image.tag shape) |
| .settings.yaml (28 tool entries) | custom regex manager | NEW |
| .settings.yaml chainsaw checksums | postUpgradeTasks → tools/update-chainsaw-checksums | NEW |
| .settings.yaml nvkind SHA | dedicated git-refs digest customManager | NEW |

## Key design decisions

- **Self-hosted via renovatebot/github-action** with the built-in
  GITHUB_TOKEN. The repo's /ok reviewer-comment policy re-fires CI on
  bot PRs, sidestepping GitHub's "GITHUB_TOKEN cannot trigger
  workflows" limitation. No PAT/App needed.

- **Custom regex manager for .settings.yaml** — annotations look like
  `# renovate: datasource=<DS> depName=<DN> depType=<section>` and
  drive the per-section grouping below. Each section in the YAML maps
  to its own bundled PR.

- **git-refs digest customManager for nvkind** — uses a distinct
  `# renovate-digest:` annotation prefix so the broad regex doesn't
  double-extract it. Captures the 40-char SHA into currentDigest.

- **Group consolidation** — section-based bundles for .settings.yaml
  (build-tools, linting, security-tools, testing-tools, docs-tools,
  test-images, languages); cross-manager bundles for `supply-chain`
  (anchore/* + sigstore/* spanning regex + github-actions); single
  `docs` bundle merging hugo (.settings.yaml) with site/package.json.

- **Conservative auto-merge** — positive-listed. Only github-actions/
  gomod/npm patches plus an explicit allow-list of build/lint/security
  tooling auto-merge. Cluster-impacting pins (helm, kubectl, kind,
  kwok, chainsaw, karpenter, gpu-operator, kindest/node, CUDA, Go
  toolchain, node, hugo, nvkind) require human review.

- **Release cooldown** — `minimumReleaseAge: "3 days"` globally,
  raised to "7 days" for auto-merged updates. Defends against
  malicious-publish ratchet attacks (event-stream, colors.js,
  node-ipc, ...). `internalChecksFilter: "strict"` excludes too-young
  releases entirely from the dashboard.

- **Schedule** — workflow cron at `0 5 * * 1-5` (weekdays 05:00 UTC)
  is the single source of truth. There is intentionally no
  second-layer `schedule:` in renovate.json5 — self-hosted Renovate
  runs only when the workflow fires, so a config-side schedule is
  redundant and creates an alignment trap.

- **Supply-chain image pinning** — both Renovate runner image
  references (Makefile validator + workflow) digest-pinned to
  `sha256:00185c0d…` for consistency with the project's GitHub
  Actions pinning policy. Lockstep noted in inline comments.

- **Permissions** — explicit `contents: write`, `pull-requests:
  write`, `issues: write`, **and `statuses: write`**. The last is
  load-bearing: Renovate calls POST /repos/{owner}/{repo}/statuses/
  {sha} after each branch creation to write a stability status check
  (tied to the cooldown / merge-confidence flow). Without it, the
  call 403s and Renovate's error handler maps the failure to
  "repository-changed", silently aborting the run.

## CI integration

- `make lint-renovate` validates .github/renovate.json5 against the
  same `ghcr.io/renovatebot/renovate:43@sha256:…` image the workflow
  uses. NOT part of the `make lint` aggregate (that target stays
  Docker-free); invoked directly by CI.

- `verify-renovate` job in `.github/workflows/merge-gate.yaml` runs
  `make lint-renovate` only when `.github/renovate.json5` changes
  (path-filtered via dorny/paths-filter, mirroring the
  verify-licenses pattern). Skip job + aggregate gate wiring keep
  the required-status semantics correct.

## Soft-launch plan

See `.github/RENOVATE.md` for the phased Phase A–E rollout. dependabot
config and auto-merge workflow remain in place until Phase D confirms
Renovate is healthy on the live repo end-to-end. Phase E is a
follow-up PR removing them.

This branch was extracted from a soft-launch practice exercise on the
maintainer's fork (njhensley/aicr) where every component was exercised
end-to-end: 4+ Renovate runs producing real PRs across manager types,
verify-renovate gate firing path-filtered, status checks writing
correctly, cooldown filtering young releases, group consolidation
producing single PRs across managers, and the chainsaw post-upgrade
hook validated as idempotent. The thrash from the practice exercise
(several PRs chasing a misleading "repository-changed" error message
that turned out to be a missing statuses:write permission) is squashed
out of this branch — only the converged final state lands here.
@njhensley njhensley force-pushed the feat/renovate-self-hosted branch from fd7d152 to d10d943 Compare May 4, 2026 23:48
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Substantial, well-documented infra enhancement with a conservative soft-launch posture (Dependabot stays, auto-merge positive-listed + 7-day cooldown, cluster-impacting pins reserved for human review). The inline policy comments and RENOVATE.md make the design decisions reviewable in isolation, and the regression breadcrumbs (the configurationFile: / dual-extraction note, the statuses:write debug story) will save someone hours later.

CI is green on the verify-renovate gate; the GPU jobs in progress are unrelated to the config surface. Three non-blocking inline notes (vendor handling, regex edge case, sed pre/post-check kudos) — none gate merge.

Comment thread .github/renovate.json5
Comment thread .github/renovate.json5
Comment thread tools/update-chainsaw-checksums
@mchmarny mchmarny added this to the v1 milestone May 5, 2026
@mchmarny mchmarny merged commit 2beb2a8 into NVIDIA:main May 5, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants