Skip to content

ci: reduce PR feedback loop with targeted caching#1294

Merged
jeremyeder merged 8 commits intoambient-code:mainfrom
jeremyeder:feature/ci-improvements
Apr 17, 2026
Merged

ci: reduce PR feedback loop with targeted caching#1294
jeremyeder merged 8 commits intoambient-code:mainfrom
jeremyeder:feature/ci-improvements

Conversation

@jeremyeder
Copy link
Copy Markdown
Contributor

@jeremyeder jeremyeder commented Apr 11, 2026

Summary

  • E2E Docker layer caching: Replace plain docker build with docker/build-push-action@v7 using GHA cache. Reads layers from the components-build-deploy workflow's cache scopes (frontend-amd64, etc.) so E2E gets warm layers from the last main build. Expected savings: 3-5 min.
  • E2E kind binary caching: Cache the kind binary between runs with actions/cache@v4. Pin version in env var for cache key stability. ~15s saved per run.
  • Lint golangci-lint consolidation: Replace two sequential golangci-lint passes (default + test tags) with a single pass using --build-tags=test (superset). ~30s saved.
  • Unit tests pipx: Replace uncached pip install junit2html with pipx run junit2html (pre-installed on GHA runners). ~10s saved.

Current PR wall-clock P50: ~10.4m (E2E bottleneck)
Expected PR wall-clock P50: ~5-7m

Test plan

  • Verify E2E workflow runs successfully with cached builds on a PR that changes component code
  • Verify E2E workflow still pulls :latest for unchanged components
  • Verify lint workflow passes with single golangci-lint pass
  • Verify unit-tests backend job generates HTML report via pipx
  • Confirm GHA cache scopes don't conflict with components-build-deploy

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores

    • CI workflows tightened: actions pinned to specific commits, E2E image handling changed to conditional build-or-pull steps with cache-aware builds, added a step to show built images, and introduced caching for the cluster binary to speed E2E runs
    • Linting consolidated into a single, more efficient pass
    • Unit-test reporting updated to avoid repeated installs and speed up report generation
  • Documentation

    • Added CI improvements plan and design specification detailing changes and validation steps

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

📝 Walkthrough

Walkthrough

Pins multiple GitHub Action references to commit SHAs; refactors E2E image preparation into per-image conditional BuildKit build vs pull-and-retag steps with cache scopes and a “show built images” step; adds actions/cache for the kind binary using a workflow-level KIND_VERSION; consolidates Go lint passes; switches JUnit HTML generation to pipx run junit2html.

Changes

Cohort / File(s) Summary
E2E workflow
.github/workflows/e2e.yml
Adds workflow-level env.KIND_VERSION and top-level permissions; pins multiple action uses to commit SHAs; replaces a single build/pull script with four conditional steps that either run docker/build-push-action@... with cache-from/cache-to when detect-changes outputs true, or docker pull :latest + retag to :e2e-test when unchanged; adds a “Show built images” step; caches kind via actions/cache@v4 and downloads kind only on cache miss using KIND_VERSION.
Lint workflow
.github/workflows/lint.yml
Pins action uses to commit SHAs; consolidates backend golangci-lint passes by merging default + --build-tags=test into a single invocation (--build-tags=test); updates other Go lint jobs to pinned golangci-lint action SHAs while preserving existing args.
Unit tests workflow
.github/workflows/unit-tests.yml
Pins several action uses and kubeflow/pipelines junit-summary to commit SHAs; replaces pip install junit2html + direct invocation with pipx run junit2html for HTML report generation while keeping existing report paths and conditional/continue-on-error behavior.
Docs: plan & design
docs/superpowers/plans/2026-04-11-ci-improvements.md, docs/superpowers/specs/2026-04-11-ci-improvements-design.md
Adds plan and design documents describing the E2E BuildKit/cache approach, kind binary caching, lint consolidation, unit-test reporting change, YAML validation steps, risk/mitigation notes, and expected PR feedback improvements.
Docs lint workflow
.github/workflows/docs-lint.yml
Pins actions/checkout and actions/setup-node to commit SHAs and upgrades Vale binary from v3.12.1 to v3.14.1; keeps Node version and npm cache config unchanged.
sequenceDiagram
    actor Workflow
    participant Detect as "detect-changes"
    participant Builder as "docker/build-push-action"
    participant Registry as "Container Registry"
    participant Cache as "GHA cache (layers)"
    participant KindCache as "actions/cache (kind)"
    participant Runner as "Job runner"

    Workflow->>Detect: run per-image detection
    Detect-->>Workflow: outputs (changed / unchanged)
    alt changed
        Workflow->>Builder: build image (cache-from/cache-to)
        Builder->>Cache: pull/push layer caches
        Builder->>Registry: push :e2e-test
    else unchanged
        Workflow->>Registry: docker pull :latest
        Registry-->>Workflow: image
        Workflow->>Registry: docker tag -> :e2e-test
    end
    Workflow->>Runner: show built images

    %% kind binary flow
    Workflow->>KindCache: restore key(KIND_VERSION)
    alt cache hit
        KindCache-->>Workflow: kind present
    else cache miss
        Workflow->>Registry: download kind (using KIND_VERSION)
        Workflow->>KindCache: save kind to cache
    end
Loading
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (ci: scope) and accurately describes the main objective: targeted caching improvements to reduce PR feedback latency.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Performance And Algorithmic Complexity ✅ Passed PR modifies only GitHub Actions workflow YAML and documentation files (6 files, 0 application code files). No performance regressions possible.
Security And Secret Handling ✅ Passed PR contains no hardcoded secrets, tokens, plaintext credentials, or sensitive data exposure. All GitHub Actions secret references use proper context syntax and token masking is correctly implemented.
Kubernetes Resource Safety ✅ Passed PR modifies only GitHub Actions workflows and documentation; Kubernetes Resource Safety check targets manifest-level issues not applicable to these file types.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Comment @coderabbitai help to get the list of available commands and usage tips.

@jeremyeder jeremyeder enabled auto-merge (squash) April 11, 2026 05:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/lint.yml (1)

144-149: ⚠️ Potential issue | 🟠 Major

Backend lint skips production-only files with --build-tags=test.

components/backend/handlers/k8s_clients_for_request_prod.go has //go:build !test and won't be linted by the current invocation. Add a second pass without build-tag restriction to catch production-only code.

Proposed fix
      - name: Run golangci-lint (all build tags)
        uses: golangci/golangci-lint-action@v9
        with:
          version: latest
          working-directory: components/backend
          args: --timeout=5m --build-tags=test
+      - name: Run golangci-lint (default build tags)
+        uses: golangci/golangci-lint-action@v9
+        with:
+          version: latest
+          working-directory: components/backend
+          args: --timeout=5m
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/lint.yml around lines 144 - 149, CI currently runs
golangci-lint only with --build-tags=test, which skips production-only files
like components/backend/handlers/k8s_clients_for_request_prod.go (//go:build
!test); add a second golangci-lint step that also runs in the same
working-directory (components/backend) but without --build-tags (or with an
explicit empty/omitted build-tags arg) so production-only files are linted too,
keeping the existing step for test-tagged checks; target the existing action
block name "Run golangci-lint (all build tags)" or add a new step named e.g.
"Run golangci-lint (no build tags)" to make this explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e.yml:
- Around line 12-17: Add an explicit permissions block at the top-level of this
workflow (near the existing env/KIND_VERSION and concurrency keys) that scopes
GITHUB_TOKEN to only the least-privilege scopes required for the job (do NOT
rely on default write); for example set only the specific permissions you need
such as contents: read, actions: read, id-token: write, or packages: write as
appropriate for your cache/action usage, and remove any broader write defaults.
Also pin every third-party action used in the workflow to an exact commit SHA
(instead of floating tags) so the run is hermetic and auditable.
- Around line 95-96: Replace all mutable action tags with immutable commit SHAs
for each action usage (e.g., replace docker/build-push-action@v7 and the other
`@v6/`@v4/@v3 occurrences with their corresponding full commit SHAs) so every
"uses:" entry is pinned; update the 12 action instances referenced (including
docker/build-push-action and the other actions in the file) to their exact SHA
values. Also add a top-level permissions: block to the workflow that scopes the
GITHUB_TOKEN to the minimal required permissions for this workflow (declare only
the specific permission keys needed, e.g., read/write for specific resources
used), ensuring the token is not granted broad defaults. Ensure changes touch
the workflow root (top-level) and every "uses:" line that currently has a
version tag.

---

Outside diff comments:
In @.github/workflows/lint.yml:
- Around line 144-149: CI currently runs golangci-lint only with
--build-tags=test, which skips production-only files like
components/backend/handlers/k8s_clients_for_request_prod.go (//go:build !test);
add a second golangci-lint step that also runs in the same working-directory
(components/backend) but without --build-tags (or with an explicit empty/omitted
build-tags arg) so production-only files are linted too, keeping the existing
step for test-tagged checks; target the existing action block name "Run
golangci-lint (all build tags)" or add a new step named e.g. "Run golangci-lint
(no build tags)" to make this explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a3e90040-f7a9-491d-bf5b-3d9c35e7ab58

📥 Commits

Reviewing files that changed from the base of the PR and between 8a2310a and bcdfbc7.

📒 Files selected for processing (5)
  • .github/workflows/e2e.yml
  • .github/workflows/lint.yml
  • .github/workflows/unit-tests.yml
  • docs/superpowers/plans/2026-04-11-ci-improvements.md
  • docs/superpowers/specs/2026-04-11-ci-improvements-design.md

Comment thread .github/workflows/e2e.yml
Comment thread .github/workflows/e2e.yml Outdated
Ambient Code Bot and others added 7 commits April 16, 2026 23:16
Targeted caching wins across all PR workflows plus E2E image reuse
via shared GHA BuildKit cache scopes. Goal: cut PR wall-clock from
~10.4m P50 to ~5-7m.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks: E2E Docker layer caching (shared scopes from components-build),
kind binary caching, golangci-lint consolidation, junit2html via pipx.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test build tag is a superset: linting with --build-tags=test covers
all production files plus test-tagged files. Reduces CI runtime by
eliminating redundant pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace monolithic shell script with individual docker/build-push-action
steps. Each component build now reads GHA cache from components-build-deploy
workflow (scope: {component}-amd64) and writes to e2e-specific scope
(scope: e2e-{component}) for future runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add workflow-level KIND_VERSION env var and use actions/cache@v4 to
store the kind binary in ~/k8s-tools/kind. Cache key includes OS and
version for invalidation on upgrades. Avoids redundant downloads on
every workflow run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pin all 13 action references in e2e.yml to commit SHAs
- Pin all action references in lint.yml and unit-tests.yml to commit SHAs
- Add explicit permissions block (contents: read, actions: write) to e2e.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jeremyeder jeremyeder force-pushed the feature/ci-improvements branch from bcdfbc7 to 7c9a714 Compare April 17, 2026 03:22
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 17, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit e12e42f
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/69e1a90ad74acc0008a9fac2

v3.12.1 release artifact is returning truncated gzip, breaking CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/docs-lint.yml:
- Around line 16-19: Add an explicit least-privilege permissions block for the
workflow (or specific job) so GITHUB_TOKEN is scoped; update the workflow
containing uses: actions/checkout and uses: actions/setup-node to include a
permissions: block with at minimum contents: read at the top-level of the
workflow or inside the job that runs these actions to ensure the token cannot
write by default.
- Line 27: The workflow step that runs "curl -sfL
https://github.com/errata-ai/vale/releases/download/v3.14.1/vale_3.14.1_Linux_64-bit.tar.gz
| tar xz -C /usr/local/bin vale" lacks integrity checks; change it to first
download both the archive and the corresponding checksum file
"vale_3.14.1_checksums.txt", compute/verify the SHA-256 checksum (e.g., with
sha256sum -c or by comparing sha256sum output) against the expected entry in the
checksum file, and only call tar xz -C /usr/local/bin vale after the checksum
verification succeeds, failing the job if the checksum does not match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1398b3cf-b8e9-40c6-926e-658687d8a2a6

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9a714 and e12e42f.

📒 Files selected for processing (1)
  • .github/workflows/docs-lint.yml

Comment on lines +16 to +19
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

- name: Setup Node.js
uses: actions/setup-node@v6
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit least-privilege permissions for this workflow.

This workflow does not scope GITHUB_TOKEN permissions explicitly. Add a minimal permissions block (contents: read) at workflow or job level.

Proposed minimal permission scope
 name: Docs Lint
 
 on:
   pull_request:
     paths:
       - 'docs/**'
+
+permissions:
+  contents: read
 
 jobs:
   lint:
     name: Lint Documentation
As per coding guidelines, "Pin action versions to SHA. Verify secrets are not exposed and permissions are scoped."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/docs-lint.yml around lines 16 - 19, Add an explicit
least-privilege permissions block for the workflow (or specific job) so
GITHUB_TOKEN is scoped; update the workflow containing uses: actions/checkout
and uses: actions/setup-node to include a permissions: block with at minimum
contents: read at the top-level of the workflow or inside the job that runs
these actions to ensure the token cannot write by default.

- name: Install Vale
run: |
curl -sfL https://github.com/errata-ai/vale/releases/download/v3.12.1/vale_3.12.1_Linux_64-bit.tar.gz | tar xz -C /usr/local/bin vale
curl -sfL https://github.com/errata-ai/vale/releases/download/v3.14.1/vale_3.14.1_Linux_64-bit.tar.gz | tar xz -C /usr/local/bin vale
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -type f -name "docs-lint.yml" .github/workflows/

Repository: ambient-code/platform

Length of output: 235


🏁 Script executed:

cat -n .github/workflows/docs-lint.yml | sed -n '20,35p'

Repository: ambient-code/platform

Length of output: 640


🌐 Web query:

Vale errata-ai releases checksum SHA verification GitHub

💡 Result:

Vale (from Errata AI) provides SHA-256 checksums for verifying downloaded release binaries on GitHub at https://github.com/errata-ai/vale/releases. Each release includes a vale_{version}_checksums.txt file listing SHA-256 hashes for all platform-specific archives (e.g., Linux, macOS, Windows tar.gz/zip files). To verify: 1. Download the desired release assets from a tag page, e.g., https://github.com/errata-ai/vale/releases/tag/v3.13.1 2. Download vale_3.13.1_checksums.txt 3. Run: sha256sum -c vale_3.13.1_checksums.txt (Linux/macOS) or equivalent on Windows (e.g., certutil -hashfile). Releases are GPG-signed commits (e.g., key ID B44A6E0381723797). Official docs recommend binary downloads but do not explicitly detail checksum verification steps; it follows standard GitHub practices. No separate PGP-signed binaries or Cosign noted.

Citations:


Add checksum verification for Vale binary.

curl | tar downloads and extracts without integrity validation. Vale releases include SHA-256 checksums at vale_3.14.1_checksums.txt—use them to verify the archive before extraction.

Recommended pattern
     - name: Install Vale
       run: |
-        curl -sfL https://github.com/errata-ai/vale/releases/download/v3.14.1/vale_3.14.1_Linux_64-bit.tar.gz | tar xz -C /usr/local/bin vale
+        VALE_VERSION="3.14.1"
+        VALE_TGZ="/tmp/vale_${VALE_VERSION}_Linux_64-bit.tar.gz"
+        VALE_URL="https://github.com/errata-ai/vale/releases/download/v${VALE_VERSION}/vale_${VALE_VERSION}_Linux_64-bit.tar.gz"
+        curl -fsSL -o "${VALE_TGZ}" "${VALE_URL}"
+        curl -fsSL "https://github.com/errata-ai/vale/releases/download/v${VALE_VERSION}/vale_${VALE_VERSION}_checksums.txt" | grep "Linux_64-bit.tar.gz" | sha256sum -c -
+        tar -xzf "${VALE_TGZ}" -C /usr/local/bin vale
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
curl -sfL https://github.com/errata-ai/vale/releases/download/v3.14.1/vale_3.14.1_Linux_64-bit.tar.gz | tar xz -C /usr/local/bin vale
VALE_VERSION="3.14.1"
VALE_TGZ="/tmp/vale_${VALE_VERSION}_Linux_64-bit.tar.gz"
VALE_URL="https://github.com/errata-ai/vale/releases/download/v${VALE_VERSION}/vale_${VALE_VERSION}_Linux_64-bit.tar.gz"
curl -fsSL -o "${VALE_TGZ}" "${VALE_URL}"
curl -fsSL "https://github.com/errata-ai/vale/releases/download/v${VALE_VERSION}/vale_${VALE_VERSION}_checksums.txt" | grep "Linux_64-bit.tar.gz" | sha256sum -c -
tar -xzf "${VALE_TGZ}" -C /usr/local/bin vale
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/docs-lint.yml at line 27, The workflow step that runs
"curl -sfL
https://github.com/errata-ai/vale/releases/download/v3.14.1/vale_3.14.1_Linux_64-bit.tar.gz
| tar xz -C /usr/local/bin vale" lacks integrity checks; change it to first
download both the archive and the corresponding checksum file
"vale_3.14.1_checksums.txt", compute/verify the SHA-256 checksum (e.g., with
sha256sum -c or by comparing sha256sum output) against the expected entry in the
checksum file, and only call tar xz -C /usr/local/bin vale after the checksum
verification succeeds, failing the job if the checksum does not match.

@jeremyeder jeremyeder disabled auto-merge April 17, 2026 04:14
@jeremyeder jeremyeder merged commit c9394ec into ambient-code:main Apr 17, 2026
27 checks passed
jeremyeder added a commit that referenced this pull request Apr 21, 2026
PR #1294 reintroduced context: components/runners, but PR #1260 had
already changed the Dockerfile to COPY . (expecting the context to be
the ambient-runner subdirectory). This caused pip3 install to fail with
"Neither setup.py nor pyproject.toml found".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jeremyeder added a commit that referenced this pull request Apr 21, 2026
PR #1294 reintroduced context: components/runners, but PR #1260 had
already changed the Dockerfile to COPY . (expecting the context to be
the ambient-runner subdirectory). This caused pip3 install to fail with
"Neither setup.py nor pyproject.toml found".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant