feat(specs): add custom runner image specification#1563
Conversation
✅ Deploy Preview for cheerful-kitten-f556a0 canceled.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDefines a stable custom runner image contract: required AG‑UI HTTP endpoints and response formats, Python/runtime and filesystem invariants, entrypoint/signal/non‑root constraints, ProjectSettings ChangesRunner image contract & ProjectSettings override
🚥 Pre-merge checks | ✅ 8✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
specs/agents/runner-image.spec.md (2)
93-93: 💤 Low valueClarify path description to avoid confusion.
The phrase "MUST contain installed
ambient_runnerpackage" could be misread to mean the pip package must be installed at/app/ambient-runner, when it actually means this directory contains the application code (main.py) that imports the package installed elsewhere in site-packages.📝 Clearer phrasing
-| `/app/ambient-runner` | Runner package source and working directory | MUST contain installed `ambient_runner` package | +| `/app/ambient-runner` | Runner application root and working directory | MUST contain main.py and application code; requires `ambient_runner` package installed via pip |🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@specs/agents/runner-image.spec.md` at line 93, The spec line for `/app/ambient-runner` is ambiguous about where the pip-installed ambient_runner resides; update the wording so it clearly states that `/app/ambient-runner` contains the application source (e.g., main.py) which imports the `ambient_runner` package installed in site-packages, not that the pip package itself is installed at that path; reference the `/app/ambient-runner` directory, the application entrypoint `main.py`, and the `ambient_runner` package in the revised sentence to make this distinction explicit.
461-461: ⚡ Quick winConsider blocking contract version mismatches by default.
The spec makes version checking advisory-only (CP logs warning but creates pod anyway). However, if a custom image uses contract v2 with breaking changes and the CP expects v1, the session will fail unpredictably at runtime rather than being rejected upfront.
💡 Alternative design
Make blocking the default with operator opt-in for mismatches:
-The CP MAY read this label at pod creation time and log a warning if the contract version does not match the expected version. This is advisory — the CP SHALL NOT block pod creation based on contract version mismatch. +The CP SHALL read this label at pod creation time. If the contract version does not match the expected version, the CP SHALL transition the session to `Failed` with a condition describing the mismatch UNLESS the operator has set `ALLOW_CONTRACT_VERSION_MISMATCH=true`.This preserves flexibility for operators who explicitly opt in while preventing accidental incompatibilities.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@specs/agents/runner-image.spec.md` at line 461, Update the sentence about contract-version handling so the Control Plane (CP) SHALL by default reject pod creation on a contract version mismatch instead of merely warning; add a clear operator-configurable override (e.g., an "allowContractMismatch" opt-in flag) that, when enabled, permits the previous advisory behavior and logs a warning; ensure the wording references the "contract version" label and the CP's behavior at "pod creation" so readers can locate and implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 274-276: Document that ProjectSettings.runner_image can override
image but not agent-type config (RUNNER_TYPE, resource limits, state dir) and
add a Failure Modes entry describing the cryptic Python import error when a
custom image lacks the required bridge implementation (e.g., ClaudeBridge,
GeminiCLIBridge, LangGraphBridge) for the session's runner type; update the
recommendations to advise building custom images FROM the standard base to
inherit all bridges and add a runtime validation step in the session creation
flow (where ProjectSettings.runner_image is applied) that inspects the image or
performs a quick probe to confirm the presence of the required bridge for the
requested RUNNER_TYPE and surface a clear, actionable error if missing.
---
Nitpick comments:
In `@specs/agents/runner-image.spec.md`:
- Line 93: The spec line for `/app/ambient-runner` is ambiguous about where the
pip-installed ambient_runner resides; update the wording so it clearly states
that `/app/ambient-runner` contains the application source (e.g., main.py) which
imports the `ambient_runner` package installed in site-packages, not that the
pip package itself is installed at that path; reference the
`/app/ambient-runner` directory, the application entrypoint `main.py`, and the
`ambient_runner` package in the revised sentence to make this distinction
explicit.
- Line 461: Update the sentence about contract-version handling so the Control
Plane (CP) SHALL by default reject pod creation on a contract version mismatch
instead of merely warning; add a clear operator-configurable override (e.g., an
"allowContractMismatch" opt-in flag) that, when enabled, permits the previous
advisory behavior and logs a warning; ensure the wording references the
"contract version" label and the CP's behavior at "pod creation" so readers can
locate and implement the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 34cd54f5-c174-486c-a499-0113c9af9cf5
📒 Files selected for processing (1)
specs/agents/runner-image.spec.md
b49e8eb to
2308ab4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 154-164: The spec enforces a contradictory UID requirement: it
mandates a fixed UID 1001 via Dockerfile `USER 1001` while also recommending
OpenShift arbitrary-UID compatibility (e.g. `chmod -R g=u`), which conflicts
under restrictive SCCs; change the normative contract to require non-root
runtime behavior (`runAsNonRoot: true`, `allowPrivilegeEscalation: false`,
`drop: ["ALL"]` and no root at runtime) and demote `UID 1001`/`Dockerfile USER
1001` to a base-image default or recommendation, keeping the OpenShift
compatibility guidance (`chmod -R g=u` on writable paths) as a SHOULD rather
than a SHALL so implementations can satisfy `runAsNonRoot` without a fixed UID.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7fd1e84c-8a85-4262-a291-b54a4719e4c9
📒 Files selected for processing (1)
specs/agents/runner-image.spec.md
903e3e4 to
ac61fed
Compare
| - GIVEN a project with `runner_image` unset | ||
| - AND a session with a specific runner type | ||
| - WHEN the CP provisions the pod | ||
| - THEN the pod uses the image from the agent registry entry for that runner type |
There was a problem hiding this comment.
Meaning there can be only one base image for each runner type.
|
|
||
| The CP SHALL support an operator-level allowlist of permitted registries via `RUNNER_IMAGE_ALLOWED_REGISTRIES` (comma-separated hostnames). When set, images from unlisted registries SHALL be rejected and the session SHALL transition to `Failed` with a descriptive condition. | ||
|
|
||
| When the allowlist is unset, the CP SHALL accept any registry. Operators SHOULD configure the allowlist in production deployments. |
There was a problem hiding this comment.
Should we set a default?
| | Runner token authentication (AG-UI, CP, gRPC) | `ambient_runner` package | Preserve the package | | ||
| | NetworkPolicy (ingress and egress) | Cluster operator | None — pod inherits namespace policies | | ||
|
|
||
| Custom images MUST NOT bundle credentials, tokens, or secrets in the image layers. All credentials SHALL be fetched at runtime via cluster-internal API endpoints as defined in `../security/security.spec.md`. |
There was a problem hiding this comment.
Security scans like this should be included in the test suite. Prodsec also has skills.
|
A few comments added. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 334-347: Add a conformance test that verifies the runner bridge
for each supported RUNNER_TYPE is present and loadable in custom images: for
each RUNNER_TYPE, start a session using a custom image lacking that bridge and
assert the pod logs contain an error naming the missing bridge module, the pod
exits non-zero, and the control plane marks the session Failed; update the
conformance checklist/spec sections referencing RUNNER_TYPE and the "Bridge
Mismatch" requirement so the suite explicitly covers bridge
availability/loadability and log assertion for missing modules.
- Line 86: Update the requirement that "/app MUST exist; writeable by UID 1001"
to be compatible with arbitrary non-root UIDs: replace the hardcoded "UID 1001"
reference with wording that /app must be writeable by the container's effective
non-root UID (for example "writeable by the container's effective non-root UID
or a group writable location when running under an arbitrary UID such as
OpenShift SCC"), and apply the same change to the related statements referenced
around lines 164-167 so the spec consistently allows arbitrary non-root UIDs
instead of a fixed UID.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 34201e0f-462a-4b2b-9b12-a2cc0c0e66ed
📒 Files selected for processing (1)
specs/agents/runner-image.spec.md
Amber Code Review — PR #1563PR: Custom Runner Image Specification ( SummaryThis PR adds a well-researched specification for custom runner image support. The stable contract sections (HTTP endpoints, filesystem layout, environment variables, SecurityContext) are thorough and security-aware. The spec is ready for structural review — the findings below are about gaps that need to be addressed before implementation begins, not errors in the spec's logic. FindingsBlockerNone. Critical1. No feature flag gating mentioned — CLAUDE.md mandates: "Feature flags strongly recommended: Gate new features behind Unleash flags." This is a new, admin-facing platform capability. The spec defines behavior for ProjectSettings, pod provisioning, and RBAC — but says nothing about gating this behind a flag. Without an explicit gate, implementation PRs may ship this directly to production without a rollout control mechanism. Standard violated: CLAUDE.md § Feature flags strongly recommended. 2. Missing full-stack consumer enumeration — CLAUDE.md states: "Auth/credential/API changes must update ALL consumers (backend, CLI, SDK, runner, sidecar) in the same PR." The spec adds Standard violated: CLAUDE.md § Full-stack awareness. Major3. Broken relative path for security spec — line 8 The actual file is at Suggested fix: 4. No API surface defined for the new ProjectSettings fields The RBAC requirement states users need Suggested fix: Add a section (or a cross-reference to the 5. Conformance test suite ownership is unspecified — "Base Image Publishing" section The spec says "The platform SHALL publish a conformance test suite" but doesn't define: who owns it, where it lives in the repo, when it ships relative to the feature, or what CI workflow executes it. This will be silently dropped during implementation. Suggested fix: Add: Minor6. Image pull policy table doesn't address cost/predictability tradeoff The table sets Suggested fix: Add a note after the table: "Workspace admins SHOULD pin production images by digest to guarantee reproducibility and avoid unnecessary registry round-trips on pod start." 7. The spec covers: Secret doesn't exist → Suggested fix: Add: "The CP SHALL only look up Positive Highlights
Recommendations (Prioritized)
Review by Amber — Ambient Code automated review agent |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 178-323: The spec adds ProjectSettings.runner_image and
runner_image_pull_secret without gating; add a new Requirement that these fields
are gated by the Unleash flag custom-runner-image: when disabled, PATCH/PUT
attempts to set runner_image or runner_image_pull_secret must be rejected with a
descriptive error (403/validation), and session provisioning logic (where image
selection occurs) must ignore ProjectSettings.runner_image and fall back to
normal precedence; reference ProjectSettings.runner_image,
ProjectSettings.runner_image_pull_secret, the session creation/path that selects
runner image, and the flag name custom-runner-image in the new paragraph.
- Around line 13-21: Add an "Implementation Impact" section to the spec that
enumerates all components that must change for ProjectSettings runner image
support and point to relevant files: list Backend (ProjectSettings handler
including validation, RBAC, registry allowlist check — reference
internal/api/project_settings.go), Operator (image selection precedence and pod
spec generation — reference pkg/operator/session_controller.go), CLI (acpctl
commands to set/get runner_image and runner_image_pull_secret), SDKs (client
library updates for new ProjectSettings fields), and Frontend (UI for workspace
admins to configure custom images); for each bullet include one-line notes on
what to update (e.g., validation rules, RBAC checks, precedence logic, new CLI
flags/commands, SDK model fields, UI form and permissions).
- Around line 454-467: Under the "Requirement: Conformance Test Suite" section
add explicit metadata: declare a repository location (e.g.,
scripts/runner-conformance/ or tests/conformance/runner/) and the exact
command/Make target to run the suite (e.g., make test-conformance
IMAGE=<custom-image>), name the owning team (e.g., platform-agents) responsible
for maintenance and updates, describe the CI integration (e.g., a GitHub Actions
workflow that runs on base-image builds and pull requests touching runner
images), and specify the shipping milestone (e.g., "must ship with the first
base image public release"); place this text directly beneath the "Requirement:
Conformance Test Suite" heading so the spec explicitly covers location,
ownership, CI workflow, and delivery timing.
- Around line 180-184: The spec adds ProjectSettings fields `runner_image` and
`runner_image_pull_secret` but omits the API surface; update the document by
either adding a cross-reference to the ProjectSettings API in
control-plane.spec.md or ambient-model.spec.md or by embedding an explicit API
contract: declare the endpoint (e.g., PATCH /api/projects/{project}/settings),
the JSON payload keys (`runner_image`, `runner_image_pull_secret`), validation
rules (400 for invalid image/ref syntax, 403 for disallowed registry, 404 for
missing secret), and expected responses (200 on success, 400/403/404 as above),
and ensure the ProjectSettings resource definition includes these two string
fields so implementers can locate `runner_image` and `runner_image_pull_secret`.
- Line 9: Update the incorrect relative link string in runner-image.spec.md:
replace the reference `../security/security.spec.md` with the correct path
`../standards/security/security.spec.md` so the spec in specs/agents points to
specs/standards/security/security.spec.md.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2def2899-7590-4cdd-ab8d-01a160bcd98a
📒 Files selected for processing (1)
specs/agents/runner-image.spec.md
| - `runner.spec.md` — Runner runtime, AG-UI protocol, bridge layer | ||
| - `../control-plane/control-plane.spec.md` — Pod provisioning, image selection, env var injection | ||
| - `../api/ambient-model.spec.md` — ProjectSettings, Session data model | ||
| - `../security/security.spec.md` — Per-session SA isolation, credential boundaries |
There was a problem hiding this comment.
Fix the relative path to the security spec.
The path ../security/security.spec.md is incorrect. From specs/agents/, the correct relative path to specs/standards/security/security.spec.md is ../standards/security/security.spec.md.
📝 Proposed fix
- - `../security/security.spec.md` — Per-session SA isolation, credential boundaries
+ - `../standards/security/security.spec.md` — Per-session SA isolation, credential boundaries📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - `../security/security.spec.md` — Per-session SA isolation, credential boundaries | |
| - `../standards/security/security.spec.md` — Per-session SA isolation, credential boundaries |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/agents/runner-image.spec.md` at line 9, Update the incorrect relative
link string in runner-image.spec.md: replace the reference
`../security/security.spec.md` with the correct path
`../standards/security/security.spec.md` so the spec in specs/agents points to
specs/standards/security/security.spec.md.
| ## Purpose | ||
|
|
||
| The Ambient Runner ships a single image containing Python, git, Node.js, Go, and several CLI tools. Workspace admins who need additional tools — Terraform, kubectl, language-specific SDKs, internal CLIs — have no supported extension path short of forking the image. | ||
|
|
||
| This spec defines a **stable runner contract** (the set of filesystem paths, HTTP endpoints, environment variables, and security constraints that custom images must preserve), a **Dockerfile FROM extension model** (users layer tools onto a published base image), and a **ProjectSettings-driven image override** (workspace admins declare a custom image per project). | ||
|
|
||
| The extension model is Dockerfile FROM only. Init hooks (scripts run at pod startup) were rejected: they are non-reproducible across pods, add startup latency, require runtime network egress that conflicts with NetworkPolicy isolation, and create OpenShift SCC conflicts when installing system packages. | ||
|
|
||
| This spec covers only the **image boundary** — what must be true about a container image for the platform to run it as a runner. Runner internals (bridge layer, gRPC transport, credential management) are defined in `runner.spec.md`. Pod provisioning mechanics are defined in `control-plane.spec.md`. |
There was a problem hiding this comment.
Add an Implementation Impact section listing all affected components.
The spec adds ProjectSettings fields but does not enumerate which components must be modified. Per reviewer feedback, this blocks implementation planning.
Add a section (e.g., after Purpose or before Stable Runner Contract) listing:
- Backend: ProjectSettings handler (validation, RBAC, registry allowlist check)
- Operator: Image selection precedence logic, pod spec generation
- CLI (acpctl): Commands to set/get
runner_imageandrunner_image_pull_secret - SDKs: Client library updates for new ProjectSettings fields
- Frontend: UI for workspace admins to configure custom images
Include file-level pointers where applicable (e.g., pkg/operator/session_controller.go, internal/api/project_settings.go).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/agents/runner-image.spec.md` around lines 13 - 21, Add an
"Implementation Impact" section to the spec that enumerates all components that
must change for ProjectSettings runner image support and point to relevant
files: list Backend (ProjectSettings handler including validation, RBAC,
registry allowlist check — reference internal/api/project_settings.go), Operator
(image selection precedence and pod spec generation — reference
pkg/operator/session_controller.go), CLI (acpctl commands to set/get
runner_image and runner_image_pull_secret), SDKs (client library updates for new
ProjectSettings fields), and Frontend (UI for workspace admins to configure
custom images); for each bullet include one-line notes on what to update (e.g.,
validation rules, RBAC checks, precedence logic, new CLI flags/commands, SDK
model fields, UI form and permissions).
| ### Requirement: Conformance Test Suite | ||
|
|
||
| The platform SHALL publish a conformance test suite that validates a custom runner image against the stable contract. The test suite SHALL verify: | ||
|
|
||
| - AG-UI endpoints respond correctly (`/health`, `/capabilities`, `/`) | ||
| - Required filesystem paths exist and are writeable | ||
| - The runner process starts within the expected timeout | ||
| - The runner runs as a non-root user | ||
| - CP-injected environment variables are not overridden by the image | ||
| - Required bridge implementations are present and loadable for each supported `RUNNER_TYPE` | ||
|
|
||
| The test suite SHALL produce a pass/fail result suitable for CI/CD integration. | ||
|
|
||
| The test suite SHOULD include security checks: non-root user verification, no SUID binaries, and base image provenance validation. Operators MAY extend the suite with additional security scanning (vulnerability scanning, SBOM generation) using their existing tooling. |
There was a problem hiding this comment.
Specify conformance test suite location, ownership, and CI integration.
The spec requires a conformance test suite but does not define where it lives, who owns it, how it's invoked, or when it ships.
Add details for:
- Location: Repository path (e.g.,
scripts/runner-conformance/ortests/conformance/runner/) - Ownership: Team responsible for maintenance
- CI workflow: How the suite is run (GitHub Actions workflow, manual script)
- Shipping milestone: When the suite must be available (e.g., "ships with first base image release")
📋 Example addition
The conformance test suite SHALL be published at `scripts/runner-conformance/` and maintained by the platform-agents team. The suite SHALL be runnable via `make test-conformance IMAGE=<custom-image>` and integrated into CI for base image builds. The suite MUST ship before the first public release of the base image.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/agents/runner-image.spec.md` around lines 454 - 467, Under the
"Requirement: Conformance Test Suite" section add explicit metadata: declare a
repository location (e.g., scripts/runner-conformance/ or
tests/conformance/runner/) and the exact command/Make target to run the suite
(e.g., make test-conformance IMAGE=<custom-image>), name the owning team (e.g.,
platform-agents) responsible for maintenance and updates, describe the CI
integration (e.g., a GitHub Actions workflow that runs on base-image builds and
pull requests touching runner images), and specify the shipping milestone (e.g.,
"must ship with the first base image public release"); place this text directly
beneath the "Requirement: Conformance Test Suite" heading so the spec explicitly
covers location, ownership, CI workflow, and delivery timing.
There was a problem hiding this comment.
This is implementation detail specific, not really for the spec
|
CodeRabbit chat interactions are restricted to organization members for this repository. Ask an organization member to interact with CodeRabbit, or set |
Define the stable runner contract and a ProjectSettings-driven image override so workspace admins can layer tools onto the base runner via Dockerfile FROM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address PR feedback: - Add venv guidance to Python Runtime Contract - Add agent registry inline explanation - Clarify runner type / custom image relationship - Add network hardening forward-looking note - Add Conformance Test Suite requirement with security checks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4e975b0 to
dbe8f9f
Compare
Summary
specs/agents/runner-image.spec.mddefining the stable runner contract and a workspace-level custom image overrideDockerfile FROMon a published base image — no init hooksrunner_imageandrunner_image_pull_secretfields on ProjectSettings let workspace admins configure a custom runner per projectDetails
The spec establishes the boundary between "what the platform guarantees" and "what custom images can change." Key design decisions:
Test plan
runner.spec.mdandcontrol-plane.spec.md🤖 Generated with Claude Code
Summary by CodeRabbit