Skip to content

feat(specs): add custom runner image specification#1563

Open
jbpratt wants to merge 7 commits into
ambient-code:mainfrom
jbpratt:spec/custom-runner-image
Open

feat(specs): add custom runner image specification#1563
jbpratt wants to merge 7 commits into
ambient-code:mainfrom
jbpratt:spec/custom-runner-image

Conversation

@jbpratt
Copy link
Copy Markdown
Collaborator

@jbpratt jbpratt commented May 12, 2026

Summary

  • Adds specs/agents/runner-image.spec.md defining the stable runner contract and a workspace-level custom image override
  • Custom images are built via Dockerfile FROM on a published base image — no init hooks
  • New runner_image and runner_image_pull_secret fields on ProjectSettings let workspace admins configure a custom runner per project
  • Defines stable interfaces: AG-UI HTTP endpoints, filesystem layout, entrypoint contract, environment variables, security constraints, and Python runtime requirements
  • Includes image selection precedence (ProjectSettings > agent registry > operator default), registry allowlist validation, RBAC, and failure mode scenarios

Details

The spec establishes the boundary between "what the platform guarantees" and "what custom images can change." Key design decisions:

  • Dockerfile FROM only — init hooks rejected due to non-reproducibility, startup latency, network dependency, and OpenShift SCC conflicts
  • ProjectSettings, not Session — image trust is an admin concern; all sessions in a project use the same vetted image
  • Agent registry is orthogonal — custom image overrides the container image but preserves RUNNER_TYPE, resources, and sandbox config from the registry

Test plan

  • Review spec for completeness against runner.spec.md and control-plane.spec.md
  • Verify GIVEN/WHEN/THEN scenarios are testable
  • Confirm implementation touchpoints table is accurate

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive spec for custom runner images: required AG-UI HTTP endpoints/port, Python 3.12+ with preserved minor version, mandatory filesystem paths, and lifecycle/startup expectations.
    • Clarified runtime/security constraints: prefer no CMD/ENTRYPOINT overrides, non-root runtime (build-time root allowed), OpenShift arbitrary-UID guidance, forbidden overrides of CP-injected env vars, and graceful shutdown handling.
    • ProjectSettings support: per-project runner image/pull-secret, precedence rules, pull-policy logic, RBAC/update requirement, and note that changes affect new sessions only.
    • Added base image publishing requirements and a conformance test suite for endpoint, filesystem, env, and runtime checks.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 12, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit 6e3e36f
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a05f818b91bb10008c3b0ef

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Defines a stable custom runner image contract: required AG‑UI HTTP endpoints and response formats, Python/runtime and filesystem invariants, entrypoint/signal/non‑root constraints, ProjectSettings runner_image override (validation, secrets, pull policy, precedence), failure modes, security expectations, and base-image OCI contract labeling.

Changes

Runner image contract & ProjectSettings override

Layer / File(s) Summary
AG‑UI HTTP contract and required endpoints
specs/agents/runner-image.spec.md (lines 1–173)
Specifies AGUI_PORT (default 8001) and required endpoints: /, /interrupt, /health, /capabilities, /events/{thread_id}. Endpoint URLs and response formats are stable and must not be removed or changed; additional AG‑UI endpoints may be provided by ambient_runner.
Python, packaging, and filesystem invariants
specs/agents/runner-image.spec.md (lines 61–167)
Requires Python ≥3.12; runner must use the base image's Python major.minor and include the ambient_runner package. Mandates preserved paths: /workspace, /app, /app/ambient-runner, /app/vertex, /tmp.
Entrypoint, startup, and signal handling
specs/agents/runner-image.spec.md (lines 61–167)
Describes CMD/ENTRYPOINT expectations (prefer not overriding; wrappers must exec the runner), runner must listen on AGUI_PORT, handle SIGTERM for graceful shutdown (runner as PID 1 or direct child), and start within pod startup timeout.
Control‑plane environment variables and runtime UID/security rules
specs/agents/runner-image.spec.md (lines 61–167)
Enumerates CP-injected env vars that custom images must not override (e.g., SESSION_ID, PROJECT_NAME, WORKSPACE_PATH, AGUI_PORT, backend/grpc/token endpoints, INITIAL_PROMPT, IS_RESUME, CREDENTIAL_IDS, RUNNER_TYPE). Requires non‑root runtime (UID 1001), allowPrivilegeEscalation: false, and dropped capabilities; root allowed only during build.
ProjectSettings fields, precedence, and scope
specs/agents/runner-image.spec.md (lines 178–318)
Adds runner_image and runner_image_pull_secret to ProjectSettings. Selection precedence: ProjectSettings > agent registry image > operator RUNNER_IMAGE. Project-level overrides affect only newly created sessions and do not change agent-type-specific settings.
Image reference validation, allowlist, and pull credentials
specs/agents/runner-image.spec.md (lines 178–318)
Defines image reference syntax/host validation, optional registry allowlist via RUNNER_IMAGE_ALLOWED_REGISTRIES, and imagePullSecrets usage limited to the project namespace with kubernetes.io/dockerconfigjson Secret type. ImagePullPolicy: IfNotPresent for digests and localhost/ refs; Always for other tags.
RBAC and operational constraints
specs/agents/runner-image.spec.md (lines 178–318)
Requires project_settings:update permission to modify runner_image or runner_image_pull_secret. Changes apply only to subsequently created sessions.
Failure modes and session state transitions
specs/agents/runner-image.spec.md (lines 321–360)
Documents outcomes for failures: AG‑UI health/readiness timeouts → session Failed; startup crashes or missing bridge for declared RUNNER_TYPE → session error/Failed; image pull failures → pull error/backoff and Failed as applicable.
Security boundary and isolation expectations
specs/agents/runner-image.spec.md (lines 362–417)
Specifies platform responsibilities: enforce non‑root PodSecurityContext, per‑session ServiceAccount isolation, NetworkPolicy inheritance, and that credentials/tokens must be fetched/cleared by ambient_runner; custom images must not bake credentials.
Base image publishing, OCI contract label, and conformance tests
specs/agents/runner-image.spec.md (lines 428–476)
Requires base images to publish a stable versioned base and include OCI label io.ambient-code.runner-contract-version="1". Label mismatches surface a pod-creation warning. Requires a conformance test suite verifying AG‑UI endpoints, filesystem and permission expectations, non‑root runtime, env var non‑overrides, and bridge availability.
🚥 Pre-merge checks | ✅ 8
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format and clearly describes the main change: a new specification for custom runner images.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed PR adds only documentation (specs/agents/runner-image.spec.md, 481 lines Markdown). No code, algorithms, or performance-critical patterns present. Check not applicable to specification-only changes.
Security And Secret Handling ✅ Passed Spec-only PR. Properly defines RBAC, credential isolation (no bundled secrets, per-turn clear), non-root execution, namespace-scoped Secrets. No hardcoded secrets or auth bypasses detected.
Kubernetes Resource Safety ✅ Passed Documentation spec only. Properly addresses pod security context, RBAC, and namespace-scoped secrets. Check not applicable to specification files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
specs/agents/runner-image.spec.md (2)

93-93: 💤 Low value

Clarify path description to avoid confusion.

The phrase "MUST contain installed ambient_runner package" could be misread to mean the pip package must be installed at /app/ambient-runner, when it actually means this directory contains the application code (main.py) that imports the package installed elsewhere in site-packages.

📝 Clearer phrasing
-| `/app/ambient-runner` | Runner package source and working directory | MUST contain installed `ambient_runner` package |
+| `/app/ambient-runner` | Runner application root and working directory | MUST contain main.py and application code; requires `ambient_runner` package installed via pip |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/agents/runner-image.spec.md` at line 93, The spec line for
`/app/ambient-runner` is ambiguous about where the pip-installed ambient_runner
resides; update the wording so it clearly states that `/app/ambient-runner`
contains the application source (e.g., main.py) which imports the
`ambient_runner` package installed in site-packages, not that the pip package
itself is installed at that path; reference the `/app/ambient-runner` directory,
the application entrypoint `main.py`, and the `ambient_runner` package in the
revised sentence to make this distinction explicit.

461-461: ⚡ Quick win

Consider blocking contract version mismatches by default.

The spec makes version checking advisory-only (CP logs warning but creates pod anyway). However, if a custom image uses contract v2 with breaking changes and the CP expects v1, the session will fail unpredictably at runtime rather than being rejected upfront.

💡 Alternative design

Make blocking the default with operator opt-in for mismatches:

-The CP MAY read this label at pod creation time and log a warning if the contract version does not match the expected version. This is advisory — the CP SHALL NOT block pod creation based on contract version mismatch.
+The CP SHALL read this label at pod creation time. If the contract version does not match the expected version, the CP SHALL transition the session to `Failed` with a condition describing the mismatch UNLESS the operator has set `ALLOW_CONTRACT_VERSION_MISMATCH=true`.

This preserves flexibility for operators who explicitly opt in while preventing accidental incompatibilities.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/agents/runner-image.spec.md` at line 461, Update the sentence about
contract-version handling so the Control Plane (CP) SHALL by default reject pod
creation on a contract version mismatch instead of merely warning; add a clear
operator-configurable override (e.g., an "allowContractMismatch" opt-in flag)
that, when enabled, permits the previous advisory behavior and logs a warning;
ensure the wording references the "contract version" label and the CP's behavior
at "pod creation" so readers can locate and implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 274-276: Document that ProjectSettings.runner_image can override
image but not agent-type config (RUNNER_TYPE, resource limits, state dir) and
add a Failure Modes entry describing the cryptic Python import error when a
custom image lacks the required bridge implementation (e.g., ClaudeBridge,
GeminiCLIBridge, LangGraphBridge) for the session's runner type; update the
recommendations to advise building custom images FROM the standard base to
inherit all bridges and add a runtime validation step in the session creation
flow (where ProjectSettings.runner_image is applied) that inspects the image or
performs a quick probe to confirm the presence of the required bridge for the
requested RUNNER_TYPE and surface a clear, actionable error if missing.

---

Nitpick comments:
In `@specs/agents/runner-image.spec.md`:
- Line 93: The spec line for `/app/ambient-runner` is ambiguous about where the
pip-installed ambient_runner resides; update the wording so it clearly states
that `/app/ambient-runner` contains the application source (e.g., main.py) which
imports the `ambient_runner` package installed in site-packages, not that the
pip package itself is installed at that path; reference the
`/app/ambient-runner` directory, the application entrypoint `main.py`, and the
`ambient_runner` package in the revised sentence to make this distinction
explicit.
- Line 461: Update the sentence about contract-version handling so the Control
Plane (CP) SHALL by default reject pod creation on a contract version mismatch
instead of merely warning; add a clear operator-configurable override (e.g., an
"allowContractMismatch" opt-in flag) that, when enabled, permits the previous
advisory behavior and logs a warning; ensure the wording references the
"contract version" label and the CP's behavior at "pod creation" so readers can
locate and implement the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 34cd54f5-c174-486c-a499-0113c9af9cf5

📥 Commits

Reviewing files that changed from the base of the PR and between 28874a9 and 0add287.

📒 Files selected for processing (1)
  • specs/agents/runner-image.spec.md

Comment thread specs/agents/runner-image.spec.md Outdated
@jbpratt jbpratt force-pushed the spec/custom-runner-image branch from b49e8eb to 2308ab4 Compare May 12, 2026 13:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 154-164: The spec enforces a contradictory UID requirement: it
mandates a fixed UID 1001 via Dockerfile `USER 1001` while also recommending
OpenShift arbitrary-UID compatibility (e.g. `chmod -R g=u`), which conflicts
under restrictive SCCs; change the normative contract to require non-root
runtime behavior (`runAsNonRoot: true`, `allowPrivilegeEscalation: false`,
`drop: ["ALL"]` and no root at runtime) and demote `UID 1001`/`Dockerfile USER
1001` to a base-image default or recommendation, keeping the OpenShift
compatibility guidance (`chmod -R g=u` on writable paths) as a SHOULD rather
than a SHALL so implementations can satisfy `runAsNonRoot` without a fixed UID.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7fd1e84c-8a85-4262-a291-b54a4719e4c9

📥 Commits

Reviewing files that changed from the base of the PR and between 0add287 and e576f72.

📒 Files selected for processing (1)
  • specs/agents/runner-image.spec.md

Comment thread specs/agents/runner-image.spec.md Outdated
@jbpratt jbpratt force-pushed the spec/custom-runner-image branch 2 times, most recently from 903e3e4 to ac61fed Compare May 13, 2026 08:02
Comment thread specs/agents/runner-image.spec.md
Comment thread specs/agents/runner-image.spec.md
Comment thread specs/agents/runner-image.spec.md Outdated
- GIVEN a project with `runner_image` unset
- AND a session with a specific runner type
- WHEN the CP provisions the pod
- THEN the pod uses the image from the agent registry entry for that runner type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning there can be only one base image for each runner type.


The CP SHALL support an operator-level allowlist of permitted registries via `RUNNER_IMAGE_ALLOWED_REGISTRIES` (comma-separated hostnames). When set, images from unlisted registries SHALL be rejected and the session SHALL transition to `Failed` with a descriptive condition.

When the allowlist is unset, the CP SHALL accept any registry. Operators SHOULD configure the allowlist in production deployments.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set a default?

Comment thread specs/agents/runner-image.spec.md
Comment thread specs/agents/runner-image.spec.md
| Runner token authentication (AG-UI, CP, gRPC) | `ambient_runner` package | Preserve the package |
| NetworkPolicy (ingress and egress) | Cluster operator | None — pod inherits namespace policies |

Custom images MUST NOT bundle credentials, tokens, or secrets in the image layers. All credentials SHALL be fetched at runtime via cluster-internal API endpoints as defined in `../security/security.spec.md`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security scans like this should be included in the test suite. Prodsec also has skills.

@jeremyeder
Copy link
Copy Markdown
Contributor

A few comments added.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 334-347: Add a conformance test that verifies the runner bridge
for each supported RUNNER_TYPE is present and loadable in custom images: for
each RUNNER_TYPE, start a session using a custom image lacking that bridge and
assert the pod logs contain an error naming the missing bridge module, the pod
exits non-zero, and the control plane marks the session Failed; update the
conformance checklist/spec sections referencing RUNNER_TYPE and the "Bridge
Mismatch" requirement so the suite explicitly covers bridge
availability/loadability and log assertion for missing modules.
- Line 86: Update the requirement that "/app MUST exist; writeable by UID 1001"
to be compatible with arbitrary non-root UIDs: replace the hardcoded "UID 1001"
reference with wording that /app must be writeable by the container's effective
non-root UID (for example "writeable by the container's effective non-root UID
or a group writable location when running under an arbitrary UID such as
OpenShift SCC"), and apply the same change to the related statements referenced
around lines 164-167 so the spec consistently allows arbitrary non-root UIDs
instead of a fixed UID.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 34201e0f-462a-4b2b-9b12-a2cc0c0e66ed

📥 Commits

Reviewing files that changed from the base of the PR and between 903e3e4 and 0a117ad.

📒 Files selected for processing (1)
  • specs/agents/runner-image.spec.md

Comment thread specs/agents/runner-image.spec.md Outdated
Comment thread specs/agents/runner-image.spec.md
@markturansky
Copy link
Copy Markdown
Contributor

Amber Code Review — PR #1563

PR: Custom Runner Image Specification (specs/agents/runner-image.spec.md)
Scope: Pure spec addition — no production code changed.


Summary

This PR adds a well-researched specification for custom runner image support. The stable contract sections (HTTP endpoints, filesystem layout, environment variables, SecurityContext) are thorough and security-aware. The spec is ready for structural review — the findings below are about gaps that need to be addressed before implementation begins, not errors in the spec's logic.


Findings

Blocker

None.


Critical

1. No feature flag gating mentioned — specs/agents/runner-image.spec.md

CLAUDE.md mandates: "Feature flags strongly recommended: Gate new features behind Unleash flags." This is a new, admin-facing platform capability. The spec defines behavior for ProjectSettings, pod provisioning, and RBAC — but says nothing about gating this behind a flag. Without an explicit gate, implementation PRs may ship this directly to production without a rollout control mechanism.

Standard violated: CLAUDE.md § Feature flags strongly recommended.
Suggested fix: Add a requirement section stating that runner_image support shall be gated behind an Unleash flag (e.g., custom-runner-image), with behavior when flag is disabled (fall through to existing image selection, ignore the field).


2. Missing full-stack consumer enumeration — specs/agents/runner-image.spec.md

CLAUDE.md states: "Auth/credential/API changes must update ALL consumers (backend, CLI, SDK, runner, sidecar) in the same PR." The spec adds runner_image and runner_image_pull_secret to ProjectSettings — but never enumerates which components need to change: backend handler, CLI (acpctl), ambient-sdk, and the frontend UI. Without this, implementation PRs will miss consumers.

Standard violated: CLAUDE.md § Full-stack awareness.
Suggested fix: Add an "Implementation Impact" section that explicitly lists every component that must be updated: backend ProjectSettings handler, operator pod provisioning logic, CLI acpctl project-settings command, Go + Python SDK types, and the frontend Settings page — referencing which files are the entry points.


Major

3. Broken relative path for security spec — line 8

  - `../security/security.spec.md` — Per-session SA isolation, credential boundaries

The actual file is at specs/standards/security/security.spec.md. Using ../security/security.spec.md resolves to specs/agents/../security/security.spec.md = specs/security/security.spec.md, which does not exist.

Suggested fix:

  - `../standards/security/security.spec.md` — Per-session SA isolation, credential boundaries

4. No API surface defined for the new ProjectSettings fields

The RBAC requirement states users need project_settings:update to modify runner_image, but the spec never defines whether this goes through an existing REST endpoint (PATCH /api/projects/{project}/settings) or a new one. The frontend and SDK cannot be implemented without knowing the API shape.

Suggested fix: Add a section (or a cross-reference to the ambient-model.spec.md) that explicitly shows the API payload shape for the new fields and which existing endpoint handles them.


5. Conformance test suite ownership is unspecified — "Base Image Publishing" section

The spec says "The platform SHALL publish a conformance test suite" but doesn't define: who owns it, where it lives in the repo, when it ships relative to the feature, or what CI workflow executes it. This will be silently dropped during implementation.

Suggested fix: Add: The conformance test suite SHALL live at scripts/runner-conformance/ and SHALL be invoked in CI via the existing component-benchmarks workflow or a dedicated workflow. The test suite SHALL ship in the same milestone as the base image.


Minor

6. Image pull policy table doesn't address cost/predictability tradeoff

The table sets Always for tag-based images (the common case). This means every pod start triggers a registry round-trip. The spec doesn't acknowledge this or recommend digest-pinning for production. Given the existing guidance in the Security Boundary section about building FROM digest, this inconsistency may cause surprise.

Suggested fix: Add a note after the table: "Workspace admins SHOULD pin production images by digest to guarantee reproducibility and avoid unnecessary registry round-trips on pod start."


7. runner_image_pull_secret validation doesn't cover wrong-namespace case

The spec covers: Secret doesn't exist → Failed. But it doesn't address: Secret exists in a different namespace. In multi-tenant clusters, a user could reference a secret name that resolves in a different project's namespace.

Suggested fix: Add: "The CP SHALL only look up runner_image_pull_secret in the project's own namespace. References that resolve to a Secret in a different namespace SHALL be rejected."


Positive Highlights

  • The stable runner contract is an excellent design artifact — explicit tables for HTTP endpoints, filesystem paths, and env vars give implementors and custom image authors a clear, unambiguous target.
  • Security boundary section cleanly separates what the image is responsible for vs. what the platform enforces externally. This prevents reimplementation anti-patterns.
  • Failure mode scenarios are detailed and cover the real failure paths (bridge mismatch, ImagePullBackOff, crash on start) that matter operationally.
  • Init hook rejection rationale is well-argued (non-reproducibility, startup latency, NetworkPolicy conflicts, OpenShift SCC).
  • The image selection precedence (ProjectSettings → agent registry → operator env var → hardcoded) is clear and extensible.

Recommendations (Prioritized)

  1. Add feature flag requirement — block implementation PRs from shipping uncontrolled.
  2. Add implementation impact section — backend, operator, CLI, SDK, frontend, with file-level pointers.
  3. Fix the broken security spec path — trivial, do now.
  4. Define the API shape — PATCH endpoint + payload schema for the new fields.
  5. Nail down conformance test suite ownership — file location, CI integration, milestone.
  6. Address the pull-secret namespace scoping gap.
  7. Add digest-pinning production recommendation to the pull policy table.

Review by Amber — Ambient Code automated review agent

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 178-323: The spec adds ProjectSettings.runner_image and
runner_image_pull_secret without gating; add a new Requirement that these fields
are gated by the Unleash flag custom-runner-image: when disabled, PATCH/PUT
attempts to set runner_image or runner_image_pull_secret must be rejected with a
descriptive error (403/validation), and session provisioning logic (where image
selection occurs) must ignore ProjectSettings.runner_image and fall back to
normal precedence; reference ProjectSettings.runner_image,
ProjectSettings.runner_image_pull_secret, the session creation/path that selects
runner image, and the flag name custom-runner-image in the new paragraph.
- Around line 13-21: Add an "Implementation Impact" section to the spec that
enumerates all components that must change for ProjectSettings runner image
support and point to relevant files: list Backend (ProjectSettings handler
including validation, RBAC, registry allowlist check — reference
internal/api/project_settings.go), Operator (image selection precedence and pod
spec generation — reference pkg/operator/session_controller.go), CLI (acpctl
commands to set/get runner_image and runner_image_pull_secret), SDKs (client
library updates for new ProjectSettings fields), and Frontend (UI for workspace
admins to configure custom images); for each bullet include one-line notes on
what to update (e.g., validation rules, RBAC checks, precedence logic, new CLI
flags/commands, SDK model fields, UI form and permissions).
- Around line 454-467: Under the "Requirement: Conformance Test Suite" section
add explicit metadata: declare a repository location (e.g.,
scripts/runner-conformance/ or tests/conformance/runner/) and the exact
command/Make target to run the suite (e.g., make test-conformance
IMAGE=<custom-image>), name the owning team (e.g., platform-agents) responsible
for maintenance and updates, describe the CI integration (e.g., a GitHub Actions
workflow that runs on base-image builds and pull requests touching runner
images), and specify the shipping milestone (e.g., "must ship with the first
base image public release"); place this text directly beneath the "Requirement:
Conformance Test Suite" heading so the spec explicitly covers location,
ownership, CI workflow, and delivery timing.
- Around line 180-184: The spec adds ProjectSettings fields `runner_image` and
`runner_image_pull_secret` but omits the API surface; update the document by
either adding a cross-reference to the ProjectSettings API in
control-plane.spec.md or ambient-model.spec.md or by embedding an explicit API
contract: declare the endpoint (e.g., PATCH /api/projects/{project}/settings),
the JSON payload keys (`runner_image`, `runner_image_pull_secret`), validation
rules (400 for invalid image/ref syntax, 403 for disallowed registry, 404 for
missing secret), and expected responses (200 on success, 400/403/404 as above),
and ensure the ProjectSettings resource definition includes these two string
fields so implementers can locate `runner_image` and `runner_image_pull_secret`.
- Line 9: Update the incorrect relative link string in runner-image.spec.md:
replace the reference `../security/security.spec.md` with the correct path
`../standards/security/security.spec.md` so the spec in specs/agents points to
specs/standards/security/security.spec.md.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2def2899-7590-4cdd-ab8d-01a160bcd98a

📥 Commits

Reviewing files that changed from the base of the PR and between 0a117ad and 22ec809.

📒 Files selected for processing (1)
  • specs/agents/runner-image.spec.md

- `runner.spec.md` — Runner runtime, AG-UI protocol, bridge layer
- `../control-plane/control-plane.spec.md` — Pod provisioning, image selection, env var injection
- `../api/ambient-model.spec.md` — ProjectSettings, Session data model
- `../security/security.spec.md` — Per-session SA isolation, credential boundaries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the relative path to the security spec.

The path ../security/security.spec.md is incorrect. From specs/agents/, the correct relative path to specs/standards/security/security.spec.md is ../standards/security/security.spec.md.

📝 Proposed fix
-  - `../security/security.spec.md` — Per-session SA isolation, credential boundaries
+  - `../standards/security/security.spec.md` — Per-session SA isolation, credential boundaries
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `../security/security.spec.md` — Per-session SA isolation, credential boundaries
- `../standards/security/security.spec.md` — Per-session SA isolation, credential boundaries
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/agents/runner-image.spec.md` at line 9, Update the incorrect relative
link string in runner-image.spec.md: replace the reference
`../security/security.spec.md` with the correct path
`../standards/security/security.spec.md` so the spec in specs/agents points to
specs/standards/security/security.spec.md.

Comment on lines +13 to +21
## Purpose

The Ambient Runner ships a single image containing Python, git, Node.js, Go, and several CLI tools. Workspace admins who need additional tools — Terraform, kubectl, language-specific SDKs, internal CLIs — have no supported extension path short of forking the image.

This spec defines a **stable runner contract** (the set of filesystem paths, HTTP endpoints, environment variables, and security constraints that custom images must preserve), a **Dockerfile FROM extension model** (users layer tools onto a published base image), and a **ProjectSettings-driven image override** (workspace admins declare a custom image per project).

The extension model is Dockerfile FROM only. Init hooks (scripts run at pod startup) were rejected: they are non-reproducible across pods, add startup latency, require runtime network egress that conflicts with NetworkPolicy isolation, and create OpenShift SCC conflicts when installing system packages.

This spec covers only the **image boundary** — what must be true about a container image for the platform to run it as a runner. Runner internals (bridge layer, gRPC transport, credential management) are defined in `runner.spec.md`. Pod provisioning mechanics are defined in `control-plane.spec.md`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add an Implementation Impact section listing all affected components.

The spec adds ProjectSettings fields but does not enumerate which components must be modified. Per reviewer feedback, this blocks implementation planning.

Add a section (e.g., after Purpose or before Stable Runner Contract) listing:

  • Backend: ProjectSettings handler (validation, RBAC, registry allowlist check)
  • Operator: Image selection precedence logic, pod spec generation
  • CLI (acpctl): Commands to set/get runner_image and runner_image_pull_secret
  • SDKs: Client library updates for new ProjectSettings fields
  • Frontend: UI for workspace admins to configure custom images

Include file-level pointers where applicable (e.g., pkg/operator/session_controller.go, internal/api/project_settings.go).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/agents/runner-image.spec.md` around lines 13 - 21, Add an
"Implementation Impact" section to the spec that enumerates all components that
must change for ProjectSettings runner image support and point to relevant
files: list Backend (ProjectSettings handler including validation, RBAC,
registry allowlist check — reference internal/api/project_settings.go), Operator
(image selection precedence and pod spec generation — reference
pkg/operator/session_controller.go), CLI (acpctl commands to set/get
runner_image and runner_image_pull_secret), SDKs (client library updates for new
ProjectSettings fields), and Frontend (UI for workspace admins to configure
custom images); for each bullet include one-line notes on what to update (e.g.,
validation rules, RBAC checks, precedence logic, new CLI flags/commands, SDK
model fields, UI form and permissions).

Comment thread specs/agents/runner-image.spec.md
Comment thread specs/agents/runner-image.spec.md
Comment on lines +454 to +467
### Requirement: Conformance Test Suite

The platform SHALL publish a conformance test suite that validates a custom runner image against the stable contract. The test suite SHALL verify:

- AG-UI endpoints respond correctly (`/health`, `/capabilities`, `/`)
- Required filesystem paths exist and are writeable
- The runner process starts within the expected timeout
- The runner runs as a non-root user
- CP-injected environment variables are not overridden by the image
- Required bridge implementations are present and loadable for each supported `RUNNER_TYPE`

The test suite SHALL produce a pass/fail result suitable for CI/CD integration.

The test suite SHOULD include security checks: non-root user verification, no SUID binaries, and base image provenance validation. Operators MAY extend the suite with additional security scanning (vulnerability scanning, SBOM generation) using their existing tooling.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Specify conformance test suite location, ownership, and CI integration.

The spec requires a conformance test suite but does not define where it lives, who owns it, how it's invoked, or when it ships.

Add details for:

  • Location: Repository path (e.g., scripts/runner-conformance/ or tests/conformance/runner/)
  • Ownership: Team responsible for maintenance
  • CI workflow: How the suite is run (GitHub Actions workflow, manual script)
  • Shipping milestone: When the suite must be available (e.g., "ships with first base image release")
📋 Example addition
The conformance test suite SHALL be published at `scripts/runner-conformance/` and maintained by the platform-agents team. The suite SHALL be runnable via `make test-conformance IMAGE=<custom-image>` and integrated into CI for base image builds. The suite MUST ship before the first public release of the base image.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/agents/runner-image.spec.md` around lines 454 - 467, Under the
"Requirement: Conformance Test Suite" section add explicit metadata: declare a
repository location (e.g., scripts/runner-conformance/ or
tests/conformance/runner/) and the exact command/Make target to run the suite
(e.g., make test-conformance IMAGE=<custom-image>), name the owning team (e.g.,
platform-agents) responsible for maintenance and updates, describe the CI
integration (e.g., a GitHub Actions workflow that runs on base-image builds and
pull requests touching runner images), and specify the shipping milestone (e.g.,
"must ship with the first base image public release"); place this text directly
beneath the "Requirement: Conformance Test Suite" heading so the spec explicitly
covers location, ownership, CI workflow, and delivery timing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implementation detail specific, not really for the spec

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

CodeRabbit chat interactions are restricted to organization members for this repository. Ask an organization member to interact with CodeRabbit, or set chat.allow_non_org_members: true in your configuration.

jbpratt and others added 4 commits May 13, 2026 19:15
Define the stable runner contract and a ProjectSettings-driven image
override so workspace admins can layer tools onto the base runner via
Dockerfile FROM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address PR feedback:
- Add venv guidance to Python Runtime Contract
- Add agent registry inline explanation
- Clarify runner type / custom image relationship
- Add network hardening forward-looking note
- Add Conformance Test Suite requirement with security checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbpratt jbpratt force-pushed the spec/custom-runner-image branch from 4e975b0 to dbe8f9f Compare May 14, 2026 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants