feat(runner,manifests): alpha migration PR 6+7 — runners and Kustomize overlays#1379
feat(runner,manifests): alpha migration PR 6+7 — runners and Kustomize overlays#1379mergify[bot] merged 6 commits intomainfrom
Conversation
✅ Deploy Preview for cheerful-kitten-f556a0 canceled.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces a bidirectional gRPC transport layer enabling persistent session-based message delivery between a Kubernetes-deployed ambient runner and a control-plane API server. It adds gRPC infrastructure (client, session/inbox message APIs), integrates gRPC into the Claude bridge via a listener and message writer, updates endpoints and middleware for event streaming, enhances credential handling with kubeconfig and CP token support, and provisions OpenShift deployment manifests. Changes
Sequence Diagram(s)sequenceDiagram
participant Pod as Ambient Runner Pod
participant Bridge as ClaudeBridge
participant CPServer as Control Plane
participant SSEClient as SSE Client
Note over Pod,CPServer: gRPC Initialization (Lifespan)
Pod->>Bridge: start_grpc_listener(grpc_url)
Bridge->>CPServer: Create AmbientGRPCClient
Bridge->>CPServer: Start WatchSessionMessages stream
Note over Bridge: Ready event set
Pod->>Pod: Pre-register SSE queue
Pod->>CPServer: Push INITIAL_PROMPT via gRPC
Note over Pod,CPServer: Session Message Delivery
CPServer->>Bridge: Stream SessionMessage (event_type=user)
Bridge->>Bridge: Parse RunnerInput from payload
Bridge->>Bridge: Invoke bridge.run() for turn
Note over Bridge,SSEClient: Event Fan-out
Bridge->>Bridge: Emit events from run()
Bridge->>Bridge: Enqueue to active_streams[thread_id]
Bridge->>CPServer: Push to session_messages (assistant)
SSEClient->>Pod: GET /events/{thread_id}
Pod->>SSEClient: Stream events (TEXT, TOOL_CALL, etc)
Note over Bridge,CPServer: Terminal Event & Cleanup
Bridge->>Bridge: Emit RUN_FINISHED
Bridge->>CPServer: Push status=completed
Bridge->>Bridge: Remove active_streams[thread_id]
SSEClient->>SSEClient: Stream ends
sequenceDiagram
participant Client as Client
participant App as Ambient Runner App
participant Auth as Auth/Crypto
participant CPServer as CP Token Server
Note over Client,CPServer: CP Token Fetch (On Startup)
App->>Auth: refresh_bot_token()
alt CP Config Present
Auth->>Auth: Read AMBIENT_CP_TOKEN_PUBLIC_KEY
Auth->>Auth: RSA-OAEP encrypt SESSION_ID
Auth->>CPServer: POST /token with Bearer header
CPServer->>CPServer: Validate request
CPServer->>Auth: Return {token: ...}
Auth->>Auth: Cache token (set_bot_token)
Auth->>App: Return refreshed token
else No CP Config
Auth->>Auth: Fall back to get_bot_token()
Auth->>App: Return cached/mounted token
end
Note over App,Client: Subsequent API Calls
App->>Auth: get_bot_token()
Auth->>Auth: Return cached CP token first
App->>CPServer: Request with bearer token
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
|
fe35a64 to
7f7227d
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
components/runners/ambient-runner/ag_ui_gemini_cli/types.py (1)
96-96:⚠️ Potential issue | 🟠 MajorAvoid logging raw event payload content on parse failures.
line[:120]can include user prompts or tokens from malformed JSON and leak sensitive data into logs. Log metadata only (for example, size/source/hash), not payload text.As per coding guidelines
components/runners/ambient-runner/**/*.py: Check subprocess handling, timeout management, and that secrets are not logged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ag_ui_gemini_cli/types.py` at line 96, The warning currently logs raw payload via logger.warning("Failed to parse JSONL line: %s", line[:120]) which may leak sensitive user data; change this to log only metadata about the failing input (e.g., its byte/char length, source identifier if available, and a non-reversible hash like SHA256) and remove any substring of the original payload. Locate the logger.warning call that references the variable line in types.py and replace the message to include computed metadata (length, source/name, sha256 or similar) while ensuring no payload text is included in the log.components/runners/ambient-runner/ambient_runner/app.py (1)
305-308:⚠️ Potential issue | 🟠 MajorDuplicate router registration.
The
events_routeris already registered at line 263. This second registration (lines 306-308) adds duplicate routes.🐛 Remove duplicate registration
- # Between-run event stream (always registered) - from ambient_runner.endpoints.events import router as events_router - - app.include_router(events_router) caps = bridge.capabilities()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/app.py` around lines 305 - 308, Remove the duplicate router registration by deleting the second app.include_router(events_router) call; ensure only the first registration of events_router (the existing import from ambient_runner.endpoints.events and the initial app.include_router(events_router) at the earlier location) remains so routes are not registered twice.components/runners/ambient-runner/ambient_runner/platform/auth.py (1)
401-409:⚠️ Potential issue | 🟠 MajorJira credential field mismatch: env vars not set when backend returns
apiToken.
fetch_jira_credentials()(line 304) andmcp.py(line 264) both check forapiToken, butpopulate_runtime_credentials()here checks fortoken(line 405). If the backend returns{"apiToken": "..."}, the env varsJIRA_URL,JIRA_API_TOKEN, andJIRA_EMAILare silently skipped.Use consistent field name: update line 405 to check
apiToken, and line 407 to read fromapiToken.Proposed fix
- elif jira_creds.get("token"): + elif jira_creds.get("apiToken"): os.environ["JIRA_URL"] = jira_creds.get("url", "") - os.environ["JIRA_API_TOKEN"] = jira_creds.get("token", "") + os.environ["JIRA_API_TOKEN"] = jira_creds.get("apiToken", "") os.environ["JIRA_EMAIL"] = jira_creds.get("email", "")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/platform/auth.py` around lines 401 - 409, populate_runtime_credentials() is checking for jira_creds.get("token") but fetch_jira_credentials() and mcp.py use "apiToken", so credential data from the backend is being skipped; change the check in populate_runtime_credentials() to look for "apiToken" and read jira_creds.get("apiToken") when setting JIRA_API_TOKEN (keep JIRA_URL and JIRA_EMAIL assignments as-is) so the environment vars are populated consistently with fetch_jira_credentials() and mcp.py.
🧹 Nitpick comments (1)
components/runners/ambient-runner/ambient_runner/endpoints/tasks.py (1)
58-62: Align synthetictask:completedpayload shape with adapter-emitted payloads.
stop_taskcurrently emits a reduced value object, while adapter notifications includeusageandoutput_filefields (components/runners/ambient-runner/ag_ui_claude_sdk/adapter.py, Line 679-694). Keeping one schema avoids client-side branching.As per coding guidelines, "Documentation and refactoring suggestions are welcome when substantive."Proposed schema-alignment diff
completed_event = CustomEvent( type=EventType.CUSTOM, name="task:completed", value={ "task_id": task_id, "status": "stopped", "summary": "Task stopped by user", + "usage": None, + "output_file": None, }, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/endpoints/tasks.py` around lines 58 - 62, The synthetic payload emitted in stop_task (the value dict for the "task:completed" notification) is missing fields present in adapter-emitted payloads—specifically usage and output_file—leading to divergent schemas; update the stop_task emission in ambient_runner/endpoints/tasks.py so its value includes the same keys as adapter notifications (task_id, status, summary, usage, output_file), populating usage and output_file with sensible defaults (e.g., zeroed usage object and null/empty output_file) or actual data if available, and ensure the event name "task:completed" and the value shape exactly match the adapter's payload contract used in ag_ui_claude_sdk/adapter.py (lines where usage/output_file are set).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/runners/ambient-runner/ambient_runner/_session_messages_api.py`:
- Around line 103-117: The UNAUTHENTICATED retry path calls
self._grpc_client.reconnect() and then mutates self._push_rpc/self._metadata
without guarding against failures; wrap the reconnect +
new_api/_push_rpc/_metadata assignment in a try/except Exception block, catch
any exception as e, log it (e.g., logger.exception or logger.error with context
including session_id) and do not let it propagate—exit the retry branch (do not
continue the loop) so the failure is handled gracefully and the outer flow
remains fire-and-forget. Keep the rest of the UNAUTHENTICATED handling (the
warning log and reconnect attempt) as-is but protect the reconnect and new_api
assignments with this local try/except.
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py`:
- Around line 215-220: This call sequence (clear_runtime_credentials(),
populate_runtime_credentials(self._context),
populate_mcp_server_credentials(self._context)) mutates process-global env and
shared token files and must be serialized across all concurrent bridge runs;
introduce and use a bridge-wide async lock (e.g. self._creds_lock) to guard the
entire refresh path and also use the same lock around the
clear_runtime_credentials() call in run()'s finally block so no other thread can
clear or overwrite credentials mid-turn; update methods/blocks invoking these
helpers (the location that sets self._last_creds_refresh and the run() finally
block) to acquire the bridge-wide lock before calling
clear_runtime_credentials/populate_* and release after.
- Around line 482-503: start_grpc_listener currently starts the
GRPCSessionListener and returns immediately; change it to await the listener's
readiness with a bounded timeout: after calling self._grpc_listener.start(), use
asyncio.wait_for(self._grpc_listener.ready.wait(), timeout=GRPC_STARTUP_TIMEOUT)
(or fetch a configured timeout) and raise a clear RuntimeError or log and clean
up if the wait times out; reference the start_grpc_listener method, the
GRPCSessionListener instantiation, the self._grpc_listener.start() call, and the
self._grpc_listener.ready event to implement this bounded readiness wait.
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py`:
- Line 72: The log call prints the raw ambient_mcp_url which may include
sensitive userinfo or query tokens; change the logger.info usage around the
ambient_mcp_url variable (the logger.info("Added ambient MCP sidecar server
(SSE): %s", ambient_mcp_url) call) to redact secrets before logging — e.g.,
parse ambient_mcp_url and remove userinfo and query params (or log only
scheme+host+path or a fixed label), then log the redacted value or omit the URL
entirely.
In `@components/runners/ambient-runner/ambient_runner/endpoints/run.py`:
- Around line 85-97: session_id currently falls back to the empty string which
disables gRPC persistence; change it to use the same fallback ClaudeBridge.run()
uses by deriving session_id from run_agent_input.thread_id or the bridge context
(e.g., session_id = run_agent_input.thread_id or (getattr(bridge, "context",
None) and bridge.context.session_id)), and pass that value to
grpc_push_middleware instead of "" so gRPC persistence is preserved; update the
session_id variable near its declaration and ensure grpc_push_middleware is
given None or the actual session id rather than an empty string.
In `@components/runners/ambient-runner/ambient_runner/middleware/grpc_push.py`:
- Around line 101-109: The loop yields events from event_stream then calls the
blocking function _push_event which synchronously invokes
grpc_client.session_messages.push(), blocking the async loop; change this to
fire-and-forget by offloading the blocking gRPC call: either call
asyncio.get_running_loop().run_in_executor(None, _push_event, grpc_client, sid,
event) or create an asyncio.create_task that runs an async wrapper (e.g.
_push_event_async) which itself uses run_in_executor to invoke
grpc_client.session_messages.push(), ensure you do not await it so streaming
stays non-blocking and handle/log exceptions inside the wrapper; keep the
existing grpc_client.close() in the finally block.
In `@components/runners/ambient-runner/tests/test_app_initial_prompt.py`:
- Line 119: Replace the deprecated call to asyncio.get_event_loop() assigned to
real_loop with asyncio.get_running_loop() so the async test uses the running
loop; locate the assignment to real_loop and change the function call from
asyncio.get_event_loop() to asyncio.get_running_loop() to match the existing use
of get_running_loop() elsewhere in the test.
In `@components/runners/ambient-runner/tests/test_grpc_writer.py`:
- Around line 27-34: The test helper make_writer currently converts an explicit
None grpc_client into a MagicMock, so the GRPCMessageWriter._write_message
branch where self._grpc_client is None is never exercised; update make_writer to
accept and pass through a sentinel (or allow grpc_client=None to be passed
through) so you can construct GRPCMessageWriter with grpc_client=None in the
test that targets the early-return path, and add/adjust the test at Line ~211 to
call make_writer(grpc_client=None) (or directly instantiate GRPCMessageWriter
with grpc_client=None) to verify the early return in
GRPCMessageWriter._write_message.
---
Outside diff comments:
In `@components/runners/ambient-runner/ag_ui_gemini_cli/types.py`:
- Line 96: The warning currently logs raw payload via logger.warning("Failed to
parse JSONL line: %s", line[:120]) which may leak sensitive user data; change
this to log only metadata about the failing input (e.g., its byte/char length,
source identifier if available, and a non-reversible hash like SHA256) and
remove any substring of the original payload. Locate the logger.warning call
that references the variable line in types.py and replace the message to include
computed metadata (length, source/name, sha256 or similar) while ensuring no
payload text is included in the log.
In `@components/runners/ambient-runner/ambient_runner/app.py`:
- Around line 305-308: Remove the duplicate router registration by deleting the
second app.include_router(events_router) call; ensure only the first
registration of events_router (the existing import from
ambient_runner.endpoints.events and the initial
app.include_router(events_router) at the earlier location) remains so routes are
not registered twice.
In `@components/runners/ambient-runner/ambient_runner/platform/auth.py`:
- Around line 401-409: populate_runtime_credentials() is checking for
jira_creds.get("token") but fetch_jira_credentials() and mcp.py use "apiToken",
so credential data from the backend is being skipped; change the check in
populate_runtime_credentials() to look for "apiToken" and read
jira_creds.get("apiToken") when setting JIRA_API_TOKEN (keep JIRA_URL and
JIRA_EMAIL assignments as-is) so the environment vars are populated consistently
with fetch_jira_credentials() and mcp.py.
---
Nitpick comments:
In `@components/runners/ambient-runner/ambient_runner/endpoints/tasks.py`:
- Around line 58-62: The synthetic payload emitted in stop_task (the value dict
for the "task:completed" notification) is missing fields present in
adapter-emitted payloads—specifically usage and output_file—leading to divergent
schemas; update the stop_task emission in ambient_runner/endpoints/tasks.py so
its value includes the same keys as adapter notifications (task_id, status,
summary, usage, output_file), populating usage and output_file with sensible
defaults (e.g., zeroed usage object and null/empty output_file) or actual data
if available, and ensure the event name "task:completed" and the value shape
exactly match the adapter's payload contract used in ag_ui_claude_sdk/adapter.py
(lines where usage/output_file are set).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: cb6929ac-dbdc-4c8b-af01-5602bfdb974b
📒 Files selected for processing (47)
components/runners/ambient-runner/.mcp.jsoncomponents/runners/ambient-runner/ag_ui_claude_sdk/adapter.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/handlers.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/hooks.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/reasoning_events.pycomponents/runners/ambient-runner/ag_ui_gemini_cli/adapter.pycomponents/runners/ambient-runner/ag_ui_gemini_cli/types.pycomponents/runners/ambient-runner/ambient_runner/_grpc_client.pycomponents/runners/ambient-runner/ambient_runner/_inbox_messages_api.pycomponents/runners/ambient-runner/ambient_runner/_session_messages_api.pycomponents/runners/ambient-runner/ambient_runner/app.pycomponents/runners/ambient-runner/ambient_runner/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/grpc_transport.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/mcp.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/prompts.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/tools.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/langgraph/bridge.pycomponents/runners/ambient-runner/ambient_runner/endpoints/events.pycomponents/runners/ambient-runner/ambient_runner/endpoints/run.pycomponents/runners/ambient-runner/ambient_runner/endpoints/tasks.pycomponents/runners/ambient-runner/ambient_runner/middleware/__init__.pycomponents/runners/ambient-runner/ambient_runner/middleware/grpc_push.pycomponents/runners/ambient-runner/ambient_runner/middleware/secret_redaction.pycomponents/runners/ambient-runner/ambient_runner/platform/auth.pycomponents/runners/ambient-runner/ambient_runner/platform/context.pycomponents/runners/ambient-runner/ambient_runner/platform/prompts.pycomponents/runners/ambient-runner/ambient_runner/platform/utils.pycomponents/runners/ambient-runner/ambient_runner/tools/backend_api.pycomponents/runners/ambient-runner/architecture.mdcomponents/runners/ambient-runner/pyproject.tomlcomponents/runners/ambient-runner/tests/test_app_initial_prompt.pycomponents/runners/ambient-runner/tests/test_auto_push.pycomponents/runners/ambient-runner/tests/test_bridge_claude.pycomponents/runners/ambient-runner/tests/test_e2e_api.pycomponents/runners/ambient-runner/tests/test_events_endpoint.pycomponents/runners/ambient-runner/tests/test_gemini_auth.pycomponents/runners/ambient-runner/tests/test_gemini_cli_adapter.pycomponents/runners/ambient-runner/tests/test_gemini_session.pycomponents/runners/ambient-runner/tests/test_google_drive_e2e.pycomponents/runners/ambient-runner/tests/test_grpc_client.pycomponents/runners/ambient-runner/tests/test_grpc_transport.pycomponents/runners/ambient-runner/tests/test_grpc_writer.pycomponents/runners/ambient-runner/tests/test_shared_session_credentials.py
💤 Files with no reviewable changes (7)
- components/runners/ambient-runner/tests/test_gemini_auth.py
- components/runners/ambient-runner/tests/test_e2e_api.py
- components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.py
- components/runners/ambient-runner/tests/test_gemini_cli_adapter.py
- components/runners/ambient-runner/tests/test_gemini_session.py
- components/runners/ambient-runner/tests/test_auto_push.py
- components/runners/ambient-runner/ag_ui_gemini_cli/adapter.py
| except grpc.RpcError as exc: | ||
| if ( | ||
| attempt == 0 | ||
| and exc.code() == grpc.StatusCode.UNAUTHENTICATED | ||
| and self._grpc_client is not None | ||
| ): | ||
| logger.warning( | ||
| "[GRPC PUSH→] UNAUTHENTICATED — reconnecting with fresh token (session=%s)", | ||
| session_id, | ||
| ) | ||
| self._grpc_client.reconnect() | ||
| new_api = self._grpc_client.session_messages | ||
| self._push_rpc = new_api._push_rpc | ||
| self._metadata = new_api._metadata | ||
| continue |
There was a problem hiding this comment.
Unhandled exception if reconnect() fails during UNAUTHENTICATED retry.
If grpc_client.reconnect() raises (e.g., CP token endpoint unreachable), the exception escapes the except grpc.RpcError block and propagates up — potentially crashing the caller. The outer except Exception at line 125 won't catch it since it's not within the try block at that point.
Wrap the reconnect path in its own try/except to maintain fire-and-forget semantics.
Proposed fix
if (
attempt == 0
and exc.code() == grpc.StatusCode.UNAUTHENTICATED
and self._grpc_client is not None
):
logger.warning(
"[GRPC PUSH→] UNAUTHENTICATED — reconnecting with fresh token (session=%s)",
session_id,
)
- self._grpc_client.reconnect()
- new_api = self._grpc_client.session_messages
- self._push_rpc = new_api._push_rpc
- self._metadata = new_api._metadata
- continue
+ try:
+ self._grpc_client.reconnect()
+ new_api = self._grpc_client.session_messages
+ self._push_rpc = new_api._push_rpc
+ self._metadata = new_api._metadata
+ continue
+ except Exception as reconnect_exc:
+ logger.warning(
+ "[GRPC PUSH→] reconnect failed (session=%s): %s",
+ session_id,
+ reconnect_exc,
+ )
+ return None
logger.warning(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/_session_messages_api.py`
around lines 103 - 117, The UNAUTHENTICATED retry path calls
self._grpc_client.reconnect() and then mutates self._push_rpc/self._metadata
without guarding against failures; wrap the reconnect +
new_api/_push_rpc/_metadata assignment in a try/except Exception block, catch
any exception as e, log it (e.g., logger.exception or logger.error with context
including session_id) and do not let it propagate—exit the retry branch (do not
continue the loop) so the failure is handled gracefully and the outer flow
remains fire-and-forget. Keep the rest of the UNAUTHENTICATED handling (the
warning log and reconnect attempt) as-is but protect the reconnect and new_api
assignments with this local try/except.
| # Fresh credentials for this user on every run | ||
| clear_runtime_credentials() | ||
| await populate_runtime_credentials(self._context) | ||
| await populate_mcp_server_credentials(self._context) | ||
| self._last_creds_refresh = time.monotonic() | ||
|
|
There was a problem hiding this comment.
Serialize credential refresh/cleanup across concurrent runs.
These helpers mutate process-global env vars and shared token files. Doing that on every run without a bridge-wide lock means one thread can clear or overwrite another thread’s credentials mid-turn; the per-thread session lock below does not protect this path.
Possible fix
class ClaudeBridge(PlatformBridge):
def __init__(self) -> None:
super().__init__()
+ self._credentials_lock = asyncio.Lock()
self._adapter: ClaudeAgentAdapter | None = None
...
- clear_runtime_credentials()
- await populate_runtime_credentials(self._context)
- await populate_mcp_server_credentials(self._context)
- self._last_creds_refresh = time.monotonic()
+ async with self._credentials_lock:
+ clear_runtime_credentials()
+ await populate_runtime_credentials(self._context)
+ await populate_mcp_server_credentials(self._context)
+ self._last_creds_refresh = time.monotonic()The same lock also needs to guard the clear_runtime_credentials() call in run()'s finally block, otherwise another active turn can still get its credentials cleared underneath it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py`
around lines 215 - 220, This call sequence (clear_runtime_credentials(),
populate_runtime_credentials(self._context),
populate_mcp_server_credentials(self._context)) mutates process-global env and
shared token files and must be serialized across all concurrent bridge runs;
introduce and use a bridge-wide async lock (e.g. self._creds_lock) to guard the
entire refresh path and also use the same lock around the
clear_runtime_credentials() call in run()'s finally block so no other thread can
clear or overwrite credentials mid-turn; update methods/blocks invoking these
helpers (the location that sets self._last_creds_refresh and the run() finally
block) to acquire the bridge-wide lock before calling
clear_runtime_credentials/populate_* and release after.
| async def start_grpc_listener(self, grpc_url: str) -> None: | ||
| """Start the gRPC session listener for this bridge. | ||
|
|
||
| Separated from _setup_platform so it can be called after platform | ||
| setup completes, with a bounded timeout for readiness. Only valid | ||
| when AMBIENT_GRPC_ENABLED=true and AMBIENT_GRPC_URL are both set. | ||
| """ | ||
| if self._context is None: | ||
| raise RuntimeError("Cannot start gRPC listener: context not set") | ||
| if self._grpc_listener is not None: | ||
| logger.warning("gRPC listener already started — skipping duplicate start") | ||
| return | ||
|
|
||
| from ambient_runner.bridges.claude.grpc_transport import GRPCSessionListener | ||
|
|
||
| session_id = self._context.session_id | ||
| self._grpc_listener = GRPCSessionListener( | ||
| bridge=self, | ||
| session_id=session_id, | ||
| grpc_url=grpc_url, | ||
| ) | ||
| self._grpc_listener.start() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C4 'class GRPCSessionListener|def start\(|ready|start_grpc_listener\(' \
components/runners/ambient-runner/ambient_runner \
--type=pyRepository: ambient-code/platform
Length of output: 50377
🏁 Script executed:
rg -n 'start_grpc_listener' components/runners/ambient-runner --type=pyRepository: ambient-code/platform
Length of output: 2673
Implement bounded readiness timeout inside start_grpc_listener.
The docstring claims this method provides "a bounded timeout for readiness," but the implementation returns immediately after start() without awaiting the ready event. The caller in app.py compensates by waiting explicitly (line 137), but this shifts responsibility outside the method and violates the stated API contract.
Suggested fix
self._grpc_listener = GRPCSessionListener(
bridge=self,
session_id=session_id,
grpc_url=grpc_url,
)
self._grpc_listener.start()
+ try:
+ await asyncio.wait_for(self._grpc_listener.ready.wait(), timeout=5)
+ except Exception:
+ await self._grpc_listener.stop()
+ self._grpc_listener = None
+ raise
logger.info(
"gRPC listener started: session=%s url=%s",
session_id,
grpc_url,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py`
around lines 482 - 503, start_grpc_listener currently starts the
GRPCSessionListener and returns immediately; change it to await the listener's
readiness with a bounded timeout: after calling self._grpc_listener.start(), use
asyncio.wait_for(self._grpc_listener.ready.wait(), timeout=GRPC_STARTUP_TIMEOUT)
(or fetch a configured timeout) and raise a clear RuntimeError or log and clean
up if the wait times out; reference the start_grpc_listener method, the
GRPCSessionListener instantiation, the self._grpc_listener.start() call, and the
self._grpc_listener.ready event to implement this bounded readiness wait.
| "type": "sse", | ||
| "url": f"{ambient_mcp_url.rstrip('/')}/sse", | ||
| } | ||
| logger.info("Added ambient MCP sidecar server (SSE): %s", ambient_mcp_url) |
There was a problem hiding this comment.
Avoid logging raw AMBIENT_MCP_URL values
Line 72 logs the full URL from environment; if it contains userinfo/query tokens, this leaks secrets into logs.
Suggested fix
- logger.info("Added ambient MCP sidecar server (SSE): %s", ambient_mcp_url)
+ logger.info("Added ambient MCP sidecar server (SSE)")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.info("Added ambient MCP sidecar server (SSE): %s", ambient_mcp_url) | |
| logger.info("Added ambient MCP sidecar server (SSE)") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py` at
line 72, The log call prints the raw ambient_mcp_url which may include sensitive
userinfo or query tokens; change the logger.info usage around the
ambient_mcp_url variable (the logger.info("Added ambient MCP sidecar server
(SSE): %s", ambient_mcp_url) call) to redact secrets before logging — e.g.,
parse ambient_mcp_url and remove userinfo and query params (or log only
scheme+host+path or a fixed label), then log the redacted value or omit the URL
entirely.
| session_id = run_agent_input.thread_id or "" | ||
|
|
||
| async def event_stream(): | ||
| try: | ||
| async for event in bridge.run( | ||
| run_agent_input, | ||
| current_user_id=current_user_id, | ||
| current_user_name=current_user_name, | ||
| caller_token=caller_token, | ||
| async for event in grpc_push_middleware( | ||
| bridge.run( | ||
| run_agent_input, | ||
| current_user_id=current_user_id, | ||
| current_user_name=current_user_name, | ||
| caller_token=caller_token, | ||
| ), | ||
| session_id=session_id, | ||
| ): |
There was a problem hiding this comment.
Use the same session fallback as bridge.run().
thread_id is optional here, but ClaudeBridge.run() already falls back to bridge.context.session_id when it is missing. Passing "" into grpc_push_middleware silently disables gRPC persistence for those runs even though the run itself still succeeds.
Possible fix
- session_id = run_agent_input.thread_id or ""
+ session_id = (
+ run_agent_input.thread_id
+ or getattr(getattr(bridge, "context", None), "session_id", "")
+ or ""
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/endpoints/run.py` around
lines 85 - 97, session_id currently falls back to the empty string which
disables gRPC persistence; change it to use the same fallback ClaudeBridge.run()
uses by deriving session_id from run_agent_input.thread_id or the bridge context
(e.g., session_id = run_agent_input.thread_id or (getattr(bridge, "context",
None) and bridge.context.session_id)), and pass that value to
grpc_push_middleware instead of "" so gRPC persistence is preserved; update the
session_id variable near its declaration and ensure grpc_push_middleware is
given None or the actual session id rather than an empty string.
| try: | ||
| async for event in event_stream: | ||
| yield event | ||
| _push_event(grpc_client, sid, event) | ||
| finally: | ||
| try: | ||
| grpc_client.close() | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Blocking gRPC call on async event loop.
_push_event() calls grpc_client.session_messages.push() synchronously (blocking I/O). This blocks the event loop while waiting for the gRPC response, which can stall event streaming.
The docstring says "fire-and-forget" but the current implementation is synchronous. Either offload to an executor or use asyncio.create_task with a wrapper.
🔧 Proposed fix using run_in_executor
try:
async for event in event_stream:
yield event
- _push_event(grpc_client, sid, event)
+ # Fire-and-forget: offload blocking gRPC call to thread pool
+ loop = asyncio.get_running_loop()
+ loop.run_in_executor(None, _push_event, grpc_client, sid, event)
finally:Or wrap in a task if you want error visibility:
asyncio.create_task(_push_event_async(grpc_client, sid, event))🧰 Tools
🪛 Ruff (0.15.10)
[error] 108-109: try-except-pass detected, consider logging the exception
(S110)
[warning] 108-108: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/middleware/grpc_push.py`
around lines 101 - 109, The loop yields events from event_stream then calls the
blocking function _push_event which synchronously invokes
grpc_client.session_messages.push(), blocking the async loop; change this to
fire-and-forget by offloading the blocking gRPC call: either call
asyncio.get_running_loop().run_in_executor(None, _push_event, grpc_client, sid,
event) or create an asyncio.create_task that runs an async wrapper (e.g.
_push_event_async) which itself uses run_in_executor to invoke
grpc_client.session_messages.push(), ensure you do not await it so streaming
stays non-blocking and handle/log exceptions inside the wrapper; keep the
existing grpc_client.close() in the finally block.
| mock_cls.from_env.return_value = mock_client | ||
|
|
||
| executor_calls = [] | ||
| real_loop = asyncio.get_event_loop() |
There was a problem hiding this comment.
Deprecated asyncio.get_event_loop() — use get_running_loop() instead.
In async context, get_event_loop() is deprecated since Python 3.10. Since the test is already async, use asyncio.get_running_loop() which you're already calling on line 121.
Proposed fix
- real_loop = asyncio.get_event_loop()
+ real_loop = asyncio.get_running_loop()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| real_loop = asyncio.get_event_loop() | |
| real_loop = asyncio.get_running_loop() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/tests/test_app_initial_prompt.py` at line
119, Replace the deprecated call to asyncio.get_event_loop() assigned to
real_loop with asyncio.get_running_loop() so the async test uses the running
loop; locate the assignment to real_loop and change the function call from
asyncio.get_event_loop() to asyncio.get_running_loop() to match the existing use
of get_running_loop() elsewhere in the test.
| def make_writer(grpc_client=None): | ||
| if grpc_client is None: | ||
| grpc_client = MagicMock() | ||
| return GRPCMessageWriter( | ||
| session_id="sess-1", | ||
| run_id="run-1", | ||
| grpc_client=grpc_client, | ||
| ) |
There was a problem hiding this comment.
grpc_client=None never hits the no-client branch.
make_writer() converts an explicit None into a MagicMock, so Line 211 still exercises the normal push path. The GRPCMessageWriter._write_message() early return for self._grpc_client is None remains untested.
Possible fix
+_DEFAULT = object()
+
-def make_writer(grpc_client=None):
- if grpc_client is None:
+def make_writer(grpc_client=_DEFAULT):
+ if grpc_client is _DEFAULT:
grpc_client = MagicMock()
return GRPCMessageWriter(
session_id="sess-1",
run_id="run-1",
grpc_client=grpc_client,
)Also applies to: 208-213
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/tests/test_grpc_writer.py` around lines 27
- 34, The test helper make_writer currently converts an explicit None
grpc_client into a MagicMock, so the GRPCMessageWriter._write_message branch
where self._grpc_client is None is never exercised; update make_writer to accept
and pass through a sentinel (or allow grpc_client=None to be passed through) so
you can construct GRPCMessageWriter with grpc_client=None in the test that
targets the early-return path, and add/adjust the test at Line ~211 to call
make_writer(grpc_client=None) (or directly instantiate GRPCMessageWriter with
grpc_client=None) to verify the early return in
GRPCMessageWriter._write_message.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/runners/ambient-runner/ambient_runner/platform/auth.py (2)
167-195:⚠️ Potential issue | 🟠 MajorExpired caller token + expired
BOT_TOKENnever reaches the refresh path.If the caller-token request gets 401/403 and the BOT_TOKEN fallback also gets 401/403, this branch raises
PermissionErrorimmediately. That skips_retry_with_fresh_bot_token(...), so the new CP refresh logic never runs in the exact case it is needed most.Suggested fix
try: with _urllib_request.urlopen(fallback_req, timeout=10) as resp: return resp.read().decode("utf-8", errors="replace") + except _urllib_request.HTTPError as fallback_err: + if fallback_err.code in (401, 403): + return _retry_with_fresh_bot_token(fallback_err.code) + logger.warning( + f"{credential_type} BOT_TOKEN fallback failed: {fallback_err}" + ) + raise PermissionError( + f"{credential_type} authentication failed after caller-token fallback" + ) from fallback_err except Exception as fallback_err: logger.warning( f"{credential_type} BOT_TOKEN fallback also failed: {fallback_err}" ) raise PermissionError(Also applies to: 202-235
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/platform/auth.py` around lines 167 - 195, The current fallback except block raises PermissionError immediately when the BOT_TOKEN fallback fails, which prevents the `_retry_with_fresh_bot_token(...)` refresh path from running; change the except handler inside the caller-token-expired branch to detect if `fallback_err` is an HTTP error with code 401 or 403 and, in that case, call and return `_retry_with_fresh_bot_token(fallback_err.code)` instead of raising; otherwise log and raise the existing PermissionError. Apply the same adjustment to the parallel block handling the other occurrence so that both places reference `_retry_with_fresh_bot_token`, `credential_type`, and the fallback request flow (`get_bot_token`, `fallback_req`, `context.current_user_id`) consistently.
125-140:⚠️ Potential issue | 🔴 CriticalReject non-HTTP(S) backend URLs before
urlopen().The current guard only checks
parsed.hostname, so values likefile:///...skip the allowlist entirely and still reachurlopen()with the Authorization header attached, defeating the exfiltration protection. Schemes likeftp://with a truthy hostname are caught, butfile://with no hostname bypass the check.Suggested fix
url = ( f"{base}/api/ambient/v1/projects/{project_id}/credentials/{credential_id}/token" ) # Reject non-cluster URLs to prevent token exfiltration via user-overridden env vars parsed = urlparse(base) + if parsed.scheme not in {"http", "https"} or not parsed.hostname: + logger.error(f"Refusing to send credentials to invalid backend URL: {base}") + return {} if parsed.hostname and not ( parsed.hostname.endswith(".svc.cluster.local") or parsed.hostname.endswith(".svc") or parsed.hostname == "localhost" or parsed.hostname == "127.0.0.1" ): logger.error( f"Refusing to send credentials to external host: {parsed.hostname}" ) return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/platform/auth.py` around lines 125 - 140, The guard currently only checks parsed.hostname and misses non-HTTP(S) schemes like file:// which have no hostname; update the validation around url/parsed (constructed from base, project_id, credential_id) to first ensure parsed.scheme is either "http" or "https" and reject any other scheme, then perform the existing hostname allowlist checks (endswith(".svc.cluster.local"), ".svc", "localhost", "127.0.0.1"); if the scheme is invalid or the hostname fails the allowlist, log the refusal (use the same logger message) and return {} before calling urlopen().
♻️ Duplicate comments (4)
components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py (1)
72-72:⚠️ Potential issue | 🟠 MajorDo not log raw
AMBIENT_MCP_URLLine 72 logs the full URL value; this can leak secrets if userinfo or query tokens are present.
Suggested fix
- logger.info("Added ambient MCP sidecar server (SSE): %s", ambient_mcp_url) + logger.info("Added ambient MCP sidecar server (SSE)")As per coding guidelines,
components/runners/ambient-runner/**/*.py: "Check subprocess handling, timeout management, and that secrets are not logged."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py` at line 72, The log on logger.info currently prints the raw ambient_mcp_url which can leak credentials; update the code around the logger.info call in mcp.py to sanitize ambient_mcp_url before logging by parsing the URL (e.g., via urllib.parse), removing any userinfo (username:password) and stripping/obfuscating sensitive query parameters (e.g., token, api_key, auth) or replacing them with "[REDACTED]", then log the sanitized URL or only host/path instead of the raw ambient_mcp_url; ensure you reference ambient_mcp_url and the logger.info call when making the change.components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py (2)
482-508:⚠️ Potential issue | 🟠 MajorWait for gRPC listener readiness before reporting startup success.
The docstring promises a bounded readiness timeout, but
start_grpc_listener()returns right afterstart()and logs success before the listener is actually ready. A badgrpc_urlcan therefore slip past startup and fail later.Possible fix
self._grpc_listener = GRPCSessionListener( bridge=self, session_id=session_id, grpc_url=grpc_url, ) self._grpc_listener.start() + try: + await asyncio.wait_for(self._grpc_listener.ready.wait(), timeout=5) + except Exception: + await self._grpc_listener.stop() + self._grpc_listener = None + raise logger.info( "gRPC listener started: session=%s url=%s", session_id, grpc_url, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py` around lines 482 - 508, The start_grpc_listener function currently logs success immediately after calling self._grpc_listener.start(); change it to wait for the GRPCSessionListener to report readiness (or fail) before logging/returning: after creating GRPCSessionListener (bridge=self, session_id=session_id, grpc_url=grpc_url) call its readiness API (e.g., await self._grpc_listener.wait_ready(timeout=...) or poll self._grpc_listener.is_ready() with a bounded timeout), raise or log an error and clear self._grpc_listener on failure, and only log "gRPC listener started" and return after the listener is confirmed ready; ensure you propagate or surface the underlying connection error when readiness times out or fails.
215-219:⚠️ Potential issue | 🔴 CriticalSerialize credential refresh and cleanup across runs.
Line 216 and Line 335 still mutate process-global env vars and shared token files without a bridge-wide lock. Two overlapping turns can clear or overwrite each other’s credentials mid-run.
Possible fix
class ClaudeBridge(PlatformBridge): def __init__(self) -> None: super().__init__() + self._credentials_lock = asyncio.Lock() self._adapter: ClaudeAgentAdapter | None = None ... - clear_runtime_credentials() - await populate_runtime_credentials(self._context) - await populate_mcp_server_credentials(self._context) - self._last_creds_refresh = time.monotonic() + async with self._credentials_lock: + clear_runtime_credentials() + await populate_runtime_credentials(self._context) + await populate_mcp_server_credentials(self._context) + self._last_creds_refresh = time.monotonic() ... - clear_runtime_credentials() + async with self._credentials_lock: + clear_runtime_credentials()Also applies to: 330-335
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py` around lines 215 - 219, The credential refresh/cleanup calls (clear_runtime_credentials(), populate_runtime_credentials(self._context), populate_mcp_server_credentials(self._context)) must be serialized with a bridge-wide async lock to prevent concurrent runs from clearing/overwriting each other’s env vars and token files; add a bridge-scoped asyncio.Lock (e.g., self._creds_lock) or module-level lock, acquire it before calling clear_runtime_credentials()/populate_runtime_credentials()/populate_mcp_server_credentials and release it after, and apply the same locking around the other credential-mutation site (the block referenced around lines 330-335) so all credential mutations use the same lock. Ensure the lock is created once (constructor or module init) and used with async with self._creds_lock to avoid deadlocks.components/runners/ambient-runner/ambient_runner/_session_messages_api.py (1)
103-117:⚠️ Potential issue | 🟠 MajorKeep reconnect failures inside the fire-and-forget contract.
push()says it logs and returnsNoneon transport errors, but this retry branch still letsreconnect()/session_messagessetup exceptions escape to the caller. Wrap the reconnect/update block locally and fall back toNoneif it fails.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/_session_messages_api.py` around lines 103 - 117, The retry branch inside push() can raise from self._grpc_client.reconnect() or accessing new_api = self._grpc_client.session_messages, so wrap the reconnect and metadata/_push_rpc update (the block that calls self._grpc_client.reconnect(), reads self._grpc_client.session_messages, assigns self._push_rpc and self._metadata) in a local try/except; on any exception log a warning/error and break/return None to preserve the fire-and-forget contract instead of letting exceptions escape to the caller. Ensure you reference push(), self._grpc_client.reconnect(), session_messages, _push_rpc and _metadata when locating and protecting the code.
🧹 Nitpick comments (2)
components/runners/ambient-runner/tests/test_grpc_client.py (1)
95-97: Assert specific exception type for invalid public key.
pytest.raises(Exception)is too broad. The cryptography library raisesValueErrororTypeErrorfor invalid PEM data — asserting on the specific type improves test reliability.♻️ Proposed fix
def test_invalid_public_key_raises(self): - with pytest.raises(Exception): + with pytest.raises((ValueError, TypeError)): _encrypt_session_id("not a pem key", "session-id")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/tests/test_grpc_client.py` around lines 95 - 97, The test test_invalid_public_key_raises is using a too-broad pytest.raises(Exception); update it to assert the specific exceptions raised by the cryptography PEM parsing (e.g., use pytest.raises((ValueError, TypeError))) when calling _encrypt_session_id("not a pem key", "session-id") so the test verifies the exact error types from the PEM parsing path in _encrypt_session_id.components/manifests/overlays/mpp-openshift/ambient-api-server.yaml (1)
78-83: EnablereadOnlyRootFilesystem: trueon the main container.The main api-server container (lines 78–83) can safely use
readOnlyRootFilesystem: trueby mounting writable paths (logs, temp) asemptyDirvolumes. The initContainer (lines 137–142) likely needs write access for database migration and should remainfalse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/overlays/mpp-openshift/ambient-api-server.yaml` around lines 78 - 83, Set readOnlyRootFilesystem: true on the main api-server container's securityContext (replace the current false) and ensure any paths that need to be writable by that container (logs, tmp, etc.) are mounted as emptyDir volumes; keep the initContainer's readOnlyRootFilesystem as false because it requires write access for DB migrations. Update the container spec where securityContext is defined (securityContext -> readOnlyRootFilesystem) and add corresponding emptyDir volume mounts (and volume entries) for the writable paths so the main container remains functional while being read-only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/manifests/base/ambient-control-plane-service.yml`:
- Around line 4-18: The Deployment lacks a namespace so the pod cannot resolve
serviceAccountName ambient-control-plane (and its secret
ambient-control-plane-token) which live in ambient-code; add metadata.namespace:
ambient-code to the Deployment top-level metadata block (the metadata section
for the resource that contains spec.replicas and selector) and likewise add
namespace: ambient-code to any other resource metadata blocks that reference
serviceAccountName ambient-control-plane (the template/spec area referenced
around lines 33–37) so the manifest explicitly targets the ambient-code
namespace.
In `@components/manifests/base/rbac/control-plane-clusterrole.yaml`:
- Around line 22-24: The ClusterRole currently grants broad create/update/delete
on cluster-scoped "secrets" and "services" via the resources: ["secrets",
"serviceaccounts", "services", "pods"] and verbs:
["get","list","watch","create","update","patch","delete","deletecollection"];
change this to follow least-privilege by removing mutation verbs for
cluster-scoped secrets/services (keep only read verbs if cluster-wide read is
needed) and/or move secrets/services mutations into a namespaced
Role/RoleBinding targeted at the specific namespaces that need it; update the
ClusterRole resource list (the resources array and verbs array) to only include
safe verbs for cluster scope and create a separate Role (with
create/update/patch/delete verbs) for "secrets" and "services" limited to the
required namespace(s), then bind that Role where necessary instead of using the
ClusterRole binding.
In `@components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml`:
- Around line 47-97: The PostgreSQL container (name: postgresql) is missing a
resources stanza; add a resources block to the container spec for postgresql
with appropriate requests and limits (CPU and memory) to ensure scheduler
placement and stable QoS (e.g., requests for CPU/memory and corresponding
limits), and apply the same policy to any similar DB containers; update the
container spec near readinessProbe/livenessProbe (within the same container
definition that mounts ambient-api-server-db-data) to include resources so the
pod gets Guaranteed/Burstable QoS as required by the manifests guidelines.
- Around line 95-97: The volume named ambient-api-server-db-data is configured
as emptyDir which makes /var/lib/pgsql/data ephemeral; replace this with a
PersistentVolumeClaim-backed volume and update the workload to use that PVC (or
convert the Deployment to a StatefulSet) so ambient-api-server retains
PostgreSQL data across restarts; create a matching PersistentVolumeClaim
resource (or reference an existing storageClass) and mount it where
ambient-api-server uses /var/lib/pgsql/data, ensuring the PVC name replaces
ambient-api-server-db-data in the pod spec and the claim has appropriate
accessMode and storage size for the database.
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml`:
- Around line 83-84: The CP_TOKEN_URL environment variable points to the wrong
namespace; update the value of CP_TOKEN_URL in the Deployment manifest from
"http://ambient-control-plane.ambient-code--ambient-s0.svc:8080/token" to use
the runtime-int namespace so the service can be reached (change to
"http://ambient-control-plane.ambient-code--runtime-int.svc:8080/token"); locate
the CP_TOKEN_URL entry in the ambient-control-plane Deployment/overlay and
replace the namespace portion of the URL accordingly so it matches the service's
namespace used by the ambient-control-plane service.
In
`@components/runners/ambient-runner/ambient_runner/bridges/claude/grpc_transport.py`:
- Around line 359-360: The listener is incorrectly removing shared SSE queues
from bridge._active_streams (the run_queue/active_streams.pop(thread_id, None)
call), which can drop other /events subscribers; remove that unconditional pop
and instead either (A) don’t modify bridge._active_streams from this listener at
all and let the endpoint-owned cleanup path handle removal, or (B) if this
listener must remove queues it created, track ownership when creating the queue
(e.g., a self._owned_streams set keyed by thread_id) and only pop
bridge._active_streams[thread_id] when thread_id is in that ownership set;
update the code around run_queue, active_streams, thread_id and
bridge._active_streams accordingly.
- Around line 420-427: The current assignment to assistant_text selects the
first (oldest) assistant message from self._accumulated_messages, causing stale
content to be resent; change the selection to pick the most recent assistant
message by iterating the messages in reverse (e.g.,
reversed(self._accumulated_messages) or indexing from the end) and return the
first item whose role == "assistant" (update the expression that computes
assistant_text to use the reversed sequence so PushSessionMessage uses the
latest assistant reply and not the oldest snapshot).
In `@components/runners/ambient-runner/ambient_runner/endpoints/events.py`:
- Around line 70-73: The code creates an unbounded asyncio.Queue (queue =
asyncio.Queue()) contrary to the function docstring and tests that expect a
bounded queue; update the queue creation in the active_streams initialization
branch to use a bounded queue (e.g., asyncio.Queue(maxsize=100)) so the variable
queue and the active_streams[thread_id] entry are created with a max size,
keeping the rest of the logic in this function (references: active_streams,
thread_id, queue) unchanged.
In `@docs/internal/proposals/alpha-to-main-migration.md`:
- Around line 75-78: The doc states that manifests are combined into "PR 6 —
runners + manifests: Auth, Credentials, gRPC, SSE, and Kustomize Overlays" but
later sections still model a separate PR 7; update the ordering graph and all
narrative references so PR7 is removed and its manifests path is merged into
PR6. Locate the heading/title "PR 6 — runners + manifests: Auth, Credentials,
gRPC, SSE, and Kustomize Overlays" and the later paragraphs/graph nodes that
mention "PR 7" or a separate "manifests" path (the ordering graph and the
narrative around PR sequencing) and replace them to reflect a single
consolidated PR6 path, ensuring dependencies/“Depends on” lines and any
sequencing arrows reference PR6 (and PR2/PR4 as needed) instead of PR7 so
there’s no conflicting execution guidance.
---
Outside diff comments:
In `@components/runners/ambient-runner/ambient_runner/platform/auth.py`:
- Around line 167-195: The current fallback except block raises PermissionError
immediately when the BOT_TOKEN fallback fails, which prevents the
`_retry_with_fresh_bot_token(...)` refresh path from running; change the except
handler inside the caller-token-expired branch to detect if `fallback_err` is an
HTTP error with code 401 or 403 and, in that case, call and return
`_retry_with_fresh_bot_token(fallback_err.code)` instead of raising; otherwise
log and raise the existing PermissionError. Apply the same adjustment to the
parallel block handling the other occurrence so that both places reference
`_retry_with_fresh_bot_token`, `credential_type`, and the fallback request flow
(`get_bot_token`, `fallback_req`, `context.current_user_id`) consistently.
- Around line 125-140: The guard currently only checks parsed.hostname and
misses non-HTTP(S) schemes like file:// which have no hostname; update the
validation around url/parsed (constructed from base, project_id, credential_id)
to first ensure parsed.scheme is either "http" or "https" and reject any other
scheme, then perform the existing hostname allowlist checks
(endswith(".svc.cluster.local"), ".svc", "localhost", "127.0.0.1"); if the
scheme is invalid or the hostname fails the allowlist, log the refusal (use the
same logger message) and return {} before calling urlopen().
---
Duplicate comments:
In `@components/runners/ambient-runner/ambient_runner/_session_messages_api.py`:
- Around line 103-117: The retry branch inside push() can raise from
self._grpc_client.reconnect() or accessing new_api =
self._grpc_client.session_messages, so wrap the reconnect and metadata/_push_rpc
update (the block that calls self._grpc_client.reconnect(), reads
self._grpc_client.session_messages, assigns self._push_rpc and self._metadata)
in a local try/except; on any exception log a warning/error and break/return
None to preserve the fire-and-forget contract instead of letting exceptions
escape to the caller. Ensure you reference push(),
self._grpc_client.reconnect(), session_messages, _push_rpc and _metadata when
locating and protecting the code.
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py`:
- Around line 482-508: The start_grpc_listener function currently logs success
immediately after calling self._grpc_listener.start(); change it to wait for the
GRPCSessionListener to report readiness (or fail) before logging/returning:
after creating GRPCSessionListener (bridge=self, session_id=session_id,
grpc_url=grpc_url) call its readiness API (e.g., await
self._grpc_listener.wait_ready(timeout=...) or poll
self._grpc_listener.is_ready() with a bounded timeout), raise or log an error
and clear self._grpc_listener on failure, and only log "gRPC listener started"
and return after the listener is confirmed ready; ensure you propagate or
surface the underlying connection error when readiness times out or fails.
- Around line 215-219: The credential refresh/cleanup calls
(clear_runtime_credentials(), populate_runtime_credentials(self._context),
populate_mcp_server_credentials(self._context)) must be serialized with a
bridge-wide async lock to prevent concurrent runs from clearing/overwriting each
other’s env vars and token files; add a bridge-scoped asyncio.Lock (e.g.,
self._creds_lock) or module-level lock, acquire it before calling
clear_runtime_credentials()/populate_runtime_credentials()/populate_mcp_server_credentials
and release it after, and apply the same locking around the other
credential-mutation site (the block referenced around lines 330-335) so all
credential mutations use the same lock. Ensure the lock is created once
(constructor or module init) and used with async with self._creds_lock to avoid
deadlocks.
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py`:
- Line 72: The log on logger.info currently prints the raw ambient_mcp_url which
can leak credentials; update the code around the logger.info call in mcp.py to
sanitize ambient_mcp_url before logging by parsing the URL (e.g., via
urllib.parse), removing any userinfo (username:password) and
stripping/obfuscating sensitive query parameters (e.g., token, api_key, auth) or
replacing them with "[REDACTED]", then log the sanitized URL or only host/path
instead of the raw ambient_mcp_url; ensure you reference ambient_mcp_url and the
logger.info call when making the change.
---
Nitpick comments:
In `@components/manifests/overlays/mpp-openshift/ambient-api-server.yaml`:
- Around line 78-83: Set readOnlyRootFilesystem: true on the main api-server
container's securityContext (replace the current false) and ensure any paths
that need to be writable by that container (logs, tmp, etc.) are mounted as
emptyDir volumes; keep the initContainer's readOnlyRootFilesystem as false
because it requires write access for DB migrations. Update the container spec
where securityContext is defined (securityContext -> readOnlyRootFilesystem) and
add corresponding emptyDir volume mounts (and volume entries) for the writable
paths so the main container remains functional while being read-only.
In `@components/runners/ambient-runner/tests/test_grpc_client.py`:
- Around line 95-97: The test test_invalid_public_key_raises is using a
too-broad pytest.raises(Exception); update it to assert the specific exceptions
raised by the cryptography PEM parsing (e.g., use pytest.raises((ValueError,
TypeError))) when calling _encrypt_session_id("not a pem key", "session-id") so
the test verifies the exact error types from the PEM parsing path in
_encrypt_session_id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 79983dab-f08b-4ce6-8720-198bc1e49513
📒 Files selected for processing (81)
components/manifests/base/ambient-control-plane-service.ymlcomponents/manifests/base/kustomization.yamlcomponents/manifests/base/rbac/control-plane-clusterrole.yamlcomponents/manifests/base/rbac/control-plane-clusterrolebinding.yamlcomponents/manifests/base/rbac/control-plane-sa.yamlcomponents/manifests/base/rbac/kustomization.yamlcomponents/manifests/overlays/cluster-reader/README.mdcomponents/manifests/overlays/cluster-reader/kustomization.yamlcomponents/manifests/overlays/cluster-reader/service-account.yamlcomponents/manifests/overlays/kind-local/operator-env-patch.yamlcomponents/manifests/overlays/local-dev/ambient-api-server-route.yamlcomponents/manifests/overlays/mpp-openshift/README.mdcomponents/manifests/overlays/mpp-openshift/ambient-api-server-args-patch.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-db.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-route.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-service-ca-patch.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane-sa.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane-svc.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane.yamlcomponents/manifests/overlays/mpp-openshift/ambient-cp-tenant-sa.yamlcomponents/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yamlcomponents/manifests/overlays/mpp-openshift/ambient-tenant-ingress-netpol.yamlcomponents/manifests/overlays/mpp-openshift/kustomization.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-runtime-int.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-s0.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/kustomization.yamlcomponents/manifests/overlays/openshift-dev/ambient-api-server-args-patch.yamlcomponents/manifests/overlays/openshift-dev/ambient-api-server-env-patch.yamlcomponents/manifests/overlays/openshift-dev/kustomization.yamlcomponents/manifests/overlays/production/ambient-api-server-route.yamlcomponents/manifests/overlays/production/kustomization.yamlcomponents/runners/ambient-runner/.mcp.jsoncomponents/runners/ambient-runner/ag_ui_claude_sdk/adapter.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/handlers.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/hooks.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/reasoning_events.pycomponents/runners/ambient-runner/ag_ui_gemini_cli/adapter.pycomponents/runners/ambient-runner/ag_ui_gemini_cli/types.pycomponents/runners/ambient-runner/ambient_runner/_grpc_client.pycomponents/runners/ambient-runner/ambient_runner/_inbox_messages_api.pycomponents/runners/ambient-runner/ambient_runner/_session_messages_api.pycomponents/runners/ambient-runner/ambient_runner/app.pycomponents/runners/ambient-runner/ambient_runner/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/grpc_transport.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/mcp.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/prompts.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/tools.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/langgraph/bridge.pycomponents/runners/ambient-runner/ambient_runner/endpoints/events.pycomponents/runners/ambient-runner/ambient_runner/endpoints/run.pycomponents/runners/ambient-runner/ambient_runner/endpoints/tasks.pycomponents/runners/ambient-runner/ambient_runner/middleware/__init__.pycomponents/runners/ambient-runner/ambient_runner/middleware/grpc_push.pycomponents/runners/ambient-runner/ambient_runner/middleware/secret_redaction.pycomponents/runners/ambient-runner/ambient_runner/platform/auth.pycomponents/runners/ambient-runner/ambient_runner/platform/context.pycomponents/runners/ambient-runner/ambient_runner/platform/prompts.pycomponents/runners/ambient-runner/ambient_runner/platform/utils.pycomponents/runners/ambient-runner/ambient_runner/tools/backend_api.pycomponents/runners/ambient-runner/architecture.mdcomponents/runners/ambient-runner/pyproject.tomlcomponents/runners/ambient-runner/tests/test_app_initial_prompt.pycomponents/runners/ambient-runner/tests/test_auto_push.pycomponents/runners/ambient-runner/tests/test_bridge_claude.pycomponents/runners/ambient-runner/tests/test_e2e_api.pycomponents/runners/ambient-runner/tests/test_events_endpoint.pycomponents/runners/ambient-runner/tests/test_gemini_auth.pycomponents/runners/ambient-runner/tests/test_gemini_cli_adapter.pycomponents/runners/ambient-runner/tests/test_gemini_session.pycomponents/runners/ambient-runner/tests/test_google_drive_e2e.pycomponents/runners/ambient-runner/tests/test_grpc_client.pycomponents/runners/ambient-runner/tests/test_grpc_transport.pycomponents/runners/ambient-runner/tests/test_grpc_writer.pycomponents/runners/ambient-runner/tests/test_shared_session_credentials.pydocs/internal/proposals/alpha-to-main-migration.md
💤 Files with no reviewable changes (10)
- components/runners/ambient-runner/tests/test_gemini_auth.py
- components/runners/ambient-runner/tests/test_e2e_api.py
- components/manifests/overlays/cluster-reader/service-account.yaml
- components/manifests/overlays/cluster-reader/kustomization.yaml
- components/runners/ambient-runner/tests/test_auto_push.py
- components/manifests/overlays/cluster-reader/README.md
- components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.py
- components/runners/ambient-runner/tests/test_gemini_cli_adapter.py
- components/runners/ambient-runner/ag_ui_gemini_cli/adapter.py
- components/runners/ambient-runner/tests/test_gemini_session.py
✅ Files skipped from review due to trivial changes (36)
- components/runners/ambient-runner/ag_ui_claude_sdk/reasoning_events.py
- components/manifests/overlays/local-dev/ambient-api-server-route.yaml
- components/runners/ambient-runner/ambient_runner/tools/backend_api.py
- components/manifests/base/rbac/kustomization.yaml
- components/runners/ambient-runner/pyproject.toml
- components/manifests/overlays/production/kustomization.yaml
- components/runners/ambient-runner/ag_ui_claude_sdk/handlers.py
- components/runners/ambient-runner/ambient_runner/middleware/secret_redaction.py
- components/runners/ambient-runner/ambient_runner/bridges/claude/prompts.py
- components/runners/ambient-runner/ambient_runner/bridges/langgraph/bridge.py
- components/runners/ambient-runner/ag_ui_claude_sdk/hooks.py
- components/runners/ambient-runner/tests/test_google_drive_e2e.py
- components/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yaml
- components/runners/ambient-runner/ambient_runner/bridges/claude/session.py
- components/manifests/overlays/mpp-openshift/ambient-control-plane-svc.yaml
- components/manifests/overlays/mpp-openshift/ambient-control-plane-sa.yaml
- components/runners/ambient-runner/ambient_runner/endpoints/tasks.py
- components/manifests/overlays/mpp-openshift/ambient-api-server-service-ca-patch.yaml
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-runtime-int.yaml
- components/manifests/overlays/mpp-openshift/ambient-tenant-ingress-netpol.yaml
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac.yaml
- components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/bridge.py
- components/manifests/overlays/production/ambient-api-server-route.yaml
- components/manifests/overlays/mpp-openshift/ambient-cp-tenant-sa.yaml
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-s0.yaml
- components/runners/ambient-runner/ambient_runner/platform/context.py
- components/manifests/overlays/mpp-openshift/tenant-rbac/kustomization.yaml
- components/runners/ambient-runner/ag_ui_claude_sdk/adapter.py
- components/manifests/overlays/mpp-openshift/kustomization.yaml
- components/manifests/base/kustomization.yaml
- components/manifests/overlays/openshift-dev/kustomization.yaml
- components/manifests/overlays/mpp-openshift/README.md
- components/manifests/base/rbac/control-plane-sa.yaml
- components/manifests/overlays/mpp-openshift/ambient-api-server-route.yaml
- components/runners/ambient-runner/tests/test_app_initial_prompt.py
- components/runners/ambient-runner/architecture.md
🚧 Files skipped from review as they are similar to previous changes (8)
- components/runners/ambient-runner/.mcp.json
- components/runners/ambient-runner/ambient_runner/bridge.py
- components/runners/ambient-runner/ambient_runner/endpoints/run.py
- components/runners/ambient-runner/ambient_runner/platform/prompts.py
- components/runners/ambient-runner/ambient_runner/bridges/claude/tools.py
- components/runners/ambient-runner/ag_ui_gemini_cli/types.py
- components/runners/ambient-runner/tests/test_grpc_writer.py
- components/runners/ambient-runner/ambient_runner/app.py
| metadata: | ||
| name: ambient-control-plane | ||
| labels: | ||
| app: ambient-control-plane | ||
| spec: | ||
| replicas: 1 | ||
| selector: | ||
| matchLabels: | ||
| app: ambient-control-plane | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app: ambient-control-plane | ||
| spec: | ||
| serviceAccountName: ambient-control-plane |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Deployment namespace + SA/secret refs
rg -n -C2 'kind:\s*Deployment|name:\s*ambient-control-plane|namespace:|serviceAccountName|ambient-control-plane-token' \
components/manifests/base/ambient-control-plane-service.yml
# SA/Secret namespaces
rg -n -C2 'kind:\s*ServiceAccount|kind:\s*Secret|name:\s*ambient-control-plane|ambient-control-plane-token|namespace:' \
components/manifests/base/rbac/control-plane-sa.yaml
# Check whether base kustomization enforces a namespace
rg -n -C2 '^namespace:|resources:' components/manifests/base/kustomization.yamlRepository: ambient-code/platform
Length of output: 1164
Add metadata.namespace: ambient-code to Deployment
This Deployment references serviceAccountName: ambient-control-plane and secret ambient-control-plane-token (lines 18, 36), both of which exist in namespace ambient-code. Without an explicit namespace in the Deployment metadata and no namespace transformer in kustomization, pod startup will fail with service account resolution errors. Add namespace: ambient-code to lines 4–7 metadata block.
Also applies to: 33-37
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/base/ambient-control-plane-service.yml` around lines 4 -
18, The Deployment lacks a namespace so the pod cannot resolve
serviceAccountName ambient-control-plane (and its secret
ambient-control-plane-token) which live in ambient-code; add metadata.namespace:
ambient-code to the Deployment top-level metadata block (the metadata section
for the resource that contains spec.replicas and selector) and likewise add
namespace: ambient-code to any other resource metadata blocks that reference
serviceAccountName ambient-control-plane (the template/spec area referenced
around lines 33–37) so the manifest explicitly targets the ambient-code
namespace.
| - apiGroups: [""] | ||
| resources: ["secrets", "serviceaccounts", "services", "pods"] | ||
| verbs: ["get", "list", "watch", "create", "update", "patch", "delete", "deletecollection"] |
There was a problem hiding this comment.
Cluster-wide secret/service mutation is over-privileged
Lines 22–24 grant create/update/delete (plus read/list/watch) on secrets and services at ClusterRole scope. Through the binding, this gives broad credential and service-plane control across namespaces, which is a high-risk blast radius.
As per coding guidelines, components/manifests/**/*.yaml: RBAC must follow least-privilege.
🧰 Tools
🪛 Trivy (0.69.3)
[error] 22-24: Manage secrets
ClusterRole 'ambient-control-plane' shouldn't have access to manage resource 'secrets'
Rule: KSV-0041
(IaC/Kubernetes)
[error] 22-24: Manage Kubernetes networking
ClusterRole 'ambient-control-plane' should not have access to resources ["services", "endpoints", "endpointslices", "networkpolicies", "ingresses"] for verbs ["create", "update", "patch", "delete", "deletecollection", "impersonate", "*"]
Rule: KSV-0056
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/base/rbac/control-plane-clusterrole.yaml` around lines
22 - 24, The ClusterRole currently grants broad create/update/delete on
cluster-scoped "secrets" and "services" via the resources: ["secrets",
"serviceaccounts", "services", "pods"] and verbs:
["get","list","watch","create","update","patch","delete","deletecollection"];
change this to follow least-privilege by removing mutation verbs for
cluster-scoped secrets/services (keep only read verbs if cluster-wide read is
needed) and/or move secrets/services mutations into a namespaced
Role/RoleBinding targeted at the specific namespaces that need it; update the
ClusterRole resource list (the resources array and verbs array) to only include
safe verbs for cluster scope and create a separate Role (with
create/update/patch/delete verbs) for "secrets" and "services" limited to the
required namespace(s), then bind that Role where necessary instead of using the
ClusterRole binding.
| - name: postgresql | ||
| image: registry.redhat.io/rhel9/postgresql-16:latest | ||
| ports: | ||
| - containerPort: 5432 | ||
| name: postgresql | ||
| env: | ||
| - name: POSTGRESQL_USER | ||
| valueFrom: | ||
| secretKeyRef: | ||
| key: db.user | ||
| name: ambient-api-server-db | ||
| - name: POSTGRESQL_PASSWORD | ||
| valueFrom: | ||
| secretKeyRef: | ||
| key: db.password | ||
| name: ambient-api-server-db | ||
| - name: POSTGRESQL_DATABASE | ||
| valueFrom: | ||
| secretKeyRef: | ||
| key: db.name | ||
| name: ambient-api-server-db | ||
| volumeMounts: | ||
| - name: ambient-api-server-db-data | ||
| mountPath: /var/lib/pgsql/data | ||
| subPath: pgdata | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - pg_isready -U "$POSTGRESQL_USER" | ||
| initialDelaySeconds: 10 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| livenessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - pg_isready -U "$POSTGRESQL_USER" | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 30 | ||
| timeoutSeconds: 5 | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: | ||
| - ALL | ||
| volumes: | ||
| - name: ambient-api-server-db-data | ||
| emptyDir: {} |
There was a problem hiding this comment.
Add resource requests/limits for the new PostgreSQL container.
This workload ships without a resources stanza, so scheduler placement and QoS are undefined for the DB pod.
Possible fix
containers:
- name: postgresql
image: registry.redhat.io/rhel9/postgresql-16:latest
+ resources:
+ requests:
+ cpu: 250m
+ memory: 512Mi
+ limits:
+ cpu: "1"
+ memory: 1Gi
ports:
- containerPort: 5432
name: postgresqlAs per coding guidelines, "components/manifests/**/*.yaml: - RBAC must follow least-privilege. Resource limits/requests required on containers."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: postgresql | |
| image: registry.redhat.io/rhel9/postgresql-16:latest | |
| ports: | |
| - containerPort: 5432 | |
| name: postgresql | |
| env: | |
| - name: POSTGRESQL_USER | |
| valueFrom: | |
| secretKeyRef: | |
| key: db.user | |
| name: ambient-api-server-db | |
| - name: POSTGRESQL_PASSWORD | |
| valueFrom: | |
| secretKeyRef: | |
| key: db.password | |
| name: ambient-api-server-db | |
| - name: POSTGRESQL_DATABASE | |
| valueFrom: | |
| secretKeyRef: | |
| key: db.name | |
| name: ambient-api-server-db | |
| volumeMounts: | |
| - name: ambient-api-server-db-data | |
| mountPath: /var/lib/pgsql/data | |
| subPath: pgdata | |
| readinessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - pg_isready -U "$POSTGRESQL_USER" | |
| initialDelaySeconds: 10 | |
| periodSeconds: 10 | |
| timeoutSeconds: 5 | |
| livenessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - pg_isready -U "$POSTGRESQL_USER" | |
| initialDelaySeconds: 30 | |
| periodSeconds: 30 | |
| timeoutSeconds: 5 | |
| securityContext: | |
| allowPrivilegeEscalation: false | |
| capabilities: | |
| drop: | |
| - ALL | |
| volumes: | |
| - name: ambient-api-server-db-data | |
| emptyDir: {} | |
| - name: postgresql | |
| image: registry.redhat.io/rhel9/postgresql-16:latest | |
| resources: | |
| requests: | |
| cpu: 250m | |
| memory: 512Mi | |
| limits: | |
| cpu: "1" | |
| memory: 1Gi | |
| ports: | |
| - containerPort: 5432 | |
| name: postgresql | |
| env: | |
| - name: POSTGRESQL_USER | |
| valueFrom: | |
| secretKeyRef: | |
| key: db.user | |
| name: ambient-api-server-db | |
| - name: POSTGRESQL_PASSWORD | |
| valueFrom: | |
| secretKeyRef: | |
| key: db.password | |
| name: ambient-api-server-db | |
| - name: POSTGRESQL_DATABASE | |
| valueFrom: | |
| secretKeyRef: | |
| key: db.name | |
| name: ambient-api-server-db | |
| volumeMounts: | |
| - name: ambient-api-server-db-data | |
| mountPath: /var/lib/pgsql/data | |
| subPath: pgdata | |
| readinessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - pg_isready -U "$POSTGRESQL_USER" | |
| initialDelaySeconds: 10 | |
| periodSeconds: 10 | |
| timeoutSeconds: 5 | |
| livenessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - pg_isready -U "$POSTGRESQL_USER" | |
| initialDelaySeconds: 30 | |
| periodSeconds: 30 | |
| timeoutSeconds: 5 | |
| securityContext: | |
| allowPrivilegeEscalation: false | |
| capabilities: | |
| drop: | |
| - ALL | |
| volumes: | |
| - name: ambient-api-server-db-data | |
| emptyDir: {} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml`
around lines 47 - 97, The PostgreSQL container (name: postgresql) is missing a
resources stanza; add a resources block to the container spec for postgresql
with appropriate requests and limits (CPU and memory) to ensure scheduler
placement and stable QoS (e.g., requests for CPU/memory and corresponding
limits), and apply the same policy to any similar DB containers; update the
container spec near readinessProbe/livenessProbe (within the same container
definition that mounts ambient-api-server-db-data) to include resources so the
pod gets Guaranteed/Burstable QoS as required by the manifests guidelines.
| volumes: | ||
| - name: ambient-api-server-db-data | ||
| emptyDir: {} |
There was a problem hiding this comment.
emptyDir makes this database state ephemeral.
Any pod restart, eviction, or rollout will wipe /var/lib/pgsql/data, which means ambient-api-server loses its backing store. This needs a PVC-backed volume (ideally behind a StatefulSet) unless the overlay is intentionally disposable.
Possible direction
- volumes:
- - name: ambient-api-server-db-data
- emptyDir: {}
+ volumes:
+ - name: ambient-api-server-db-data
+ persistentVolumeClaim:
+ claimName: ambient-api-server-db-data🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml`
around lines 95 - 97, The volume named ambient-api-server-db-data is configured
as emptyDir which makes /var/lib/pgsql/data ephemeral; replace this with a
PersistentVolumeClaim-backed volume and update the workload to use that PVC (or
convert the Deployment to a StatefulSet) so ambient-api-server retains
PostgreSQL data across restarts; create a matching PersistentVolumeClaim
resource (or reference an existing storageClass) and mount it where
ambient-api-server uses /var/lib/pgsql/data, ensuring the PVC name replaces
ambient-api-server-db-data in the pod spec and the claim has appropriate
accessMode and storage size for the database.
| - name: CP_TOKEN_URL | ||
| value: "http://ambient-control-plane.ambient-code--ambient-s0.svc:8080/token" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the configured CP token URL in this Deployment
rg -n -C2 'CP_TOKEN_URL|ambient-code--ambient-s0|ambient-control-plane' \
components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml
# Inspect service identity/namespace in the same overlay
rg -n -C2 'kind:\s*Service|name:\s*ambient-control-plane|namespace:' \
components/manifests/overlays/mpp-openshift/ambient-control-plane-svc.yaml
# Confirm both manifests are part of this overlay build
rg -n -C2 'resources:|ambient-control-plane.yaml|ambient-control-plane-svc.yaml' \
components/manifests/overlays/mpp-openshift/kustomization.yamlRepository: ambient-code/platform
Length of output: 1815
Fix namespace mismatch in CP_TOKEN_URL
Line 84 references ambient-code--ambient-s0 but the service is deployed in ambient-code--runtime-int (line 5 of the service manifest). The Deployment will fail to reach the token endpoint. Update the URL to http://ambient-control-plane.ambient-code--runtime-int.svc:8080/token.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml`
around lines 83 - 84, The CP_TOKEN_URL environment variable points to the wrong
namespace; update the value of CP_TOKEN_URL in the Deployment manifest from
"http://ambient-control-plane.ambient-code--ambient-s0.svc:8080/token" to use
the runtime-int namespace so the service can be reached (change to
"http://ambient-control-plane.ambient-code--runtime-int.svc:8080/token"); locate
the CP_TOKEN_URL entry in the ambient-control-plane Deployment/overlay and
replace the namespace portion of the URL accordingly so it matches the service's
namespace used by the ambient-control-plane service.
| assistant_text = next( | ||
| ( | ||
| m.get("content") or "" | ||
| for m in self._accumulated_messages | ||
| if m.get("role") == "assistant" | ||
| ), | ||
| "", | ||
| ) |
There was a problem hiding this comment.
Select the latest assistant reply, not the first one in the snapshot.
MESSAGES_SNAPSHOT is the whole conversation, so this picks the oldest assistant message. After the first turn, PushSessionMessage will resend stale text instead of the reply that just finished.
Suggested fix
assistant_text = next(
(
m.get("content") or ""
- for m in self._accumulated_messages
+ for m in reversed(self._accumulated_messages)
if m.get("role") == "assistant"
),
"",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assistant_text = next( | |
| ( | |
| m.get("content") or "" | |
| for m in self._accumulated_messages | |
| if m.get("role") == "assistant" | |
| ), | |
| "", | |
| ) | |
| assistant_text = next( | |
| ( | |
| m.get("content") or "" | |
| for m in reversed(self._accumulated_messages) | |
| if m.get("role") == "assistant" | |
| ), | |
| "", | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/runners/ambient-runner/ambient_runner/bridges/claude/grpc_transport.py`
around lines 420 - 427, The current assignment to assistant_text selects the
first (oldest) assistant message from self._accumulated_messages, causing stale
content to be resent; change the selection to pick the most recent assistant
message by iterating the messages in reverse (e.g.,
reversed(self._accumulated_messages) or indexing from the end) and return the
first item whose role == "assistant" (update the expression that computes
assistant_text to use the reversed sequence so PushSessionMessage uses the
latest assistant reply and not the oldest snapshot).
| queue: asyncio.Queue = existing | ||
| else: | ||
| queue = asyncio.Queue() | ||
| active_streams[thread_id] = queue |
There was a problem hiding this comment.
Unbounded queue contradicts docstring claim.
Line 49 docstring states "Creates a bounded asyncio.Queue" but line 72 creates asyncio.Queue() without maxsize. If the consumer is slow (network lag), this could grow unboundedly. Consider adding maxsize to match the bounded queue semantics documented and used in tests (maxsize=100).
🔧 Proposed fix
else:
- queue = asyncio.Queue()
+ queue = asyncio.Queue(maxsize=100)
active_streams[thread_id] = queue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| queue: asyncio.Queue = existing | |
| else: | |
| queue = asyncio.Queue() | |
| active_streams[thread_id] = queue | |
| queue: asyncio.Queue = existing | |
| else: | |
| queue = asyncio.Queue(maxsize=100) | |
| active_streams[thread_id] = queue |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/endpoints/events.py` around
lines 70 - 73, The code creates an unbounded asyncio.Queue (queue =
asyncio.Queue()) contrary to the function docstring and tests that expect a
bounded queue; update the queue creation in the active_streams initialization
branch to use a bounded queue (e.g., asyncio.Queue(maxsize=100)) so the variable
queue and the active_streams[thread_id] entry are created with a max size,
keeping the rest of the logic in this function (references: active_streams,
thread_id, queue) unchanged.
|
Already merged to main as 7f7227d via squash merge. Closing. |
Pull request was closed
7f7227d to
f23044a
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/runners/ambient-runner/tests/test_shared_session_credentials.py (1)
855-864:⚠️ Potential issue | 🟠 MajorCleanup fails when
_GH_WRAPPER_PATHis empty.Pipeline failure:
wrapper.unlink(missing_ok=True)raisesIsADirectoryErrorbecause_GH_WRAPPER_PATHis""at import time (set dynamically byinstall_gh_wrapper).Path("")resolves to.(current directory).Proposed fix
def _cleanup(self): """Remove wrapper artifacts created during tests.""" import ambient_runner.platform.auth as _auth_mod _auth_mod._gh_wrapper_installed = False - wrapper = Path(_GH_WRAPPER_PATH) - wrapper.unlink(missing_ok=True) + if _GH_WRAPPER_PATH: + wrapper = Path(_GH_WRAPPER_PATH) + if wrapper.exists() and wrapper.is_file(): + wrapper.unlink(missing_ok=True) wrapper_dir = Path(_GH_WRAPPER_DIR) - if wrapper_dir.exists() and not any(wrapper_dir.iterdir()): - wrapper_dir.rmdir() + if _GH_WRAPPER_DIR and wrapper_dir.exists(): + if wrapper_dir.is_dir() and not any(wrapper_dir.iterdir()): + wrapper_dir.rmdir()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/tests/test_shared_session_credentials.py` around lines 855 - 864, The test _cleanup function currently calls Path(_GH_WRAPPER_PATH).unlink which fails when _GH_WRAPPER_PATH is an empty string; modify _cleanup to first check that _GH_WRAPPER_PATH is a non-empty string and that Path(_GH_WRAPPER_PATH).exists() and is_file() before calling unlink, and only then attempt to remove the wrapper file; leave the existing logic that resets ambient_runner.platform.auth._gh_wrapper_installed and that removes _GH_WRAPPER_DIR only if it exists and is empty (use Path(...).iterdir() to check emptiness) so we never attempt to unlink the current directory when install_gh_wrapper hasn't set a path.
♻️ Duplicate comments (3)
components/manifests/base/rbac/control-plane-clusterrole.yaml (1)
22-24:⚠️ Potential issue | 🟠 MajorOver-privileged cluster-wide mutation on secrets/services
Line 22–24 gives cluster-wide mutate/delete on
secretsandservices, which is excessive for a ClusterRole and materially increases blast radius. Keep only read verbs here (if needed), and move mutation of these resources to namespace-scopedRole/RoleBindingin only the required namespaces.Suggested RBAC tightening (example)
- apiGroups: [""] - resources: ["secrets", "serviceaccounts", "services", "pods"] - verbs: ["get", "list", "watch", "create", "update", "patch", "delete", "deletecollection"] + resources: ["serviceaccounts", "pods"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] + +- apiGroups: [""] + resources: ["secrets", "services"] + verbs: ["get", "list", "watch"]As per coding guidelines,
components/manifests/**/*.yaml: RBAC must follow least-privilege.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/base/rbac/control-plane-clusterrole.yaml` around lines 22 - 24, The ClusterRole currently grants cluster-wide mutation and deletion on "secrets" and "services" via the verbs list in the ClusterRole resource; remove mutation verbs ("create","update","patch","delete","deletecollection") for the "secrets" and "services" resource entries in this ClusterRole and restrict them to only read verbs ("get","list","watch") if any cluster-wide read is required, then create namespace-scoped Role(s) and RoleBinding(s) that grant the necessary mutating verbs for "secrets" and "services" only in the specific namespaces that need them; update the ClusterRole's resources/verbs block (the ClusterRole resource definition containing resources: ["secrets","serviceaccounts","services","pods"] and verbs: [...]) and add corresponding Role/RoleBinding manifests targeting the required namespaces.components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml (1)
83-84:⚠️ Potential issue | 🔴 CriticalFix broken in-cluster token endpoint namespace
Line 84 points
CP_TOKEN_URLtoambient-code--ambient-s0, while this Deployment runs inambient-code--runtime-int; token fetches will fail.Suggested fix
- - name: CP_TOKEN_URL - value: "http://ambient-control-plane.ambient-code--ambient-s0.svc:8080/token" + - name: CP_TOKEN_URL + value: "http://ambient-control-plane.$(CP_RUNTIME_NAMESPACE).svc:8080/token"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml` around lines 83 - 84, CP_TOKEN_URL in the manifest is hardcoded to the wrong namespace; update the CP_TOKEN_URL environment variable value (the "value" for name: CP_TOKEN_URL) to point to the in-cluster token service in the current Deployment namespace (ambient-code--runtime-int) instead of ambient-code--ambient-s0 so token fetches succeed; locate the CP_TOKEN_URL env var in the ambient-control-plane Deployment and change the URL host segment to ambient-code--runtime-int (or derive it from the Pod namespace at runtime if preferred).components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml (1)
47-97:⚠️ Potential issue | 🟠 MajorPostgreSQL workload is under-specified and currently non-durable.
Line 47 defines a DB container without
resources, and Line 97 stores database data onemptyDir. That combination gives unpredictable scheduling/QoS and data loss on restart/eviction/rollout. Add CPU/memory requests+limits and move/var/lib/pgsql/datato a PVC-backed volume (or explicitly document this overlay as disposable).Suggested patch direction
containers: - name: postgresql image: registry.redhat.io/rhel9/postgresql-16:latest + resources: + requests: + cpu: 250m + memory: 512Mi + limits: + cpu: "1" + memory: 1Gi @@ volumes: - name: ambient-api-server-db-data - emptyDir: {} + persistentVolumeClaim: + claimName: ambient-api-server-db-dataAs per coding guidelines, "
components/manifests/**/*.yaml: - RBAC must follow least-privilege. Resource limits/requests required on containers."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml` around lines 47 - 97, The PostgreSQL container "postgresql" is missing resource requests/limits and is using an ephemeral volume "ambient-api-server-db-data" backed by emptyDir; add a resources block (cpu/memory requests and limits) to the container spec for QoS and predictable scheduling, replace the emptyDir volume with a PVC reference (volumes: name: ambient-api-server-db-data -> persistentVolumeClaim: claimName: e.g. ambient-api-server-db-pvc) so /var/lib/pgsql/data (volumeMount on that volume, subPath: pgdata) is durable, and ensure a matching PVC manifest is created or documented; keep existing readinessProbe/livenessProbe and securityContext unchanged.
🧹 Nitpick comments (1)
components/runners/ambient-runner/tests/test_grpc_client.py (1)
95-97: Catch the specific exception type from cryptography.
pytest.raises(Exception)is too broad. The cryptography library raisesValueErrorfor invalid PEM data.Proposed fix
def test_invalid_public_key_raises(self): - with pytest.raises(Exception): + with pytest.raises(ValueError): _encrypt_session_id("not a pem key", "session-id")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/tests/test_grpc_client.py` around lines 95 - 97, The test test_invalid_public_key_raises currently uses pytest.raises(Exception) which is too broad; change it to expect the specific exception type raised by the cryptography library (ValueError) when _encrypt_session_id is given invalid PEM input — update the pytest.raises(Exception) call in test_invalid_public_key_raises to pytest.raises(ValueError) so the test asserts the correct error from _encrypt_session_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml`:
- Line 48: Replace the non-deterministic image tag
"registry.redhat.io/rhel9/postgresql-16:latest" with a pinned immutable
reference (either a specific patch tag like postgresql-16.Y.Z or the image
digest sha256:...) in the container spec, and add a resources block under that
same container (resources.requests and resources.limits) to specify CPU and
memory for the DB workload (e.g., set reasonable DB-specific cpu/memory request
and higher limits) so the manifest is reproducible and meets resource
guidelines.
In `@components/manifests/overlays/mpp-openshift/ambient-api-server.yaml`:
- Around line 56-57: The manifest currently uses the floating tag
quay.io/ambient_code/vteam_api_server:latest for both the init and main
containers (seen on the image lines around where the init container and main
container are defined); update both occurrences to use immutable image digests
(quay.io/ambient_code/vteam_api_server@sha256:...) or a fixed release tag
instead of :latest so the deployed image is deterministic and cannot change
silently at runtime—replace the two image fields accordingly.
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml`:
- Line 25: The manifest uses mutable image tags (e.g., the image entry
"quay.io/ambient_code/vteam_control_plane:latest") which should be pinned;
update each image reference (the control-plane image at the "image:
quay.io/ambient_code/vteam_control_plane:latest" entry and the other two image
entries referenced at the same pattern on lines ~54 and ~86) to use either a
specific version tag or an immutable digest (sha256) instead of :latest, e.g.,
replace ":latest" with a concrete semver tag or "@sha256:..." for each image
reference to ensure deterministic, auditable releases.
In `@components/runners/ambient-runner/ambient_runner/bridge.py`:
- Around line 230-252: The docstring for inject_message is incorrect: it says
the default is a no-op but the method actually raises NotImplementedError;
update the inject_message docstring (the method in bridge.py) to state that the
default implementation raises NotImplementedError and that subclasses must
override inject_message to handle inbound messages (include mention of the
raised exception and the parameters session_id, event_type, payload for
clarity).
---
Outside diff comments:
In `@components/runners/ambient-runner/tests/test_shared_session_credentials.py`:
- Around line 855-864: The test _cleanup function currently calls
Path(_GH_WRAPPER_PATH).unlink which fails when _GH_WRAPPER_PATH is an empty
string; modify _cleanup to first check that _GH_WRAPPER_PATH is a non-empty
string and that Path(_GH_WRAPPER_PATH).exists() and is_file() before calling
unlink, and only then attempt to remove the wrapper file; leave the existing
logic that resets ambient_runner.platform.auth._gh_wrapper_installed and that
removes _GH_WRAPPER_DIR only if it exists and is empty (use Path(...).iterdir()
to check emptiness) so we never attempt to unlink the current directory when
install_gh_wrapper hasn't set a path.
---
Duplicate comments:
In `@components/manifests/base/rbac/control-plane-clusterrole.yaml`:
- Around line 22-24: The ClusterRole currently grants cluster-wide mutation and
deletion on "secrets" and "services" via the verbs list in the ClusterRole
resource; remove mutation verbs
("create","update","patch","delete","deletecollection") for the "secrets" and
"services" resource entries in this ClusterRole and restrict them to only read
verbs ("get","list","watch") if any cluster-wide read is required, then create
namespace-scoped Role(s) and RoleBinding(s) that grant the necessary mutating
verbs for "secrets" and "services" only in the specific namespaces that need
them; update the ClusterRole's resources/verbs block (the ClusterRole resource
definition containing resources: ["secrets","serviceaccounts","services","pods"]
and verbs: [...]) and add corresponding Role/RoleBinding manifests targeting the
required namespaces.
In `@components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml`:
- Around line 47-97: The PostgreSQL container "postgresql" is missing resource
requests/limits and is using an ephemeral volume "ambient-api-server-db-data"
backed by emptyDir; add a resources block (cpu/memory requests and limits) to
the container spec for QoS and predictable scheduling, replace the emptyDir
volume with a PVC reference (volumes: name: ambient-api-server-db-data ->
persistentVolumeClaim: claimName: e.g. ambient-api-server-db-pvc) so
/var/lib/pgsql/data (volumeMount on that volume, subPath: pgdata) is durable,
and ensure a matching PVC manifest is created or documented; keep existing
readinessProbe/livenessProbe and securityContext unchanged.
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml`:
- Around line 83-84: CP_TOKEN_URL in the manifest is hardcoded to the wrong
namespace; update the CP_TOKEN_URL environment variable value (the "value" for
name: CP_TOKEN_URL) to point to the in-cluster token service in the current
Deployment namespace (ambient-code--runtime-int) instead of
ambient-code--ambient-s0 so token fetches succeed; locate the CP_TOKEN_URL env
var in the ambient-control-plane Deployment and change the URL host segment to
ambient-code--runtime-int (or derive it from the Pod namespace at runtime if
preferred).
---
Nitpick comments:
In `@components/runners/ambient-runner/tests/test_grpc_client.py`:
- Around line 95-97: The test test_invalid_public_key_raises currently uses
pytest.raises(Exception) which is too broad; change it to expect the specific
exception type raised by the cryptography library (ValueError) when
_encrypt_session_id is given invalid PEM input — update the
pytest.raises(Exception) call in test_invalid_public_key_raises to
pytest.raises(ValueError) so the test asserts the correct error from
_encrypt_session_id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 4ba40ae2-323e-432e-bb47-2eae8c345c3f
📒 Files selected for processing (80)
components/manifests/base/ambient-control-plane-service.ymlcomponents/manifests/base/kustomization.yamlcomponents/manifests/base/rbac/control-plane-clusterrole.yamlcomponents/manifests/base/rbac/control-plane-clusterrolebinding.yamlcomponents/manifests/base/rbac/control-plane-sa.yamlcomponents/manifests/base/rbac/kustomization.yamlcomponents/manifests/overlays/cluster-reader/README.mdcomponents/manifests/overlays/cluster-reader/kustomization.yamlcomponents/manifests/overlays/cluster-reader/service-account.yamlcomponents/manifests/overlays/kind-local/operator-env-patch.yamlcomponents/manifests/overlays/local-dev/ambient-api-server-route.yamlcomponents/manifests/overlays/mpp-openshift/README.mdcomponents/manifests/overlays/mpp-openshift/ambient-api-server-args-patch.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-db.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-route.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-service-ca-patch.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane-sa.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane-svc.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane.yamlcomponents/manifests/overlays/mpp-openshift/ambient-cp-tenant-sa.yamlcomponents/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yamlcomponents/manifests/overlays/mpp-openshift/ambient-tenant-ingress-netpol.yamlcomponents/manifests/overlays/mpp-openshift/kustomization.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-runtime-int.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-s0.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/kustomization.yamlcomponents/manifests/overlays/openshift-dev/ambient-api-server-args-patch.yamlcomponents/manifests/overlays/openshift-dev/ambient-api-server-env-patch.yamlcomponents/manifests/overlays/openshift-dev/kustomization.yamlcomponents/manifests/overlays/production/ambient-api-server-route.yamlcomponents/manifests/overlays/production/kustomization.yamlcomponents/runners/ambient-runner/.mcp.jsoncomponents/runners/ambient-runner/ag_ui_claude_sdk/adapter.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/handlers.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/hooks.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/reasoning_events.pycomponents/runners/ambient-runner/ag_ui_gemini_cli/types.pycomponents/runners/ambient-runner/ambient_runner/_grpc_client.pycomponents/runners/ambient-runner/ambient_runner/_inbox_messages_api.pycomponents/runners/ambient-runner/ambient_runner/_session_messages_api.pycomponents/runners/ambient-runner/ambient_runner/app.pycomponents/runners/ambient-runner/ambient_runner/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/grpc_transport.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/mcp.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/prompts.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/tools.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/langgraph/bridge.pycomponents/runners/ambient-runner/ambient_runner/endpoints/events.pycomponents/runners/ambient-runner/ambient_runner/endpoints/run.pycomponents/runners/ambient-runner/ambient_runner/endpoints/tasks.pycomponents/runners/ambient-runner/ambient_runner/middleware/__init__.pycomponents/runners/ambient-runner/ambient_runner/middleware/grpc_push.pycomponents/runners/ambient-runner/ambient_runner/middleware/secret_redaction.pycomponents/runners/ambient-runner/ambient_runner/platform/auth.pycomponents/runners/ambient-runner/ambient_runner/platform/context.pycomponents/runners/ambient-runner/ambient_runner/platform/prompts.pycomponents/runners/ambient-runner/ambient_runner/platform/utils.pycomponents/runners/ambient-runner/ambient_runner/tools/backend_api.pycomponents/runners/ambient-runner/architecture.mdcomponents/runners/ambient-runner/pyproject.tomlcomponents/runners/ambient-runner/tests/test_app_initial_prompt.pycomponents/runners/ambient-runner/tests/test_auto_push.pycomponents/runners/ambient-runner/tests/test_bridge_claude.pycomponents/runners/ambient-runner/tests/test_e2e_api.pycomponents/runners/ambient-runner/tests/test_events_endpoint.pycomponents/runners/ambient-runner/tests/test_gemini_auth.pycomponents/runners/ambient-runner/tests/test_gemini_cli_adapter.pycomponents/runners/ambient-runner/tests/test_gemini_session.pycomponents/runners/ambient-runner/tests/test_google_drive_e2e.pycomponents/runners/ambient-runner/tests/test_grpc_client.pycomponents/runners/ambient-runner/tests/test_grpc_transport.pycomponents/runners/ambient-runner/tests/test_grpc_writer.pycomponents/runners/ambient-runner/tests/test_shared_session_credentials.pydocs/internal/proposals/alpha-to-main-migration.md
💤 Files with no reviewable changes (9)
- components/runners/ambient-runner/tests/test_gemini_auth.py
- components/manifests/overlays/cluster-reader/README.md
- components/runners/ambient-runner/tests/test_e2e_api.py
- components/runners/ambient-runner/tests/test_gemini_cli_adapter.py
- components/runners/ambient-runner/tests/test_gemini_session.py
- components/manifests/overlays/cluster-reader/kustomization.yaml
- components/runners/ambient-runner/tests/test_auto_push.py
- components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.py
- components/manifests/overlays/cluster-reader/service-account.yaml
✅ Files skipped from review due to trivial changes (37)
- components/runners/ambient-runner/ag_ui_claude_sdk/reasoning_events.py
- components/runners/ambient-runner/ag_ui_claude_sdk/handlers.py
- components/runners/ambient-runner/pyproject.toml
- components/manifests/overlays/local-dev/ambient-api-server-route.yaml
- components/runners/ambient-runner/tests/test_google_drive_e2e.py
- components/manifests/base/rbac/kustomization.yaml
- components/runners/ambient-runner/ag_ui_claude_sdk/hooks.py
- components/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yaml
- components/manifests/base/rbac/control-plane-sa.yaml
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-runtime-int.yaml
- components/runners/ambient-runner/ambient_runner/bridges/claude/prompts.py
- components/manifests/overlays/mpp-openshift/ambient-control-plane-sa.yaml
- components/manifests/overlays/production/kustomization.yaml
- components/manifests/base/kustomization.yaml
- components/manifests/overlays/mpp-openshift/ambient-control-plane-svc.yaml
- components/runners/ambient-runner/ambient_runner/endpoints/tasks.py
- components/runners/ambient-runner/.mcp.json
- components/runners/ambient-runner/ambient_runner/platform/context.py
- components/runners/ambient-runner/ambient_runner/middleware/secret_redaction.py
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-s0.yaml
- components/manifests/overlays/production/ambient-api-server-route.yaml
- components/manifests/overlays/mpp-openshift/ambient-cp-tenant-sa.yaml
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac.yaml
- components/manifests/overlays/mpp-openshift/ambient-api-server-service-ca-patch.yaml
- components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/bridge.py
- components/manifests/overlays/mpp-openshift/ambient-tenant-ingress-netpol.yaml
- components/runners/ambient-runner/ambient_runner/tools/backend_api.py
- components/runners/ambient-runner/ambient_runner/bridges/langgraph/bridge.py
- components/manifests/overlays/mpp-openshift/tenant-rbac/kustomization.yaml
- components/runners/ambient-runner/tests/test_grpc_writer.py
- components/runners/ambient-runner/ag_ui_claude_sdk/adapter.py
- components/manifests/overlays/mpp-openshift/README.md
- components/manifests/base/rbac/control-plane-clusterrolebinding.yaml
- components/manifests/base/ambient-control-plane-service.yml
- components/manifests/overlays/mpp-openshift/ambient-api-server-route.yaml
- components/runners/ambient-runner/architecture.md
- components/runners/ambient-runner/ambient_runner/bridges/claude/session.py
🚧 Files skipped from review as they are similar to previous changes (8)
- components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py
- docs/internal/proposals/alpha-to-main-migration.md
- components/manifests/overlays/mpp-openshift/kustomization.yaml
- components/runners/ambient-runner/tests/test_bridge_claude.py
- components/runners/ambient-runner/ambient_runner/platform/utils.py
- components/manifests/overlays/openshift-dev/kustomization.yaml
- components/runners/ambient-runner/ambient_runner/platform/prompts.py
- components/runners/ambient-runner/ambient_runner/app.py
| type: RuntimeDefault | ||
| containers: | ||
| - name: postgresql | ||
| image: registry.redhat.io/rhel9/postgresql-16:latest |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find mutable image tags across manifest overlays
rg -n --iglob 'components/manifests/**/*.yaml' '^\s*image:\s+\S+:latest\s*$'Repository: ambient-code/platform
Length of output: 2704
🏁 Script executed:
cat -n components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml | head -60Repository: ambient-code/platform
Length of output: 1835
🏁 Script executed:
cat -n components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml | sed -n '48,80p'Repository: ambient-code/platform
Length of output: 1378
🏁 Script executed:
cat -n components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml | sed -n '80,95p'Repository: ambient-code/platform
Length of output: 620
Add resource limits and pin DB image to immutable digest.
Line 48 uses :latest tag, weakening reproducibility and rollback safety. Additionally, the container lacks required resource limits and requests per manifest guidelines. Pin to an explicit patch version or digest, and set resource constraints (CPU/memory limits for database workload).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml` at
line 48, Replace the non-deterministic image tag
"registry.redhat.io/rhel9/postgresql-16:latest" with a pinned immutable
reference (either a specific patch tag like postgresql-16.Y.Z or the image
digest sha256:...) in the container spec, and add a resources block under that
same container (resources.requests and resources.limits) to specify CPU and
memory for the DB workload (e.g., set reasonable DB-specific cpu/memory request
and higher limits) so the manifest is reproducible and meets resource
guidelines.
| image: quay.io/ambient_code/vteam_api_server:latest | ||
| imagePullPolicy: Always |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and get its content
if [ -f "components/manifests/overlays/mpp-openshift/ambient-api-server.yaml" ]; then
echo "=== File exists, showing content with line numbers ==="
cat -n "components/manifests/overlays/mpp-openshift/ambient-api-server.yaml"
else
echo "File not found at specified path"
# Try to search for similar files
fd -t f "ambient-api-server.yaml"
fiRepository: ambient-code/platform
Length of output: 7736
Pin vteam_api_server images to immutable digests.
Lines 56 and 86 both use :latest, which can silently change at runtime. Use immutable digests or fixed release tags for both init and main containers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/overlays/mpp-openshift/ambient-api-server.yaml` around
lines 56 - 57, The manifest currently uses the floating tag
quay.io/ambient_code/vteam_api_server:latest for both the init and main
containers (seen on the image lines around where the init container and main
container are defined); update both occurrences to use immutable image digests
(quay.io/ambient_code/vteam_api_server@sha256:...) or a fixed release tag
instead of :latest so the deployed image is deterministic and cannot change
silently at runtime—replace the two image fields accordingly.
| type: RuntimeDefault | ||
| containers: | ||
| - name: ambient-control-plane | ||
| image: quay.io/ambient_code/vteam_control_plane:latest |
There was a problem hiding this comment.
Pin container references to immutable versions
Using :latest for control-plane, runner, and MCP images makes releases non-deterministic and increases supply-chain risk. Pin tags or digests.
Suggested fix pattern
- image: quay.io/ambient_code/vteam_control_plane:latest
+ image: quay.io/ambient_code/vteam_control_plane@sha256:<control-plane-digest>
...
- - name: RUNNER_IMAGE
- value: "quay.io/ambient_code/vteam_claude_runner:latest"
+ - name: RUNNER_IMAGE
+ value: "quay.io/ambient_code/vteam_claude_runner@sha256:<runner-digest>"
...
- - name: MCP_IMAGE
- value: "quay.io/ambient_code/vteam_mcp:latest"
+ - name: MCP_IMAGE
+ value: "quay.io/ambient_code/vteam_mcp@sha256:<mcp-digest>"Also applies to: 54-54, 86-86
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml` at
line 25, The manifest uses mutable image tags (e.g., the image entry
"quay.io/ambient_code/vteam_control_plane:latest") which should be pinned;
update each image reference (the control-plane image at the "image:
quay.io/ambient_code/vteam_control_plane:latest" entry and the other two image
entries referenced at the same pattern on lines ~54 and ~86) to use either a
specific version tag or an immutable digest (sha256) instead of :latest, e.g.,
replace ":latest" with a concrete semver tag or "@sha256:..." for each image
reference to ensure deterministic, auditable releases.
| async def inject_message( | ||
| self, session_id: str, event_type: str, payload: str | ||
| ) -> None: | ||
| """Inject an inbound session message into the active run. | ||
|
|
||
| Called by the run endpoint for each ``SessionMessage`` received via | ||
| ``WatchSessionMessages`` gRPC stream while a run is in progress. | ||
|
|
||
| Override in bridge subclasses to handle inbound messages — e.g. to | ||
| interrupt the current Claude turn and inject a new user message. | ||
|
|
||
| Default: no-op (inbound messages are silently dropped). | ||
|
|
||
| Args: | ||
| session_id: The session this message belongs to. | ||
| event_type: The message event type string. | ||
| payload: The raw JSON payload string. | ||
| """ | ||
| raise NotImplementedError( | ||
| f"{type(self).__name__} does not support inject_message " | ||
| f"(session_id={session_id!r}, event_type={event_type!r}). " | ||
| "Override inject_message() in your bridge subclass to handle inbound messages." | ||
| ) |
There was a problem hiding this comment.
Docstring contradicts implementation.
Line 241 says "Default: no-op (inbound messages are silently dropped)" but the implementation raises NotImplementedError. Update the docstring to reflect the actual behavior.
Proposed fix
- Default: no-op (inbound messages are silently dropped).
+ Default: raises NotImplementedError — override in subclasses.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def inject_message( | |
| self, session_id: str, event_type: str, payload: str | |
| ) -> None: | |
| """Inject an inbound session message into the active run. | |
| Called by the run endpoint for each ``SessionMessage`` received via | |
| ``WatchSessionMessages`` gRPC stream while a run is in progress. | |
| Override in bridge subclasses to handle inbound messages — e.g. to | |
| interrupt the current Claude turn and inject a new user message. | |
| Default: no-op (inbound messages are silently dropped). | |
| Args: | |
| session_id: The session this message belongs to. | |
| event_type: The message event type string. | |
| payload: The raw JSON payload string. | |
| """ | |
| raise NotImplementedError( | |
| f"{type(self).__name__} does not support inject_message " | |
| f"(session_id={session_id!r}, event_type={event_type!r}). " | |
| "Override inject_message() in your bridge subclass to handle inbound messages." | |
| ) | |
| async def inject_message( | |
| self, session_id: str, event_type: str, payload: str | |
| ) -> None: | |
| """Inject an inbound session message into the active run. | |
| Called by the run endpoint for each ``SessionMessage`` received via | |
| ``WatchSessionMessages`` gRPC stream while a run is in progress. | |
| Override in bridge subclasses to handle inbound messages — e.g. to | |
| interrupt the current Claude turn and inject a new user message. | |
| Default: raises NotImplementedError — override in subclasses. | |
| Args: | |
| session_id: The session this message belongs to. | |
| event_type: The message event type string. | |
| payload: The raw JSON payload string. | |
| """ | |
| raise NotImplementedError( | |
| f"{type(self).__name__} does not support inject_message " | |
| f"(session_id={session_id!r}, event_type={event_type!r}). " | |
| "Override inject_message() in your bridge subclass to handle inbound messages." | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/runners/ambient-runner/ambient_runner/bridge.py` around lines 230
- 252, The docstring for inject_message is incorrect: it says the default is a
no-op but the method actually raises NotImplementedError; update the
inject_message docstring (the method in bridge.py) to state that the default
implementation raises NotImplementedError and that subclasses must override
inject_message to handle inbound messages (include mention of the raised
exception and the parameters session_id, event_type, payload for clarity).
…hancements, and Kustomize overlays from alpha PR 6 of alpha-to-main migration (combined with PR 7 — manifests). Runners: - gRPC transport for session message streaming - gRPC client for control-plane token endpoint - Inbox and session messages APIs with delta buffer - Credential system: fetch/populate/clear, gh CLI wrapper - SSE flush-per-chunk, unbounded tap queue - CP OIDC token for backend credential fetches (RSA keypair auth) - New deps: cryptography, grpcio, protobuf - Tests: grpc_client, grpc_transport, grpc_writer, events_endpoint, app_initial_prompt, expanded bridge_claude and shared_session_credentials Manifests: - mpp-openshift overlay: NetworkPolicy, gRPC Route, CP token Service, RBAC, MCP sidecar, RoleBinding namespace fixes - production overlay updates - openshift-dev overlay - Removed deprecated cluster-reader overlay - All overlays pass kustomize build Migration plan updated: PRs 1-5 marked merged, PR 6+7 combined. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
TestGhWrapper._cleanup() used stale top-level imports of _GH_WRAPPER_PATH
(always ""), causing Path(".").unlink() → IsADirectoryError. Now reads from
the module object with empty-string guards. Also fixes e2e workflow Docker
build context from components/runners to components/runners/ambient-runner
so pyproject.toml is found during image build.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
cd8d939 to
c586206
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
components/runners/ambient-runner/tests/test_shared_session_credentials.py (1)
452-453: Avoid hard-coded request count in the test server loopLine 452 pins the handler loop to
6requests. This is brittle and can leave the thread blocked when provider calls change.Suggested change
- thread = Thread( - target=lambda: [server.handle_request() for _ in range(6)], daemon=True - ) + thread = Thread(target=server.serve_forever, daemon=True) thread.start() @@ finally: + server.shutdown() server.server_close() thread.join(timeout=2)Also applies to: 485-486
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/tests/test_shared_session_credentials.py` around lines 452 - 453, The test currently hard-codes a fixed number of calls to server.handle_request() (target=lambda: [server.handle_request() for _ in range(6)]) which is brittle; change the thread target to run the server loop until explicitly stopped (e.g., use server.serve_forever() in the thread or replace the fixed-list lambda with a while loop controlled by a threading.Event that calls server.handle_request() repeatedly), and ensure the test calls server.shutdown() or sets the stop event to terminate the thread; update both occurrences that use server.handle_request() so the test no longer depends on a hard-coded request count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@components/runners/ambient-runner/tests/test_shared_session_credentials.py`:
- Around line 452-453: The test currently hard-codes a fixed number of calls to
server.handle_request() (target=lambda: [server.handle_request() for _ in
range(6)]) which is brittle; change the thread target to run the server loop
until explicitly stopped (e.g., use server.serve_forever() in the thread or
replace the fixed-list lambda with a while loop controlled by a threading.Event
that calls server.handle_request() repeatedly), and ensure the test calls
server.shutdown() or sets the stop event to terminate the thread; update both
occurrences that use server.handle_request() so the test no longer depends on a
hard-coded request count.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 25d9e292-aeaf-424c-b317-588183a93a09
📒 Files selected for processing (2)
.github/workflows/e2e.ymlcomponents/runners/ambient-runner/tests/test_shared_session_credentials.py
clear_runtime_credentials was deleting the Google Workspace credentials file between turns, causing workspace-mcp to fall back to an inaccessible localhost OAuth flow. The file is now intentionally preserved. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (6)
components/manifests/base/rbac/control-plane-clusterrole.yaml (1)
17-27:⚠️ Potential issue | 🟠 MajorCluster-wide mutation permissions violate least-privilege.
Lines 18-20 grant cluster-wide RoleBinding CRUD; lines 22-27 grant cluster-wide secrets/services/pods/jobs CRUD. This allows control-plane to mutate sensitive resources and escalate privileges across all namespaces. Scope these to namespace-specific Roles/RoleBindings in project namespaces instead.
As per coding guidelines, RBAC must follow least-privilege.
🔒 Restrict to namespace-scoped Roles
Remove cluster-wide mutation verbs from ClusterRole:
-# RoleBindings (reconcile group access from ProjectSettings) -- apiGroups: ["rbac.authorization.k8s.io"] - resources: ["rolebindings"] - verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] -# Session runner resources (provision/deprovision per-session workloads in project namespaces) -- apiGroups: [""] - resources: ["secrets", "serviceaccounts", "services", "pods"] - verbs: ["get", "list", "watch", "create", "update", "patch", "delete", "deletecollection"] -- apiGroups: ["batch"] - resources: ["jobs"] - verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] +# RoleBindings (read-only cluster-wide; mutations via namespace Roles) ++ apiGroups: ["rbac.authorization.k8s.io"] + resources: ["rolebindings"] + verbs: ["get", "list", "watch"] +# Session runner resources (read-only cluster-wide; mutations via namespace Roles) ++ apiGroups: [""] + resources: ["secrets", "serviceaccounts", "services", "pods"] + verbs: ["get", "list", "watch"] ++ apiGroups: ["batch"] + resources: ["jobs"] + verbs: ["get", "list", "watch"]Create namespace-scoped Role (e.g.,
base/rbac/control-plane-project-role.yaml):apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: ambient-control-plane-project # Applied per project namespace via RoleBinding rules: - apiGroups: ["rbac.authorization.k8s.io"] resources: ["rolebindings"] verbs: ["create", "update", "patch", "delete"] - apiGroups: [""] resources: ["secrets", "serviceaccounts", "services", "pods"] verbs: ["create", "update", "patch", "delete", "deletecollection"] - apiGroups: ["batch"] resources: ["jobs"] verbs: ["create", "update", "patch", "delete"]Bind this Role in each project namespace where control-plane provisions resources.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/base/rbac/control-plane-clusterrole.yaml` around lines 17 - 27, The ClusterRole currently grants cluster-wide mutation permissions for rolebindings, secrets, serviceaccounts, services, pods and jobs; remove those mutation rules from the ClusterRole in control-plane-clusterrole.yaml and instead create a namespace-scoped Role (e.g., name it ambient-control-plane-project) that contains rules for rbacs rolebindings and core/batch resources with only the necessary verbs (create, update, patch, delete, deletecollection as appropriate) and then bind that Role into each project namespace via a RoleBinding; update any references that expect cluster-wide permissions to reference the new RoleBinding so control-plane provisioning is limited to project namespaces only.components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml (2)
83-84:⚠️ Potential issue | 🔴 CriticalFix CP token endpoint namespace mismatch.
Line 84 points to
ambient-code--ambient-s0, but this deployment is inambient-code--runtime-int(Line 5). This will break token fetches from control-plane.Suggested fix
- - name: CP_TOKEN_URL - value: "http://ambient-control-plane.ambient-code--ambient-s0.svc:8080/token" + - name: CP_TOKEN_URL + value: "http://ambient-control-plane.ambient-code--runtime-int.svc:8080/token"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml` around lines 83 - 84, The CP_TOKEN_URL environment variable is pointing to the wrong namespace (ambient-code--ambient-s0) causing token fetches to fail; update the CP_TOKEN_URL value to use the deployment's actual namespace (ambient-code--runtime-int) so the URL becomes http://ambient-control-plane.ambient-code--runtime-int.svc:8080/token and ensure CP_TOKEN_URL in the manifest for the ambient-control-plane container is the corrected value.
25-25:⚠️ Potential issue | 🟠 MajorReplace mutable
:latestimage tags with immutable versions.Line 25, Line 54, and Line 86 use mutable tags, which makes rollouts non-deterministic and weakens supply-chain control.
Suggested fix pattern
- image: quay.io/ambient_code/vteam_control_plane:latest + image: quay.io/ambient_code/vteam_control_plane@sha256:<control-plane-digest> ... - - name: RUNNER_IMAGE - value: "quay.io/ambient_code/vteam_claude_runner:latest" + - name: RUNNER_IMAGE + value: "quay.io/ambient_code/vteam_claude_runner@sha256:<runner-digest>" ... - - name: MCP_IMAGE - value: "quay.io/ambient_code/vteam_mcp:latest" + - name: MCP_IMAGE + value: "quay.io/ambient_code/vteam_mcp@sha256:<mcp-digest>"Also applies to: 54-54, 86-86
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml` at line 25, Replace all occurrences of the mutable image reference "quay.io/ambient_code/vteam_control_plane:latest" with an immutable identifier (either a fixed semver tag or an image digest like quay.io/ambient_code/vteam_control_plane@sha256:<digest>) so rollouts are deterministic; update every instance of that image string in the manifest (the three occurrences currently using :latest) and ensure any deployment/Pod specs that reference the image use the same pinned value.components/runners/ambient-runner/ambient_runner/middleware/grpc_push.py (1)
101-105:⚠️ Potential issue | 🟠 MajorAvoid blocking the async stream with synchronous gRPC pushes
Line 104 executes a synchronous network call in the async loop, which can stall streaming on slow/unavailable gRPC.
Suggested fix
+import asyncio ... try: async for event in event_stream: yield event - _push_event(grpc_client, sid, event) + # fire-and-forget without blocking the event loop + loop = asyncio.get_running_loop() + loop.run_in_executor(None, _push_event, grpc_client, sid, event)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/middleware/grpc_push.py` around lines 101 - 105, The async generator in grpc_push.py yields events while calling the synchronous _push_event (in the loop that iterates over event_stream), which can block the event stream; change the call to run _push_event off the event loop (e.g., use asyncio.to_thread or loop.run_in_executor) or convert _push_event into an async non-blocking function and await it, so the async for event in event_stream loop no longer performs blocking network I/O and the generator remains responsive.components/runners/ambient-runner/tests/test_grpc_writer.py (1)
27-34:⚠️ Potential issue | 🟠 Major
Noneclient path is still untestedLine 27 converts explicit
grpc_client=NoneintoMagicMock, so Line 211 does not exercise the no-client early return.Suggested fix
+_DEFAULT = object() -def make_writer(grpc_client=None): - if grpc_client is None: +def make_writer(grpc_client=_DEFAULT): + if grpc_client is _DEFAULT: grpc_client = MagicMock() return GRPCMessageWriter( session_id="sess-1", run_id="run-1", grpc_client=grpc_client, )Also applies to: 208-213
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/tests/test_grpc_writer.py` around lines 27 - 34, The helper make_writer currently replaces a passed grpc_client=None with a MagicMock so tests never exercise the no-client early-return in GRPCMessageWriter; update tests to explicitly construct GRPCMessageWriter(session_id="sess-1", run_id="run-1", grpc_client=None) or modify make_writer to accept a sentinel (e.g., keep_none=True) that preserves None so you can assert the early-return behavior when grpc_client is None (targeting make_writer and GRPCMessageWriter constructor/write path).components/runners/ambient-runner/ambient_runner/endpoints/run.py (1)
85-97:⚠️ Potential issue | 🟠 MajorPreserve gRPC persistence when
thread_idis absentLine 85 falls back to
"", so gRPC push can be disabled even thoughbridge.run()still succeeds via context/session fallback. Derivesession_idfromthread_idor bridge context.Suggested fix
- session_id = run_agent_input.thread_id or "" + session_id = ( + run_agent_input.thread_id + or getattr(getattr(bridge, "context", None), "session_id", None) + ) ... - session_id=session_id, + session_id=session_id,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/ambient_runner/endpoints/run.py` around lines 85 - 97, The code currently sets session_id = run_agent_input.thread_id or "" which disables grpc_push_middleware when thread_id is absent; change session_id derivation to use run_agent_input.thread_id if present else obtain the session id from the bridge's context (e.g., bridge.context.session_id or bridge.context.get("session_id")) so grpc_push_middleware receives a persistent session id even when thread_id is missing; update the session_id assignment near the grpc_push_middleware call (symbols: session_id, run_agent_input.thread_id, bridge.run, grpc_push_middleware) to prefer thread_id then bridge context before falling back to empty.
🧹 Nitpick comments (1)
components/runners/ambient-runner/tests/test_grpc_transport.py (1)
579-594: Minor: unusedclientvariable.Line 581 unpacks
clientbut only useswriter. Considerwriter, _client = self._writer()or justwriter, _ = self._writer().Otherwise, test coverage is comprehensive.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/runners/ambient-runner/tests/test_grpc_transport.py` around lines 579 - 594, The test test_push_offloaded_to_executor_not_inline unpacks an unused second value from self._writer(); change the unpack to ignore the unused client (e.g., writer, _ = self._writer() or writer, _client = self._writer()) so the unused-variable warning is resolved while keeping the call to self._writer() intact; update the unpack at the start of test_push_offloaded_to_executor_not_inline to use an underscore-prefixed variable name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml`:
- Around line 55-64: The overlay references two missing
Secrets—ambient-api-server (keys clientId, clientSecret) used by the
ambient-control-plane container env vars and ambient-api-server-db (keys
db.user, db.password, db.name) used by the ambient-api-server-db
Deployment—causing CreateContainerConfigError; add Kubernetes Secret manifests
named ambient-api-server and ambient-api-server-db to this overlay with the
referenced keys (clientId, clientSecret and db.user, db.password, db.name
respectively) or update the env/valueFrom to point to existing Secrets; ensure
the secret keys match exactly and are included in the overlay manifests so the
pods can mount them at deploy time.
In
`@components/manifests/overlays/openshift-dev/ambient-api-server-args-patch.yaml`:
- Around line 20-21: The patch currently disables authentication/authorization
via the flags --enable-jwt=false and --enable-authz=false while also allowing
wildcard CORS origins (origins: "*"), which is unsafe; update the manifest so
you do not ship both JWT and authz disabled together — re-enable at least one of
them (e.g., remove or set --enable-authz=true or --enable-jwt=true in the
ambient API server args) and replace the wildcard CORS entry with a restricted
dev-only origin list or make the CORS origin configurable via an
environment/overlay value (reference the flags --enable-jwt, --enable-authz and
the CORS origins entry) so dev defaults are safer and can be tightened in
CI/prod.
- Around line 10-12: Add an explicit securityContext and resource limits to the
api-server container: in the containers -> - name: api-server block, add a
container-level securityContext with runAsNonRoot: true, runAsUser (non-zero
UID), allowPrivilegeEscalation: false, privileged: false, capabilities: { drop:
["ALL"] }, readOnlyRootFilesystem: true and seccompProfile: { type:
"RuntimeDefault" } (or Pod-level equivalents if used); also add
resources.requests and resources.limits (cpu/memory) to satisfy resource policy;
ensure no conflicting fields remain and that these settings appear alongside the
existing command and args for the api-server container.
In `@components/runners/ambient-runner/ambient_runner/app.py`:
- Around line 272-280: The events_router is being registered twice via
app.include_router(events_router), causing duplicate route registration; locate
both occurrences of app.include_router(events_router) (one with the core
endpoints block and one in the between-run event stream block) and remove the
redundant include so events_router is included only once in the intended
location; ensure any related comments/documentation reflect the single
registration and keep app.include_router(run_router),
app.include_router(interrupt_router), and app.include_router(health_router)
as-is.
---
Duplicate comments:
In `@components/manifests/base/rbac/control-plane-clusterrole.yaml`:
- Around line 17-27: The ClusterRole currently grants cluster-wide mutation
permissions for rolebindings, secrets, serviceaccounts, services, pods and jobs;
remove those mutation rules from the ClusterRole in
control-plane-clusterrole.yaml and instead create a namespace-scoped Role (e.g.,
name it ambient-control-plane-project) that contains rules for rbacs
rolebindings and core/batch resources with only the necessary verbs (create,
update, patch, delete, deletecollection as appropriate) and then bind that Role
into each project namespace via a RoleBinding; update any references that expect
cluster-wide permissions to reference the new RoleBinding so control-plane
provisioning is limited to project namespaces only.
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml`:
- Around line 83-84: The CP_TOKEN_URL environment variable is pointing to the
wrong namespace (ambient-code--ambient-s0) causing token fetches to fail; update
the CP_TOKEN_URL value to use the deployment's actual namespace
(ambient-code--runtime-int) so the URL becomes
http://ambient-control-plane.ambient-code--runtime-int.svc:8080/token and ensure
CP_TOKEN_URL in the manifest for the ambient-control-plane container is the
corrected value.
- Line 25: Replace all occurrences of the mutable image reference
"quay.io/ambient_code/vteam_control_plane:latest" with an immutable identifier
(either a fixed semver tag or an image digest like
quay.io/ambient_code/vteam_control_plane@sha256:<digest>) so rollouts are
deterministic; update every instance of that image string in the manifest (the
three occurrences currently using :latest) and ensure any deployment/Pod specs
that reference the image use the same pinned value.
In `@components/runners/ambient-runner/ambient_runner/endpoints/run.py`:
- Around line 85-97: The code currently sets session_id =
run_agent_input.thread_id or "" which disables grpc_push_middleware when
thread_id is absent; change session_id derivation to use
run_agent_input.thread_id if present else obtain the session id from the
bridge's context (e.g., bridge.context.session_id or
bridge.context.get("session_id")) so grpc_push_middleware receives a persistent
session id even when thread_id is missing; update the session_id assignment near
the grpc_push_middleware call (symbols: session_id, run_agent_input.thread_id,
bridge.run, grpc_push_middleware) to prefer thread_id then bridge context before
falling back to empty.
In `@components/runners/ambient-runner/ambient_runner/middleware/grpc_push.py`:
- Around line 101-105: The async generator in grpc_push.py yields events while
calling the synchronous _push_event (in the loop that iterates over
event_stream), which can block the event stream; change the call to run
_push_event off the event loop (e.g., use asyncio.to_thread or
loop.run_in_executor) or convert _push_event into an async non-blocking function
and await it, so the async for event in event_stream loop no longer performs
blocking network I/O and the generator remains responsive.
In `@components/runners/ambient-runner/tests/test_grpc_writer.py`:
- Around line 27-34: The helper make_writer currently replaces a passed
grpc_client=None with a MagicMock so tests never exercise the no-client
early-return in GRPCMessageWriter; update tests to explicitly construct
GRPCMessageWriter(session_id="sess-1", run_id="run-1", grpc_client=None) or
modify make_writer to accept a sentinel (e.g., keep_none=True) that preserves
None so you can assert the early-return behavior when grpc_client is None
(targeting make_writer and GRPCMessageWriter constructor/write path).
---
Nitpick comments:
In `@components/runners/ambient-runner/tests/test_grpc_transport.py`:
- Around line 579-594: The test test_push_offloaded_to_executor_not_inline
unpacks an unused second value from self._writer(); change the unpack to ignore
the unused client (e.g., writer, _ = self._writer() or writer, _client =
self._writer()) so the unused-variable warning is resolved while keeping the
call to self._writer() intact; update the unpack at the start of
test_push_offloaded_to_executor_not_inline to use an underscore-prefixed
variable name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5de9ffc6-c173-42d3-a50d-6651218ebf70
📒 Files selected for processing (80)
components/manifests/base/ambient-control-plane-service.ymlcomponents/manifests/base/kustomization.yamlcomponents/manifests/base/rbac/control-plane-clusterrole.yamlcomponents/manifests/base/rbac/control-plane-clusterrolebinding.yamlcomponents/manifests/base/rbac/control-plane-sa.yamlcomponents/manifests/base/rbac/kustomization.yamlcomponents/manifests/overlays/cluster-reader/README.mdcomponents/manifests/overlays/cluster-reader/kustomization.yamlcomponents/manifests/overlays/cluster-reader/service-account.yamlcomponents/manifests/overlays/kind-local/operator-env-patch.yamlcomponents/manifests/overlays/local-dev/ambient-api-server-route.yamlcomponents/manifests/overlays/mpp-openshift/README.mdcomponents/manifests/overlays/mpp-openshift/ambient-api-server-args-patch.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-db.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-route.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server-service-ca-patch.yamlcomponents/manifests/overlays/mpp-openshift/ambient-api-server.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane-sa.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane-svc.yamlcomponents/manifests/overlays/mpp-openshift/ambient-control-plane.yamlcomponents/manifests/overlays/mpp-openshift/ambient-cp-tenant-sa.yamlcomponents/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yamlcomponents/manifests/overlays/mpp-openshift/ambient-tenant-ingress-netpol.yamlcomponents/manifests/overlays/mpp-openshift/kustomization.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-runtime-int.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-s0.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac.yamlcomponents/manifests/overlays/mpp-openshift/tenant-rbac/kustomization.yamlcomponents/manifests/overlays/openshift-dev/ambient-api-server-args-patch.yamlcomponents/manifests/overlays/openshift-dev/ambient-api-server-env-patch.yamlcomponents/manifests/overlays/openshift-dev/kustomization.yamlcomponents/manifests/overlays/production/ambient-api-server-route.yamlcomponents/manifests/overlays/production/kustomization.yamlcomponents/runners/ambient-runner/.mcp.jsoncomponents/runners/ambient-runner/ag_ui_claude_sdk/adapter.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/handlers.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/hooks.pycomponents/runners/ambient-runner/ag_ui_claude_sdk/reasoning_events.pycomponents/runners/ambient-runner/ag_ui_gemini_cli/types.pycomponents/runners/ambient-runner/ambient_runner/_grpc_client.pycomponents/runners/ambient-runner/ambient_runner/_inbox_messages_api.pycomponents/runners/ambient-runner/ambient_runner/_session_messages_api.pycomponents/runners/ambient-runner/ambient_runner/app.pycomponents/runners/ambient-runner/ambient_runner/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/grpc_transport.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/mcp.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/prompts.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/claude/tools.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/bridge.pycomponents/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.pycomponents/runners/ambient-runner/ambient_runner/bridges/langgraph/bridge.pycomponents/runners/ambient-runner/ambient_runner/endpoints/events.pycomponents/runners/ambient-runner/ambient_runner/endpoints/run.pycomponents/runners/ambient-runner/ambient_runner/endpoints/tasks.pycomponents/runners/ambient-runner/ambient_runner/middleware/__init__.pycomponents/runners/ambient-runner/ambient_runner/middleware/grpc_push.pycomponents/runners/ambient-runner/ambient_runner/middleware/secret_redaction.pycomponents/runners/ambient-runner/ambient_runner/platform/auth.pycomponents/runners/ambient-runner/ambient_runner/platform/context.pycomponents/runners/ambient-runner/ambient_runner/platform/prompts.pycomponents/runners/ambient-runner/ambient_runner/platform/utils.pycomponents/runners/ambient-runner/ambient_runner/tools/backend_api.pycomponents/runners/ambient-runner/architecture.mdcomponents/runners/ambient-runner/pyproject.tomlcomponents/runners/ambient-runner/tests/test_app_initial_prompt.pycomponents/runners/ambient-runner/tests/test_auto_push.pycomponents/runners/ambient-runner/tests/test_bridge_claude.pycomponents/runners/ambient-runner/tests/test_e2e_api.pycomponents/runners/ambient-runner/tests/test_events_endpoint.pycomponents/runners/ambient-runner/tests/test_gemini_auth.pycomponents/runners/ambient-runner/tests/test_gemini_cli_adapter.pycomponents/runners/ambient-runner/tests/test_gemini_session.pycomponents/runners/ambient-runner/tests/test_google_drive_e2e.pycomponents/runners/ambient-runner/tests/test_grpc_client.pycomponents/runners/ambient-runner/tests/test_grpc_transport.pycomponents/runners/ambient-runner/tests/test_grpc_writer.pycomponents/runners/ambient-runner/tests/test_shared_session_credentials.pydocs/internal/proposals/alpha-to-main-migration.md
💤 Files with no reviewable changes (9)
- components/runners/ambient-runner/tests/test_e2e_api.py
- components/manifests/overlays/cluster-reader/kustomization.yaml
- components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/session.py
- components/runners/ambient-runner/tests/test_gemini_session.py
- components/runners/ambient-runner/tests/test_auto_push.py
- components/runners/ambient-runner/tests/test_gemini_cli_adapter.py
- components/runners/ambient-runner/tests/test_gemini_auth.py
- components/manifests/overlays/cluster-reader/service-account.yaml
- components/manifests/overlays/cluster-reader/README.md
✅ Files skipped from review due to trivial changes (38)
- components/runners/ambient-runner/ambient_runner/middleware/secret_redaction.py
- components/runners/ambient-runner/tests/test_google_drive_e2e.py
- components/runners/ambient-runner/ag_ui_claude_sdk/reasoning_events.py
- components/manifests/overlays/local-dev/ambient-api-server-route.yaml
- components/runners/ambient-runner/ambient_runner/tools/backend_api.py
- components/runners/ambient-runner/ag_ui_claude_sdk/handlers.py
- components/runners/ambient-runner/ambient_runner/bridges/langgraph/bridge.py
- components/runners/ambient-runner/ag_ui_claude_sdk/hooks.py
- components/manifests/overlays/mpp-openshift/ambient-cp-token-netpol.yaml
- components/manifests/base/rbac/control-plane-sa.yaml
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-runtime-int.yaml
- components/runners/ambient-runner/.mcp.json
- components/manifests/overlays/mpp-openshift/ambient-control-plane-sa.yaml
- components/manifests/base/kustomization.yaml
- components/manifests/overlays/mpp-openshift/ambient-tenant-ingress-netpol.yaml
- components/manifests/overlays/mpp-openshift/ambient-cp-tenant-sa.yaml
- components/manifests/base/rbac/kustomization.yaml
- components/runners/ambient-runner/ambient_runner/endpoints/tasks.py
- components/manifests/overlays/production/ambient-api-server-route.yaml
- components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/bridge.py
- components/manifests/overlays/production/kustomization.yaml
- components/manifests/overlays/mpp-openshift/tenant-rbac/kustomization.yaml
- components/runners/ambient-runner/ambient_runner/platform/context.py
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac.yaml
- components/runners/ambient-runner/ag_ui_claude_sdk/adapter.py
- components/manifests/overlays/openshift-dev/kustomization.yaml
- components/manifests/overlays/mpp-openshift/ambient-api-server-service-ca-patch.yaml
- components/manifests/base/ambient-control-plane-service.yml
- components/runners/ambient-runner/ambient_runner/bridges/claude/session.py
- components/manifests/overlays/mpp-openshift/README.md
- components/manifests/overlays/mpp-openshift/tenant-rbac/ambient-control-plane-rbac-s0.yaml
- components/manifests/base/rbac/control-plane-clusterrolebinding.yaml
- components/manifests/overlays/mpp-openshift/ambient-api-server-route.yaml
- components/runners/ambient-runner/tests/test_events_endpoint.py
- components/runners/ambient-runner/tests/test_app_initial_prompt.py
- components/runners/ambient-runner/architecture.md
- components/manifests/overlays/mpp-openshift/kustomization.yaml
- components/manifests/overlays/mpp-openshift/ambient-control-plane-svc.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
- components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py
- components/runners/ambient-runner/pyproject.toml
- docs/internal/proposals/alpha-to-main-migration.md
- components/runners/ambient-runner/ambient_runner/bridges/claude/prompts.py
- components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py
| - name: OIDC_CLIENT_ID | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: ambient-api-server | ||
| key: clientId | ||
| - name: OIDC_CLIENT_SECRET | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: ambient-api-server | ||
| key: clientSecret |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find Secret/ExternalSecret/SealedSecret definitions for ambient-api-server.
rg -n -C3 'kind:\s*(Secret|ExternalSecret|SealedSecret)|name:\s*ambient-api-server|namespace:\s*ambient-code--runtime-int' components/manifests
# 2) Check kustomizations that include secret resources or generators.
rg -n -C4 'secretGenerator|resources:|ambient-api-server' components/manifests
# Expected:
# - At least one manifest path that provisions ambient-api-server for ambient-code--runtime-int,
# or a clearly referenced external secret mechanism included by the overlay.Repository: ambient-code/platform
Length of output: 50378
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check what's actually in the ambient-api-server.yaml file for mpp-openshift
echo "=== Content of mpp-openshift/ambient-api-server.yaml ==="
head -160 components/manifests/overlays/mpp-openshift/ambient-api-server.yaml | tail -30
# Look specifically for Secret definitions for ambient-api-server in mpp-openshift
echo -e "\n=== Searching for Secret kind in mpp-openshift overlay ==="
rg -n "^kind: Secret" components/manifests/overlays/mpp-openshift/
# Check mpp-openshift kustomization for secretGenerator or secret resources
echo -e "\n=== Searching mpp-openshift kustomization.yaml for secrets ==="
rg -n -A2 -B2 "secretGenerator|resources:" components/manifests/overlays/mpp-openshift/kustomization.yaml | head -60
# Check if ambient-api-server Secret exists explicitly
echo -e "\n=== Checking for Secret ambient-api-server in entire mpp-openshift ==="
rg -n "kind: Secret" components/manifests/overlays/mpp-openshift/ -A2 | grep -A2 "ambient-api-server" || echo "No ambient-api-server Secret found"Repository: ambient-code/platform
Length of output: 1349
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Final check: is there any ExternalSecret or SealedSecret for ambient-api-server in mpp-openshift?
echo "=== Checking for ExternalSecret/SealedSecret definitions ==="
rg -n "kind: (ExternalSecret|SealedSecret)" components/manifests/overlays/mpp-openshift/ || echo "None found"
# Check if ambient-api-server-db is defined (to see pattern for secrets in this overlay)
echo -e "\n=== Checking ambient-api-server-db.yaml to understand secret provisioning pattern ==="
rg -n "kind: Secret" components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml -A5 | head -20Repository: ambient-code/platform
Length of output: 223
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check if ambient-api-server-db.yaml exists and what it contains
echo "=== Checking ambient-api-server-db.yaml ==="
if [ -f components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml ]; then
cat components/manifests/overlays/mpp-openshift/ambient-api-server-db.yaml
else
echo "File does not exist"
fi
# List all files in mpp-openshift to see what's actually there
echo -e "\n=== Files in mpp-openshift overlay ==="
ls -la components/manifests/overlays/mpp-openshift/Repository: ambient-code/platform
Length of output: 3813
Add Secret ambient-api-server to the mpp-openshift overlay.
Lines 58–64 reference keys clientId and clientSecret from Secret ambient-api-server, which is not defined in this overlay. The pod will fail with CreateContainerConfigError on deployment.
Additionally, the ambient-api-server-db Deployment also references a missing ambient-api-server-db Secret (with keys db.user, db.password, db.name). Both Secrets must be provisioned before the overlay can run.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/overlays/mpp-openshift/ambient-control-plane.yaml`
around lines 55 - 64, The overlay references two missing
Secrets—ambient-api-server (keys clientId, clientSecret) used by the
ambient-control-plane container env vars and ambient-api-server-db (keys
db.user, db.password, db.name) used by the ambient-api-server-db
Deployment—causing CreateContainerConfigError; add Kubernetes Secret manifests
named ambient-api-server and ambient-api-server-db to this overlay with the
referenced keys (clientId, clientSecret and db.user, db.password, db.name
respectively) or update the env/valueFrom to point to existing Secrets; ensure
the secret keys match exactly and are included in the overlay manifests so the
pods can mount them at deploy time.
| containers: | ||
| - name: api-server | ||
| command: |
There was a problem hiding this comment.
Add explicit pod/container securityContext hardening
The container currently relies on default security context; set explicit non-root and no-privilege-escalation controls to avoid root-capable defaults (Line 10–12, Line 48–68).
Suggested hardening patch
spec:
template:
spec:
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: api-server
+ securityContext:
+ runAsNonRoot: true
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop:
+ - ALL
command:
- /usr/local/bin/ambient-api-serverAlso applies to: 48-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/manifests/overlays/openshift-dev/ambient-api-server-args-patch.yaml`
around lines 10 - 12, Add an explicit securityContext and resource limits to the
api-server container: in the containers -> - name: api-server block, add a
container-level securityContext with runAsNonRoot: true, runAsUser (non-zero
UID), allowPrivilegeEscalation: false, privileged: false, capabilities: { drop:
["ALL"] }, readOnlyRootFilesystem: true and seccompProfile: { type:
"RuntimeDefault" } (or Pod-level equivalents if used); also add
resources.requests and resources.limits (cpu/memory) to satisfy resource policy;
ensure no conflicting fields remain and that these settings appear alongside the
existing command and args for the api-server container.
| - --enable-jwt=false | ||
| - --enable-authz=false |
There was a problem hiding this comment.
Avoid shipping fully disabled auth with wildcard CORS
Line 20–21 disables both JWT and authz, and Line 39 allows * origins. Even for dev, this combination is high risk if route/network boundaries are misconfigured.
Safer dev defaults
- - --enable-jwt=false
- - --enable-authz=false
+ - --enable-jwt=true
+ - --enable-authz=true
@@
- - --cors-allowed-origins=*
+ - --cors-allowed-origins=https://ambient-dev.example.comAlso applies to: 39-40
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/manifests/overlays/openshift-dev/ambient-api-server-args-patch.yaml`
around lines 20 - 21, The patch currently disables authentication/authorization
via the flags --enable-jwt=false and --enable-authz=false while also allowing
wildcard CORS origins (origins: "*"), which is unsafe; update the manifest so
you do not ship both JWT and authz disabled together — re-enable at least one of
them (e.g., remove or set --enable-authz=true or --enable-jwt=true in the
ambient API server args) and replace the wildcard CORS entry with a restricted
dev-only origin list or make the CORS origin configurable via an
environment/overlay value (reference the flags --enable-jwt, --enable-authz and
the CORS origins entry) so dev defaults are safer and can be tightened in
CI/prod.
Tests were refactored to remove CREDENTIAL_IDS but _fetch_credential
still requires it to look up the credential_id and build the API URL.
Without it, the function returns {} immediately without hitting the
test HTTP server. Also restores credential_id-based response matching
in the lifecycle test.
713 passed, 11 skipped locally.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
…hot tests The session-list screenshot test failed because setTheme clicked the toggle button before it was visible in the DOM. Add explicit visibility wait with 10s timeout to setTheme, matching the pattern already used in the waitForThemeToggle setup step. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Merge Queue Status
This pull request spent 11 seconds in the queue, including 1 second running CI. Required conditions to merge |
|
| File | Component | Mode |
|---|---|---|
components/runners/ambient-runner/.mcp.json |
runner | warn |
components/runners/ambient-runner/architecture.md |
runner | warn |
components/runners/ambient-runner/pyproject.toml |
runner | warn |
No action required — these components are in warn mode. Consider using the component's agent workflow for future changes.
📖 Specs: Runner Spec · Runner Constitution
Summary
PR 6+7 of the alpha-to-main migration (plan).
Combined runners (PR 6) and manifests (PR 7) since all component dependencies have landed.
Runners (44 files, ~4.9K insertions)
cryptography>=42.0.0,grpcio>=1.60.0,protobuf>=4.25.0Manifests (33 files, ~980 insertions)
cluster-readeroverlayMigration plan updated
Verification
ruff check .✅ruff format --check .✅python -m pytest tests/✅ (707 passed, 11 skipped)kustomize build✅ (base, kind-local, mpp-openshift, openshift-dev, production)test_observability_mlflow_integration.py(missing optionalmlflowdep, unchanged file)Test plan
kustomize buildfor all overlaysvalidate-manifestsCI job passesAfter this PR
Only PR 7.1 (cleanup) remains — delete the migration plan file.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Infrastructure