fix(security,api-server,control-plane): CodeRabbit fixes, deletecollection fallback, idempotent agent start#1290
Conversation
… MCP, and control-plane - Add project ownership validation on credential Get/Patch/Delete/GetToken (IDOR) - Validate projectID format in credential List to prevent search injection - Guard MCP sidecar creation on CP token exchange config availability - Add nil-guard for TokenFunc in mention.NewResolver - Add language tag to SKILL.md fenced code block 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…forbidden The MPP tenant SA lacks deletecollection permission on pods, secrets, serviceaccounts, and services in runner namespaces. Session cleanup silently fails and orphaned pods accumulate consuming ~4Gi each. All Delete*ByLabel methods now try deletecollection first, and on 403 Forbidden fall back to list matching resources then delete individually. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…e session
If an agent already has a session in Pending, Creating, or Running phase,
POST /projects/{id}/agents/{agent_id}/start now returns 200 with the
existing session instead of creating a duplicate. New sessions are still
created with 201 when no active session exists.
Adds ActiveByAgentID to session DAO and service layers.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds active-session lookup and reuse to the start flow; tightens credential project ID validation and ownership checks; introduces DAO/service methods for active sessions and mock support; changes resolver construction to return errors and updates callers; adds DeleteCollection fallback logic; forwards HTTP(S)/NO_PROXY through control plane into spawned pods and MCP sidecar; minor documentation code-fence update. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant StartHandler
participant SessionService
participant SessionDAO
participant Database
Client->>StartHandler: POST /agents/:agent_id/start
StartHandler->>StartHandler: extract ctx & agent_id
StartHandler->>SessionService: ActiveByAgentID(ctx, agent_id)
SessionService->>SessionDAO: ActiveByAgentID(ctx, agent_id)
SessionDAO->>Database: Query latest session WHERE agent_id=? AND phase IN (Pending,Creating,Running)
alt active session found
Database-->>SessionDAO: session row
SessionDAO-->>SessionService: *Session
SessionService-->>StartHandler: *Session
StartHandler->>StartHandler: build StartResponse(existing)
StartHandler-->>Client: 200 OK + existing session
else no active session
Database-->>SessionDAO: no rows
SessionDAO-->>SessionService: nil
SessionService-->>StartHandler: nil
StartHandler->>StartHandler: unread inbox, create session, push start prompt, start session
StartHandler-->>Client: 200 OK + new session
end
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/ambient-control-plane/internal/reconciler/kube_reconciler.go (1)
408-409:⚠️ Potential issue | 🟠 MajorMCP enablement flag is inconsistent with sidecar creation.
On Line 423,
buildEnv(..., useMCPSidecar, ...)is computed fromMCPImageonly, but on Lines 444-449 the sidecar can be skipped when CP token config is missing. That leavesAMBIENT_MCP_URLconfigured without an MCP container, which can break delegation flows at runtime.Proposed fix
- useMCPSidecar := r.cfg.MCPImage != "" + hasMCPImage := r.cfg.MCPImage != "" + useMCPSidecar := hasMCPImage && r.cfg.CPTokenURL != "" && r.cfg.CPTokenPublicKey != "" containers := []interface{}{ map[string]interface{}{ @@ - "env": r.buildEnv(ctx, session, sdk, useMCPSidecar, credentialIDs), + "env": r.buildEnv(ctx, session, sdk, useMCPSidecar, credentialIDs), @@ - if useMCPSidecar { - if r.cfg.CPTokenURL == "" || r.cfg.CPTokenPublicKey == "" { + if hasMCPImage { + if !useMCPSidecar { r.logger.Warn().Str("session_id", session.ID).Msg("MCP sidecar skipped: CP_TOKEN_URL or CPTokenPublicKey not configured") } else { containers = append(containers, r.buildMCPSidecar(session.ID)) r.logger.Info().Str("session_id", session.ID).Msg("MCP sidecar enabled for session") } }As per coding guidelines, "Flag only errors, security risks, or functionality-breaking problems."
Also applies to: 423-423, 444-449
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/ambient-control-plane/internal/reconciler/kube_reconciler.go` around lines 408 - 409, The MCP sidecar decision is inconsistent: useMCPSidecar is set from MCPImage alone but the code later can skip creating the MCP container when the CP token config is missing, which can leave AMBIENT_MCP_URL set without a sidecar. Change the sidecar flag to reflect both presence of r.cfg.MCPImage and the CP token config check (make useMCPSidecar = r.cfg.MCPImage != "" && <same CP token condition used when skipping creation>), update the buildEnv(...) call to use that unified boolean, and ensure AMBIENT_MCP_URL (and any other MCP-related env) is not injected when useMCPSidecar is false; update the code paths that currently skip container creation to rely on the same useMCPSidecar variable (references: useMCPSidecar, r.cfg.MCPImage, buildEnv, and the CP token config check/sidecar creation block).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/ambient-api-server/plugins/agents/start_handler.go`:
- Around line 59-67: The handler currently ignores errors from
h.session.ActiveByAgentID; update the call to capture the error (e.g., existing,
svcErr := h.session.ActiveByAgentID(ctx, agentID)), check if svcErr != nil and
call handlers.HandleError(ctx, w, svcErr) then return, and only proceed to build
StartResponse and write the 200 JSON when existing != nil; keep using
StartResponse and sessions.PresentSession for the successful path.
In `@components/ambient-api-server/plugins/sessions/service.go`:
- Around line 269-275: The ActiveByAgentID method in sqlSessionService currently
swallows all DAO errors by returning (nil, nil); instead, check the error from
sessionDao.ActiveByAgentID using errors.Is(err, gorm.ErrRecordNotFound) and only
treat ErrRecordNotFound as a nil result, otherwise wrap/convert the real DB
error into a non-nil *errors.ServiceError and return it (do not return nil, nil
on real errors). Update the caller (the start handler that invokes
ActiveByAgentID) to fail fast when a non-nil ServiceError is returned rather
than proceeding to Create, so transient DB failures do not lead to duplicate
sessions.
---
Outside diff comments:
In `@components/ambient-control-plane/internal/reconciler/kube_reconciler.go`:
- Around line 408-409: The MCP sidecar decision is inconsistent: useMCPSidecar
is set from MCPImage alone but the code later can skip creating the MCP
container when the CP token config is missing, which can leave AMBIENT_MCP_URL
set without a sidecar. Change the sidecar flag to reflect both presence of
r.cfg.MCPImage and the CP token config check (make useMCPSidecar =
r.cfg.MCPImage != "" && <same CP token condition used when skipping creation>),
update the buildEnv(...) call to use that unified boolean, and ensure
AMBIENT_MCP_URL (and any other MCP-related env) is not injected when
useMCPSidecar is false; update the code paths that currently skip container
creation to rely on the same useMCPSidecar variable (references: useMCPSidecar,
r.cfg.MCPImage, buildEnv, and the CP token config check/sidecar creation block).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e6cdb68f-ed8b-46a9-8866-8d557bd7f3f3
📒 Files selected for processing (11)
.claude/skills/devflow/SKILL.mdcomponents/ambient-api-server/plugins/agents/start_handler.gocomponents/ambient-api-server/plugins/credentials/handler.gocomponents/ambient-api-server/plugins/sessions/dao.gocomponents/ambient-api-server/plugins/sessions/mock_dao.gocomponents/ambient-api-server/plugins/sessions/service.gocomponents/ambient-control-plane/internal/kubeclient/kubeclient.gocomponents/ambient-control-plane/internal/reconciler/kube_reconciler.gocomponents/ambient-mcp/mention/resolve.gocomponents/ambient-mcp/mention/resolve_test.gocomponents/ambient-mcp/tools/sessions.go
| if existing, _ := h.session.ActiveByAgentID(ctx, agentID); existing != nil { | ||
| resp := &StartResponse{ | ||
| Session: sessions.PresentSession(existing), | ||
| } | ||
| w.Header().Set("Content-Type", "application/json") | ||
| w.WriteHeader(http.StatusOK) | ||
| json.NewEncoder(w).Encode(resp) | ||
| return | ||
| } |
There was a problem hiding this comment.
Handler should propagate errors from ActiveByAgentID check.
Currently discards the error with _. If the service is updated to return real errors (per comment on service.go), this should be:
existing, svcErr := h.session.ActiveByAgentID(ctx, agentID)
if svcErr != nil {
handlers.HandleError(ctx, w, svcErr)
return
}
if existing != nil {
// return 200 with existing session
}This ensures idempotency isn't silently bypassed during DB issues.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/ambient-api-server/plugins/agents/start_handler.go` around lines
59 - 67, The handler currently ignores errors from h.session.ActiveByAgentID;
update the call to capture the error (e.g., existing, svcErr :=
h.session.ActiveByAgentID(ctx, agentID)), check if svcErr != nil and call
handlers.HandleError(ctx, w, svcErr) then return, and only proceed to build
StartResponse and write the 200 JSON when existing != nil; keep using
StartResponse and sessions.PresentSession for the successful path.
| func (s *sqlSessionService) ActiveByAgentID(ctx context.Context, agentID string) (*Session, *errors.ServiceError) { | ||
| session, err := s.sessionDao.ActiveByAgentID(ctx, agentID) | ||
| if err != nil { | ||
| return nil, nil | ||
| } | ||
| return session, nil | ||
| } |
There was a problem hiding this comment.
Swallowing all errors breaks idempotency under DB instability.
Returning (nil, nil) for any DAO error treats DB connection failures the same as "no active session found." Combined with the handler discarding the error (start_handler.go:59), a transient DB failure during the check could lead to duplicate session creation if the DB recovers before Create().
Consider distinguishing gorm.ErrRecordNotFound from other errors:
Proposed fix
func (s *sqlSessionService) ActiveByAgentID(ctx context.Context, agentID string) (*Session, *errors.ServiceError) {
session, err := s.sessionDao.ActiveByAgentID(ctx, agentID)
if err != nil {
+ if errors.Is(err, gorm.ErrRecordNotFound) {
+ return nil, nil
+ }
+ return nil, errors.GeneralError("failed to check active session: %v", err)
- return nil, nil
}
return session, nil
}Then update the handler to fail fast on real errors instead of proceeding to create.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (s *sqlSessionService) ActiveByAgentID(ctx context.Context, agentID string) (*Session, *errors.ServiceError) { | |
| session, err := s.sessionDao.ActiveByAgentID(ctx, agentID) | |
| if err != nil { | |
| return nil, nil | |
| } | |
| return session, nil | |
| } | |
| func (s *sqlSessionService) ActiveByAgentID(ctx context.Context, agentID string) (*Session, *errors.ServiceError) { | |
| session, err := s.sessionDao.ActiveByAgentID(ctx, agentID) | |
| if err != nil { | |
| if errors.Is(err, gorm.ErrRecordNotFound) { | |
| return nil, nil | |
| } | |
| return nil, errors.GeneralError("failed to check active session: %v", err) | |
| } | |
| return session, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/ambient-api-server/plugins/sessions/service.go` around lines 269 -
275, The ActiveByAgentID method in sqlSessionService currently swallows all DAO
errors by returning (nil, nil); instead, check the error from
sessionDao.ActiveByAgentID using errors.Is(err, gorm.ErrRecordNotFound) and only
treat ErrRecordNotFound as a nil result, otherwise wrap/convert the real DB
error into a non-nil *errors.ServiceError and return it (do not return nil, nil
on real errors). Update the caller (the start handler that invokes
ActiveByAgentID) to fail fast when a non-nil ServiceError is returned rather
than proceeding to Create, so transient DB failures do not lead to duplicate
sessions.
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…er and MCP sidecar pods 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/ambient-control-plane/internal/reconciler/kube_reconciler.go`:
- Around line 447-452: The code currently computes useMCPSidecar from MCPImage
only but still passes it into buildEnv causing AMBIENT_MCP_URL to be set even
when the CP token config is missing; change the logic to compute an enabled flag
(e.g., enabledMCPSidecar or enableMCPSidecar) that is true only when
useMCPSidecar is true AND r.cfg.CPTokenURL and r.cfg.CPTokenPublicKey are
non-empty, pass that new flag into buildEnv instead of useMCPSidecar, and only
call buildMCPSidecar/append the container when this combined flag is true so
AMBIENT_MCP_URL is not advertised when the sidecar is skipped (references:
useMCPSidecar, buildEnv, buildMCPSidecar, AMBIENT_MCP_URL).
- Around line 647-655: The code only injects uppercase proxy envs
(HTTP_PROXY/HTTPS_PROXY/NO_PROXY) which misses lowercase variants; update the
reconciler to append both uppercase and lowercase names when setting env (add
envVar("http_proxy", r.cfg.HTTPProxy), envVar("https_proxy", r.cfg.HTTPSProxy),
envVar("no_proxy", r.cfg.NoProxy) alongside the existing uppercase ones in the
code that constructs env using envVar and r.cfg.HTTPProxy/HTTPSProxy/NoProxy)
and also adjust the config loader in config.go to read lowercase fallbacks
(check os.Getenv("http_proxy") etc. when populating the same config fields) so
config values set via lowercase envs are preserved end-to-end.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8498cea5-7991-49af-88f3-e4c4136ec713
📒 Files selected for processing (4)
.claude/skills/ambient-pr-test/SKILL.mdcomponents/ambient-control-plane/cmd/ambient-control-plane/main.gocomponents/ambient-control-plane/internal/config/config.gocomponents/ambient-control-plane/internal/reconciler/kube_reconciler.go
✅ Files skipped from review due to trivial changes (1)
- .claude/skills/ambient-pr-test/SKILL.md
| if r.cfg.CPTokenURL == "" || r.cfg.CPTokenPublicKey == "" { | ||
| r.logger.Warn().Str("session_id", session.ID).Msg("MCP sidecar skipped: CP_TOKEN_URL or CPTokenPublicKey not configured") | ||
| } else { | ||
| containers = append(containers, r.buildMCPSidecar(session.ID)) | ||
| r.logger.Info().Str("session_id", session.ID).Msg("MCP sidecar enabled for session") | ||
| } |
There was a problem hiding this comment.
Don’t advertise MCP when the sidecar is skipped.
Lines 447-452 can omit the MCP container, but useMCPSidecar was already computed from MCPImage alone at Line 411 and passed into buildEnv at Line 426. That still sets AMBIENT_MCP_URL (Lines 607-609), so the runner will target localhost:8090 even though nothing is listening there.
Suggested fix
- useMCPSidecar := r.cfg.MCPImage != ""
+ useMCPSidecar := r.cfg.MCPImage != "" && r.cfg.CPTokenURL != "" && r.cfg.CPTokenPublicKey != ""
containers := []interface{}{
map[string]interface{}{
"name": "ambient-code-runner",
@@
- if useMCPSidecar {
- if r.cfg.CPTokenURL == "" || r.cfg.CPTokenPublicKey == "" {
- r.logger.Warn().Str("session_id", session.ID).Msg("MCP sidecar skipped: CP_TOKEN_URL or CPTokenPublicKey not configured")
- } else {
- containers = append(containers, r.buildMCPSidecar(session.ID))
- r.logger.Info().Str("session_id", session.ID).Msg("MCP sidecar enabled for session")
- }
+ if r.cfg.MCPImage != "" && !useMCPSidecar {
+ r.logger.Warn().Str("session_id", session.ID).Msg("MCP sidecar skipped: CP_TOKEN_URL or CPTokenPublicKey not configured")
+ }
+ if useMCPSidecar {
+ containers = append(containers, r.buildMCPSidecar(session.ID))
+ r.logger.Info().Str("session_id", session.ID).Msg("MCP sidecar enabled for session")
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/ambient-control-plane/internal/reconciler/kube_reconciler.go`
around lines 447 - 452, The code currently computes useMCPSidecar from MCPImage
only but still passes it into buildEnv causing AMBIENT_MCP_URL to be set even
when the CP token config is missing; change the logic to compute an enabled flag
(e.g., enabledMCPSidecar or enableMCPSidecar) that is true only when
useMCPSidecar is true AND r.cfg.CPTokenURL and r.cfg.CPTokenPublicKey are
non-empty, pass that new flag into buildEnv instead of useMCPSidecar, and only
call buildMCPSidecar/append the container when this combined flag is true so
AMBIENT_MCP_URL is not advertised when the sidecar is skipped (references:
useMCPSidecar, buildEnv, buildMCPSidecar, AMBIENT_MCP_URL).
| if r.cfg.HTTPProxy != "" { | ||
| env = append(env, envVar("HTTP_PROXY", r.cfg.HTTPProxy)) | ||
| } | ||
| if r.cfg.HTTPSProxy != "" { | ||
| env = append(env, envVar("HTTPS_PROXY", r.cfg.HTTPSProxy)) | ||
| } | ||
| if r.cfg.NoProxy != "" { | ||
| env = append(env, envVar("NO_PROXY", r.cfg.NoProxy)) | ||
| } |
There was a problem hiding this comment.
Forward lowercase proxy vars too, not just uppercase.
This path only injects HTTP_PROXY / HTTPS_PROXY / NO_PROXY, and components/ambient-control-plane/internal/config/config.go Lines 81-83 only read those uppercase forms. That drops deployments configured with lowercase proxy envs, and tools based on curl/libcurl in the runner or MCP sidecar may not honor uppercase HTTP_PROXY, so outbound calls can still bypass the proxy or fail.
Suggested fix
--- a/components/ambient-control-plane/internal/config/config.go
+++ b/components/ambient-control-plane/internal/config/config.go
@@
- HTTPProxy: os.Getenv("HTTP_PROXY"),
- HTTPSProxy: os.Getenv("HTTPS_PROXY"),
- NoProxy: os.Getenv("NO_PROXY"),
+ HTTPProxy: firstEnv("HTTP_PROXY", "http_proxy"),
+ HTTPSProxy: firstEnv("HTTPS_PROXY", "https_proxy"),
+ NoProxy: firstEnv("NO_PROXY", "no_proxy"),
}
@@
func envOrDefault(key, fallback string) string {
if v := os.Getenv(key); v != "" {
return v
}
return fallback
}
+
+func firstEnv(keys ...string) string {
+ for _, key := range keys {
+ if v := os.Getenv(key); v != "" {
+ return v
+ }
+ }
+ return ""
+}--- a/components/ambient-control-plane/internal/reconciler/kube_reconciler.go
+++ b/components/ambient-control-plane/internal/reconciler/kube_reconciler.go
@@
if r.cfg.HTTPProxy != "" {
- env = append(env, envVar("HTTP_PROXY", r.cfg.HTTPProxy))
+ env = append(env,
+ envVar("HTTP_PROXY", r.cfg.HTTPProxy),
+ envVar("http_proxy", r.cfg.HTTPProxy),
+ )
}
if r.cfg.HTTPSProxy != "" {
- env = append(env, envVar("HTTPS_PROXY", r.cfg.HTTPSProxy))
+ env = append(env,
+ envVar("HTTPS_PROXY", r.cfg.HTTPSProxy),
+ envVar("https_proxy", r.cfg.HTTPSProxy),
+ )
}
if r.cfg.NoProxy != "" {
- env = append(env, envVar("NO_PROXY", r.cfg.NoProxy))
+ env = append(env,
+ envVar("NO_PROXY", r.cfg.NoProxy),
+ envVar("no_proxy", r.cfg.NoProxy),
+ )
}Verify the current code only handles uppercase proxy vars end-to-end:
#!/bin/bash
set -euo pipefail
echo "== Proxy env loading from config =="
rg -n 'os\.Getenv\("(HTTP_PROXY|HTTPS_PROXY|NO_PROXY|http_proxy|https_proxy|no_proxy)"\)' \
components/ambient-control-plane/internal/config/config.go
echo
echo "== Proxy env injection into pods =="
rg -n 'envVar\("(HTTP_PROXY|HTTPS_PROXY|NO_PROXY|http_proxy|https_proxy|no_proxy)"' \
components/ambient-control-plane/internal/reconciler/kube_reconciler.goAlso applies to: 845-853
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/ambient-control-plane/internal/reconciler/kube_reconciler.go`
around lines 647 - 655, The code only injects uppercase proxy envs
(HTTP_PROXY/HTTPS_PROXY/NO_PROXY) which misses lowercase variants; update the
reconciler to append both uppercase and lowercase names when setting env (add
envVar("http_proxy", r.cfg.HTTPProxy), envVar("https_proxy", r.cfg.HTTPSProxy),
envVar("no_proxy", r.cfg.NoProxy) alongside the existing uppercase ones in the
code that constructs env using envVar and r.cfg.HTTPProxy/HTTPSProxy/NoProxy)
and also adjust the config loader in config.go to read lowercase fallbacks
(check os.Getenv("http_proxy") etc. when populating the same config fields) so
config values set via lowercase envs are preserved end-to-end.
Summary
deletecollection, session cleanup now falls back to list+delete individually — prevents orphaned pods accumulatingPOST /projects/{id}/agents/{agent_id}/startreturns 200 with existing active session instead of creating duplicates. New sessions still return 201.Test plan
go build ./...passes for api-server, control-plane, ambient-mcpgo vet ./...passes for all three🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests