Skip to content

fix(control-plane,manifests): resolve MCP sidecar TLS and api-server proxy failures#1546

Merged
mergify[bot] merged 7 commits into
mainfrom
fix/mcp-sidecar-tls-and-backend-proxy
May 11, 2026
Merged

fix(control-plane,manifests): resolve MCP sidecar TLS and api-server proxy failures#1546
mergify[bot] merged 7 commits into
mainfrom
fix/mcp-sidecar-tls-and-backend-proxy

Conversation

@markturansky
Copy link
Copy Markdown
Contributor

@markturansky markturansky commented May 11, 2026

Summary

  • MCP sidecar TLS fix: The ambient-mcp sidecar container in runner pods was missing the OpenShift service-ca volume mount and SSL_CERT_FILE env var, causing x509: certificate signed by unknown authority errors when calling ambient-api-server over HTTPS (service-serving certs). Added both, matching the existing runner container configuration.
  • Backend proxy fix: The ambient-api-server reverse proxy plugin defaults BACKEND_URL to http://localhost:8080 when unset. Since the backend runs in a separate pod (backend-service:8080), all proxied requests (credentials, session listing) returned 502. Added BACKEND_URL to the production env patch.

Root Cause

These issues were exposed by the ambient-control-plane deployment fix (env[15] value/valueFrom conflict on CP_RUNTIME_NAMESPACE). Once the control-plane was redeployed with the correct manifest, the MCP sidecar started being injected into runner pods, surfacing the missing TLS config. The BACKEND_URL gap has existed since the api-server proxy plugin was introduced but was previously masked.

Test plan

  • Deploy updated ambient-control-plane image to cluster
  • Create a new session and verify MCP tools (list_sessions, list_projects) work without TLS errors
  • Verify credential fetching (github, gitlab, etc.) returns proper responses instead of 502
  • Verify acp_list_sessions works without 502

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Enhanced MCP sidecar container configuration to support TLS certificate validation and CA bundle access for secure internal communication.
    • Configured the API server with backend service endpoint URL settings for production deployments, enabling proper service-to-service connectivity in the ambient control plane.

user and others added 6 commits May 5, 2026 14:22
Initial draft of the security specification covering:
- OpenShift namespace-scoped build agent ServiceAccount
- Control Plane SA as the single SRE-owned cluster identity
- Per-project Vertex AI credential scoping
- User SSO token propagation into runners
- Integration credential lifecycle (Credential provider=*)
- Dynamic MCP credential watching (sidecar vs pod mode)
- Per-session ServiceAccount isolation (closing the shared-SA gap)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Move security.md to specs/security/security.spec.md. Extract HOW content
(credential authorization model, RBAC runtime grant semantics, proxy
authentication, design decisions) from ambient-model.spec.md into the
security spec. Model spec retains WHAT (schemas, endpoints, provider enum,
permission matrix, CLI mappings). Cross-references link both documents.

Also moves ambient-model.spec.md from specs/sessions/ to specs/api/ and
updates all references (BOOKMARKS.md, design README, devflow skill,
workflows).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Rewrite to Requirement:/Scenario: format with RFC 2119 keywords (SHALL/MUST/SHOULD)
- Fix broken GFM table (double pipe in Design Decisions header separator)
- Remove implementation details (file paths, function names) from spec
- Use "Project" consistently instead of "namespace" for Ambient boundary; add terminology note
- Register api/ and security/ domains in specs/index.spec.md
- Fix BOOKMARKS.md domain label (sessions -> api)
- Remove Draft/Authors/Last Updated metadata header to match other specs
- Replace fragile §N anchors with descriptive anchor links in model spec cross-refs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Credentials are now global resources bound to Projects via RoleBindings
instead of project-scoped with a project_id FK. This eliminates
duplication when the same PAT is used across multiple Projects. Adds
vertex and kubeconfig to the provider enum. Splits the security spec
Accounts and Tokens table into scoped subsections for independent
implementation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add credential:owner and credential:viewer roles for self-service CRUD
- Update security spec: credentials are global, bound via RoleBindings
- Fix all stale references: endpoint paths, K8s analogy, named patterns,
  design decisions (5-scope RBAC), key invariants, accounts table scopes
- Update cross-reference anchor from project-scoped to credential-access

Addresses review feedback from jsell-rh on PR #1514.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…proxy failures

The MCP sidecar in runner pods failed with "x509: certificate signed by
unknown authority" because it lacked the OpenShift service-ca volume mount
and SSL_CERT_FILE env var. Additionally, the ambient-api-server returned
502 on proxied requests (credentials, session listing) because BACKEND_URL
was never set, defaulting to unreachable localhost:8080.

- Add service-ca volume mount and SSL_CERT_FILE to MCP sidecar container
- Add BACKEND_URL to production ambient-api-server env patch

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7d3d3ebf-c100-4135-885e-442b255de7fe

📥 Commits

Reviewing files that changed from the base of the PR and between a455af3 and df04193.

📒 Files selected for processing (2)
  • components/ambient-control-plane/internal/reconciler/kube_reconciler.go
  • components/manifests/overlays/production/ambient-api-server-env-patch.yaml

📝 Walkthrough

Walkthrough

The PR configures TLS certificate access for the MCP sidecar container in the reconciler and sets the backend service URL for the production API server deployment through independent manifest and code updates.

Changes

MCP Sidecar TLS Configuration

Layer / File(s) Summary
Environment & Volume Setup
components/ambient-control-plane/internal/reconciler/kube_reconciler.go
buildMCPSidecar adds SSL_CERT_FILE environment variable and a read-only volumeMount for the service-ca volume, enabling the MCP sidecar to access the CA bundle at /etc/pki/ca-trust/extracted/pem/service-ca.crt.

API Server Backend Configuration

Layer / File(s) Summary
Environment Patch
components/manifests/overlays/production/ambient-api-server-env-patch.yaml
Production deployment adds BACKEND_URL environment variable pointing to http://backend-service.ambient-code.svc:8080 in the api-server container.
🚥 Pre-merge checks | ✅ 7 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Kubernetes Resource Safety ⚠️ Warning Child resources (Pods, Secrets, ServiceAccounts) missing OwnerReferences for garbage collection. Resource limits and security contexts are configured correctly. Add OwnerReferences to Pod, Secret, and ServiceAccount resources created in ensurePod, ensureVertexSecret, and ensureServiceAccount to enable automatic cleanup.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format with appropriate type (fix), scopes (control-plane, manifests), and clearly describes the two main fixes: MCP sidecar TLS configuration and api-server backend proxy setup.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed No performance issues. Changes are config additions only: SSL/TLS env var and volume mount, plus BACKEND_URL env var. No algorithmic complexity, loops, N+1, or unbounded growth.
Security And Secret Handling ✅ Passed PR adds TLS cert config to MCP sidecar and BACKEND_URL for internal K8s routing. No plaintext secrets, auth gaps, injection vulnerabilities, or sensitive data leakage. Clean on all security checks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mcp-sidecar-tls-and-backend-proxy
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/mcp-sidecar-tls-and-backend-proxy

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify Bot added the queued label May 11, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 11, 2026

Merge Queue Status

  • Entered queue2026-05-11 14:02 UTC · Rule: default
  • Checks skipped · PR is already up-to-date
  • Merged2026-05-11 14:03 UTC · at df0419308f0f33eda38e344c0a24f7040eeb5045 · squash

This pull request spent 42 seconds in the queue, including 9 seconds running CI.

Required conditions to merge

@mergify mergify Bot merged commit 136db29 into main May 11, 2026
39 checks passed
@mergify mergify Bot deleted the fix/mcp-sidecar-tls-and-backend-proxy branch May 11, 2026 14:03
@mergify mergify Bot removed the queued label May 11, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented May 11, 2026

Deploy Preview for cheerful-kitten-f556a0 failed.

Name Link
🔨 Latest commit 858dc3f
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a01debd13670000083fbd95

markturansky added a commit that referenced this pull request May 11, 2026
…nt-code namespace (#1553)

## Summary
- Adds a `NetworkPolicy` (`allow-from-runner-namespaces`) to the base
kustomize manifests that permits ingress from runner pods (`app:
ambient-code-runner`) in any namespace to the `ambient-code` namespace
- Fixes `INITIAL_PROMPT TimeoutError` caused by default-deny
NetworkPolicies blocking cross-namespace traffic from runner pods to
`backend-service`
- Already applied on the live `hcmais01ue1` cluster; this PR codifies
the fix in manifests

## Context
This is a follow-up to PR #1546 (MCP sidecar TLS + BACKEND_URL fixes).
The NetworkPolicy was applied on-cluster during incident response but
was not included in the merged PR.

## Test plan
- [x] Verified on live cluster: runner pods can reach `backend-service`
and `INITIAL_PROMPT` succeeds
- [ ] `kustomize build components/manifests/overlays/production` renders
without errors
- [ ] Next production deploy includes the NetworkPolicy

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added network security policy to restrict ingress traffic, allowing
connections only from designated runner pods in their respective
namespaces.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: user <u@example.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant