From f897aac7c803e74e0510655930a11e00313b708c Mon Sep 17 00:00:00 2001 From: Sam Xu Date: Thu, 14 May 2026 23:02:00 -0700 Subject: [PATCH] feat(cloud-codex): codex CLI routes through LiteLLM, not direct chatgpt.com MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Multi-runtime ≠ multi-auth-surface. Codex CLI's runtime distinction (sandbox, tool use, sessions) is independent from where its HTTPS calls go. Point codex CLI at LiteLLM instead of chatgpt.com so: - single auth surface across openclaw and codex runtimes - one rotator, one cluster-bound auth.json (already established by PR #365) - per-agent codex login --device-auth no longer needed - per-agent /state/.codex/auth.json no longer needed - shared quota pool across all agents - LiteLLM observability captures all model traffic regardless of runtime What changes: - Boot script seeds ~/.codex/config.toml with model_provider=litellm, base_url pointing at LiteLLM service, wire_api=responses (matches the chatgpt/ bridge's Responses-API shape), env_key=LITELLM_API_KEY. - LITELLM_API_KEY exported from a k8s Secret (cloud-codex--litellm-key, optional so the pod can boot before the key exists; warning logged if missing). - Drops the "wait for /state/.codex/auth.json" gate — no longer needed since codex CLI no longer holds its own auth. Operator setup (per agent): 1. POST /api/registry/install (cloud-codex/) 2. Mint AgentInstallation runtime token → secret cloud-codex--token 3. Mint LiteLLM virtual key → secret cloud-codex--litellm-key 4. helm upgrade — pod boots, no device-auth needed The cloud-codex pod's PVC still holds /state/.commonly/tokens/.json (commonly agent run loop's CAP token); only the codex auth.json went away. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../agents/cloud-codex-deployment.yaml | 59 +++++++++++++------ k8s/helm/commonly/values.yaml | 6 ++ 2 files changed, 48 insertions(+), 17 deletions(-) diff --git a/k8s/helm/commonly/templates/agents/cloud-codex-deployment.yaml b/k8s/helm/commonly/templates/agents/cloud-codex-deployment.yaml index 369b00db..0a58edc9 100644 --- a/k8s/helm/commonly/templates/agents/cloud-codex-deployment.yaml +++ b/k8s/helm/commonly/templates/agents/cloud-codex-deployment.yaml @@ -147,25 +147,35 @@ spec: EOF chmod 600 /state/.commonly/tokens/${COMMONLY_AGENT_NAME}.json - # Wait for codex auth.json. ChatGPT binds OAuth to the IP that - # ran device-auth; running `codex login --device-auth` INSIDE - # this pod is the whole point. If auth.json is missing, sit - # idle and log clear instructions so the operator's first - # `kubectl exec` shows them exactly what to do. - if [ ! -s /state/.codex/auth.json ]; then - echo "[cloud-codex] no codex auth.json on PVC — waiting for device-auth" - echo "[cloud-codex] run this once to bind the cluster session:" - echo "[cloud-codex] kubectl exec -n {{ include "commonly.namespace" $ }} -it deploy/cloud-codex-{{ $name }} -- codex login --device-auth" - echo "[cloud-codex] (after completing in browser, the pod will resume on next reboot)" - # Sleep loop so operator can exec in. Restart-on-success is the - # cleanest UX — when auth.json appears, we want to re-enter the - # main path, and the simplest way to do that is a fresh boot. - while [ ! -s /state/.codex/auth.json ]; do sleep 10; done - echo "[cloud-codex] auth.json present — restarting to enter run loop" - exit 0 + # Seed ~/.codex/config.toml so codex CLI routes its model calls + # through LiteLLM instead of straight to chatgpt.com. The LiteLLM + # pod already holds cluster-IP-bound auth.json (rotator-managed, + # operator-device-auth'd), so this agent shares the same auth + # surface as every other openclaw moltbot agent — single quota + # pool, single rotation, single observability. + # + # Runtime stays codex: codex CLI still spawns, still sandboxes, + # still owns tool use and sessions. Only the HTTPS layer is proxied. + cat > /state/.codex/config.toml <.litellmBaseUrl. + litellmBaseUrl: http://litellm:4000/v1 # Per-agent map. Each key is the agent name that maps to an # AgentInstallation already created via /api/registry/install. The # token secret should be pre-populated with the cm_agent_* runtime