fix(cloud-codex): tokens/<name>.json + drop --instance + ca-certificates#363
Merged
Conversation
…uster
ADR-005 variant of the sam-local-codex laptop wrapper: same `commonly agent
run <name>` poll loop + codex CLI, but running in a cluster-side pod
instead of an operator's machine.
Motivation: ChatGPT binds OAuth sessions to the IP/device that completed
device-auth. A session device-auth'd on a laptop and then used by LiteLLM
from the cluster IP gets `token_invalidated` immediately — confirmed
empirically on dev today (probe of fresh tokens against /backend-api/codex/
responses from the cluster returned 401 INVALIDATED within seconds of
upload). When `codex login --device-auth` runs INSIDE this pod, the
cluster IP signs the device-auth AND signs subsequent CLI calls — no
mismatch, no anti-abuse revoke.
What this PR adds:
- `templates/agents/cloud-codex-deployment.yaml`: ranged Deployment + PVC
per `.Values.agents.cloudCodex.agents.<name>`. Mirrors the
clawdbot-deployment codex-tools-installer init container for binary
setup, then main container runs `commonly agent run <name>`.
- `values.yaml`: top-level `agents.cloudCodex` block (disabled by default).
- `values-dev.yaml`: enables one agent (`cody`) bound to the demo pod.
Operator flow (one-time per agent install):
1. POST /api/registry/install with agentName=cloud-codex + instanceId=<name>
2. Mint a runtime token; put in k8s Secret cloud-codex-<name>-token
3. `helm upgrade` — pod boots; init installs CLIs
4. `kubectl exec -it deploy/cloud-codex-<name> -- codex login --device-auth`
(completes in operator's browser; auth.json lands on the PVC at
/state/.codex/auth.json — bound to cluster IP)
5. Restart pod; main container picks up auth.json and starts the run
loop. Replies use codex CLI in this pod, so OpenAI sees one stable
client = one stable session = no invalidation.
Identity continuity (ADR-001 §3): AgentInstallation + User row predate
this pod and survive its restart/redeploy. The pod is just runtime.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues caught when the first cloud-codex-cody pod tried to boot
into the agent run loop:
1. `commonly agent run <name>` doesn't accept a `--instance` flag —
the template was passing it and the CLI exited "unknown option".
The instance/token resolution happens via the per-agent token file,
not via the run subcommand.
2. The container was writing `~/.commonly/config.json` with an
`agentTokens` block — that's not the shape `loadAgentToken()` reads.
The CLI looks at `~/.commonly/tokens/<name>.json` (one file per
agent, per saveAgentToken in cli/src/commands/agent.js). Switch to
that file shape so the run subcommand actually finds the token.
3. node:22-bookworm-slim doesn't ship ca-certificates, so codex CLI's
outbound TLS to auth.openai.com / api.openai.com fails ("error
sending request"). Install ca-certificates at boot (idempotent —
apt skips what's there) so the device-auth + run loop work.
Also split COMMONLY_REGISTRY_AGENT_NAME from COMMONLY_AGENT_NAME so
the local file alias and the server-side registry agentName can
diverge (Cody's local alias is "cody"; her registry install is
"cloud-codex").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n shape + drop --instance
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three boot-time issues caught running the first cloud-codex-cody pod:
/.commonly/config.json` with `agentTokens` block; CLI's `loadAgentToken()` reads `/.commonly/tokens/.json`.Also splits `COMMONLY_REGISTRY_AGENT_NAME` from `COMMONLY_AGENT_NAME` so the local file alias and the server-side registry agentName can diverge (Cody's local alias is "cody"; her registry install is "cloud-codex").
Test plan
🤖 Generated with Claude Code