fix(codex): route ChatGPT OAuth via CODEX_HOME env, bump OpenClaw to 2026.4.25 by prez2307 · Pull Request #404 · Isol8AI/isol8

prez2307 · 2026-04-28T03:09:08Z

Summary

The chatgpt_oauth provisioning path has been broken since it shipped. Two compounding bugs:

We wrote {"openai-codex": {"codexHome": "..."}} to openclaw.json. codexHome is not an OpenClaw config field in any version, on any branch — we invented it. OpenClaw rejected the config and crashed the container. (Verified by reading upstream src + git log -S.)
Even with valid config, our pinned v2026.4.5 has no pre-staged-auth.json support at all — Codex auth in 4.5 was interactive-browser-only, broken in headless ECS regardless. Pre-staged-auth.json reading first landed in v2026.4.7 (commit 7e0e2f81e5 added extensions/openai/openai-codex-cli-auth.ts).

What this PR does

OpenClaw bump: openclaw-version.json + apps/infra/openclaw/Dockerfile → alpine/openclaw:2026.4.25 (current stable, includes hardened OAuth refresh on top of the 4.7 base feature). Per-env tags set to <upstream>-bootstrap so build-openclaw-image.yml rebuilds the extended image.
Drop fake config knob: core/containers/config.py no longer emits the openai-codex provider block. Omitted entirely — the bundled provider plugin's defaults apply (and an empty {} would still fail the base schema's baseUrl/models validator).
Use the real one: core/containers/ecs_manager.py injects CODEX_HOME=/home/node/.openclaw/codex as an environment: entry on the per-user task. The path is the in-container view of <EFS>/users/{user_id}/codex/ after the access-point chroot — exactly where pre_stage_codex_auth already writes auth.json.

Verification

Upstream source-grounded confirmation of the fix shape:

// extensions/openai/openai-codex-cli-auth.ts (HEAD)
function resolveCodexCliHome(env) {
  const configured = trimNonEmptyString(env.CODEX_HOME);
  if (!configured) return path.join(resolveRequiredHomeDir(), ".codex");
  return path.resolve(configured);
}
function readCodexCliAuthFile(env) {
  const authPath = path.join(resolveCodexCliHome(env), "auth.json");
  ...
}

Test plan

pytest tests/unit — 1085 pass
test_config_provider_routing.test_chatgpt_oauth_branch updated to assert the openai-codex block is absent
test_provision_chatgpt_oauth_pre_stages_auth_before_service_create extended to assert CODEX_HOME env var lands on the per-user task
Watch deploy + smoke-test on dev with a fresh ChatGPT OAuth flow
Follow-up PR to bump per-env tags from <upstream>-bootstrap → <upstream>-<sha> once build-openclaw-image finishes

Caveats

The bootstrap tag intentionally won't resolve. CDK deploy on this PR's main-merge will fail to pull the image until the follow-up tag-bump PR lands. That's the project's documented two-PR flow for OpenClaw image bumps; not a regression.
Once the new image is in ECR, existing dev containers need to be re-provisioned to pick up the new openclaw.json shape and CODEX_HOME env. We'll do that as part of the dev clean test.

🤖 Generated with Claude Code

…2026.4.25 The chatgpt_oauth path has been broken since it shipped: * `_provider_block` wrote `{"openai-codex": {"codexHome": "..."}}` to openclaw.json. `codexHome` is NOT an OpenClaw config key — it doesn't exist anywhere in upstream, on any tag or branch. We invented it. * OpenClaw 2026.4.5 (our pin) had zero pre-staged-auth.json support anyway. Codex auth in 4.5 only worked via interactive browser OAuth, which is broken in headless ECS regardless of config shape. The actual upstream contract (added in v2026.4.7, extensions/openai/openai-codex-cli-auth.ts): * Read `${CODEX_HOME}/auth.json` (default `${HOME}/.codex/auth.json`). * Expected payload: `{"auth_mode":"chatgpt","tokens":{access_token, refresh_token, account_id?}}` — same shape we already write in `pre_stage_codex_auth`. Three coordinated changes: 1. `openclaw-version.json` + `apps/infra/openclaw/Dockerfile` → `alpine/openclaw:2026.4.25` (4.7 minimum required for the feature; 4.25 is current stable with hardened OAuth refresh logic). Per-env tags reset to `<upstream>-bootstrap` so build-openclaw-image.yml rebuilds the extended image; the real `<upstream>-<sha>` tag will land in a follow-up PR (matches the CI workflow's documented flow). 2. `core/containers/config.py` → drop the bogus `codexHome` entry. Omit the `openai-codex` provider block entirely so the bundled provider plugin's defaults apply; an empty `{}` would still fail the base-schema validator (which requires `baseUrl` + `models`). 3. `core/containers/ecs_manager.py` → on the chatgpt_oauth path, inject `CODEX_HOME=/home/node/.openclaw/codex` as an `environment:` entry on the per-user task definition. The path is the in-container view of `<EFS>/users/{user_id}/codex/` after the per-user EFS access point chroots `/users/{user_id}` to `/home/node/.openclaw`. `pre_stage_codex_auth` already writes `auth.json` to that backend-side path, so OpenClaw finds it cold on first boot. Tests: * `test_config_provider_routing` updated — chatgpt_oauth must NOT emit an `openai-codex` provider entry. * `test_provision_chatgpt_oauth_pre_stages_auth_before_service_create` extended to assert the per-user task carries the CODEX_HOME env var. * All 1085 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…table tag (#405) PR #404 set the pin to alpine/openclaw:2026.4.25, but upstream never published that exact tag — only 2026.4.25-slim and the 2026.4.25-beta.X series have published fat (non-slim) variants. Our Dockerfile extends with apt-get layers (ffmpeg, ripgrep, 1password-cli, etc.) so the slim base would break our skill bundling. Beta.11 is the latest 2026.4.25-line tag with the fat variant we need. Verified via Docker Hub API: curl -sL 'https://hub.docker.com/v2/repositories/alpine/openclaw/tags?name=2026.4.25' build-openclaw-image.yml run 25031784091 failed with: ERROR: docker.io/alpine/openclaw:2026.4.25: not found Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecfb9a6d15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T03:13:21Z

+            if idx == 0 and environment_for_task:
+                # Same merge story for environment — keep CDK base entries
+                # (CHOKIDAR_USEPOLLING, CLAWHUB_WORKDIR) intact.
+                base_env = list(cd_copy.get("environment") or [])
+                cd_copy["environment"] = base_env + list(environment_for_task)


Keep CODEX_HOME when cloning task defs outside provisioning

chatgpt_oauth now depends on CODEX_HOME being present in the task definition, but this env merge only happens when environment_for_task is explicitly passed. Paths like resize_user_container still call _build_register_kwargs_from_base(...) without environment_for_task (see resize_user_container around lines 561-567), so a resize/re-register drops CODEX_HOME from the new revision. After that, OAuth users fall back to ~/.codex/auth.json and lose access to the EFS-staged credentials, so inference fails after a resize/redeploy cycle.

Useful? React with 👍 / 👎.

…71624b4 (#409) Verified in ECR: isol8/openclaw-extended:2026.4.25-slim-71624b4 digest sha256:0409487c9c3b9d2bdcf2f5386357c852aa3041abc7afd40be444e2381ee14e4a pushed 2026-04-27 23:42:31 EDT Built by build-openclaw-image run 25032408047 against main 71624b4 (the gh-via-apt fix from #408 on top of the 4.25-slim switch from #407 on top of the codex-auth env-var fix from #404). This unblocks the deploy chain — CDK has been failing every cycle since #404 because dev.tag pointed at the placeholder *-bootstrap value. Once this PR merges, deploy.yml will pull the new image and the per-user container task def will reference it on the next provision. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…milies (#410) The CDK base task def and per-user clones used to register into the same ECS family (`isol8-{env}-openclaw`). #299 had to add an SSM-pinned ARN to work around the resulting "describe-by-family returns a per-user clone" problem. Then today (#404→#409 deploy chain) we hit the OTHER edge of that workaround: the ARN is injected into the backend via `ecs.Secret.fromSsmParameter`, which resolves at TASK STARTUP only — never refreshes. The running backend started with SSM=rev 1011 cached and cloned per-user task defs from that stale revision through every subsequent CDK deploy, producing a per-user task def 1016 that pulled a placeholder image (`2026.4.25-bootstrap`) which doesn't exist in ECR. Two underlying causes: 1. Co-mingled families forced a workaround. 2. The workaround cached its input at startup. Fix the FAMILY problem and both downstream issues collapse: * `EcsManager._build_register_kwargs_from_base` now describes the bare base family (e.g. `isol8-dev-openclaw`) — that family contains ONLY CDK base revisions because per-user clones go to `<base>-user`. ECS returns the latest base revision deterministically. * Per-user clones register into `f"{base['family']}-user"` so they don't pollute the base family. Existing per-user task defs on the old family stay valid (they're full ARNs); they age out as users re-provision. * Drop the SSM param + `ECS_TASK_DEFINITION` env var + the `ecs.Secret.fromSsmParameter` wiring + the cached `self._task_def`. Less code, no startup cache to go stale, no incident class. * Drop the lingering CFN `exportValue` from container-stack (added in #299 to keep the cross-stack import alive across the SSM transition; no consumers now). Tests updated: - `test_clones_task_def_with_access_point` asserts describe-by-family AND that the registered family is `<base>-user`. - `test_resize_reads_env_from_base_not_current` docstring updated to reflect the new mechanism. - All 1085 unit tests pass. - 11 pre-existing CDK Jest failures unrelated to this change. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prez2307 merged commit 9d4388f into main Apr 28, 2026
1 check passed

prez2307 deleted the fix/codex-auth-via-env-var branch April 28, 2026 03:10

prez2307 mentioned this pull request Apr 28, 2026

fix(openclaw): pin to 2026.4.25-beta.11 (upstream skipped fat 4.25 stable) #405

Merged

2 tasks

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

This was referenced Apr 28, 2026

fix(openclaw): pin to 2026.4.22 (latest fat stable, no beta) #406

Merged

fix(openclaw): bump dev/prod tag to 2026.4.25-slim-71624b4 (built + in ECR) #409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex): route ChatGPT OAuth via CODEX_HOME env, bump OpenClaw to 2026.4.25#404

fix(codex): route ChatGPT OAuth via CODEX_HOME env, bump OpenClaw to 2026.4.25#404
prez2307 merged 1 commit into
mainfrom
fix/codex-auth-via-env-var

prez2307 commented Apr 28, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prez2307 commented Apr 28, 2026

Summary

What this PR does

Verification

Test plan

Caveats

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant