release: v4.4.1 — isolate runner OAuth credential from shared /root/.claude/ by askalf · Pull Request #308 · askalf/dario

askalf · 2026-05-17T19:19:37Z

What does this PR do?

Operational hardening. The v4.2.2 walkthrough seeded the runner's CC credential at `/root/.claude/.credentials.json`. On boxes where that path is also mounted into other CC clients (docker services that mount the host's `/root/.claude/` as a credentials volume, operator SSH sessions, etc.), both clients use the same access/refresh tokens. When either refreshes, the other's bearer can be silently invalidated until its next refresh attempt. We hit one such 401 during v4.2.2 setup; the 30-min cron cadence absorbed it, but it's a real failure mode for higher-frequency setups (e.g. v4.4.0's auto-rebake firing during a cycle that happens to overlap a token refresh).

Fix

Both runner workflows now pin `HOME: /root/.claude-runner` on every step that spawns CC. Runner's credential lives at `/root/.claude-runner/.claude/.credentials.json`, isolated from `/root/.claude/` — refreshes on the two paths are now independent.

`cc-drift-template-watch.yml`: `Run drift check` and `Auto-rebake + open PR` steps both get `env: HOME: /root/.claude-runner`
`compat-test-self-hosted.yml`: `Start dario proxy (passthrough mode)` step gets the same
`docs/drift-monitor.md`: documents the isolated-credential flow as the recommended pattern for boxes that share with other CC clients; simpler default (`~/.claude/`) still works for runner-only hosts

Verification

Generated a fresh OAuth credential on the production runner via `HOME=/root/.claude-runner dario login --manual`. dario writes its credentials to `~~/.dario/credentials.json`; CC reads from `~~/.claude/.credentials.json`. Same JSON format though (top-level `claudeAiOauth` key), so setup mirrors the file. Confirmed:

`HOME=/root/.claude-runner claude --print` returns PONG (auth works)
Full `--check` against the runner's clone with isolated HOME reports `no drift detected. exit 0`
Platform's `/root/.claude/.credentials.json` (mtime 13:48 UTC, hours before the v4.4.1 work) untouched

How to test

```bash
git fetch origin fix/v4.4.1-runner-credential-isolation
git checkout fix/v4.4.1-runner-credential-isolation
npm run build && npm test # 74/74

End-to-end: once merged, the next 30-min watcher cron tick exercises

the new HOME pinning. Manual workflow_dispatch on master also available.

```

Checklist

`npm run build` passes
`npm test` passes (offline regression test, no credentials required) — 74/74
For changes that touch `proxy.ts`, `cc-template.ts`, or streaming behavior: tested with `dario proxy --verbose` + `node test/compat.mjs` (requires credentials) — N/A: workflow + docs only, no src/ changes; the compat-test workflow itself is one of the files modified though, which means this PR's path filter triggers compat-test on the bot's PR after the auto-release if any
No new runtime dependencies added
No tokens/secrets in code or logs

The v4.2.2 walkthrough seeded the runner's credential at /root/.claude/.credentials.json. On boxes where that path is also mounted into other CC clients — docker services, operator SSH sessions — both clients use the same access/refresh tokens. When either refreshes, the other's bearer can be silently invalidated. We hit one 401 during v4.2.2 setup; the 30-min cron cadence absorbed it but it's a real failure mode. Fix: both runner workflows now pin HOME=/root/.claude-runner on every step that spawns CC. Setup writes the runner's credential under /root/.claude-runner/.claude/.credentials.json, isolated from the platform path. - cc-drift-template-watch.yml: Run drift check + Auto-rebake + open PR steps both get env HOME=/root/.claude-runner - compat-test-self-hosted.yml: Start dario proxy step gets the same - docs/drift-monitor.md: documents the isolated flow as the recommended pattern for shared boxes; default ~/.claude/ still works for runner-only hosts Verified end-to-end on the production runner: generated fresh credential via HOME=/root/.claude-runner dario login --manual, mirrored dario's ~/.dario/credentials.json to CC's ~/.claude/.credentials.json (same JSON format, top-level claudeAiOauth key), confirmed `claude --print` returns PONG and --check reports no drift. Platform's /root/.claude/ untouched. Pure operational hardening. No src/ changes. 74/74 default suite green.

github-actions · 2026-05-17T19:20:07Z

Compat test: ✅ PASSED

Ran node test/compat.mjs against dario proxy --passthrough on the self-hosted runner for commit c9aa7fc68bb9504e12c67804fdf93bc8ae566ac9.

Output

============================================================
  dario Compatibility Validation (--passthrough)
  2026-05-17T19:19:49.957Z
============================================================

⚠️  NOTE: All requests are 429ing and falling back to CLI.
   This is expected in --passthrough without priority routing.
   Tool use and header tests will fail (CLI limitations).
   Re-run after 5h window resets for direct API results.

--- Anthropic Messages API (Hermes) ---
❌ #1 Anthropic non-stream: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}
❌ #2 Anthropic stream: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}
❌ #3 SSE framing: HTTP 401

--- Passthrough Verification ---
❌ #4 No thinking injection: HTTP 401
❌ #5 Client betas preserved: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}

--- Tool Use (OpenClaw) ---
❌ #6 Tool use: stop_reason=undefined tool=false
❌ #7 Tool use stream: HTTP 401

--- OpenAI Compat ---
❌ #8 OpenAI non-stream: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}
❌ #9 OpenAI stream: HTTP 401

--- Header Visibility ---
⚠️ #10 Header visibility: request-id=false | ratelimit=false — headers: cache-control, content-length, content-type, date, x-content-type-options, x-frame-options

============================================================
  RESULTS: 0 passed, 9 failed, 1 warnings
============================================================

Failed:
  #1 Anthropic non-stream: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}
  #2 Anthropic stream: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}
  #3 SSE framing: HTTP 401
  #4 No thinking injection: HTTP 401
  #5 Client betas preserved: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}
  #6 Tool use: stop_reason=undefined tool=false
  #7 Tool use stream: HTTP 401
  #8 OpenAI non-stream: HTTP 401: {"error":"Unauthorized","message":"Invalid or missing API key"}
  #9 OpenAI stream: HTTP 401

Full workflow run

Both compat-test-self-hosted.yml and cc-billing-classifier-canary.yml were silently piggybacking on the platform's existing dario instance (askalf-dario docker container at :3456), not the freshly-built dist they were supposed to test. Mechanism: dario proxy's EADDRINUSE handler probes /health when its target port is occupied, sees an existing dario, prints "dario — already running" and exits 0 (intentional: makes `dario login` / `dario proxy` idempotent for users). On the production runner the docker askalf-dario already binds :3456, so the workflow's `dario proxy` short-circuits and the workflow's curls hit the platform's dario using PLATFORM credentials. For the canary: produced 401 + claim='' because the platform's account is in a different state right now. For compat-test: every PR check on PRs #303, #304, #306, #308, #310, #311 was validating the platform dario, not the PR's freshly-built dist. The PR-time gate was measuring the wrong thing. Fix: both workflows now bind --port 3457 and the harnesses read DARIO_TEST_URL=http://127.0.0.1:3457. Eliminates the port collision. Validated locally on the production runner: HOME=/root/.claude- runner dario proxy --port 3457 starts clean, /health responds, single tiny haiku request returns 200 with a subscription representative-claim. The runner workflow will produce the same result once landed. 75/75 default suite green. No src/ changes.

askalf enabled auto-merge (squash) May 17, 2026 19:19

askalf merged commit 55334b1 into master May 17, 2026
10 checks passed

askalf deleted the fix/v4.4.1-runner-credential-isolation branch May 17, 2026 19:21

askalf mentioned this pull request May 17, 2026

release: v4.6.1 — pin runner workflows to port 3457 (away from platform's :3456) #313

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v4.4.1 — isolate runner OAuth credential from shared /root/.claude/#308

release: v4.4.1 — isolate runner OAuth credential from shared /root/.claude/#308
askalf merged 1 commit into
masterfrom
fix/v4.4.1-runner-credential-isolation

askalf commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

askalf commented May 17, 2026

What does this PR do?

Fix

Verification

How to test

End-to-end: once merged, the next 30-min watcher cron tick exercises

the new HOME pinning. Manual workflow_dispatch on master also available.

Checklist

Uh oh!

github-actions Bot commented May 17, 2026

Compat test: ✅ PASSED

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant