Skip to content

release: v4.6.1 — pin runner workflows to port 3457 (away from platform's :3456)#313

Merged
askalf merged 1 commit into
masterfrom
fix/v4.6.1-runner-port-isolation
May 17, 2026
Merged

release: v4.6.1 — pin runner workflows to port 3457 (away from platform's :3456)#313
askalf merged 1 commit into
masterfrom
fix/v4.6.1-runner-port-isolation

Conversation

@askalf
Copy link
Copy Markdown
Owner

@askalf askalf commented May 17, 2026

What does this PR do?

Fixes a silent test-of-the-wrong-binary problem that's been hiding since v4.3.0.

What broke

When v4.6.0's billing canary first ran on the production runner, it returned `representative-claim: ''` and a 401 — but `claude --print` on the same box with the same `HOME` was returning PONG cleanly. Investigation:

  1. `dario proxy` has a friendly EADDRINUSE handler. When its target port is occupied, it probes `/health`, sees an existing dario, prints "dario — already running" and exits 0. Intentional: makes `dario login` and `dario proxy` idempotent for users.
  2. On the production runner, the askalf platform's own docker container (`askalf-dario`) already binds `:3456`.
  3. So every "Start dario proxy" step in the runner workflows short-circuited, exited 0, and the subsequent curls in the workflow hit the platform's dario — using the platform's `/root/.claude/.credentials.json` (not the runner-isolated `/root/.claude-runner/.claude/.credentials.json` from v4.4.1).

For the canary that's an obvious 401 (platform credential in a different state). For compat-test, it's worse: every PR check on PRs #303, #304, #306, #308, #310, #311 was validating the platform dario, not the PR's freshly-built dist. The PR-time gate was measuring the wrong thing.

Fix

Both runner workflows now bind `--port 3457` (away from the platform's :3456) and the harnesses read `DARIO_TEST_URL=http://127.0.0.1:3457\`. Eliminates the port collision.

  • `compat-test-self-hosted.yml`: `Start dario proxy` adds `--port 3457`; both Start + Run Tests steps env-set `DARIO_TEST_URL=http://127.0.0.1:3457\`; readiness probe + PR-comment fallback all point at :3457.
  • `cc-billing-classifier-canary.yml`: `Start dario proxy` adds `--port 3457`; canary curl posts to `:3457/v1/messages`.

Validation

Manual run on the production runner with these flags:

```bash
HOME=/root/.claude-runner DARIO_QUIET_TLS=1 dario proxy --port 3457

→ starts clean, /health responds, single tiny haiku request returns 200

→ representative-claim is a subscription value

```

The workflow path will produce the same result once landed — both gates will start actually validating the PR's dist instead of the platform's.

How to test

```bash
git fetch origin fix/v4.6.1-runner-port-isolation
git checkout fix/v4.6.1-runner-port-isolation
npm run build && npm test # 75/75

End-to-end: this PR touches a workflow file in compat-test-self-hosted.yml's

path filter, so compat-test fires on the PR. THAT run is the validation:

it should resolve to compat tests passing against THIS branch's freshly-

built dist, not the platform dario at :3456.

Also, manually dispatch the canary after merge:

gh workflow run cc-billing-classifier-canary.yml --ref master

Expected: exit 0, no alert (canary healthy = subscription). If 4.6.0's

fake-401 was misdiagnosing rather than an actual classifier issue, this

will close out cleanly. If it was a real classifier flip, the canary

now produces an honest signal.

```

Checklist

  • `npm run build` passes
  • `npm test` passes (offline regression test, no credentials required) — 75/75
  • For changes that touch `proxy.ts`, `cc-template.ts`, or streaming behavior: tested with `dario proxy --verbose` + `node test/compat.mjs` (requires credentials) — N/A: workflow + CHANGELOG only, no src/ changes
  • No new runtime dependencies added
  • No tokens/secrets in code or logs

Both compat-test-self-hosted.yml and cc-billing-classifier-canary.yml
were silently piggybacking on the platform's existing dario
instance (askalf-dario docker container at :3456), not the
freshly-built dist they were supposed to test.

Mechanism: dario proxy's EADDRINUSE handler probes /health when
its target port is occupied, sees an existing dario, prints
"dario — already running" and exits 0 (intentional: makes
`dario login` / `dario proxy` idempotent for users). On the
production runner the docker askalf-dario already binds :3456,
so the workflow's `dario proxy` short-circuits and the workflow's
curls hit the platform's dario using PLATFORM credentials.

For the canary: produced 401 + claim='' because the platform's
account is in a different state right now.

For compat-test: every PR check on PRs #303, #304, #306, #308,
#310, #311 was validating the platform dario, not the PR's
freshly-built dist. The PR-time gate was measuring the wrong
thing.

Fix: both workflows now bind --port 3457 and the harnesses read
DARIO_TEST_URL=http://127.0.0.1:3457. Eliminates the port
collision.

Validated locally on the production runner: HOME=/root/.claude-
runner dario proxy --port 3457 starts clean, /health responds,
single tiny haiku request returns 200 with a subscription
representative-claim. The runner workflow will produce the same
result once landed.

75/75 default suite green. No src/ changes.
@askalf askalf enabled auto-merge (squash) May 17, 2026 21:36
@github-actions
Copy link
Copy Markdown
Contributor

Compat test: ❌ FAILED

Ran node test/compat.mjs against dario proxy --passthrough on the self-hosted runner for commit d0742455f8613ddcb098b62050b34a2cb5e8dde9.

Output
(no output captured)

Full workflow run

@askalf askalf merged commit ea76eaf into master May 17, 2026
9 of 10 checks passed
@askalf askalf deleted the fix/v4.6.1-runner-port-isolation branch May 17, 2026 21:38
askalf added a commit that referenced this pull request May 17, 2026
v4.6.1 declared --port 3457 (space-separated) but dario's CLI only
parses --port=3457 (equals). Space-separated form silently falls
through to default 3456, which is what the platform's askalf-dario
docker container already binds. Result: v4.6.1's compat-test on
PR #313 still bound :3456, still hit the platform dario.

This is actually the system catching itself: v4.6.1's
compat-test on its own PR failed with the same proxy.log output
("dario — already running on http://localhost:3456") that v4.6.1
was meant to fix. The runner is now testing the right thing
frequently enough that bugs in the harness can't hide.

Six --port 3457 → --port=3457 substitutions across the two
workflow files. Same change in spirit as v4.6.1; same change in
code as a one-character typo.

74/74 default suite green. No src/ changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant