ci(diag): isolate OAuth failure to agent SDK vs runner by bootjp · Pull Request #698 · bootjp/elastickv

bootjp · 2026-04-28T13:48:31Z

Purpose (temporary diagnostic)

Insert a step that runs claude --print directly on the runner with the OAuth token from the secret, bypassing anthropics/claude-code-action and the agent SDK entirely. The same invocation is known-good locally (clean HOME, only CLAUDE_CODE_OAUTH_TOKEN in env), so the result of this step in CI tells us where the bug lives:

Diag step	`Run Claude Code` step	Conclusion
✅ pass	❌ fail (`Could not resolve [authentication]`)	bug is in the action / agent SDK's auth forwarding to its child claude process — minimal upstream repro
❌ fail	❌ fail	environmental: GitHub-hosted runner ↔ Anthropic OAuth backend (network, region, IP)

Scope

Adds one step before the existing Run Claude Code step.
Does NOT touch the Run Claude Code step (still v1.0.107 pinned + env).
Step is marked TEMPORARY in comments. Will be removed in a follow-up once the question is answered.

Token-leak hardening (inline rationale)

set -euo pipefail only — never set -x; explicit set +x defends against ACTIONS_STEP_DEBUG=true also enabling xtrace.
The token is only referenced via env (CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets... }}) and is never expanded into a command line. claude inherits it via fork, not via argv.
install.sh stdout/stderr are dropped (>/dev/null 2>&1).
No --debug or --verbose on claude.
HOME=$(mktemp -d) is a fresh tmp dir, so cached creds and ~/.claude/.credentials.json go to an ephemeral location that is destroyed with the runner.
GitHub Actions' secret-masking covers any literal occurrence in stdout/stderr regardless.

Operational note. As an additional safety net, after the diag PR runs once we plan to rotate CLAUDE_CODE_OAUTH_TOKEN (re-issue via claude setup-token, update the secret, revoke the old session at claude.ai). Even in the unlikely event the diag step's logs leaked the token in some unmasked form, rotation makes the leaked value useless.

Test plan

Run the workflow once by @claude review-ing this PR (or any PR).
Read both step results:
- Sanity-check OAuth on the runner (diag only) — pass/fail
- Run Claude Code — pass/fail
Decide next move from the conclusion table above.
Rotate CLAUDE_CODE_OAUTH_TOKEN.
Open a follow-up PR that removes the diag step.

Self-review

CI workflow change only — single new step. 5-lens collapses to: no data-loss / concurrency / consistency surface, no perf impact (one extra ~10s step on @claude runs only), no test coverage applicable.

The risk surface is "could leak the token in logs"; mitigations are documented inline and in this body, plus the rotation step in the test plan keeps a safety margin even against unknown-unknowns.

Add a temporary diagnostic step that runs `claude --print` directly on the runner with the OAuth token from the secret, bypassing the action / agent SDK entirely. The same invocation is known to succeed locally (clean HOME, only CLAUDE_CODE_OAUTH_TOKEN as env), so: - if this step also succeeds in CI: the bug is isolated to how the agent SDK forwards auth to its spawned child, and we have a minimal upstream repro. - if this step also fails in CI: the failure is environmental (network / runner) rather than SDK-only, and the issue moves up one layer. Token-leak hardening is documented inline. The step is intended to be removed (and the token rotated) once the question is answered.

gemini-code-assist · 2026-04-28T13:48:37Z

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

coderabbitai · 2026-04-28T13:48:50Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 1 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1bfd25fc-e6ef-4871-af67-c14904a30198

📥 Commits

Reviewing files that changed from the base of the PR and between e019a21 and d3d05ba.

📒 Files selected for processing (1)

.github/workflows/claude.yml

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/claude-action-oauth-diag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd3ae28858

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T13:50:16Z

+      - name: Sanity-check OAuth on the runner (diag only)
+        env:
+          CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
+        run: |


Allow diagnostic failure without skipping main Claude step

If claude --print fails in this diagnostic step, the job stops immediately and Run Claude Code is skipped by default, which prevents the pass/fail matrix in the commit message from ever being observed in the failure case. In the exact scenario this diagnostic is meant to detect (runner-side OAuth failure), the workflow won’t execute the comparison step, so the isolation logic is incomplete.

Useful? React with 👍 / 👎.

Addressed in 800a99a: marked the verify step continue-on-error: true so a runner-side OAuth failure no longer aborts the job. Run Claude Code still runs, both step results are observable, and the (diag fail / main fail) → "environmental" matrix entry can be detected.

chatgpt-codex-connector · 2026-04-28T13:50:16Z

+        env:
+          CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}


Limit OAuth secret exposure to the claude command

Setting CLAUDE_CODE_OAUTH_TOKEN at step scope makes it available to the curl | bash installer and any subprocesses, even though only the final claude --print call needs it. This unnecessarily broadens the secret exposure window; if the installer script or its execution path is compromised/noisy, the token can be leaked before the diagnostic command runs.

Useful? React with 👍 / 👎.

Addressed in 800a99a: split into two steps — Install Claude CLI for diag (no secret in env, runs curl | bash) and Verify OAuth on the runner (secret scoped only to this step, used only by the single claude --print call). Installer subprocesses now have zero exposure to the OAuth token.

…error P1: with set -e the diag step would abort the job on its own failure, skipping `Run Claude Code` — exactly the (diag fail / main fail) case the comparison matrix is meant to detect. Mark the verify step `continue-on-error: true` so both step results are always observed. P2: Token at step env scope is also visible to the curl|bash installer and any subprocesses it spawns. Split into two steps: the installer has no secret in env; only the verify step has it, scoped to a single claude invocation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 800a99aa43

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T13:54:10Z

+      - name: Install Claude CLI for diag (no secrets in env)
+        run: |
+          set -euo pipefail
+          set +x
+          curl -fsSL https://claude.ai/install.sh | bash -s -- 2.1.119 >/dev/null 2>&1


Allow installer diagnostic step to fail without aborting job

The new Install Claude CLI for diag step is still mandatory, so a transient failure downloading https://claude.ai/install.sh will stop the workflow before Run Claude Code executes. That means temporary diagnostic infrastructure can block normal @claude handling and you still lose the intended pass/fail isolation matrix whenever install fails. Make this installer step non-blocking (or otherwise ensure the main Claude step always runs) so diagnostics cannot short-circuit the primary workflow.

Useful? React with 👍 / 👎.

Addressed in d3d05ba: marked the installer step continue-on-error: true as well. Now neither a transient install failure (e.g. fetching https://claude.ai/install.sh) nor a runner-side OAuth failure can abort the job — Run Claude Code always runs, and the (install fail / main fail) case is observable from the run summary just like (verify fail / main fail).

@claude

Round-2 P1: a transient failure of curl https://claude.ai/install.sh would otherwise abort the job and skip Run Claude Code, letting the temporary diagnostic infrastructure block normal @claude handling. Both diag steps are now continue-on-error: true. If install fails, the verify step then fails too (no claude on PATH) but workflow proceeds to the main step, and the (install fail / main fail) case is also observable from the run summary.

bootjp enabled auto-merge April 28, 2026 13:49

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

bootjp disabled auto-merge April 28, 2026 14:06

bootjp merged commit 25ec963 into main Apr 28, 2026
11 checks passed

bootjp deleted the fix/claude-action-oauth-diag branch April 28, 2026 14:09

bootjp mentioned this pull request Apr 28, 2026

ci: remove Claude Code OAuth diagnostic steps #699

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(diag): isolate OAuth failure to agent SDK vs runner#698

ci(diag): isolate OAuth failure to agent SDK vs runner#698
bootjp merged 3 commits intomainfrom
fix/claude-action-oauth-diag

bootjp commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

bootjp Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

bootjp Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

bootjp Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		env:
		CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}

Conversation

bootjp commented Apr 28, 2026

Purpose (temporary diagnostic)

Scope

Token-leak hardening (inline rationale)

Test plan

Self-review

Uh oh!

gemini-code-assist Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading