feat(cloud): add custom gh credential helper and enhanced token fetch retry logic#288
feat(cloud): add custom gh credential helper and enhanced token fetch retry logic#288khaliqgant merged 3 commits intomainfrom
Conversation
The exponential backoff retry logic introduced in a23bffa caused message delivery delays of 2-4 minutes when injections failed. This was too aggressive for real-time agent communication. Changes: - Removed MAX_INJECTION_RETRIES (5) and INJECTION_RETRY_BASE_MS (2000) - Reverted to immediate failure reporting without retry loops - Fixed logError to always output (was incorrectly gated by debug flag) Messages now fail immediately when injection fails, allowing the system to recover faster rather than blocking in exponential backoff loops. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Record complete investigation process for 2-4 minute message latency regression - Document root cause: commit a23bffa's exponential retry backoff (2000ms × 2^n) - Record strategy evaluation: 5 retry approaches analyzed, full revert chosen - Track implementation decisions and credential blocker resolution - Confidence: 90% - Fix verified, expected to restore 30s baseline latency Trajectory: traj_i2h6krqx2iun Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…retry logic - Create gh-credential-relay: Custom credential helper for gh CLI to fetch tokens from cloud API on-demand - No more workarounds: gh commands automatically get fresh tokens without pre-fetch setup - Enhance entrypoint.sh GH_TOKEN fetch with: * Remove -f flag to capture error responses and HTTP status * Add 3-attempt retry with exponential backoff (2s, 4s, 8s) * Classify errors: don't retry auth errors, do retry transient failures * Detailed logging for debugging (enable with GH_CREDENTIAL_DEBUG=1) * HTTP status capture (200, 401/403, timeout, etc) * Graceful fallback to git credential helper chain if API unavailable This addresses the recurring GitHub auth issues by: 1. Fixing silent failures caused by curl -f flag 2. Adding automatic retries for transient API timeouts 3. Providing gh CLI on-demand token fetching (no initialization overhead) Files: - deploy/workspace/gh-credential-relay: New credential helper script - deploy/workspace/entrypoint.sh: Enhanced GH_TOKEN fetch logic + gh helper config Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
🤖 My Senior Dev — Analysis Complete👤 For @khaliqgant📁 Expert in View your contributor analytics → 📊 84 files reviewed • 20 high risk • 13 need attention 🚨 High Risk:
📚 Relevant Past Discussions:
🚀 Open Interactive Review →The full interface unlocks features not available in GitHub:
💬 Chat here: 📖 View all 12 personas & slash commandsYou can interact with me by mentioning In PR comments or on any line of code:
Slash commands:
AI Personas (mention to get their perspective):
For the best experience, view this PR on myseniordev.com — includes AI chat, file annotations, and interactive reviews. |
| if [[ -f "/app/deploy/workspace/gh-credential-relay" ]]; then | ||
| sudo install -m 755 /app/deploy/workspace/gh-credential-relay /usr/local/bin/gh-credential-relay | ||
| gh config set -h github.com credential-helper gh-credential-relay 2>/dev/null || true |
There was a problem hiding this comment.
🔴 Workspace entrypoint uses sudo install after dropping to non-root user, but image does not include sudo
In cloud/gateway mode, entrypoint.sh attempts to install the new gh credential helper using sudo, but the workspace base image doesn't install sudo (it installs gosu, not sudo). Since the script runs with set -euo pipefail, this will abort container startup.
Actual: container fails at runtime with sudo: command not found (or permission failure) when CLOUD_API_URL/WORKSPACE_ID/WORKSPACE_TOKEN are set.
Expected: credential helper install should succeed (run as root before gosu), or the helper should be pre-installed at build time.
Click to expand
Relevant code:
- Privilege drop happens early:
deploy/workspace/entrypoint.sh:90-92. - Later, it tries to install the helper via sudo:
if [[ -f "/app/deploy/workspace/gh-credential-relay" ]]; then sudo install -m 755 /app/deploy/workspace/gh-credential-relay /usr/local/bin/gh-credential-relay gh config set -h github.com credential-helper gh-credential-relay ... fi
deploy/workspace/entrypoint.sh:246-249
The base image’s apt packages include gosu but not sudo (see deploy/workspace/Dockerfile.base), so sudo is not available.
Impact: Workspace containers in cloud mode can fail to start entirely.
Recommendation: Install gh-credential-relay during the Docker build (preferred), or move the install step into the root-only section before exec gosu .... Avoid sudo entirely in the container.
Was this helpful? React with 👍 or 👎 to provide feedback.
| reject(new Error(`Request timeout after ${timeout}ms`)); | ||
| }, timeout); | ||
|
|
||
| socket.on('connect', () => socket.write(JSON.stringify(req) + '\n')); | ||
| socket.on('connect', () => socket.write(encodeFrame(envelope))); | ||
|
|
||
| socket.on('data', (data) => { | ||
| // Ignore data if we've already timed out | ||
| if (timedOut) return; | ||
|
|
||
| buffer += data.toString(); | ||
| const lines = buffer.split('\n'); | ||
| buffer = lines.pop() || ''; | ||
|
|
||
| for (const line of lines) { | ||
| if (!line.trim()) continue; | ||
| try { | ||
| const response = JSON.parse(line); | ||
| if (response.id === id) { | ||
| clearTimeout(timeoutId); | ||
| socket.end(); | ||
| if (response.error) reject(new Error(response.error)); | ||
| else resolve(response.payload as T); | ||
| return; | ||
| } | ||
| } catch (parseError) { | ||
| // Log parse errors in debug mode for easier troubleshooting | ||
| if (process.env.DEBUG || process.env.RELAY_DEBUG) { | ||
| console.error('[RelayClient] Failed to parse daemon response:', line, parseError); | ||
| const frames = parser.push(data); | ||
| for (const response of frames) { | ||
| // Check if this is a response to our request | ||
| if (response.id === id || (response as { payload?: { replyTo?: string } }).payload?.replyTo === id) { | ||
| clearTimeout(timeoutId); | ||
| socket.end(); | ||
| // Handle error responses | ||
| if (response.type === 'ERROR') { | ||
| const errorPayload = response.payload as { message?: string; code?: string }; | ||
| reject(new Error(errorPayload?.message || errorPayload?.code || 'Unknown error')); | ||
| } else if ((response.payload as { error?: string })?.error) { | ||
| reject(new Error((response.payload as { error: string }).error)); | ||
| } else { | ||
| resolve(response.payload as T); | ||
| } | ||
| return; | ||
| } | ||
| } | ||
| }); |
There was a problem hiding this comment.
🔴 MCP RelayClient sends framed requests without HELLO handshake and uses message types the daemon connection does not handle
packages/mcp/src/client.ts changed to send a framed protocol envelope directly on connect, but it never performs the daemon’s required HELLO/WELCOME handshake. Additionally, it issues request types like STATUS, INBOX, and LIST_AGENTS that are not handled by packages/daemon/src/connection.ts (which only processes real protocol envelopes once ACTIVE).
Actual: MCP requests will be rejected because the connection stays in HANDSHAKING and any non-HELLO frame triggers BAD_REQUEST / no response. MCP tools will fail to communicate with a running daemon.
Expected: MCP client should (1) send a valid HELLO envelope first and wait for WELCOME, and (2) only use message types supported by the daemon, or connect to an actual supported request/response API endpoint.
Click to expand
MCP client writes immediately on connect:
socket.on('connect', () => socket.write(encodeFrame(envelope)));packages/mcp/src/client.ts:103-105
No code sends HELLO at all (entire file).
Daemon requires HELLO before ACTIVE:
- Connection constructor starts in
HANDSHAKING:packages/daemon/src/connection.ts:107-109. - Non-HELLO frames in HANDSHAKING fall through and ultimately hit
handleSend/onMessageguarded by ACTIVE or emit errors:packages/daemon/src/connection.ts:194-217andpackages/daemon/src/connection.ts:293-303.
Impact: MCP server (agent-relay mcp serve) will be unable to send messages/spawn/etc., breaking editor integrations.
(Refers to lines 81-170)
Recommendation: Implement the standard daemon handshake: send HELLO (with agent name + entityType), parse WELCOME, then send supported envelopes (e.g., SEND, SPAWN, etc.). If MCP needs request/response semantics for status/inbox/list, add explicit daemon message types/handlers or use an existing HTTP API instead of inventing new types.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Resolve recurring GitHub authentication issues by:
Root Cause
Previous implementation used
curl -sfflag which silently suppressed HTTP error responses, making debugging impossible. No retry logic meant transient API timeouts caused permanent failures.Changes
1. New:
deploy/workspace/gh-credential-relay2. Enhanced:
deploy/workspace/entrypoint.sh-fflag from curl to capture error responses and HTTP statusGH_CREDENTIAL_DEBUG=1)Benefits
Test Plan
Fixes recurring credential helper issues.