feat: cloud platform client (login, relay, queue)#132
Conversation
Add CLI-side plumbing for the new failproofai cloud platform: - src/auth: device-code Google OAuth flow, token storage at ~/.failproofai/auth.json (0600) - src/relay: rotate-then-process queue (no race with concurrent hooks), WebSocket relay daemon with exponential backoff, orphan recovery across crashes, and client_event_id for idempotent retries - Hook handler fire-and-forgets events to the local queue (~0.5ms appendFileSync) and lazy-starts the daemon if it's not running, so users never need to manage daemon lifecycle - bin/failproofai.mjs: login / logout / whoami / relay start|stop|status / sync subcommands, plus internal --relay-daemon entrypoint - .gitignore: exclude platform/ (closed-source server + frontend lives in a separate private repo) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 43 minutes and 8 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughAdded cloud platform authentication (login/logout/whoami) with device-code flow, filesystem-backed token storage, and a background relay daemon that queues hook events locally and syncs them to a remote server via WebSocket or REST. Integrated relay startup in hook handler and CLI. Updated CLI to support new auth and relay management subcommands. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as CLI (login)
participant Server as Server (auth)
participant Browser as Browser
participant TokenStore as Token Store
participant Daemon as Relay Daemon
User->>CLI: failproofai login
CLI->>Server: POST /api/v1/auth/device-code
Server-->>CLI: device_code, user_code, verification_uri, interval
CLI->>Browser: open verification_uri
Browser->>User: Display user_code
User->>Browser: Verify code
CLI->>Server: Poll POST /api/v1/auth/device-token
Server-->>CLI: (pending...)
User->>Browser: Approve
CLI->>Server: Poll POST /api/v1/auth/device-token
Server-->>CLI: access_token, refresh_token, expires_at
CLI->>TokenStore: writeTokens()
TokenStore->>TokenStore: Write to ~/.failproofai/auth.json
CLI->>Daemon: ensureRelayRunning()
Daemon-->>CLI: Started (or already running)
CLI-->>User: Logged in as <email>
sequenceDiagram
participant Hook as Hook Handler
participant Queue as Queue (pending.jsonl)
participant Daemon as Relay Daemon
participant Server as Server (ingest)
participant TokenStore as Token Store
Hook->>Queue: appendToServerQueue(activity + toolInput)
Queue->>Queue: Append JSON line to pending.jsonl
Hook->>Daemon: ensureRelayRunning()
Daemon->>Daemon: Start detached process if needed
Daemon->>Queue: claimPendingBatch()
Queue->>Queue: Atomic rename pending.jsonl → processing-<ts>-<pid>.jsonl
Daemon->>TokenStore: readTokens() + refresh if needed
Daemon->>Server: WebSocket connect
Daemon->>Server: Send batch of events (with client_event_id)
Server-->>Daemon: Acknowledged
Daemon->>Queue: deleteProcessingFile(processing-<ts>-<pid>.jsonl)
Daemon->>Queue: Repeat on flush interval or file watch
sequenceDiagram
participant User
participant CLI as CLI (sync)
participant Queue as Queue
participant Daemon as Daemon (one-shot)
participant Server as Server (ingest)
participant TokenStore as Token Store
User->>CLI: failproofai sync
CLI->>Daemon: runOneShotSync()
Daemon->>TokenStore: readTokens()
Daemon->>Queue: Read all pending + orphaned files
Daemon->>Server: POST /api/v1/ingest/events (batch REST)
Server-->>Daemon: 200 OK
Daemon->>Queue: deleteProcessingFile() for all sent
Daemon-->>CLI: Return event count
CLI-->>User: Synced N events
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (1)
src/auth/login.ts (1)
35-45: Add timeout to network calls to prevent indefinite hangs.
postJsonand thefetchcall inlogouthave no timeout. If a server stalls during TLS handshake, NAT drop, or load-balancer half-open, both the initialdevice-codecall and polling loop will block forever, freezing the CLI. Thedeadlinecheck in the polling loop provides no protection since it's bypassed by a hung individual fetch call.🔧 Proposed fix
-async function postJson<T>(url: string, body: unknown): Promise<T> { - const resp = await fetch(url, { - method: "POST", - headers: { "Content-Type": "application/json" }, - body: JSON.stringify(body), - }); +async function postJson<T>(url: string, body: unknown, timeoutMs = 15_000): Promise<T> { + const resp = await fetch(url, { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify(body), + signal: AbortSignal.timeout(timeoutMs), + }); if (!resp.ok) { throw new Error(`${url} → ${resp.status} ${resp.statusText}`); } return (await resp.json()) as T; }Apply the same timeout pattern to the direct
fetchcall insrc/auth/logout.ts:12for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/auth/login.ts` around lines 35 - 45, postJson currently uses fetch without any timeout and the logout fetch call is also unprotected, which can hang forever; modify postJson and the direct fetch in logout to use an AbortController with a configurable deadline (e.g., constant or parameter) so each request is aborted after the timeout: create an AbortController, start a timer that calls controller.abort() after the timeout, pass controller.signal to fetch in postJson (and to the fetch in logout), clear the timer on success or error, and map an AbortError to a clear timeout error message so callers can handle it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/failproofai.mjs`:
- Around line 351-357: The CLI reports "Failed to start daemon" because
ensureRelayRunning() spawns a detached child and returns immediately, then
relayStatus() runs before the child has written its PID; update
ensureRelayRunning (or the spawn path it calls) to avoid the race by either (a)
having the parent write the PID file synchronously using child.pid immediately
after spawn() returns, or (b) having ensureRelayRunning poll relayStatus() with
a short timeout/retry loop (e.g., check a few times with small delays) before
returning so the subsequent relayStatus() call sees the daemon; locate and
modify the spawn/daemon-starting code invoked by ensureRelayRunning (and any
helper like spawnDaemon) to implement one of these fixes.
In `@src/auth/login.ts`:
- Around line 22-33: The Windows branch in openBrowser uses [" /c","start", url]
which treats the first quoted token as the window title; update the Windows args
in openBrowser so cmd is "cmd" and args are ["/c", "start", '""', urlQuoted] —
i.e., insert an explicit empty title argument ('""') before the URL and ensure
the URL is quoted/escaped for cmd (so spaces, & and " are handled) when building
the urlQuoted value; leave non-Windows branches unchanged and keep using
spawn(..., { detached: true, stdio: "ignore" }).unref().
In `@src/auth/logout.ts`:
- Around line 11-19: The logout flow can hang because fetch has no timeout;
update the try block that calls fetch(`${tokens.server_url}/api/v1/auth/logout`,
...) to use an AbortController: create controller = new AbortController(), pass
controller.signal to fetch, start a timeout (e.g., 3–5s) that calls
controller.abort(), and clear that timer in finally so the timer doesn't leak;
ensure the fetch call still sends JSON body with tokens.refresh_token and that
the catch remains best-effort so clearTokens() (or the rest of logout) proceeds
immediately after timeout/abort.
In `@src/auth/token-store.ts`:
- Around line 27-35: writeTokens currently creates AUTH_DIR and writes AUTH_FILE
using the process umask, then tightens permissions with chmodSync, leaving a
race where tokens may be world-readable; change mkdirSync(AUTH_DIR) to set
directory mode explicitly (mode: 0o700) and create the token file atomically
with restrictive permissions rather than relying on chmodSync: open a temp file
inside AUTH_DIR with an explicit mode 0o600 (using fs.openSync/ fs.writeSync or
fs.writeFile with a file descriptor), write the JSON, fs.fsync the descriptor,
close it, then fs.renameSync the temp file to AUTH_FILE to make the replace
atomic; remove the post-write chmodSync fallback (or keep only as a
best-effort), and use the same approach when creating AUTH_DIR if it may already
exist (ensure mode 0o700 is applied on initial creation).
In `@src/hooks/handler.ts`:
- Around line 172-189: The hook currently enqueues every activity via
appendToServerQueue(...) regardless of auth state, causing unbounded
PENDING_FILE growth for unauthenticated users; update the hook in handler.ts to
check the same isLoggedIn() condition used by ensureRelayRunning() before
calling appendToServerQueue, i.e., import or call isLoggedIn() and skip the
append when it returns false; alternatively (or additionally) add a defensive
cap inside appendToServerQueue (e.g., max bytes or line count, rotating or
dropping oldest entries) so callers cannot grow the queue indefinitely—refer to
appendToServerQueue, ensureRelayRunning, isLoggedIn, and PENDING_FILE when
making changes.
In `@src/relay/daemon.ts`:
- Around line 120-125: The current mapping that sets client_event_id only
in-memory (the events = lines.map(...) block using JSON.parse and randomUUID())
must persist IDs and avoid crashing on malformed JSON: read and parse lines
one-by-one, for each valid JSON ensure client_event_id exists (assign
randomUUID() if missing) and write back the updated records to the processing
file (or an atomic tmp file swap) so IDs survive restarts/retries; for lines
that fail JSON.parse, move them to a quarantine/invalid file or append to a
bad-lines log and skip them (do not throw), and ensure the enqueue/send path
uses the persisted file content (so the client_event_id is durable). Also apply
the same fix to the other processing block referenced (around the 226-254
region) so both places persist IDs and handle malformed lines consistently.
- Around line 61-65: The three fetch calls that post to auth/refresh and the
other endpoints currently can hang; wrap each fetch (the one assigning to resp
and the other two fetch invocations) with an AbortController and a setTimeout
that calls controller.abort() after a reasonable timeout (e.g., 5–15s), handle
AbortError distinctly, and add retry logic with exponential backoff
(configurable max attempts) so transient network stalls are retried; ensure you
clear the timeout on success and propagate or log final errors consistently.
- Around line 101-107: The WebSocket connect Promise (the block using ws.onopen,
ws.onerror and ws.send(token)) can hang; modify it to add a connection timeout,
an onclose handler and defensive send handling: set a timer (e.g., const
timeoutId = setTimeout(...)) that rejects the Promise with a timeout Error if
not opened in time, add ws.onclose to reject if the socket closes before open,
wrap ws.send(token) in try/catch and reject on send failure, and ensure you
clear the timer and remove/disable ws.onopen/ws.onerror/ws.onclose handlers on
any resolve or reject to avoid leaks.
In `@src/relay/pid.ts`:
- Around line 29-52: The liveness check in isProcessAlive currently treats any
exception from process.kill(pid, 0) as "not alive"; change isProcessAlive to
catch the thrown error (e.g., catch (err)) and inspect err.code: if code ===
'ESRCH' return false (process does not exist), if code === 'EPERM' return true
(process exists but not signalable), otherwise rethrow the error (or return
false only for known cases). This preserves stopRelay behavior (which calls
isProcessAlive) so clearPid() is only invoked when ESRCH reports the process
truly gone.
In `@src/relay/queue.ts`:
- Around line 79-83: The current catch for renameSync(PENDING_FILE,
processingFile) (in the claim routine handling PENDING_FILE/processingFile)
swallows all errors and returns null; change it to inspect the caught error
(catch (err)) and only return null for benign races like err.code === 'ENOENT',
otherwise rethrow or surface the error (e.g., processLogger.error and throw) so
real filesystem failures are not treated as an empty queue.
- Around line 35-49: The directory and queue file are created with default
permissions (subject to umask), which can be too permissive; update ensureDir()
to create QUEUE_DIR with mode 0o700 and ensure appendToServerQueue
writes/creates PENDING_FILE with mode 0o600 (matching token-store posture).
Specifically, in ensureDir() set mkdirSync(..., { recursive: true, mode: 0o700
}) and after calling appendFileSync(PENDING_FILE, ...) ensure the file
permissions are set to 0o600 (e.g., check for file existence and use chmodSync
or create the file with correct mode before writing) so QUEUE_DIR and
PENDING_FILE have strict private permissions.
- Around line 18-33: Queue entries are persisting sensitive raw values
(toolInput, cwd, transcriptPath, reason) into QueueEntry and sending them via
appendToServerQueue/relay daemon; change the enqueue flow to sanitize or
allowlist before persistence. In the code path that constructs the QueueEntry
(the handler that currently passes parsed.tool_input into appendToServerQueue),
replace raw toolInput with a redacted/filtered object (e.g., remove keys
matching secret patterns, strip file paths, and null out transcriptPath/cwd or
replace with sanitized basenames), and ensure appendToServerQueue only receives
this sanitized QueueEntry shape; also update the daemon send path that reads
persisted entries to assume those fields are redacted. Locate symbols
QueueEntry, appendToServerQueue, and the handler function that builds the queue
entry and apply the sanitization logic there (or add a sanitizeQueueEntry
helper) so no raw secrets/paths are written or relayed.
---
Nitpick comments:
In `@src/auth/login.ts`:
- Around line 35-45: postJson currently uses fetch without any timeout and the
logout fetch call is also unprotected, which can hang forever; modify postJson
and the direct fetch in logout to use an AbortController with a configurable
deadline (e.g., constant or parameter) so each request is aborted after the
timeout: create an AbortController, start a timer that calls controller.abort()
after the timeout, pass controller.signal to fetch in postJson (and to the fetch
in logout), clear the timer on success or error, and map an AbortError to a
clear timeout error message so callers can handle it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 20e3423f-6cdf-427d-94e4-f7b6fb6bac1b
📒 Files selected for processing (10)
.gitignoreCHANGELOG.mdbin/failproofai.mjssrc/auth/login.tssrc/auth/logout.tssrc/auth/token-store.tssrc/hooks/handler.tssrc/relay/daemon.tssrc/relay/pid.tssrc/relay/queue.ts
|
Addressed all 13 CodeRabbit comments in a follow-up commit: Auth / tokens
Queue (sanitization + safety)
Relay daemon
PID
CLI
Server (matching contract)
🤖 Generated with Claude Code |
Security / privacy:
- Atomic 0o600 token write with umask-safe openSync + renameSync
- ~/.failproofai created with 0o700, queue files with 0o600
- Sanitize queue entries before relay: drop toolInput and transcriptPath,
hash cwd (SHA-256), redact AWS/JWT/GitHub/OpenAI/Bearer patterns in reason
- appendToServerQueue is a no-op when user is not logged in
- Cap pending.jsonl at 50 MB to prevent unbounded growth
Reliability:
- Generate client_event_id at enqueue (persisted) for idempotent retries
- ReplacingMergeTree on server dedups by (user_id, event_id) on merge
- Ack protocol: relay waits for {ack: batch_id} before deleting the
processing file; deletion otherwise leaves it for retry on reconnect
- Skip malformed JSON lines instead of wedging the processing file
- claimPendingBatch only swallows ENOENT; other FS errors throw so events
are never silently stranded
Network hardening:
- AbortSignal.timeout on every fetch (login/logout/refresh/sync)
- WebSocket connect has 15s timeout + onclose handler before open so
the connect promise always settles
- 30s ack timeout per batch; connection close aborts pending acks
CLI polish:
- isProcessAlive distinguishes ESRCH (dead) from EPERM (alive-but-unsignalable)
- `relay start` polls the PID up to 2s after spawn, fixing the race that
made the first start report "Failed to start daemon"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
appendFileSync), then lazy-starts a background relay daemon that streams events to the server via WebSocket — zero added latency to the policy evaluation pathlogin(Google OAuth device-code flow),logout,whoami,relay start|stop|status,syncrename→ freshpending.jsonl) so concurrent hook writes never collide with the batch being flushed. Orphanprocessing-*.jsonlfiles from crashed daemons are reprocessed on next startup; every event gets aclient_event_idfor idempotent server-side dedupFiles
src/auth/{login,logout,token-store}.ts— OAuth flow +~/.failproofai/auth.json(0600)src/relay/{queue,pid,daemon}.ts— queue + daemon with backoff, refresh, orphan recoverysrc/hooks/handler.ts— enqueue + lazy-start afterpersistHookActivity()bin/failproofai.mjs— new subcommand dispatch + internal--relay-daemonentrypoint.gitignore— excludeplatform/(closed-source server/frontend lives in a separate private repo)CHANGELOG.md—UnreleasedentryTest plan
npx tsc --noEmit)bun ./bin/failproofai.mjs --helpshows the new commandsbun ./bin/failproofai.mjs whoamiprints "Not logged in"bun ./bin/failproofai.mjs relay statusprints "Relay daemon not running"bun ./bin/failproofai.mjs syncerrors with "Not logged in" (graceful)failproofai login→ browser OAuth → token saved → relay auto-starts → events flow to server🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
login,logout, andwhoamiauthentication commandsrelaycommand withstart,stop, andstatussubcommands to manage the sync daemonsynccommand for manual event synchronization