Skip to content

feat: cloud platform client (login, relay, queue)#132

Merged
NiveditJain merged 3 commits into
mainfrom
luv-132
Apr 22, 2026
Merged

feat: cloud platform client (login, relay, queue)#132
NiveditJain merged 3 commits into
mainfrom
luv-132

Conversation

@NiveditJain
Copy link
Copy Markdown
Member

@NiveditJain NiveditJain commented Apr 22, 2026

Summary

  • Add CLI-side plumbing for the new failproofai cloud platform (closed-source server + dashboard live in a separate private repo)
  • Hook handler fire-and-forgets events into a local queue (~0.5ms appendFileSync), then lazy-starts a background relay daemon that streams events to the server via WebSocket — zero added latency to the policy evaluation path
  • New CLI subcommands: login (Google OAuth device-code flow), logout, whoami, relay start|stop|status, sync
  • Relay daemon survives reboots: the hook handler auto-respawns it (~1ms PID check when running, ~5ms spawn when needed). Uses a rotate-then-process queue pattern (atomic rename → fresh pending.jsonl) so concurrent hook writes never collide with the batch being flushed. Orphan processing-*.jsonl files from crashed daemons are reprocessed on next startup; every event gets a client_event_id for idempotent server-side dedup

Files

  • src/auth/{login,logout,token-store}.ts — OAuth flow + ~/.failproofai/auth.json (0600)
  • src/relay/{queue,pid,daemon}.ts — queue + daemon with backoff, refresh, orphan recovery
  • src/hooks/handler.ts — enqueue + lazy-start after persistHookActivity()
  • bin/failproofai.mjs — new subcommand dispatch + internal --relay-daemon entrypoint
  • .gitignore — exclude platform/ (closed-source server/frontend lives in a separate private repo)
  • CHANGELOG.mdUnreleased entry

Test plan

  • Typecheck passes (npx tsc --noEmit)
  • bun ./bin/failproofai.mjs --help shows the new commands
  • bun ./bin/failproofai.mjs whoami prints "Not logged in"
  • bun ./bin/failproofai.mjs relay status prints "Relay daemon not running"
  • bun ./bin/failproofai.mjs sync errors with "Not logged in" (graceful)
  • Hook handler continues to work (server queue is fail-open — errors are swallowed)
  • End-to-end login flow once the platform server is deployed: failproofai login → browser OAuth → token saved → relay auto-starts → events flow to server

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added cloud platform client with login, logout, and whoami authentication commands
    • Hook events are now queued locally and automatically synced to the cloud server via background relay daemon
    • Added relay command with start, stop, and status subcommands to manage the sync daemon
    • Added sync command for manual event synchronization
    • Implemented secure device-code authentication flow for user login

NiveditJain and others added 2 commits April 22, 2026 00:13
Add CLI-side plumbing for the new failproofai cloud platform:
- src/auth: device-code Google OAuth flow, token storage at ~/.failproofai/auth.json (0600)
- src/relay: rotate-then-process queue (no race with concurrent hooks), WebSocket relay daemon with exponential backoff, orphan recovery across crashes, and client_event_id for idempotent retries
- Hook handler fire-and-forgets events to the local queue (~0.5ms appendFileSync) and lazy-starts the daemon if it's not running, so users never need to manage daemon lifecycle
- bin/failproofai.mjs: login / logout / whoami / relay start|stop|status / sync subcommands, plus internal --relay-daemon entrypoint
- .gitignore: exclude platform/ (closed-source server + frontend lives in a separate private repo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Warning

Rate limit exceeded

@NiveditJain has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 43 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d8f78fb5-8280-4eb1-861a-28d6de4c83ce

📥 Commits

Reviewing files that changed from the base of the PR and between bef4f21 and 60f943f.

📒 Files selected for processing (8)
  • bin/failproofai.mjs
  • src/auth/login.ts
  • src/auth/logout.ts
  • src/auth/token-store.ts
  • src/hooks/handler.ts
  • src/relay/daemon.ts
  • src/relay/pid.ts
  • src/relay/queue.ts
📝 Walkthrough

Walkthrough

Added cloud platform authentication (login/logout/whoami) with device-code flow, filesystem-backed token storage, and a background relay daemon that queues hook events locally and syncs them to a remote server via WebSocket or REST. Integrated relay startup in hook handler and CLI. Updated CLI to support new auth and relay management subcommands.

Changes

Cohort / File(s) Summary
Configuration
.gitignore, CHANGELOG.md
Added ignore rule for /platform directory; documented new cloud client features including login, logout, whoami, relay, and sync subcommands.
Authentication
src/auth/login.ts, src/auth/logout.ts, src/auth/token-store.ts
Implemented device-code OAuth login flow with automatic browser opening, token polling, and relay auto-start. Added logout and whoami commands. Created filesystem-backed token persistence in ~/.failproofai/auth.json with safe permissions.
Relay Queue & Process Management
src/relay/queue.ts, src/relay/pid.ts
Introduced file-backed event queue in ~/.failproofai/cache/server-queue with atomic batch claiming for idempotent processing. Added PID management utilities for relay daemon lifecycle (read, write, check liveness, stop, status).
Relay Daemon
src/relay/daemon.ts
Implemented daemon process that maintains persistent WebSocket connection to server, batches queued events, handles token refresh with expiry checks, recovers orphaned processing files on startup, and retries with exponential backoff. Includes one-shot sync via REST for CLI usage.
CLI Integration
bin/failproofai.mjs
Added early --relay-daemon fast-path for daemon startup. Expanded subcommand set to include login, logout, whoami, relay (start/stop/status), and sync. Each routes to corresponding module and handles exit codes.
Hook Integration
src/hooks/handler.ts
Modified hook handler to enqueue activity to relay queue and lazily start relay daemon after local persistence, both fire-and-forget with error swallowing.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as CLI (login)
    participant Server as Server (auth)
    participant Browser as Browser
    participant TokenStore as Token Store
    participant Daemon as Relay Daemon

    User->>CLI: failproofai login
    CLI->>Server: POST /api/v1/auth/device-code
    Server-->>CLI: device_code, user_code, verification_uri, interval
    CLI->>Browser: open verification_uri
    Browser->>User: Display user_code
    User->>Browser: Verify code
    CLI->>Server: Poll POST /api/v1/auth/device-token
    Server-->>CLI: (pending...)
    User->>Browser: Approve
    CLI->>Server: Poll POST /api/v1/auth/device-token
    Server-->>CLI: access_token, refresh_token, expires_at
    CLI->>TokenStore: writeTokens()
    TokenStore->>TokenStore: Write to ~/.failproofai/auth.json
    CLI->>Daemon: ensureRelayRunning()
    Daemon-->>CLI: Started (or already running)
    CLI-->>User: Logged in as <email>
Loading
sequenceDiagram
    participant Hook as Hook Handler
    participant Queue as Queue (pending.jsonl)
    participant Daemon as Relay Daemon
    participant Server as Server (ingest)
    participant TokenStore as Token Store

    Hook->>Queue: appendToServerQueue(activity + toolInput)
    Queue->>Queue: Append JSON line to pending.jsonl
    Hook->>Daemon: ensureRelayRunning()
    Daemon->>Daemon: Start detached process if needed
    Daemon->>Queue: claimPendingBatch()
    Queue->>Queue: Atomic rename pending.jsonl → processing-<ts>-<pid>.jsonl
    Daemon->>TokenStore: readTokens() + refresh if needed
    Daemon->>Server: WebSocket connect
    Daemon->>Server: Send batch of events (with client_event_id)
    Server-->>Daemon: Acknowledged
    Daemon->>Queue: deleteProcessingFile(processing-<ts>-<pid>.jsonl)
    Daemon->>Queue: Repeat on flush interval or file watch
Loading
sequenceDiagram
    participant User
    participant CLI as CLI (sync)
    participant Queue as Queue
    participant Daemon as Daemon (one-shot)
    participant Server as Server (ingest)
    participant TokenStore as Token Store

    User->>CLI: failproofai sync
    CLI->>Daemon: runOneShotSync()
    Daemon->>TokenStore: readTokens()
    Daemon->>Queue: Read all pending + orphaned files
    Daemon->>Server: POST /api/v1/ingest/events (batch REST)
    Server-->>Daemon: 200 OK
    Daemon->>Queue: deleteProcessingFile() for all sent
    Daemon-->>CLI: Return event count
    CLI-->>User: Synced N events
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 Hop, hop, hop—the relay goes,
Events queued where the JSON flows.
Login, logout, sync—a triad of grace,
Cloud-bound whispers in cyberspace!
~✨ CodeRabbit's Relay Sprite

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: cloud platform client (login, relay, queue)' accurately and concisely summarizes the main change—adding cloud platform client functionality with key components.
Description check ✅ Passed The PR description is comprehensive, covering objectives, architecture, files changed, and a detailed test plan. However, the checklist in the template (lint, tsc, test, build) is not explicitly marked as complete.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch luv-132

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (1)
src/auth/login.ts (1)

35-45: Add timeout to network calls to prevent indefinite hangs.

postJson and the fetch call in logout have no timeout. If a server stalls during TLS handshake, NAT drop, or load-balancer half-open, both the initial device-code call and polling loop will block forever, freezing the CLI. The deadline check in the polling loop provides no protection since it's bypassed by a hung individual fetch call.

🔧 Proposed fix
-async function postJson<T>(url: string, body: unknown): Promise<T> {
-  const resp = await fetch(url, {
-    method: "POST",
-    headers: { "Content-Type": "application/json" },
-    body: JSON.stringify(body),
-  });
+async function postJson<T>(url: string, body: unknown, timeoutMs = 15_000): Promise<T> {
+  const resp = await fetch(url, {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify(body),
+    signal: AbortSignal.timeout(timeoutMs),
+  });
   if (!resp.ok) {
     throw new Error(`${url} → ${resp.status} ${resp.statusText}`);
   }
   return (await resp.json()) as T;
 }

Apply the same timeout pattern to the direct fetch call in src/auth/logout.ts:12 for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/auth/login.ts` around lines 35 - 45, postJson currently uses fetch
without any timeout and the logout fetch call is also unprotected, which can
hang forever; modify postJson and the direct fetch in logout to use an
AbortController with a configurable deadline (e.g., constant or parameter) so
each request is aborted after the timeout: create an AbortController, start a
timer that calls controller.abort() after the timeout, pass controller.signal to
fetch in postJson (and to the fetch in logout), clear the timer on success or
error, and map an AbortError to a clear timeout error message so callers can
handle it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/failproofai.mjs`:
- Around line 351-357: The CLI reports "Failed to start daemon" because
ensureRelayRunning() spawns a detached child and returns immediately, then
relayStatus() runs before the child has written its PID; update
ensureRelayRunning (or the spawn path it calls) to avoid the race by either (a)
having the parent write the PID file synchronously using child.pid immediately
after spawn() returns, or (b) having ensureRelayRunning poll relayStatus() with
a short timeout/retry loop (e.g., check a few times with small delays) before
returning so the subsequent relayStatus() call sees the daemon; locate and
modify the spawn/daemon-starting code invoked by ensureRelayRunning (and any
helper like spawnDaemon) to implement one of these fixes.

In `@src/auth/login.ts`:
- Around line 22-33: The Windows branch in openBrowser uses [" /c","start", url]
which treats the first quoted token as the window title; update the Windows args
in openBrowser so cmd is "cmd" and args are ["/c", "start", '""', urlQuoted] —
i.e., insert an explicit empty title argument ('""') before the URL and ensure
the URL is quoted/escaped for cmd (so spaces, & and " are handled) when building
the urlQuoted value; leave non-Windows branches unchanged and keep using
spawn(..., { detached: true, stdio: "ignore" }).unref().

In `@src/auth/logout.ts`:
- Around line 11-19: The logout flow can hang because fetch has no timeout;
update the try block that calls fetch(`${tokens.server_url}/api/v1/auth/logout`,
...) to use an AbortController: create controller = new AbortController(), pass
controller.signal to fetch, start a timeout (e.g., 3–5s) that calls
controller.abort(), and clear that timer in finally so the timer doesn't leak;
ensure the fetch call still sends JSON body with tokens.refresh_token and that
the catch remains best-effort so clearTokens() (or the rest of logout) proceeds
immediately after timeout/abort.

In `@src/auth/token-store.ts`:
- Around line 27-35: writeTokens currently creates AUTH_DIR and writes AUTH_FILE
using the process umask, then tightens permissions with chmodSync, leaving a
race where tokens may be world-readable; change mkdirSync(AUTH_DIR) to set
directory mode explicitly (mode: 0o700) and create the token file atomically
with restrictive permissions rather than relying on chmodSync: open a temp file
inside AUTH_DIR with an explicit mode 0o600 (using fs.openSync/ fs.writeSync or
fs.writeFile with a file descriptor), write the JSON, fs.fsync the descriptor,
close it, then fs.renameSync the temp file to AUTH_FILE to make the replace
atomic; remove the post-write chmodSync fallback (or keep only as a
best-effort), and use the same approach when creating AUTH_DIR if it may already
exist (ensure mode 0o700 is applied on initial creation).

In `@src/hooks/handler.ts`:
- Around line 172-189: The hook currently enqueues every activity via
appendToServerQueue(...) regardless of auth state, causing unbounded
PENDING_FILE growth for unauthenticated users; update the hook in handler.ts to
check the same isLoggedIn() condition used by ensureRelayRunning() before
calling appendToServerQueue, i.e., import or call isLoggedIn() and skip the
append when it returns false; alternatively (or additionally) add a defensive
cap inside appendToServerQueue (e.g., max bytes or line count, rotating or
dropping oldest entries) so callers cannot grow the queue indefinitely—refer to
appendToServerQueue, ensureRelayRunning, isLoggedIn, and PENDING_FILE when
making changes.

In `@src/relay/daemon.ts`:
- Around line 120-125: The current mapping that sets client_event_id only
in-memory (the events = lines.map(...) block using JSON.parse and randomUUID())
must persist IDs and avoid crashing on malformed JSON: read and parse lines
one-by-one, for each valid JSON ensure client_event_id exists (assign
randomUUID() if missing) and write back the updated records to the processing
file (or an atomic tmp file swap) so IDs survive restarts/retries; for lines
that fail JSON.parse, move them to a quarantine/invalid file or append to a
bad-lines log and skip them (do not throw), and ensure the enqueue/send path
uses the persisted file content (so the client_event_id is durable). Also apply
the same fix to the other processing block referenced (around the 226-254
region) so both places persist IDs and handle malformed lines consistently.
- Around line 61-65: The three fetch calls that post to auth/refresh and the
other endpoints currently can hang; wrap each fetch (the one assigning to resp
and the other two fetch invocations) with an AbortController and a setTimeout
that calls controller.abort() after a reasonable timeout (e.g., 5–15s), handle
AbortError distinctly, and add retry logic with exponential backoff
(configurable max attempts) so transient network stalls are retried; ensure you
clear the timeout on success and propagate or log final errors consistently.
- Around line 101-107: The WebSocket connect Promise (the block using ws.onopen,
ws.onerror and ws.send(token)) can hang; modify it to add a connection timeout,
an onclose handler and defensive send handling: set a timer (e.g., const
timeoutId = setTimeout(...)) that rejects the Promise with a timeout Error if
not opened in time, add ws.onclose to reject if the socket closes before open,
wrap ws.send(token) in try/catch and reject on send failure, and ensure you
clear the timer and remove/disable ws.onopen/ws.onerror/ws.onclose handlers on
any resolve or reject to avoid leaks.

In `@src/relay/pid.ts`:
- Around line 29-52: The liveness check in isProcessAlive currently treats any
exception from process.kill(pid, 0) as "not alive"; change isProcessAlive to
catch the thrown error (e.g., catch (err)) and inspect err.code: if code ===
'ESRCH' return false (process does not exist), if code === 'EPERM' return true
(process exists but not signalable), otherwise rethrow the error (or return
false only for known cases). This preserves stopRelay behavior (which calls
isProcessAlive) so clearPid() is only invoked when ESRCH reports the process
truly gone.

In `@src/relay/queue.ts`:
- Around line 79-83: The current catch for renameSync(PENDING_FILE,
processingFile) (in the claim routine handling PENDING_FILE/processingFile)
swallows all errors and returns null; change it to inspect the caught error
(catch (err)) and only return null for benign races like err.code === 'ENOENT',
otherwise rethrow or surface the error (e.g., processLogger.error and throw) so
real filesystem failures are not treated as an empty queue.
- Around line 35-49: The directory and queue file are created with default
permissions (subject to umask), which can be too permissive; update ensureDir()
to create QUEUE_DIR with mode 0o700 and ensure appendToServerQueue
writes/creates PENDING_FILE with mode 0o600 (matching token-store posture).
Specifically, in ensureDir() set mkdirSync(..., { recursive: true, mode: 0o700
}) and after calling appendFileSync(PENDING_FILE, ...) ensure the file
permissions are set to 0o600 (e.g., check for file existence and use chmodSync
or create the file with correct mode before writing) so QUEUE_DIR and
PENDING_FILE have strict private permissions.
- Around line 18-33: Queue entries are persisting sensitive raw values
(toolInput, cwd, transcriptPath, reason) into QueueEntry and sending them via
appendToServerQueue/relay daemon; change the enqueue flow to sanitize or
allowlist before persistence. In the code path that constructs the QueueEntry
(the handler that currently passes parsed.tool_input into appendToServerQueue),
replace raw toolInput with a redacted/filtered object (e.g., remove keys
matching secret patterns, strip file paths, and null out transcriptPath/cwd or
replace with sanitized basenames), and ensure appendToServerQueue only receives
this sanitized QueueEntry shape; also update the daemon send path that reads
persisted entries to assume those fields are redacted. Locate symbols
QueueEntry, appendToServerQueue, and the handler function that builds the queue
entry and apply the sanitization logic there (or add a sanitizeQueueEntry
helper) so no raw secrets/paths are written or relayed.

---

Nitpick comments:
In `@src/auth/login.ts`:
- Around line 35-45: postJson currently uses fetch without any timeout and the
logout fetch call is also unprotected, which can hang forever; modify postJson
and the direct fetch in logout to use an AbortController with a configurable
deadline (e.g., constant or parameter) so each request is aborted after the
timeout: create an AbortController, start a timer that calls controller.abort()
after the timeout, pass controller.signal to fetch in postJson (and to the fetch
in logout), clear the timer on success or error, and map an AbortError to a
clear timeout error message so callers can handle it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20e3423f-6cdf-427d-94e4-f7b6fb6bac1b

📥 Commits

Reviewing files that changed from the base of the PR and between 56f2201 and bef4f21.

📒 Files selected for processing (10)
  • .gitignore
  • CHANGELOG.md
  • bin/failproofai.mjs
  • src/auth/login.ts
  • src/auth/logout.ts
  • src/auth/token-store.ts
  • src/hooks/handler.ts
  • src/relay/daemon.ts
  • src/relay/pid.ts
  • src/relay/queue.ts

Comment thread bin/failproofai.mjs
Comment thread src/auth/login.ts
Comment thread src/auth/logout.ts
Comment thread src/auth/token-store.ts
Comment thread src/hooks/handler.ts Outdated
Comment thread src/relay/daemon.ts
Comment thread src/relay/pid.ts
Comment thread src/relay/queue.ts Outdated
Comment thread src/relay/queue.ts
Comment thread src/relay/queue.ts Outdated
@NiveditJain
Copy link
Copy Markdown
Member Author

Addressed all 13 CodeRabbit comments in a follow-up commit:

Auth / tokens

  • token-store.ts — Tokens now written atomically via openSync(mode 0o600) + renameSync; ~/.failproofai created with mode 0o700. No world-readable window.
  • login.ts — Windows start now uses empty title arg: cmd /c start "" <url>. All fetch calls wrapped in AbortSignal.timeout(10_000).
  • logout.tsfetch wrapped in AbortSignal.timeout(3_000) so local clear isn't blocked by a slow server.

Queue (sanitization + safety)

  • Sensitive fields dropped/redacted before persistence: toolInput and transcriptPath dropped entirely; cwd replaced with SHA-256 cwd_hash; reason passed through regex redactor for AWS keys, JWTs, GH tokens, sk-* API keys, and Bearer *.
  • client_event_id UUID now generated at enqueue time (persisted to disk) so retries dedup cleanly.
  • Queue dir created with 0o700, files with 0o600.
  • claimPendingBatch now returns null only on ENOENT; other FS errors throw so events don't silently strand.
  • readProcessingFile skips malformed JSON lines instead of throwing.
  • 50 MB cap on pending file prevents unbounded growth when a logged-out user accumulates events forever.
  • appendToServerQueue is a no-op when user is not logged in (checks auth.json existence).

Relay daemon

  • All fetch calls wrapped in AbortSignal.timeout.
  • WebSocket connect now has a 15s timeout, handles onclose before open, and cleans up handlers on settle.
  • Ack protocol added: each batch is sent with a batch_id and the daemon waits for {ack: batch_id} from the server before considering the batch delivered. Processing file is only deleted after every batch is acked. On connection close or 30s ack timeout, the processing file remains and is reprocessed on next startup.

PID

  • isProcessAlive now returns true on EPERM (process exists but unsignalable), false only on ESRCH. Prevents spawning a second daemon when the original was reparented.

CLI

  • relay start now polls the PID file up to 2s after spawn before reporting status, fixing the "Failed to start daemon" race.

Server (matching contract)

  • ClickHouse table switched to ReplacingMergeTree keyed on (user_id, event_id) — server-side dedup for retried events.
  • WebSocket ingest parses {batch_id, events} and emits {ack: batch_id} after a successful insert; sends {error, batch_id} on insert failure.

🤖 Generated with Claude Code

Security / privacy:
- Atomic 0o600 token write with umask-safe openSync + renameSync
- ~/.failproofai created with 0o700, queue files with 0o600
- Sanitize queue entries before relay: drop toolInput and transcriptPath,
  hash cwd (SHA-256), redact AWS/JWT/GitHub/OpenAI/Bearer patterns in reason
- appendToServerQueue is a no-op when user is not logged in
- Cap pending.jsonl at 50 MB to prevent unbounded growth

Reliability:
- Generate client_event_id at enqueue (persisted) for idempotent retries
- ReplacingMergeTree on server dedups by (user_id, event_id) on merge
- Ack protocol: relay waits for {ack: batch_id} before deleting the
  processing file; deletion otherwise leaves it for retry on reconnect
- Skip malformed JSON lines instead of wedging the processing file
- claimPendingBatch only swallows ENOENT; other FS errors throw so events
  are never silently stranded

Network hardening:
- AbortSignal.timeout on every fetch (login/logout/refresh/sync)
- WebSocket connect has 15s timeout + onclose handler before open so
  the connect promise always settles
- 30s ack timeout per batch; connection close aborts pending acks

CLI polish:
- isProcessAlive distinguishes ESRCH (dead) from EPERM (alive-but-unsignalable)
- `relay start` polls the PID up to 2s after spawn, fixing the race that
  made the first start report "Failed to start daemon"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NiveditJain NiveditJain merged commit a0c364a into main Apr 22, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant