Add Agent Relay Cloud onboarding design document#35
Merged
khaliqgant merged 39 commits intomainfrom Dec 30, 2025
Merged
Conversation
Design for cloud-hosted agent-relay with multi-provider authentication: - GitHub OAuth as primary auth + repo connection - Provider credential vault (API keys + OAuth tokens) - Support for Claude, Codex, Gemini, and custom providers - Team templates for quick setup - Security considerations for credential storage bd-cloud-onboarding
Remove API key support in favor of login-based auth for all providers: - All providers now use "Login with X" buttons - OAuth tokens stored instead of API keys - Automatic token refresh before expiry - GitHub Copilot auto-connected via signup - Updated security model for OAuth token lifecycle bd-cloud-onboarding
Claude Code currently uses browser-based OAuth that's not fully headless-compatible. Updated design with practical alternatives: - Device authorization flow (enter code at anthropic.com) - Credential import from local Claude installation - Provider status table showing actual OAuth support levels References GitHub issue anthropics/claude-code#7100 for context. bd-cloud-onboarding
Both Claude Code and Codex use browser-based OAuth that doesn't support redirect URIs for third-party apps. Device flow (RFC 8628) is the solution. Added: - Sequence diagram showing device flow protocol - Provider-specific device flow URLs table - Complete UI flow with wireframes for both providers - TypeScript implementation (DeviceFlowAuth class) - Express API routes with background polling - React component with state machine References: - anthropics/claude-code#7100 (headless auth request) - openai/codex#2798 (remote auth request) bd-cloud-onboarding
Agent Relay supports both deployment models with unified auth: 1. Cloud Hosted: Everything runs in cloud, users just connect accounts 2. Self-Hosted: Agents run locally with optional cloud sync 3. Self-Hosted + Cloud: Local execution with cloud auth/dashboard Added: - Deployment model diagrams - Feature comparison table - Self-hosted onboarding CLI flow - Credential sync architecture - Hybrid mode (agent-relay cloud connect) bd-cloud-onboarding
Clarify that authentication is always handled by Agent Relay Cloud, not self-hosted. The two models are: 1. Cloud Hosted: We run everything (auth + compute + repos) 2. Self-Hosted: User brings compute, auth still via our cloud Updated: - Deployment diagrams showing auth always in cloud - Feature comparison table (removed offline auth option) - Self-hosted setup flow using `agent-relay cloud` commands - Credential sync architecture diagram bd-cloud-onboarding
Self-hosted users must connect to our cloud to authenticate since Claude/Codex require browser-based OAuth. This creates intentional friction to encourage cloud adoption. Key changes: - Auth runs on our servers, not user's headless server - User opens URL in browser, tokens sync to their server - Token refresh continues via cloud (ongoing dependency) - Added friction comparison table (cloud = easy, self = more steps) - QR code option for mobile auth Business rationale: Cloud should be path of least resistance. bd-cloud-onboarding
Implements one-click workspace provisioning with: - Database layer (PostgreSQL) for users, credentials, workspaces, repos - Credential vault with AES-256-GCM encryption for OAuth tokens - Workspace provisioner supporting Fly.io, Railway, and Docker - API routes for auth, providers, workspaces, repos, onboarding - CLI proxy authentication for Claude Code and Codex - Device flow OAuth for Google/Gemini Auth strategy: - Google: Real OAuth device flow (works today) - Claude/Codex: CLI-based auth with URL proxy through our UI - GitHub: Web OAuth (for signup)
- Add custom_domain and custom_domain_status fields to workspaces - Add API endpoints for domain management: - POST /workspaces/:id/domain - Set custom domain - POST /workspaces/:id/domain/verify - Verify DNS & provision SSL - DELETE /workspaces/:id/domain - Remove custom domain - DNS verification via CNAME lookup - SSL provisioning for Fly.io and Railway - Database index for custom domain lookups Users can now use their own domains (e.g., agents.acme.com) instead of the default workspace-xxx.agentrelay.dev URLs.
- Add plan field to users (free, pro, team, enterprise) - Custom domains require Team or Enterprise plan - Returns 402 Payment Required with upgrade link for free/pro users - Default URLs use subdomains: workspace-xxx.agentrelay.dev Pricing model: - Free/Pro: Subdomains only (included) - Team/Enterprise: Custom domains ($10/mo add-on)
- Add workspace_members table for team collaboration - Support roles: owner, admin, member, viewer - Add team invitation system with accept/decline - Add user avatar_url from GitHub - Add findByGithubUsername and findByEmail to users - Create /api/teams routes for member management - Update /api/auth/me to include avatar, plan, pending invites - Team members require Team/Enterprise plan Workspace member permissions: - Owner: Full control, can delete workspace - Admin: Can invite/remove members, edit settings - Member: Can use agents, view all - Viewer: Read-only access
New src/resiliency/ module provides: ## Health Monitor (health-monitor.ts) - Periodic process liveness checks - Configurable health check intervals and timeouts - Max consecutive failures before marking dead - Memory/CPU usage tracking - Event-based notifications ## Structured Logger (logger.ts) - JSON format for production, pretty format for dev - Log levels with filtering (debug, info, warn, error, fatal) - Context propagation (correlation IDs, agent names) - File output with rotation - Child loggers for scoped context ## Metrics (metrics.ts) - Per-agent crash/restart/spawn counters - System-wide health status - Prometheus-compatible export format - Metric history for trending - JSON export for dashboards ## Supervisor (supervisor.ts) - Ties together health + logging + metrics - Auto-restart with configurable limits - Crash notifications - Force restart capability - Overall status reporting Key improvements: - Agents auto-restart on crash (up to 5 attempts) - Dead process detection via PID checks - Structured logs for debugging - Metrics endpoint for observability - Event-based notifications for alerts
Implements Continuous-Claude-v2 inspired context persistence: - Ledger-based state storage for agent context - Handoff protocol for task continuation across restarts - Provider-specific context injection: - Claude: Uses hooks to inject context into CLAUDE.md - Codex: Uses config for periodic context refresh via system prompt - Gemini: Updates system instruction file - Auto-save functionality with configurable intervals - Integrated with supervisor for automatic context save on crash/restart
The new architecture makes the relay daemon the default mode: - Orchestrator manages multiple workspaces (repos) from a single API - Dashboard becomes primary interface for project switching - No separate "bridge" command needed New modules: - orchestrator.ts: Top-level service managing workspace daemons - workspace-manager.ts: Add/remove/switch workspaces - agent-manager.ts: Spawn/stop agents with resiliency integration - api.ts: REST and WebSocket API for dashboard - types.ts: Core types for workspaces, agents, events API endpoints: - GET/POST /workspaces - List/add workspaces - POST /workspaces/:id/switch - Switch active workspace - GET/POST /workspaces/:id/agents - List/spawn agents - WebSocket for real-time events
New components for multi-workspace navigation: - WorkspaceSelector: Dropdown for switching between workspaces - AddWorkspaceModal: Modal for adding new repository paths - useOrchestrator hook: Connects to orchestrator API and WebSocket Features: - Real-time workspace/agent updates via WebSocket - Provider detection (Claude, Codex, Gemini) - Git branch display - Status indicators (active, inactive, error)
Integrates multi-workspace support into the dashboard: - Connect useOrchestrator hook for workspace/agent management - Add WorkspaceSelector dropdown in sidebar for switching projects - Add AddWorkspaceModal for adding new repositories - Convert workspaces to projects for unified navigation - Update spawn/release handlers to use orchestrator when available
Implements complete billing system for Agent Relay Cloud: - Billing types and plan definitions (Free, Pro, Team, Enterprise) - Stripe service for customer, subscription, and payment management - Billing API endpoints (checkout, portal, webhooks, invoices) - PricingPlans component with monthly/yearly toggle - BillingPanel for subscription overview and management - Usage tracking and plan limit comparisons
Mission-control themed landing page with: - Animated agent network visualization with glowing connections - Live demo section showing agents collaborating in real-time - Dark atmospheric design with cyan/purple/orange accent colors - Responsive layout with smooth animations - Features, providers, and pricing sections - Terminal-style CTA with realistic CLI output - Static HTML version for SEO and fast initial load - React component version for dynamic interactions
Resolves conflicts: - App.tsx: Keep workspace switching + orchestrator logic, add shadow agent support - hooks/index.ts: Export both useOrchestrator and useAgentLogs - index.ts: Merge component exports with workspace/billing components - issues.jsonl: Accept upstream beads issues
- Add .js extensions to all ESM imports in cloud and resiliency modules - Install missing type declarations (@types/pg, @types/cors, etc.) - Fix Stripe API compatibility with type assertions for API version changes - Fix Redis/connect-redis API compatibility with type assertions - Fix WebSocket import in daemon/api.ts - Rename DaemonConfig to ApiDaemonConfig to avoid duplicate export - Exclude landing page from main tsconfig (it's React/browser code) - Fix type annotations for event handlers in supervisor.ts - Fix logger delete operator issue using destructuring - Fix SpawnConfig type mismatch in dashboard App.tsx
- Add Dockerfile for main cloud service (Railway deployment) - Add railway.json with healthcheck configuration - Add workspace Dockerfile and fly.toml template for Fly.io machines - Add deploy scripts for Railway and Fly.io setup - Add .env.cloud.example with domain configuration template - Update FlyProvisioner with custom domain support: - Support FLY_REGION for workspace placement - Support FLY_WORKSPACE_DOMAIN for custom subdomains (e.g., ws.agent-relay.com) - Auto-provision SSL certificates for custom hostnames - Enable auto-stop/start for cost optimization - Update all provisioners to use consistent image naming
Cloud-Daemon Sync: - Add /api/daemons endpoints for daemon registration and linking - Implement API key authentication for daemon-to-cloud communication - Add CloudSyncService for heartbeat, agent discovery, and credential sync - Support cross-machine message relay through cloud queue Database: - Add Drizzle ORM schema with full type safety - Create drizzle.ts client with typed query helpers - Add linkedDaemons table for daemon registration - Add subscriptions and usage_records tables for billing Local Development: - Add docker-compose.dev.yml for full local cloud stack - Add init-db.sql for PostgreSQL schema initialization - Add npm scripts: db:generate, db:migrate, db:push, db:studio Architecture: - Each machine gets one API key (not per-project) - Daemon reports all agents from all projects on that machine - Cloud aggregates agents from all linked machines - Messages can be relayed across machines via cloud queue
CLI Commands: - Add `agent-relay cloud link` to connect machine to cloud - Add `agent-relay cloud unlink` to disconnect from cloud - Add `agent-relay cloud status` to show sync status - Add `agent-relay cloud sync` to manually sync credentials - Implements browser-based OAuth flow with API key verification - Stores config securely in ~/.local/share/agent-relay/ CSS to Tailwind: - Convert sidebar-container CSS classes to Tailwind utilities - Convert workspace-selector-container to Tailwind classes - Remove appStyles CSS export (kept empty for backwards compat) - Use Tailwind theme tokens (bg-sidebar-bg, border-sidebar-border)
- Add CloudSessionProvider to wrap the dashboard with session management - Add useSession hook for detecting expired sessions - Add SessionExpiredModal component for re-login prompts - Add cloudApi client with automatic session expiration detection - Update auth.ts with session endpoint and error codes - Add ProjectGroup schema with coordinator agent configuration - Refactor db layer to use Drizzle ORM with strong typing - Add WorkspaceMemberQueries for team management - Fix null/undefined type conversions in vault
Coordinator Agents: - Add coordinators API at /api/project-groups/:groupId/coordinator - Add coordinator service for lifecycle management (start/stop/restart) - Support enable/disable, configuration updates for coordinators Plan-based Limits: - Add plan limits service with tier definitions (free/pro/team/enterprise) - Add middleware to enforce workspace and agent count limits - Add usage API for tracking quotas (/api/usage, /api/usage/summary) - Update workspaces API to check limits before creation - Return 402 errors with upgrade prompts when limits exceeded
- Change limits from per-workspace agents to global concurrent agents - Add repo count limit (3/20/100/unlimited per tier) - Add coordinatorsEnabled flag (Pro+ only) - Update middleware: checkRepoLimit, checkAgentLimit, checkCoordinatorAccess - Update usage API to return new limit structure - Free: 1 workspace, 3 repos, 2 concurrent agents, 10 compute hrs - Pro: 5 workspaces, 20 repos, 10 concurrent agents, 100 compute hrs - Team: 20 workspaces, 100 repos, 50 concurrent agents, 500 compute hrs
- Update LandingPage.tsx pricing section with new limits: - Free: 1 workspace, 3 repos, 2 concurrent agents, 10 hrs - Pro: 5 workspaces, 20 repos, 10 agents, 100 hrs, coordinators - Team: 20 workspaces, 100 repos, 50 agents, 500 hrs - Enterprise: Unlimited everything - Create dedicated PricingPage.tsx with: - Monthly/annual billing toggle (20% discount) - Plan cards with visual limit indicators - Feature comparison table - FAQ section explaining compute hours, BYOK, coordinators - Orbital animation CTA section - Add coordinator Pro-only restriction to coordinators API - Fix TypeScript warnings array type in usage.ts - Add pricing page styles to styles.css
- Move landing pages into dashboard/landing for build compatibility - / now serves LandingPage - /pricing serves PricingPage - /app serves the dashboard (post-login) - Remove old /landing route
…odes Create comprehensive documentation for three deployment modes: - CLOUD.md: Getting started with agent-relay.com managed service - SELF-HOSTED.md: Running on own infrastructure with cloud auth - LOCAL.md: Standalone local development usage
- Add scripts/dev.sh to start daemon + Next.js dashboard in tmux - Add dev:start, dev:stop, dev:attach npm scripts - Update LOCAL.md with simplified quickstart guide - Dashboard dev server on port 4281 with hot reload
- Add .github/workflows/docker.yml to build and push images on release - Publishes agent-relay and agent-relay-workspace images - Supports linux/amd64 and linux/arm64 platforms - Update all docs and docker-compose to use agentworkforce org
khaliqgant
pushed a commit
that referenced
this pull request
Dec 30, 2025
Key changes to match cloud-first paradigm: - OAuth handled by cloud API (src/cloud/api/integrations/slack.ts) - Credentials stored in encrypted vault, not local config files - SlackService in src/cloud/services/ for token management - Database schema via Drizzle ORM (slack_integrations table) - Daemon bridge syncs credentials via cloud-sync.ts - Orchestrator manages SlackBridge lifecycle per workspace - Plan-gated access (Pro+ only) - Dashboard UI with SlackIntegrationPanel component Follows same patterns as provider credentials from PR #35. bd-slack-integration
- AgentList: Solo agents (like Lead) now display without redundant group header - AgentList: Reduced spacing between agents (gap-2 → gap-1) - Header: Added notification badge on mobile hamburger menu for unread messages - App: Track unread messages when sidebar closed on mobile - LogViewer: Fixed auto-scroll to re-enable when user scrolls back to bottom - MessageList: Fixed auto-scroll reliability with setTimeout and instant behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces the foundational cloud infrastructure for Agent Relay, including landing/pricing pages, credential management, workspace provisioning, and multi-provider authentication. The design supports GitHub OAuth as primary authentication, a secure credential vault for API keys/OAuth tokens, and integration with Claude, Codex, Gemini, and custom providers. Additional features include team templates, coordinator agents for project groups, and Stripe-based subscription management.
Key changes:
- Landing and pricing pages with mission control aesthetic
- Secure credential vault using AES-256-GCM encryption
- Workspace provisioning for Fly.io, Railway, and Docker
- Database schema with Drizzle ORM for users, workspaces, project groups, and credentials
- Billing integration with Stripe for subscription tiers (free, pro, team, enterprise)
- Plan limits service with usage tracking and quota management
Reviewed changes
Copilot reviewed 70 out of 119 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/dashboard/landing/index.ts | Export landing and pricing page components |
| src/dashboard/landing/PricingPage.tsx | Full-featured pricing page with plan comparison and FAQ |
| src/dashboard/landing/LandingPage.tsx | Landing page with hero section, live demo, and feature showcase |
| src/dashboard/app/pricing/page.tsx | Next.js route wrapper for pricing page |
| src/dashboard/app/page.tsx | Switched main page from dashboard to landing page |
| src/dashboard/app/globals.css | Updated global styles for mission control theme |
| src/dashboard/app/app/page.tsx | New route for authenticated dashboard app |
| src/dashboard-server/server.ts | Enhanced dashboard server with agent online checks and processing state updates |
| src/dashboard-server/metrics.ts | Updated offline threshold to 30 seconds |
| src/daemon/workspace-manager.ts | Manager for multiple workspaces with switching support |
| src/daemon/types.ts | Core types for daemon (workspaces, agents, events) |
| src/daemon/server.ts | Updated daemon server to support processing state callbacks |
| src/daemon/router.ts | Added processing state change notifications |
| src/daemon/orchestrator.ts | Top-level orchestrator for managing workspace daemons |
| src/daemon/index.ts | Added exports for orchestrator and workspace manager |
| src/daemon/cloud-sync.ts | Cloud sync service for cross-machine agent coordination |
| src/daemon/api.ts | REST and WebSocket API for daemon communication |
| src/daemon/agent-manager.ts | Manages agents across workspaces with resiliency |
| src/cloud/vault/index.ts | Secure credential vault with AES-256-GCM encryption |
| src/cloud/services/planLimits.ts | Plan limits and usage tracking service |
| src/cloud/services/coordinator.ts | Coordinator agent service for project groups |
| src/cloud/server.ts | Express server with session management and CSRF protection |
| src/cloud/provisioner/index.ts | Workspace provisioning for Fly.io, Railway, Docker |
| src/cloud/index.ts | Main entry point for cloud infrastructure |
| src/cloud/db/schema.ts | Drizzle ORM schema for PostgreSQL |
| src/cloud/db/migrations/0001_initial.sql | Initial database migration |
| src/cloud/db/index.ts | Database layer exports and query namespaces |
| src/cloud/db/drizzle.ts | Drizzle database client with type-safe queries |
| src/cloud/config.ts | Cloud configuration with environment variable loading |
| src/cloud/billing/types.ts | Billing types for subscriptions and payments |
| src/cloud/billing/service.ts | Stripe integration for billing operations |
| src/cloud/billing/plans.ts | Subscription plan definitions and comparisons |
| src/cloud/billing/index.ts | Billing module exports |
| src/bridge/types.ts | Added shadow execution mode fields to SpawnRequest |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Design for cloud-hosted agent-relay with multi-provider authentication:
bd-cloud-onboarding