feat: Phase 1 local performance optimizations by khaliqgant · Pull Request #126 · AgentWorkforce/relay

khaliqgant · 2026-01-10T03:41:21Z

Summary

Local Performance Optimizations

Per-connection write queues with backpressure
- Configurable high/low water marks (default 1500/500)
- Max queue size of 2000 messages
- Socket drain handling for slow consumers
Batched SQLite writes
- Configurable batch size (default 50 messages)
- Time-based flush (default 100ms)
- Memory-based flush (default 1MB)
- Metrics for monitoring batch behavior
Token bucket rate limiter with generous defaults
- 500 messages/sec sustained rate
- 1000 message burst capacity
- Per-agent tracking with auto-cleanup

Cloud Sync Improvements

Auto-detect workspace from git remote
- Daemon detects repo full name (e.g., AgentWorkforce/relay) from git remote
- Includes repo context in message sync payloads
Auto-link daemon to workspace
- When messages sync, if daemon isn't linked but repo is found in a workspace, it auto-links
- No manual workspace linking required if repo is already connected in dashboard
New DB query: findByRepoFullName
- Look up workspaces by repository for automatic resolution

Workspace Reliability

Worker thread health server
- Health check runs on separate thread, independent of main event loop
- Ensures health checks respond even during heavy compute (next build, etc.)
- Prevents false-positive machine restarts on Fly.io
- Falls back to main thread health check if worker fails to start

Dashboard Improvements

Agent interrupt button in LogViewerPanel
- Send Ctrl+C to agents stuck in loops to refocus them
- Interrupt by name via /agents/by-name/:name/interrupt endpoint
- Visual feedback during interrupt operation

Test plan

Run npm test - all 1197 tests pass
Run npm run build - compiles successfully
Verify health worker starts on dedicated port
Verify main server falls back gracefully if worker fails
Test interrupt button on running agent

🤖 Generated with Claude Code

- Add per-connection write queues with backpressure - Configurable high/low water marks (default 1500/500) - Max queue size of 2000 messages - Socket drain handling for slow consumers - Add batched SQLite writes - Configurable batch size (default 50 messages) - Time-based flush (default 100ms) - Memory-based flush (default 1MB) - Metrics for monitoring batch behavior - Add token bucket rate limiter with generous defaults - 500 messages/sec sustained rate - 1000 message burst capacity - Per-agent tracking with auto-cleanup These changes optimize the local daemon experience while preparing the architecture for cloud sync improvements.

- Add SyncQueue with adaptive batching - Size trigger (100 messages) - Time trigger (200ms) - Bytes trigger (512KB) - Add gzip compression for large payloads - Threshold: 1KB - Typical 60-80% reduction for message batches - Add disk spillover for offline resilience - Automatic spill on sync failure - Recovery on startup - Configurable max spill files - Add retry with exponential backoff - 3 retries by default - Exponential delay (1s, 2s, 4s) - Integrate into CloudSyncService - Real-time queueMessageForSync() method - Stats via getSyncQueueStats() - Graceful shutdown with flush This ensures zero message loss even during network outages and reduces bandwidth usage significantly.

- cloud-sync.test.ts: await stop() before testing disconnected state - connection.test.ts: wait for setImmediate drain before checking writes

- Add bulk-ingest.ts with multi-row INSERT and staging table strategies - Optimized connection pool (20 max, 30s idle timeout, 10s connection timeout) - Auto-select strategy based on batch size: - < 1000 rows: multi-row INSERT with chunking - > 1000 rows: staging table with single dedup INSERT SELECT - Enhanced /api/daemons/messages/stats with pool health metrics - Proper JSONB serialization for data/payloadMeta fields

Copilot

Pull request overview

This PR implements Phase 1 local performance optimizations for the agent relay system, focusing on three key areas: write queue management, batched SQLite operations, and rate limiting. The changes aim to improve throughput and reliability for high-volume message processing while maintaining backward compatibility.

Changes:

Added per-connection write queues with configurable backpressure handling to prevent blocking on slow consumers
Implemented batched SQLite writes with configurable size, time, and memory-based flush triggers
Added token bucket rate limiting with generous defaults (500 msg/sec sustained, 1000 burst) and per-agent tracking

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/storage/batched-sqlite-adapter.ts	New batched SQLite adapter that buffers writes and flushes based on size/time/memory thresholds
src/storage/adapter.ts	Updated storage configuration to support 'sqlite-batched' type and batch configuration options
src/daemon/sync-queue.ts	New optimized cloud sync queue with compression, disk spillover, and retry logic
src/daemon/router.ts	Integrated rate limiter into message routing with configurable limits
src/daemon/rate-limiter.ts	New token bucket rate limiter implementation with per-agent tracking
src/daemon/connection.ts	Added write queue with backpressure handling and socket drain management
src/daemon/connection.test.ts	Updated test to await write queue drain using setImmediate
src/daemon/cloud-sync.ts	Integrated optimized sync queue with spill recovery on startup
src/daemon/cloud-sync.test.ts	Updated tests to await async stop() method
src/cloud/db/index.ts	Exported new bulk ingest utilities for high-volume operations
src/cloud/db/drizzle.ts	Enhanced connection pool with optimized settings and error logging
src/cloud/db/bulk-ingest.ts	New bulk insert utilities using raw SQL for performance
src/cloud/api/daemons.ts	Updated message sync endpoint to use optimized bulk insert with health monitoring

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-10T03:42:33Z

+   * Queue a message for batched writing.
+   * May trigger an immediate flush if thresholds are exceeded.
+   */
+  async saveMessage(message: StoredMessage): Promise<void> {


The saveMessage method lacks test coverage for the batching behavior, including verification that messages are properly queued and flushed based on size, time, and memory thresholds. Consider adding tests that verify flush conditions and ensure no message loss occurs during batched operations.

Copilot · 2026-01-10T03:42:33Z

+  /**
+   * Flush all pending messages to SQLite.
+   */
+  async flush(): Promise<void> {


The flush method lacks test coverage for concurrent flush scenarios and error handling during batch writes. Consider adding tests that verify the re-queuing behavior when writes fail and ensure that concurrent flushes are properly synchronized.

Copilot · 2026-01-10T03:42:33Z

+   * Queue a message for sync to cloud.
+   * May trigger an immediate flush if thresholds are exceeded.
+   */
+  async enqueue(message: StoredMessage): Promise<void> {


The enqueue method lacks test coverage for compression triggers, disk spillover behavior, and recovery scenarios. Consider adding tests that verify compression is applied when payloads exceed the threshold and that spilled messages are properly recovered on startup.

Copilot · 2026-01-10T03:42:33Z

+    try {
+      await fs.mkdir(this.config.spillDir, { recursive: true });
+
+      const filename = `spill-${Date.now()}-${Math.random().toString(36).slice(2, 10)}.json`;


Using Math.random() for file naming could lead to collisions in high-concurrency scenarios. Consider using a more robust identifier such as a UUID or an atomic counter to ensure uniqueness.

Copilot · 2026-01-10T03:42:34Z

+   * Try to acquire a token for the given agent.
+   * Returns true if the message should be allowed, false if rate limited.
+   */
+  tryAcquire(agentName: string): boolean {


The tryAcquire method lacks test coverage for the token bucket refill logic, especially edge cases such as rapid successive calls and boundary conditions when tokens are at exactly 1.0. Consider adding tests that verify correct token accounting across various time intervals.

Copilot · 2026-01-10T03:42:34Z

+   * Drain the write queue to the socket.
+   * Respects socket backpressure by waiting for 'drain' events.
+   */
+  private drain(): void {


The drain method lacks test coverage for socket backpressure scenarios, including verification that the drain event handler is properly registered and that messages are queued when the socket buffer is full. Consider adding tests that simulate slow consumers and verify correct backpressure handling.

Copilot · 2026-01-10T03:42:34Z

+export async function bulkInsertMessages(
+  pool: Pool,
+  messages: NewAgentMessage[],
+  chunkSize = 100
+): Promise<BulkInsertResult> {


The bulkInsertMessages function lacks test coverage for chunk processing, error handling within chunks, and deduplication behavior. Consider adding tests that verify correct handling of partial failures and that duplicate messages are properly counted.

Copilot · 2026-01-10T03:42:34Z

+export async function streamingBulkInsert(
+  pool: Pool,
+  messages: NewAgentMessage[]
+): Promise<BulkInsertResult> {


The streamingBulkInsert function lacks test coverage for transaction rollback scenarios and staging table cleanup. Consider adding tests that verify the temporary table is properly cleaned up on commit/rollback and that errors during staging don't leave orphaned data.

- Replace Math.random() with UUID for spill file naming to avoid collisions in high-concurrency scenarios - Add comprehensive tests for batched-sqlite-adapter.ts covering batch size, time, and memory threshold triggers - Add tests for sync-queue.ts including enqueue behavior, compression, and disk spillover with recovery - Add tests for rate-limiter.ts covering token bucket refill logic and boundary conditions - Add tests for connection.ts drain/backpressure handling including socket buffer full scenarios and queue cleanup - Add tests for bulk-ingest.ts covering chunk processing, error handling, and deduplication behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add git-remote.ts utility for parsing git remote URLs - Add findByRepoFullName() to workspace queries for repo lookup - Include repoFullName in message sync payload from daemon - Auto-link daemon to workspace when syncing if repo matches - Better error messages when workspace can't be resolved This enables messages to sync to cloud automatically without requiring explicit workspace linking, as long as the repo is connected to a workspace in the dashboard. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…socket-baseline-architecture-zpoVU

- Prefix unused caught errors with _ (codex-auth-helper, dashboard-server) - Prefix unused parameters with _ (workspaces, sync-queue, trajectory/config, auth-detection) - Remove unused imports (bulk-ingest.test, drizzle, intro-expiration, daemon/api, cli-auth.test, rate-limiter.test, sync-queue.test, user-directory.test, batched-sqlite-adapter.test, pty-wrapper, tmux-wrapper) - Change let to const where variable is never reassigned (sync-queue.test) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Runs health check endpoint on a separate worker thread to ensure responses even when main event loop is blocked by heavy compute tasks (e.g., next build). Includes: - health-worker.ts: Minimal HTTP server in worker thread - health-worker-manager.ts: Manages worker lifecycle and stats updates - Updated Fly provisioner to use dedicated health port (3889) - Integrated into dashboard-server with fallback to main thread 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Allows users to send Ctrl+C (SIGINT) to agents stuck in loops, giving them an opportunity to refocus. Includes: - AgentManager: interrupt() and interruptByName() methods - Daemon API: POST /agents/:id/interrupt and /agents/by-name/:name/interrupt - Dashboard API: interruptAgent() client method - LogViewerPanel: Interrupt button with warning color styling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

khaliqgant · 2026-01-10T09:30:37Z

cc/ @willwashburn

claude added 4 commits January 10, 2026 01:39

fix: update tests for async stop() and write queue drain

d7b62ad

- cloud-sync.test.ts: await stop() before testing disconnected state - connection.test.ts: wait for setImmediate drain before checking writes

khaliqgant requested a review from Copilot January 10, 2026 03:41

Copilot AI reviewed Jan 10, 2026

View reviewed changes

khaliqgant and others added 7 commits January 10, 2026 00:54

chore: close workspace detection issues

7432e69

Merge branch 'main' of github.com:khaliqgant/agent-relay into claude/…

cea25a1

…socket-baseline-architecture-zpoVU

khaliqgant merged commit 426cf62 into main Jan 10, 2026
7 checks passed

khaliqgant deleted the claude/socket-baseline-architecture-zpoVU branch January 10, 2026 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Phase 1 local performance optimizations#126

feat: Phase 1 local performance optimizations#126
khaliqgant merged 11 commits intomainfrom
claude/socket-baseline-architecture-zpoVU

khaliqgant commented Jan 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

khaliqgant commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

khaliqgant commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Local Performance Optimizations

Cloud Sync Improvements

Workspace Reliability

Dashboard Improvements

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

khaliqgant commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khaliqgant commented Jan 10, 2026 •

edited

Loading