Merged
Conversation
- Create worker_pool.rs with multi-instance model loading - Each worker has own QuantizedModelState + Metal GPU device - Request channel distributes work via tokio mpsc - Semaphore tracks available workers for backpressure - Auto-detect worker count based on system memory (~2GB per worker) - Update InferenceCoordinator: maxConcurrent 1→3, reduced cooldowns - Fallback to single BF16 instance when LoRA adapters requested Before: 1 request/~6s + 30s timeout cascade After: 4 requests/~6s in parallel, no timeouts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Daemons start in dependency order: critical → integration → lightweight - Critical path (data, command, events, session): 350ms max - Integration daemons wait for DataDaemon before starting - Lightweight daemons (health, widget, logger) start immediately - Phase breakdown metrics logged for observability Phases: - critical: 4 daemons, max=207ms (UI can render) - integration: 7 daemons, max=3518ms (AIProvider bottleneck) - lightweight: 7 daemons, max=130ms Total startup: 3531ms (critical path ready much sooner) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix SystemOrchestrator navigate command (remove invalid --path param) - Fix launch-and-capture.ts: check ping, refresh if connected, open if not - Fix SystemMetricsCollector countCommands to use ping instead of browser logs - Add deterministic UUIDs for seeded users (Joel, Claude Code) - Improve UserDaemonServer error logging for persona client creation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Browser launch: - ALWAYS open browser window, don't just reload - Ensures user sees something even if WebSocket connected but window closed - Both SystemOrchestrator and launch-and-capture now open browser unconditionally UUID fix: - stringToUUID now generates valid 36-char UUIDs (was generating 32-char) - Last segment now correctly 12 chars instead of 8 ChatWidget: - Use server backend for $in queries (localStorage doesn't support $in) - Add debug logging for member loading Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- DataReadBrowserCommand now supports backend:'server' to bypass localStorage - ChatWidget uses server backend for room queries to avoid stale cache - Fixes issues with members not loading due to localStorage not supporting $in Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CONSOLIDATION (Ministry of Code Deletion): - RoutingService is now THE single source of truth for room/user resolution - Added resolveRoomIdentifier() and resolveUserIdentifier() convenience functions - Added name fallback query for legacy support - Migrated ChatSendServerCommand: deleted findRoom(), uses RoutingService - Migrated ChatAnalyzeServerCommand: deleted resolveRoom(), uses RoutingService - Migrated ChatPollServerCommand: deleted inline resolution, uses RoutingService - WallTypes.isRoomUUID() now delegates to RoutingService.isUUID() - MainWidget: deleted dead handleTabClick/handleTabClose, simplified openContentTab ETHOS (CLAUDE.md): - Added "The Compression Principle" - one logical decision, one place - Added "The Methodical Process" - 8 mandatory steps, outlier validation - Encoded the Ministry philosophy: deletion without loss = compression = efficiency Net change: +306/-298 lines (compression-neutral while adding documentation) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Key fixes: - data-clear.ts: Clear session metadata during reseed to prevent stale entityIds from persisting (root cause of corrupted UUID bug) - MainWidget.ts: Add userId setup with retry in openContentFromUrl() and initializeContentTabs() to ensure ContentService can persist to database - RoutingService.ts: Fix example UUID in comment - SchemaBasedFactory.ts: Fix hardcoded test UUID The corrupted UUID issue (5e71a0c8-0303-4eb8-a478-3a121248) was caused by stale session metadata files that weren't cleared during data reseed. The session files stored old entityIds that no longer existed after reseeding the database. ContentState persistence now works - tabs are saved to database with correct UUIDs. Tab restore on refresh still needs investigation due to session management timing issues. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: Browser's OfflineStorageAdapter caches user_states in localStorage. When tabs are opened, the server database is updated, but localStorage retains stale data. On page refresh, loadUserContext() would get old cached data with fewer/no openItems. Fix: Add `backend: 'server'` to user_states query in loadUserContext() to bypass localStorage cache and always fetch fresh contentState from the server database. Also added debug logging (temporary) to help diagnose initialization timing issues between loadUserContext() and initializeContentTabs(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- inference-grpc: Fix dead code, use pool stats, proper strip_prefix - data-daemon: Fix HDD acronym, add type alias for complex type - inference: Collapse nested if-let - model.rs: Use struct literal instead of removed new() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: UserDaemon's initializeDeferred() tried to create persona clients before JTAGSystem.daemons was populated. DataDaemon emits system:ready during its initialize() phase, triggering UserDaemon's ensurePersonaClients() which needs CommandDaemon. But CommandDaemon wasn't yet registered to JTAGSystem.daemons (only happens AFTER orchestrator.startAll() returns). Fix: 1. CommandDaemonServer now registers itself to globalThis during its initialize() phase, providing early access for other daemons 2. JTAGSystem.getCommandsInterface() now checks globalThis first, falling back to this.daemons for compatibility Also fixed Clippy duplicate_mod warning in training-worker: - logger_client.rs now re-exports JTAG protocol types - messages.rs uses re-exports instead of including jtag_protocol directly Verified: All 11 AI personas now healthy and responding to messages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Per-persona inference logging confirmed (Helper AI, Teacher AI, etc.) - System utilities correctly show [unknown] in Rust logs - AI responses verified working via Candle gRPC and cloud APIs - Version bump 1.0.7184 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. SignalDetector: Switch from local (slow) to Groq (fast) - Was flooding local inference queue with classification calls - Groq responds in <1s vs local ~10s - Frees local queue for actual persona responses 2. CandleGrpcAdapter: Add prompt truncation (24K char limit) - Prevents "narrow invalid args" tensor dimension errors - Large RAG contexts were sending 74000+ char prompts - Model has 8K token (~32K char) context window - Truncation preserves system prompt + recent messages Before: Constant queue backlog, tensor errors, hangs After: Workers have availability, no tensor errors, faster responses Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The chat widget was unlatching from the bottom when large messages arrived because the fixed 200px threshold was too small. Changes: - Add isLatchedToBottom state to track user intent - Dynamic threshold: max of config, 50% viewport, or 500px - ResizeObserver checks latch state instead of distance - Scroll handler updates latch with tighter 100px threshold - Scroll listener active when autoScroll enabled Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The scrollToEnd was called immediately after adding items to DOM, but the browser hadn't laid them out yet. Using double-rAF ensures the DOM is fully rendered before scrolling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR appears to be a major refactoring focused on daemon/worker infrastructure improvements, including memory management for inference workers, worker pool implementation for concurrent inference, improved process management scripts, and extensive cleanup of deprecated test/script files.
Changes:
- Added memory limits and worker pool support for inference workers
- Refactored Rust worker modules to avoid duplicate code and improve structure
- Improved shell scripts for starting/stopping workers with better process tracking
- Updated TypeScript widgets and system core for better content state management
- Removed 40+ deprecated test and script files
Reviewed changes
Copilot reviewed 142 out of 144 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| workers-config.json | Added memory limits configuration and per-worker memory settings |
| workers/training/src/messages.rs | Refactored to re-export protocol types from logger_client |
| workers/stop-workers.sh | Updated to use binary names from config for process termination |
| workers/start-workers.sh | Added memory limit parsing and per-worker log files |
| workers/shared/logger_client.rs | Added re-exports of JTAG protocol types |
| workers/inference-grpc/* | Added worker pool, GPU synchronization, persona tracking |
| widgets/* | Improved content state management and user identifier handling |
| system/core/* | Added DaemonOrchestrator for wave-based parallel startup |
| system/routing/* | Added room name lookup and server-side resolution functions |
| Multiple scripts/* | Deleted 40+ deprecated test/utility scripts |
Files not reviewed (1)
- src/debug/jtag/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| queuedMessages: daemon.startupQueueSize | ||
| }); | ||
|
|
||
| const totalMs = endTime - startTime; |
There was a problem hiding this comment.
Unused variable totalMs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brief description of changes and why they're needed
Change Type & Scale
Scale:
Testing & Verification
npm run lintpassesnpm testpassespython python-client/ai-portal.py --cmd testsAI Development Notes
Status & Readiness
Known Issues: (if any)
Files Changed
List key files and why they changed
Related Issues
Fixes #(issue) or Relates to #(issue)
For AI Agents: Use
python python-client/ai-portal.py --dashboardto verify system health after merging