Skip to content

Conversation

olesho
Copy link
Collaborator

@olesho olesho commented Sep 18, 2025

Summary by CodeRabbit

  • New Features

    • Centralized LLM configuration manager with UI API, per-request overrides, cross-tab sync, and a runtime JSON‑RPC configure_llm.
  • Improvements

    • Multi-provider support with nested model tiers (main/mini/nano), per-request api_key/endpoint, automated mode defaults/bypass, credential refresh, cancellation support, richer agent descriptors and tracing, MCP connectors UI and provider integrations, and tool name/behavior updates (e.g., unified extractor name).
  • Documentation

    • Added design docs, provider docs, .env example, and runtime/examples for multi-provider evaluation flows.

Copy link

coderabbitai bot commented Sep 18, 2025

Walkthrough

Adds a centralized frontend LLMConfigurationManager, a configure_llm JSON‑RPC on the eval server, nested/per-request and persistent LLM model/provider overrides, automated-mode credential bypasses, cross‑tab sync, examples/docs/config updates, and wiring so evaluations use the unified configuration and per-client overrides.

Changes

Cohort / File(s) Summary of Changes
Design & Docs
MODEL-CONFIGS.md, eval-server/nodejs/CLAUDE.md
New LLM config design doc; CLAUDE.md expanded to document multi‑provider support, configure_llm JSON‑RPC and nested per‑tier model configuration.
Server env & config
eval-server/nodejs/.env.example, eval-server/nodejs/src/config.js
Adds .env template and expands server CONFIG with multi-provider creds (openai, litellm, groq, openrouter), defaults (provider/main/mini/nano), auth secret, and clients/evals dirs.
Eval server RPC & API
eval-server/nodejs/src/lib/EvalServer.js, eval-server/nodejs/src/api-server.js
Adds JSON‑RPC handling (configure_llm), per-connection llmConfig storage/merge, nested model processing helpers, wiring to pass nested model into evaluations, and getClientManager() export.
Server examples
eval-server/nodejs/examples/*
Examples import CONFIG, introspect providers, demonstrate per‑eval model selection across providers, adjust ports, and show automated/no‑auth flows.
Frontend build & entry
front_end/panels/ai_chat/BUILD.gn
Adds many new modules to ai_chat build (including core/LLMConfigurationManager.ts, agent implementations, MCP dialogs, and MCP SDK v2 entries).
LLM configuration manager
front_end/panels/ai_chat/core/LLMConfigurationManager.ts
New singleton LLMConfigurationManager and LLMConfig interface: localStorage-backed config, per‑request overrides, persistence, listeners, cross‑tab sync, validation and redacted logging.
Agent wiring & execution
front_end/panels/ai_chat/core/AgentService.ts, .../AgentNodes.ts, .../ConfigurableAgentTool.ts, agent_framework/AgentRunner.ts
Replace ad-hoc localStorage reads with manager APIs; provider array construction (openai/litellm/groq/openrouter); config-change subscription and refreshCredentials(); execution abort support, execution registry (AbortController), propagate agentDescriptor metadata through traces and tools.
Evaluation protocol & agent
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts, .../EvaluationAgent.ts, evaluation/EvaluationAgent.ts
Adds LLMConfigurationRequest/Response types and helpers; extends EvaluationParams.model with api_key/endpoint; applies per-request overrides (nested or flat) via manager during evaluate/chat flows; adds handler to persist configuration; automated-mode auth bypass.
Frontend UI & config
front_end/panels/ai_chat/ui/AIChatPanel.ts, front_end/panels/ai_chat/common/EvaluationConfig.ts
Integrates manager into AIChatPanel, adds setLLMConfiguration, reads models/provider from manager, updates WS default to ws://localhost:8082, and conditions secret handling for automated mode.
MCP & tooling surface
front_end/panels/ai_chat/mcp/*, core/ToolSurfaceProvider.ts, core/ToolNameMap.ts
Major MCP v2 integration: new MCPConfig/MCPRegistry/MCP dialogs, provider/token management, per-server connection/retry/events, async tool registration, LLM-based MCP tool router, tool name mapping changes and tests.
Tools & extractors
front_end/panels/ai_chat/tools/*, eval-server/nodejs/evals/schema-extractor/*, schemas/client.schema.json
Rename extractor tool id extract_schema_dataextract_data across tools, evals, schemas, and tests; add abortable/sleep helpers and thread AbortSignal through tools.
Agent descriptors & metadata
front_end/panels/ai_chat/core/AgentDescriptorRegistry.ts, core/BaseOrchestratorAgent.ts
New AgentDescriptorRegistry for computing/caching agent descriptors (promptHash, toolsetHash); orchestrator/agent configs include optional version; descriptor used to enrich traces and evaluation metadata.
Tests & examples
front_end/panels/ai_chat/**/__tests__/*, eval-server/nodejs/examples/*
New and updated tests for ToolNameMap, MCP integration/registration/config, ToolExecutorNode; many import path updates; examples expanded for multi-provider demos.
Misc frontend changes
core/StateGraph.ts, core/Types.ts, core/ConfigurableGraph.ts, core/State.ts, core/Logger.ts, UI components, CSS
Add AbortSignal support across graphs/types/nodes, public Logger.isDevelopment(), input APIs (InputBar.setInputValue/clearInput), UI additions (MCP dialogs, chat example chips), and version bump.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant UI as Frontend (AIChatPanel)
  participant CM as LLMConfigurationManager
  participant Eval as EvalServer (Node)
  Note over UI,Eval: Persistent runtime configuration (configure_llm)
  UI->>Eval: JSON‑RPC configure_llm { provider, apiKey?, endpoint?, models, partial? }
  Eval->>Eval: validate + merge per‑connection llmConfig
  Eval-->>UI: JSON‑RPC response { status: "success", appliedConfig }
  UI->>CM: saveConfiguration(appliedConfig)
  UI->>UI: AgentService.refreshCredentials()
Loading
sequenceDiagram
  autonumber
  participant Client as Frontend Client
  participant CM as LLMConfigurationManager
  participant Eval as EvalServer
  Note over Client,Eval: Per-request model override during evaluate/chat
  Client->>CM: setOverride({ provider, mainModel, miniModel?, nanoModel?, apiKey?, endpoint? })
  Client->>Eval: evaluate { model: nested-or-flat config }
  Eval->>Eval: merge connection llmConfig + eval overrides -> nested modelConfig
  Eval-->>Client: evaluation result
  Client->>CM: clearOverride()
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • tysonthomas9

Poem

In burrows of bytes I nudge config to bloom,
Models line like carrots—main, mini, nano in room.
A configure_llm hop, JSON‑RPC sings clear,
Overrides flutter, automated mode near.
Logs glow like clover—evals leap, joyful cheer. 🐇🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.30% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Local Dockerised Eval Server" references a real part of this changeset (multiple eval-server files and configuration were modified), so it is related to the PR, but it only captures a subset of the work: the diff also contains extensive front-end and protocol changes (LLMConfigurationManager, MCP UI/connectors, agent descriptors, tool renames, and JSON-RPC configure_llm support) that the title does not reflect, making it a partial summary rather than a complete description.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/local-eval-api-try3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

claude bot commented Sep 18, 2025

Claude finished @olesho's task —— View job


PR Review: Local Dockerised Eval Server

I've completed a comprehensive review of this substantial PR implementing a local dockerized evaluation server with advanced LLM configuration management. This is an impressive addition with solid architecture, though there are several areas for improvement.

Review Progress:

  • Analyze core eval server changes
  • Review LLM configuration management
  • Check security implementations
  • Evaluate error handling and validation
  • Assess performance considerations
  • Review test coverage needs
  • Provide comprehensive feedback

🎯 Overall Assessment

This PR introduces a sophisticated evaluation server system with excellent separation of concerns and flexible configuration management. The implementation demonstrates strong architectural decisions with room for security and robustness improvements.

✅ Strengths

Architecture & Design

  • Excellent modular architecture with clear separation between eval server, agent service, and LLM management
  • Flexible LLM configuration system supporting both manual and automated modes with override capabilities
  • Comprehensive protocol design with proper JSON-RPC implementation and type safety
  • Well-structured tracing integration for observability

Code Quality

  • Consistent error handling patterns across most components
  • Good logging implementation with appropriate detail levels
  • Proper WebSocket connection management with heartbeat and reconnection logic
  • Comprehensive model configuration documentation (MODEL-CONFIGS.md)

🔍 Areas for Improvement

Critical Security Issues

1. API Key Exposure Risk (api-server.js:349, EvaluationAgent.ts:377-392)

// ISSUE: API keys logged at info level
logger.info('Setting early configuration override', {
  hasApiKey: !!apiKey,
  apiKeyLength: apiKey?.length  // Should not log length
});

Recommendation: Never log API key lengths or related metadata. Use debug level for configuration details.

2. Insufficient Input Validation (api-server.js:263-264)

if (!requestBody.input || typeof requestBody.input !== 'string') {
  throw new Error('Missing or invalid "input" field. Expected a string.');
}

Recommendation: Add comprehensive validation for all request parameters, including size limits and sanitization.

3. CORS Security (api-server.js:61)

res.setHeader('Access-Control-Allow-Origin', '*');

Recommendation: Replace wildcard CORS with specific allowed origins for production use.

Memory & Performance Concerns

4. Potential Memory Leaks (EvalServer.js:117-123)

for (const [clientId, connection] of this.connectedClients) {
  connection.rpcClient.cleanup();
  // Missing: Remove event listeners and clear timers
}

Recommendation: Ensure all event listeners, intervals, and timeouts are properly cleaned up.

5. Missing Request Size Limits (api-server.js:88-91)

for await (const chunk of req) {
  body += chunk; // No size limit
}

Recommendation: Implement request size limits to prevent DoS attacks.

Error Handling & Reliability

6. Incomplete Error Context (EvaluationAgent.ts:633-670)

} catch (error) {
  const errorMessage = error instanceof Error ? error.message : 'Unknown error';
  // Missing: Error stack traces, evaluation context
}

Recommendation: Include evaluation context and stack traces in error responses for better debugging.

7. Inconsistent Timeout Handling (api-server.js:415, EvaluationAgent.ts:573)

timeout: 1500000, // 25 minutes - very long

Recommendation: Implement configurable timeouts with reasonable defaults and proper cleanup on timeout.

Code Quality Issues

8. Complex Configuration Logic (EvaluationAgent.ts:352-412)
The model configuration parsing has nested conditionals and multiple fallbacks that are hard to follow.
Recommendation: Extract to dedicated configuration parser with unit tests.

9. Missing Validation (LLMConfigurationManager.ts:157-167)

setOverride(config: Partial<LLMConfig>): void {
  this.overrideConfig = { ...config }; // No validation
}

Recommendation: Add validation for provider compatibility and required fields.

🧪 Test Coverage Gaps

The PR lacks comprehensive test coverage for:

  • Configuration override scenarios
  • Error handling edge cases
  • WebSocket connection failure scenarios
  • Multi-provider model configuration
  • Tracing integration

Recommendation: Add unit tests for core components and integration tests for the evaluation flow.

🚀 Performance Recommendations

1. Connection Pooling

Implement connection pooling for HTTP requests to external LLM providers to reduce latency.

2. Configuration Caching

Cache parsed configurations to avoid repeated localStorage access.

3. Request Debouncing

Add debouncing for rapid configuration changes to prevent unnecessary reinitializations.

🔧 Implementation Suggestions

1. Enhanced Error Types

export class EvaluationError extends Error {
  constructor(
    message: string,
    public readonly code: string,
    public readonly evaluationId?: string,
    public readonly context?: Record<string, any>
  ) {
    super(message);
  }
}

2. Configuration Validation Schema

const CONFIG_SCHEMA = {
  provider: { type: 'string', enum: ['openai', 'litellm', 'groq', 'openrouter'] },
  apiKey: { type: 'string', minLength: 10 },
  models: { type: 'object', required: ['main'] }
};

3. Request Rate Limiting

Implement rate limiting for evaluation requests to prevent abuse.

📋 Immediate Action Items

High Priority

  1. Fix API key logging - Remove all API key metadata from logs
  2. Add input validation - Implement comprehensive request validation
  3. Configure CORS properly - Replace wildcard with specific origins
  4. Add request size limits - Prevent DoS via large payloads

Medium Priority

  1. Improve error handling - Add context and stack traces
  2. Add configuration validation - Validate provider compatibility
  3. Implement cleanup - Ensure proper resource cleanup on shutdown
  4. Add unit tests - Test core configuration and evaluation logic

Low Priority

  1. Optimize performance - Add connection pooling and caching
  2. Enhanced documentation - Add API documentation and examples

🏆 Conclusion

This is a well-architected and feature-rich addition that significantly enhances the evaluation capabilities. The core design is solid with excellent separation of concerns. The main concerns are around security hardening, comprehensive testing, and production readiness.

Recommendation: Address the security issues and add basic test coverage before merging. The architecture is sound and will provide a strong foundation for evaluation workflows.

Overall Score: 7.5/10 - Excellent architecture with important security/robustness improvements needed.


Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
front_end/panels/ai_chat/ui/AIChatPanel.ts (3)

51-63: Stop logging/previewing secrets in StorageMonitor.
Current logs emit value length/preview for keys containing 'openrouter' or 'ai_chat'. This risks API key leakage in dev/prod logs.

-    localStorage.setItem = (key: string, value: string) => {
-      if (key.includes('openrouter') || key.includes('ai_chat')) {
-        logger.debug(`=== LOCALSTORAGE SET ===`);
-        logger.debug(`Key: ${key}`);
-        logger.debug(`Value exists: ${!!value}`);
-        logger.debug(`Value length: ${value?.length || 0}`);
-        logger.debug(`Value preview: ${value?.substring(0, 50) + (value?.length > 50 ? '...' : '') || 'null'}`);
-        logger.debug(`Timestamp: ${new Date().toISOString()}`);
-      }
-      return this.originalSetItem(key, value);
-    };
+    localStorage.setItem = (key: string, value: string) => {
+      if (key.includes('openrouter') || key.includes('ai_chat')) {
+        logger.debug('=== LOCALSTORAGE SET ===');
+        logger.debug(`Key: ${key}`);
+        logger.debug('Value: [redacted]');
+        logger.debug(`Timestamp: ${new Date().toISOString()}`);
+      }
+      return this.originalSetItem(key, value);
+    };
@@
-      if (key.includes('openrouter') || key.includes('ai_chat')) {
+      if (key.includes('openrouter') || key.includes('ai_chat')) {
         logger.debug(`=== LOCALSTORAGE REMOVE ===`);
         logger.debug(`Key: ${key}`);
         logger.debug(`Timestamp: ${new Date().toISOString()}`);
       }
       return this.originalRemoveItem(key);
     };

Also gate StorageMonitor behind a debug flag to avoid shipping monkey‑patching:

-    // Initialize storage monitoring for debugging
-    StorageMonitor.getInstance();
+    // Initialize storage monitoring for debugging
+    if (BUILD_CONFIG.DEBUG) {
+      StorageMonitor.getInstance();
+    }

Also applies to: 66-73


1057-1076: Remove API‑key visibility logs on OAuth success.
Logging “API key length” and method is unnecessary and leaks sensitive metadata.

-      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
-      const authMethod = localStorage.getItem('openrouter_auth_method');
-      logger.info('- API key exists:', !!apiKey);
-      logger.info('- API key length:', apiKey?.length || 0);
-      logger.info('- Auth method:', authMethod);
+      // Avoid logging credential presence/length in production
+      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
+      const authMethod = localStorage.getItem('openrouter_auth_method');
+      logger.info('OpenRouter OAuth success');

1496-1527: Avoid logging API key prefix/length.
These logs leak secrets to console/storage. Log boolean presence only.

-        logger.info('Retrieved API key:');
-        logger.info('- Exists:', !!apiKey);
-        logger.info('- Length:', apiKey?.length || 0);
-        logger.info('- Prefix:', apiKey?.substring(0, 8) + '...' || 'none');
+        logger.info('Retrieved API key: [present=%s]', !!apiKey);
eval-server/nodejs/src/config.js (1)

65-71: Enforce OPENAI_API_KEY per guidelines.
Guideline requires OPENAI_API_KEY (and PORT default). validateConfig currently makes it optional unless requireLLM=true; make it required by default.

-export function validateConfig(requireLLM = false) {
+export function validateConfig(requireLLM = true) {
@@
-  // Only require OpenAI API key if LLM judge is explicitly needed
-  if (requireLLM && !CONFIG.llm.apiKey) {
+  // Require OpenAI API key by default for evaluator/judge usage
+  if (!CONFIG.llm.apiKey) {
     errors.push('OPENAI_API_KEY is required when using LLM judge');
   }

Optionally warn if no provider creds exist (CONFIG.providers.*) even when judge not used.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (2)

212-218: Global LLM override + maxConcurrency=3 risks cross‑evaluation credential bleed. Set to 1 or make overrides per‑evaluation.

Because LLMConfigurationManager override is global, concurrent evaluations can stomp each other’s API key/provider. Until per‑call config is plumbed, reduce concurrency here.

Apply this diff:

-        maxConcurrency: 3,
+        // NOTE: Global LLM override is not concurrency-safe. Keep at 1 until per-eval config is implemented.
+        maxConcurrency: 1,

If you prefer to keep 3, we need to make AgentService/LLM client accept per-call config and eliminate the global override path.


335-342: Redact sensitive payloads from logs (models, keys, message text).

Avoid logging raw params.model, merged inputs, or full user message. Keep booleans/lengths only.

Apply this diff:

-    logger.info('Received evaluation request', {
+    logger.info('Received evaluation request', {
       evaluationId: params.evaluationId,
       tool: params.tool,
       url: params.url,
       isChat: params.tool === 'chat',
-      modelOverride: params.input?.ai_chat_model,
-      modelConfig: params.model
+      modelOverride: !!params.input?.ai_chat_model,
+      hasModelConfig: !!params.model
     });
-    logger.info('DEBUG: Checking for model configuration override', {
-      evaluationId: params.evaluationId,
-      hasModel: !!params.model,
-      model: params.model
-    });
+    logger.info('DEBUG: Checking for model configuration override', {
+      evaluationId: params.evaluationId,
+      hasModel: !!params.model
+    });
-        logger.info('DEBUG: Created merged input for chat evaluation', {
-          evaluationId: params.evaluationId,
-          hasOverrideConfig: !!overrideConfig,
-          overrideConfig,
-          mergedInput: {
-            ...mergedInput,
-            api_key: mergedInput.api_key ? `${mergedInput.api_key.substring(0, 10)}...` : undefined
-          }
-        });
+        logger.info('DEBUG: Created merged input for chat evaluation', {
+          evaluationId: params.evaluationId,
+          hasOverride: configManager.hasOverride?.() || !!overrideConfig,
+          mergedInputPreview: {
+            provider: mergedInput.provider,
+            main_model: mergedInput.main_model,
+            mini_model: mergedInput.mini_model,
+            nano_model: mergedInput.nano_model,
+            has_api_key: !!mergedInput.api_key
+          }
+        });
-    logger.info('Starting chat evaluation', {
-      message: input.message,
+    logger.info('Starting chat evaluation', {
+      messageLength: String(input.message || '').length,
       timeout,
       hasTracingContext: !!tracingContext
     });

Also applies to: 346-351, 367-375, 561-569, 821-826

🧹 Nitpick comments (26)
eval-server/nodejs/.env.example (1)

31-34: Satisfy dotenv-linter ordering and trailing newline

Pure formatting, but easy to fix to keep linters green and examples clean.

Apply:

-PORT=8080
-HOST=127.0.0.1
+HOST=127.0.0.1
+PORT=8080
@@
-LITELLM_ENDPOINT=http://localhost:4000
-LITELLM_API_KEY=your-litellm-api-key-here
+LITELLM_API_KEY=your-litellm-api-key-here
+LITELLM_ENDPOINT=http://localhost:4000
@@
-DEFAULT_PROVIDER=openai
-DEFAULT_MAIN_MODEL=gpt-4
-DEFAULT_MINI_MODEL=gpt-4-mini
-DEFAULT_NANO_MODEL=gpt-3.5-turbo
+DEFAULT_MAIN_MODEL=
+DEFAULT_MINI_MODEL=
+DEFAULT_NANO_MODEL=
+DEFAULT_PROVIDER=openai
@@
-LOG_LEVEL=info
-LOG_DIR=./logs
+LOG_DIR=./logs
+LOG_LEVEL=info
@@
-AUTH_SECRET_KEY=
+AUTH_SECRET_KEY=
+

Also applies to: 39-41, 42-45

eval-server/nodejs/CLAUDE.md (2)

110-131: Add security note for runtime LLM config (api keys over RPC)

Recommend calling out that apiKey in configure_llm must only traverse trusted, local, encrypted channels, and keys must be redacted in logs.

Patch suggestion:

   "id": "config-request-id"
 }

+#### Security considerations
+- Only use configure_llm over trusted local connections (wss:// or ws://localhost).
+- Never log raw API keys; always redact in server/client logs.
+- Consider scoping keys per‑client session and expiring overrides on disconnect.
+


---

`133-163`: **Clarify naming conventions between protocol and UI**

Protocol example uses snake_case (main_model, mini_model, nano_model) while frontend uses camelCase in LLMConfigurationManager. Add a note that the server/bridge maps these fields.

</blockquote></details>
<details>
<summary>front_end/panels/ai_chat/core/AgentNodes.ts (1)</summary><blockquote>

`595-605`: **Validate config and fall back to node defaults; pass endpoint when present**

If override/localStorage is incomplete, tool.execute may get undefined apiKey/model. Validate and fall back to constructor defaults. Also pass endpoint for LiteLLM.



```diff
-          const configManager = LLMConfigurationManager.getInstance();
-          const config = configManager.getConfiguration();
-
-          return await selectedTool.execute(toolArgs as any, {
-            apiKey: config.apiKey,
-            provider: config.provider,
-            model: config.mainModel,
-            miniModel: config.miniModel,
-            nanoModel: config.nanoModel
-          });
+          const mgr = LLMConfigurationManager.getInstance();
+          const cfg = mgr.getConfiguration();
+          const fallback = {
+            provider: this.provider,
+            model: this.modelName,
+            miniModel: this.miniModel,
+            nanoModel: this.nanoModel,
+          };
+          // Minimal validation: require provider+model; keep apiKey optional to support OAuth/OpenRouter flows
+          const provider = cfg.provider || (fallback.provider as any);
+          const model = cfg.mainModel || (fallback.model as any);
+          return await selectedTool.execute(toolArgs as any, {
+            apiKey: cfg.apiKey,
+            provider,
+            model,
+            miniModel: cfg.miniModel || fallback.miniModel,
+            nanoModel: cfg.nanoModel || fallback.nanoModel,
+            ...(cfg.endpoint ? { endpoint: cfg.endpoint } : {}),
+          });
eval-server/nodejs/examples/with-http-wrapper.js (1)

14-17: Example ports diverge from .env; consider env-driven defaults

The example binds EvalServer to 8082 and HTTP to 8080 while .env defaults to 8080. Consider reading from env with safe defaults to prevent confusion.

-const evalServer = new EvalServer({
-  // No authKey - authentication disabled for automated mode
-  host: '127.0.0.1',
-  port: 8082
-});
+const evalServer = new EvalServer({
+  // No authKey - authentication disabled for automated mode (local only)
+  host: process.env.HOST || '127.0.0.1',
+  port: Number(process.env.PORT || 8082),
+});
@@
-const httpWrapper = new HTTPWrapper(evalServer, {
-  port: 8080,
-  host: '127.0.0.1'
-});
+const httpWrapper = new HTTPWrapper(evalServer, {
+  port: Number(process.env.HTTP_PORT || 8080),
+  host: process.env.HOST || '127.0.0.1',
+});

Bind only to loopback in examples to avoid exposing no‑auth endpoints.

Also applies to: 21-23, 28-38

eval-server/nodejs/examples/multiple-evals.js (5)

29-34: Prefer CONFIG.defaults for model names.
Hardcoding 'gpt-4' drifts from server defaults; read defaults so examples follow env.

-      main_model: {
-        provider: 'openai',
-        model: 'gpt-4',
-        api_key: CONFIG.providers.openai.apiKey
-      }
+      main_model: {
+        provider: 'openai',
+        model: CONFIG.defaults.mainModel,
+        api_key: CONFIG.providers.openai.apiKey
+      }

45-51: Same: use CONFIG.defaults or provider‑specific defaults.

-        model: 'llama-3.1-8b-instant',
+        model: CONFIG.defaults.mainModel, // or a GROQ default read from CONFIG if added

62-68: Same: avoid hardcoding OpenRouter model in example.

-        model: 'anthropic/claude-3-sonnet',
+        model: CONFIG.defaults.mainModel,

79-86: Same: LiteLLM model constant may not exist at endpoint.
Use CONFIG.defaults or make the model user‑provided to avoid 404s at proxies.

-        model: 'claude-3-haiku-20240307',
+        model: CONFIG.defaults.mainModel,

100-105: Minor: read host/port from CONFIG for consistency.

-const server = new EvalServer({
-  authKey: 'hello',
-  host: '127.0.0.1',
-  port: 8080
-});
+const server = new EvalServer({
+  authKey: 'hello',
+  host: CONFIG.server.host || '127.0.0.1',
+  port: CONFIG.server.port || 8080
+});
front_end/panels/ai_chat/ui/AIChatPanel.ts (2)

361-376: Use LLMConfigurationManager for provider fallback.
This reads from localStorage; prefer config manager to keep a single source of truth.

-    const currentProvider = localStorage.getItem(PROVIDER_SELECTION_KEY) || 'openai';
+    const currentProvider = LLMConfigurationManager.getInstance().getProvider();

1971-1981: Restore StorageMonitor hooks on panel hide.
To avoid global side effects across DevTools sessions, call restore().

   override willHide(): void {
     // Explicitly remove any event listeners to prevent memory leaks
+    try { StorageMonitor.getInstance().restore(); } catch {}
     if (this.#boundOnMessagesChanged) {
       this.#agentService.removeEventListener(AgentEvents.MESSAGES_CHANGED, this.#boundOnMessagesChanged);
     }
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (1)

92-95: Tighten typing and align casing for provider fields.
Use a union type for provider in EvaluationParams.model to match LLMConfigurationParams; keep casing consistent across protocol objects to reduce translation glue.

-    provider?: string;
+    provider?: 'openai' | 'litellm' | 'groq' | 'openrouter';
-    api_key?: string;    // New: per-request API key
+    api_key?: string;    // Note: snake_case for wire protocol; avoid logging
eval-server/nodejs/src/api-server.js (2)

37-43: Defaults drift from CONFIG.defaults; unify source of truth.
Hardcoded 'gpt-4.1*' differs from src/config.js defaults. Load defaults from CONFIG to prevent divergence.

+import { CONFIG } from './config.js';
@@
-        this.configDefaults = {
-          model: {
-            main_model: 'gpt-4.1',
-            mini_model: 'gpt-4.1-mini',
-            nano_model: 'gpt-4.1-nano',
-            provider: 'openai'
-          }
-        };
+        this.configDefaults = {
+          model: {
+            main_model: CONFIG.defaults.mainModel,
+            mini_model: CONFIG.defaults.miniModel,
+            nano_model: CONFIG.defaults.nanoModel,
+            provider: CONFIG.defaults.provider
+          }
+        };
@@
-      this.configDefaults = {
-        model: {
-          main_model: 'gpt-4.1',
-          mini_model: 'gpt-4.1-mini',
-          nano_model: 'gpt-4.1-nano',
-          provider: 'openai'
-        }
-      };
+      this.configDefaults = {
+        model: {
+          main_model: CONFIG.defaults.mainModel,
+          mini_model: CONFIG.defaults.miniModel,
+          nano_model: CONFIG.defaults.nanoModel,
+          provider: CONFIG.defaults.provider
+        }
+      };

Also applies to: 48-55


58-66: CORS wildcard on local API.
Fine for local dev; consider restricting or making it configurable before shipping.

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

64-70: Defensive provider parsing

getProvider casts arbitrary storage values to LLMProvider. Guard against invalid strings.

   getProvider(): LLMProvider {
     if (this.overrideConfig?.provider) {
       return this.overrideConfig.provider;
     }
     const stored = localStorage.getItem(STORAGE_KEYS.PROVIDER);
-    return (stored as LLMProvider) || 'openai';
+    const allowed: LLMProvider[] = ['openai','litellm','groq','openrouter'];
+    return allowed.includes(stored as LLMProvider) ? (stored as LLMProvider) : 'openai';
   }
front_end/panels/ai_chat/core/AgentService.ts (2)

647-681: Use config manager for endpoint detection; avoid direct localStorage

#doesCurrentConfigRequireApiKey reads localStorage directly. Prefer the centralized manager to avoid key drift.

   #doesCurrentConfigRequireApiKey(): boolean {
     try {
       // Check the selected provider
-      const selectedProvider = this.#configManager.getProvider();
+      const selectedProvider = this.#configManager.getProvider();
       
       // OpenAI provider always requires an API key
       if (selectedProvider === 'openai') {
         return true;
       }
       // Groq provider always requires an API key
       if (selectedProvider === 'groq') {
         return true;
       }
       // OpenRouter provider always requires an API key
       if (selectedProvider === 'openrouter') {
         return true;
       }
       // For LiteLLM, only require API key if no endpoint is configured
       if (selectedProvider === 'litellm') {
-        const hasLiteLLMEndpoint = Boolean(localStorage.getItem('ai_chat_litellm_endpoint'));
+        const hasLiteLLMEndpoint = !!this.#configManager.getEndpoint();
         // If we have an endpoint, API key is optional
         return !hasLiteLLMEndpoint;
       }
       // Default to requiring API key for any unknown provider
       return true;

4-4: “Cache break” comments

Harmless, but consider removing before merge to keep history noise down.

Also applies to: 26-27

MODEL-CONFIGS.md (1)

56-62: Fix markdownlint issues (MD036, MD024)

Replace emphasis-styled headings with proper “### …” headings and dedupe repeated headings.

Also applies to: 62-62, 384-384

eval-server/nodejs/examples/library-usage.js (1)

50-53: LiteLLM should be considered configured with endpoint only

Examples mark LiteLLM configured only if apiKey AND endpoint are present. The rest of the system allows endpoint‑only LiteLLM.

-  if (CONFIG.providers.litellm.apiKey && CONFIG.providers.litellm.endpoint) {
+  if (CONFIG.providers.litellm.endpoint) {
     availableProviders.push('litellm');
     console.log('   ✅ LiteLLM configured');
   }
eval-server/nodejs/src/lib/EvalServer.js (1)

662-678: Minor logging nit: messageType for JSON‑RPC

sendMessage logs data.type which is undefined for JSON‑RPC responses/requests; consider guarding to avoid “undefined”.

-          messageType: data.type
+          messageType: (data && data.type) ? data.type : (data && data.jsonrpc ? 'jsonrpc' : 'unknown')
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (5)

352-365: Include endpoint in early override extraction; else LiteLLM/OpenRouter endpoint overrides are ignored.

You parse provider/model/apiKey but omit endpoint; add it to extraction and override.

Apply this diff:

-      const provider = mainModel?.provider || params.model.provider || 'openai';
-      const apiKey = mainModel?.api_key || params.model.api_key;
+      const provider = mainModel?.provider || params.model.provider || 'openai';
+      const apiKey = mainModel?.api_key || params.model.api_key;
       const modelName = mainModel?.model || mainModel || params.model.main_model;
       const miniModel = (params.model.mini_model as any)?.model || params.model.mini_model;
       const nanoModel = (params.model.nano_model as any)?.model || params.model.nano_model;
+      const endpoint = (params.model as any)?.endpoint || (mainModel as any)?.endpoint;
@@
-        configManager.setOverride({
+        configManager.setOverride({
           provider,
           apiKey,
           mainModel: modelName,
           miniModel: miniModel || modelName,
-          nanoModel: nanoModel || miniModel || modelName
+          nanoModel: nanoModel || miniModel || modelName,
+          endpoint
         });

Also applies to: 385-393


545-551: Also pass through endpoint when building merged chat input.

This ensures the chat path can use LiteLLM/OpenRouter endpoints provided in the request.

Apply this diff:

           ...(params.model && {
             // Extract nested model configuration properly
             main_model: (params.model.main_model as any)?.model || params.model.main_model,
             mini_model: (params.model.mini_model as any)?.model || params.model.mini_model,
             nano_model: (params.model.nano_model as any)?.model || params.model.nano_model,
             provider: (params.model.main_model as any)?.provider || params.model.provider,
-            api_key: (params.model.main_model as any)?.api_key || (params.model as any).api_key
+            api_key: (params.model.main_model as any)?.api_key || (params.model as any).api_key,
+            endpoint: (params.model as any)?.endpoint || (params.model.main_model as any)?.endpoint
           }),

300-307: Auth failure path: short‑circuit earlier on known failure.

You send the verify message and later reject the auth promise. Consider rejecting immediately after sending the negative verification to avoid a dangling connection.

Also applies to: 306-308


4-4: Nit: “Cache break” comment timestamp is great; consider a link to the change or commit for traceability.


978-986: Minor: Don’t log whether an API key exists when processing configuration.

Even boolean presence can be sensitive in shared logs. Safe to keep in debug builds only.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74c9f40 and 78b020b.

⛔ Files ignored due to path filters (1)
  • eval-server/nodejs/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (19)
  • MODEL-CONFIGS.md (1 hunks)
  • eval-server/nodejs/.env.example (1 hunks)
  • eval-server/nodejs/CLAUDE.md (4 hunks)
  • eval-server/nodejs/examples/library-usage.js (3 hunks)
  • eval-server/nodejs/examples/multiple-evals.js (4 hunks)
  • eval-server/nodejs/examples/with-http-wrapper.js (1 hunks)
  • eval-server/nodejs/src/api-server.js (6 hunks)
  • eval-server/nodejs/src/config.js (1 hunks)
  • eval-server/nodejs/src/lib/EvalServer.js (5 hunks)
  • front_end/panels/ai_chat/BUILD.gn (2 hunks)
  • front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2 hunks)
  • front_end/panels/ai_chat/common/EvaluationConfig.ts (3 hunks)
  • front_end/panels/ai_chat/core/AgentNodes.ts (2 hunks)
  • front_end/panels/ai_chat/core/AgentService.ts (9 hunks)
  • front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1 hunks)
  • front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (3 hunks)
  • front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (11 hunks)
  • front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (2 hunks)
  • front_end/panels/ai_chat/ui/AIChatPanel.ts (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
eval-server/nodejs/src/config.js

📄 CodeRabbit inference engine (eval-server/nodejs/CLAUDE.md)

eval-server/nodejs/src/config.js: Centralize configuration via environment variables in src/config.js
Require OPENAI_API_KEY and PORT (default 8080) from environment configuration

Files:

  • eval-server/nodejs/src/config.js
🧠 Learnings (10)
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/config.js : Require OPENAI_API_KEY and PORT (default 8080) from environment configuration

Applied to files:

  • eval-server/nodejs/.env.example
  • front_end/panels/ai_chat/core/AgentService.ts
  • eval-server/nodejs/examples/with-http-wrapper.js
  • eval-server/nodejs/src/config.js
  • eval-server/nodejs/examples/library-usage.js
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/config.js : Centralize configuration via environment variables in src/config.js

Applied to files:

  • eval-server/nodejs/.env.example
  • front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
  • front_end/panels/ai_chat/common/EvaluationConfig.ts
  • eval-server/nodejs/examples/multiple-evals.js
  • eval-server/nodejs/src/config.js
  • eval-server/nodejs/examples/library-usage.js
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/evaluator.js : LLM evaluator must integrate with the OpenAI API to judge agent responses

Applied to files:

  • front_end/panels/ai_chat/LLM/OpenAIProvider.ts
  • eval-server/nodejs/examples/multiple-evals.js
  • eval-server/nodejs/CLAUDE.md
  • front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts
  • front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
  • eval-server/nodejs/examples/library-usage.js
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/server.js : WebSocket server must accept agent connections, manage agent lifecycle (connect/ready/disconnect), orchestrate evaluation sessions, and handle bidirectional RPC communication

Applied to files:

  • eval-server/nodejs/examples/with-http-wrapper.js
  • front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
  • eval-server/nodejs/src/lib/EvalServer.js
  • eval-server/nodejs/CLAUDE.md
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must call Evaluate(request: string) -> string on connected agents

Applied to files:

  • eval-server/nodejs/src/api-server.js
  • eval-server/nodejs/src/lib/EvalServer.js
  • eval-server/nodejs/CLAUDE.md
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/evaluator.js : LLM evaluator must return structured JSON with scores and reasoning

Applied to files:

  • eval-server/nodejs/examples/multiple-evals.js
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must implement JSON-RPC 2.0 protocol

Applied to files:

  • eval-server/nodejs/src/lib/EvalServer.js
  • eval-server/nodejs/CLAUDE.md
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/logger.js : Log all RPC calls, evaluations, and connection events

Applied to files:

  • eval-server/nodejs/src/lib/EvalServer.js
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must correlate requests and responses using unique IDs

Applied to files:

  • eval-server/nodejs/CLAUDE.md
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must handle timeouts and error conditions

Applied to files:

  • eval-server/nodejs/CLAUDE.md
🧬 Code graph analysis (11)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (5)
front_end/panels/ai_chat/core/AgentService.ts (2)
  • logger (295-306)
  • event (686-722)
front_end/panels/ai_chat/ui/AIChatPanel.ts (14)
  • logger (957-1050)
  • logger (1358-1428)
  • logger (1578-1591)
  • logger (1635-1638)
  • window (1055-1102)
  • localStorage (822-845)
  • provider (1478-1541)
  • event (1689-1696)
  • event (1701-1705)
  • event (1710-1714)
  • event (1719-1723)
  • event (1728-1732)
  • event (1737-1741)
  • error (1948-1961)
front_end/panels/ai_chat/core/Logger.ts (1)
  • createLogger (316-318)
front_end/panels/ai_chat/LLM/LLMTypes.ts (1)
  • LLMProvider (142-142)
front_end/panels/ai_chat/core/Constants.ts (1)
  • STORAGE_KEYS (39-45)
front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2)
front_end/panels/ai_chat/core/BuildConfig.ts (1)
  • BUILD_CONFIG (11-20)
front_end/panels/ai_chat/ui/AIChatPanel.ts (4)
  • apiKey (1191-1222)
  • apiKey (1264-1283)
  • apiKey (1596-1630)
  • localStorage (822-845)
front_end/panels/ai_chat/core/AgentService.ts (3)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)
  • LLMConfigurationManager (41-368)
front_end/panels/ai_chat/core/BuildConfig.ts (1)
  • BUILD_CONFIG (11-20)
front_end/panels/ai_chat/core/Graph.ts (1)
  • createAgentGraph (24-42)
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (1)
front_end/panels/ai_chat/core/BuildConfig.ts (1)
  • BUILD_CONFIG (11-20)
front_end/panels/ai_chat/common/EvaluationConfig.ts (1)
front_end/panels/ai_chat/core/BuildConfig.ts (1)
  • BUILD_CONFIG (11-20)
eval-server/nodejs/examples/multiple-evals.js (2)
eval-server/nodejs/src/config.js (2)
  • CONFIG (5-63)
  • CONFIG (5-63)
eval-server/nodejs/examples/library-usage.js (3)
  • response (63-71)
  • response (121-135)
  • response (195-223)
front_end/panels/ai_chat/core/AgentNodes.ts (1)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)
  • LLMConfigurationManager (41-368)
eval-server/nodejs/src/lib/EvalServer.js (2)
front_end/panels/ai_chat/core/AgentService.ts (1)
  • logger (295-306)
eval-server/nodejs/src/config.js (2)
  • CONFIG (5-63)
  • CONFIG (5-63)
front_end/panels/ai_chat/ui/AIChatPanel.ts (2)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)
  • LLMConfigurationManager (41-368)
front_end/panels/ai_chat/core/AgentService.ts (1)
  • logger (295-306)
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (5)
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (5)
  • isLLMConfigurationRequest (297-299)
  • LLMConfigurationRequest (259-264)
  • createLLMConfigurationResponse (314-334)
  • createErrorResponse (240-255)
  • ErrorCodes (141-155)
front_end/panels/ai_chat/core/BuildConfig.ts (1)
  • BUILD_CONFIG (11-20)
front_end/panels/ai_chat/core/AgentService.ts (1)
  • AgentService (44-786)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)
  • LLMConfigurationManager (41-368)
eval-server/nodejs/examples/library-usage.js (3)
  • response (63-71)
  • response (121-135)
  • response (195-223)
eval-server/nodejs/examples/library-usage.js (1)
eval-server/nodejs/src/config.js (2)
  • CONFIG (5-63)
  • CONFIG (5-63)
🪛 dotenv-linter (3.3.0)
eval-server/nodejs/.env.example

[warning] 6-6: [UnorderedKey] The HOST key should go before the PORT key

(UnorderedKey)


[warning] 16-16: [UnorderedKey] The LITELLM_API_KEY key should go before the LITELLM_ENDPOINT key

(UnorderedKey)


[warning] 27-27: [UnorderedKey] The DEFAULT_MAIN_MODEL key should go before the DEFAULT_PROVIDER key

(UnorderedKey)


[warning] 28-28: [UnorderedKey] The DEFAULT_MINI_MODEL key should go before the DEFAULT_PROVIDER key

(UnorderedKey)


[warning] 29-29: [UnorderedKey] The DEFAULT_NANO_MODEL key should go before the DEFAULT_PROVIDER key

(UnorderedKey)


[warning] 33-33: [UnorderedKey] The LOG_DIR key should go before the LOG_LEVEL key

(UnorderedKey)


[warning] 45-45: [EndingBlankLine] No blank line at the end of the file

(EndingBlankLine)

🪛 markdownlint-cli2 (0.17.2)
MODEL-CONFIGS.md

56-56: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


62-62: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


384-384: Multiple headings with the same content

(MD024, no-duplicate-heading)

🔇 Additional comments (33)
front_end/panels/ai_chat/BUILD.gn (1)

59-60: LGTM: Manager included in both sources and metadata

The new LLMConfigurationManager.ts is correctly added to build and metadata lists.

Also applies to: 196-197

front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (1)

277-281: Good: no secret leakage in logs

You're logging only boolean presence and verification result, not the secrets themselves.

front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2)

4-4: Cache-break comment is fine

Doc-only; non-functional.


11-11: AUTOMATED_MODE credential bypass — override wiring verified

validateCredentials() intentionally skips checks in AUTOMATED_MODE; the per-request override path supplies the API key into LLMClient → OpenAIProvider at runtime. Key pointers: front_end/panels/ai_chat/LLM/OpenAIProvider.ts (526–532), front_end/panels/ai_chat/core/LLMConfigurationManager.ts (103–107), front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (874–879), front_end/panels/ai_chat/core/AgentService.ts (116–137), front_end/panels/ai_chat/LLM/LLMClient.ts (78–82). Optional: guard verbose logs that expose API keys/payloads behind debug.

front_end/panels/ai_chat/core/AgentNodes.ts (1)

18-18: Central config import looks good

Using the singleton avoids drift across nodes.

front_end/panels/ai_chat/common/EvaluationConfig.ts (2)

33-36: Default ws://localhost:8082 matches examples

Nice alignment with the example start script.


53-64: Automated mode: good defaults; avoid persisting secrets

Behavior matches intent — automated mode defaults evaluation to enabled and endpoint to ws://localhost:8082 and does not persist the secret key. I attempted a repo-wide search for other hardcoded 8080/8082 references, but ripgrep returned "No files were searched" in this environment, so I could not verify. Manually run a repo-wide search for "localhost:8080", "localhost:8082", "ws://localhost:8082", and the ai_chat localStorage keys (ai_chat_evaluation_*) to avoid docs/UI confusion. Also applies to: 68–70, 108–111.

eval-server/nodejs/CLAUDE.md (1)

45-51: configure_llm implemented and wired in EvalServer.js

Switch case 'configure_llm' (eval-server/nodejs/src/lib/EvalServer.js:346) calls this.handleConfigureLLM; async handleConfigureLLM is implemented at eval-server/nodejs/src/lib/EvalServer.js:382.

eval-server/nodejs/.env.example (1)

5-6: Align default ports and pin model IDs — verification required

Ripgrep returned "No files were searched", so repo-wide verification couldn't complete.

  • Port: eval-server/nodejs/.env.example (lines 5–6) sets PORT=8080 while examples/server/frontend use 8082 — unify to 8082 (recommended) or update examples to 8080.
  • Models: replace placeholders (gpt-4, gpt-4-mini, gpt-3.5-turbo) with pinned, supported IDs for stability (e.g., gpt-4-0613, gpt-3.5-turbo-0613) or leave blank with a comment referencing OpenAI's canonical model list.
  • Action: run a repo-wide search for ports and model placeholders (e.g., rg -n --hidden -S 'localhost:808[0-9]' && rg -n --hidden -S 'gpt-4|gpt-4-mini|gpt-3.5-turbo') or allow re-run; verification is incomplete until that search succeeds.
eval-server/nodejs/examples/multiple-evals.js (3)

12-12: Good: centralized config import.
Importing CONFIG keeps example wiring in sync with server defaults.


94-96: Nice touch: provider/default label in stack log.
Clearer operator feedback without leaking secrets.


132-141: Good: runtime log mentions provider and model only.
No API keys in logs. Keep it that way.

front_end/panels/ai_chat/ui/AIChatPanel.ts (6)

321-327: Getter fallback logic LGTM.
Using config manager + fallback to main is correct.


330-337: Nano fallback chain LGTM.
[nano -> mini -> main] is sensible.


339-347: Provider resolution via config manager improves cohesion.


350-358: Same for mini+provider.


727-742: Config manager initialization at construction: good timing.


1108-1123: Update tests/e2e to call setLLMConfiguration instead of editing localStorage directly.

Search for setLLMConfiguration returned only its definition at front_end/panels/ai_chat/ui/AIChatPanel.ts:1111 — no call sites detected; update any tests/e2e that previously mutated localStorage to use this API. Run: rg -n "setLLMConfiguration\(" --type=ts -C2 to re-check.

eval-server/nodejs/src/config.js (1)

7-9: Server port defaults OK.
parseInt fallback to 8080 covers unset/invalid env.

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (1)

259-334: LLM configuration RPC: API shape looks good.
Clear request/response types and guards.

front_end/panels/ai_chat/core/AgentService.ts (4)

192-205: Redundant API‑key gate after prior validation

Following the fix above, this gate remains useful for non‑AUTOMATED_MODE. No change needed, just confirming behavior.

If you’d like to keep one source of truth, you can drop the apiKey portion from validateConfiguration entirely and rely on this gate for runtime enforcement in non‑AUTOMATED_MODE.


344-349: AUTOMATED_MODE cold‑start still hit by validation

sendMessage initializes the graph in AUTOMATED_MODE when #graph is absent, but initialize() → #initializeLLMClient would previously fail validation. The earlier fix resolves this path.

After applying the validation bypass, please test an AUTOMATED_MODE run with no stored API key to confirm first message succeeds.


292-306: Config change handler looks good

Marking the service uninitialized and clearing the graph is the right minimal action on config changes.


171-177: Log detail is appropriate

Provider list and counts are useful; no secrets are logged.

eval-server/nodejs/examples/library-usage.js (1)

76-85: Good progressive demo flow

Basic eval, then model selection, then provider config demos — solid developer UX.

eval-server/nodejs/src/lib/EvalServer.js (3)

329-379: JSON‑RPC handler: solid structure

Routing requests and returning JSON‑RPC compliant responses is correct.


701-729: Model config layering looks good

Nested main/mini/nano with evaluation overrides is sensible.

Also applies to: 730-748


199-205: Exposing getClientManager() is fine

Public accessor is a useful addition.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (5)

193-195: Good: routes configure_llm messages to a dedicated handler.

This integrates the new JSON‑RPC method cleanly.


689-762: Tool timeout handling looks solid.

Clear timeouts, trace updates, and error propagation are handled correctly.


782-793: Heartbeat is fine.

Interval cleared on disconnect; low risk.


1-1048: Repo‑wide verification helpers

Run these to confirm assumptions and spot remaining hazards.


278-284: Restrict automated-mode auth bypass to local endpoints and make it explicit opt‑in.

AUTOMATED_MODE currently skips auth for any endpoint — gate the bypass so it only applies to loopback/private hosts (localhost, 127.0.0.1, ::1, *.local) and require an explicit allow flag (e.g., AUTOMATED_MODE_AUTH_BYPASS) in addition to AUTOMATED_MODE; refuse bypass and log a warning for non-local endpoints.
Location: front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (around lines 278–284).

Comment on lines 270 to 273
logger.info('Processing responses request:', {
input: requestBody.input,
modelConfig
modelConfig: nestedModelConfig
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not log full model config (leaks API keys).
This prints nestedModelConfig containing api_key. Remove or redact.

-      logger.info('Processing responses request:', {
-        input: requestBody.input,
-        modelConfig: nestedModelConfig
-      });
+      logger.info('Processing responses request', { input: requestBody.input });

If you need visibility, log only provider/model names without secrets.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info('Processing responses request:', {
input: requestBody.input,
modelConfig
modelConfig: nestedModelConfig
});
logger.info('Processing responses request', { input: requestBody.input });
🤖 Prompt for AI Agents
In eval-server/nodejs/src/api-server.js around lines 270 to 273, the current
logger.info call prints nestedModelConfig which may contain secrets (api_key);
remove or redact any sensitive fields before logging. Replace logging of the
full object with a sanitized object that only includes non-sensitive metadata
(e.g., provider name and model name) or explicitly remove api_key/credentials
fields, and ensure nested structures are traversed to strip any secret keys
before passing data to logger.info.

Comment on lines +384 to +450
async handleConfigureLLM(connection, params) {
if (!connection.registered) {
throw new Error('Client must be registered before configuring LLM');
}

const { provider, apiKey, endpoint, models, partial = false } = params;

// Validate provider
const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];
if (!supportedProviders.includes(provider)) {
throw new Error(`Unsupported provider: ${provider}. Supported providers: ${supportedProviders.join(', ')}`);
}

// Validate models
if (!models || !models.main) {
throw new Error('Main model is required');
}

// Store configuration for this client connection
if (!connection.llmConfig) {
connection.llmConfig = {};
}

// Apply configuration (full or partial update)
if (partial && connection.llmConfig) {
// Partial update - merge with existing config
connection.llmConfig = {
...connection.llmConfig,
provider: provider || connection.llmConfig.provider,
apiKey: apiKey || connection.llmConfig.apiKey,
endpoint: endpoint || connection.llmConfig.endpoint,
models: {
...connection.llmConfig.models,
...models
}
};
} else {
// Full update - replace entire config
connection.llmConfig = {
provider,
apiKey: apiKey || CONFIG.providers[provider]?.apiKey,
endpoint: endpoint || CONFIG.providers[provider]?.endpoint,
models: {
main: models.main,
mini: models.mini || models.main,
nano: models.nano || models.mini || models.main
}
};
}

logger.info('LLM configuration updated', {
clientId: connection.clientId,
provider: connection.llmConfig.provider,
models: connection.llmConfig.models,
hasApiKey: !!connection.llmConfig.apiKey,
hasEndpoint: !!connection.llmConfig.endpoint
});

return {
status: 'success',
message: 'LLM configuration updated successfully',
appliedConfig: {
provider: connection.llmConfig.provider,
models: connection.llmConfig.models
}
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

configure_llm: partial updates require provider/models even when not changing them

For partial=true, the code still validates provider and models.main unconditionally. This rejects legitimate partial updates.

Use an effectiveProvider (from params or existing/default), and only require models.main for full updates.

   async handleConfigureLLM(connection, params) {
     if (!connection.registered) {
       throw new Error('Client must be registered before configuring LLM');
     }

-    const { provider, apiKey, endpoint, models, partial = false } = params;
+    const { provider, apiKey, endpoint, models, partial = false } = params;

-    // Validate provider
-    const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];
-    if (!supportedProviders.includes(provider)) {
-      throw new Error(`Unsupported provider: ${provider}. Supported providers: ${supportedProviders.join(', ')}`);
-    }
+    const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];
+    const existing = connection.llmConfig || {};
+    const effectiveProvider = provider || existing.provider || CONFIG.defaults.provider;
+    if (!supportedProviders.includes(effectiveProvider)) {
+      throw new Error(`Unsupported provider: ${effectiveProvider}. Supported providers: ${supportedProviders.join(', ')}`);
+    }

-    // Validate models
-    if (!models || !models.main) {
-      throw new Error('Main model is required');
-    }
+    // Validate models: required only for full update
+    if (!partial && (!models || !models.main)) {
+      throw new Error('Main model is required');
+    }

     // Store configuration for this client connection
     if (!connection.llmConfig) {
       connection.llmConfig = {};
     }

     // Apply configuration (full or partial update)
     if (partial && connection.llmConfig) {
       // Partial update - merge with existing config
       connection.llmConfig = {
         ...connection.llmConfig,
-        provider: provider || connection.llmConfig.provider,
-        apiKey: apiKey || connection.llmConfig.apiKey,
-        endpoint: endpoint || connection.llmConfig.endpoint,
+        provider: effectiveProvider,
+        apiKey: apiKey ?? connection.llmConfig.apiKey ?? CONFIG.providers[effectiveProvider]?.apiKey,
+        endpoint: endpoint ?? connection.llmConfig.endpoint ?? CONFIG.providers[effectiveProvider]?.endpoint,
         models: {
           ...connection.llmConfig.models,
           ...models
         }
       };
     } else {
       // Full update - replace entire config
       connection.llmConfig = {
-        provider,
-        apiKey: apiKey || CONFIG.providers[provider]?.apiKey,
-        endpoint: endpoint || CONFIG.providers[provider]?.endpoint,
+        provider: effectiveProvider,
+        apiKey: apiKey || CONFIG.providers[effectiveProvider]?.apiKey,
+        endpoint: endpoint || CONFIG.providers[effectiveProvider]?.endpoint,
         models: {
-          main: models.main,
-          mini: models.mini || models.main,
-          nano: models.nano || models.mini || models.main
+          main: models.main,
+          mini: models.mini || models.main,
+          nano: models.nano || models.mini || models.main
         }
       };
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async handleConfigureLLM(connection, params) {
if (!connection.registered) {
throw new Error('Client must be registered before configuring LLM');
}
const { provider, apiKey, endpoint, models, partial = false } = params;
// Validate provider
const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];
if (!supportedProviders.includes(provider)) {
throw new Error(`Unsupported provider: ${provider}. Supported providers: ${supportedProviders.join(', ')}`);
}
// Validate models
if (!models || !models.main) {
throw new Error('Main model is required');
}
// Store configuration for this client connection
if (!connection.llmConfig) {
connection.llmConfig = {};
}
// Apply configuration (full or partial update)
if (partial && connection.llmConfig) {
// Partial update - merge with existing config
connection.llmConfig = {
...connection.llmConfig,
provider: provider || connection.llmConfig.provider,
apiKey: apiKey || connection.llmConfig.apiKey,
endpoint: endpoint || connection.llmConfig.endpoint,
models: {
...connection.llmConfig.models,
...models
}
};
} else {
// Full update - replace entire config
connection.llmConfig = {
provider,
apiKey: apiKey || CONFIG.providers[provider]?.apiKey,
endpoint: endpoint || CONFIG.providers[provider]?.endpoint,
models: {
main: models.main,
mini: models.mini || models.main,
nano: models.nano || models.mini || models.main
}
};
}
logger.info('LLM configuration updated', {
clientId: connection.clientId,
provider: connection.llmConfig.provider,
models: connection.llmConfig.models,
hasApiKey: !!connection.llmConfig.apiKey,
hasEndpoint: !!connection.llmConfig.endpoint
});
return {
status: 'success',
message: 'LLM configuration updated successfully',
appliedConfig: {
provider: connection.llmConfig.provider,
models: connection.llmConfig.models
}
};
}
async handleConfigureLLM(connection, params) {
if (!connection.registered) {
throw new Error('Client must be registered before configuring LLM');
}
const { provider, apiKey, endpoint, models, partial = false } = params;
const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];
const existing = connection.llmConfig || {};
const effectiveProvider = provider || existing.provider || CONFIG.defaults.provider;
if (!supportedProviders.includes(effectiveProvider)) {
throw new Error(`Unsupported provider: ${effectiveProvider}. Supported providers: ${supportedProviders.join(', ')}`);
}
// Validate models: required only for full update
if (!partial && (!models || !models.main)) {
throw new Error('Main model is required');
}
// Store configuration for this client connection
if (!connection.llmConfig) {
connection.llmConfig = {};
}
// Apply configuration (full or partial update)
if (partial && connection.llmConfig) {
// Partial update - merge with existing config
connection.llmConfig = {
...connection.llmConfig,
provider: effectiveProvider,
apiKey: apiKey ?? connection.llmConfig.apiKey ?? CONFIG.providers[effectiveProvider]?.apiKey,
endpoint: endpoint ?? connection.llmConfig.endpoint ?? CONFIG.providers[effectiveProvider]?.endpoint,
models: {
...connection.llmConfig.models,
...models
}
};
} else {
// Full update - replace entire config
connection.llmConfig = {
provider: effectiveProvider,
apiKey: apiKey || CONFIG.providers[effectiveProvider]?.apiKey,
endpoint: endpoint || CONFIG.providers[effectiveProvider]?.endpoint,
models: {
main: models.main,
mini: models.mini || models.main,
nano: models.nano || models.mini || models.main
}
};
}
logger.info('LLM configuration updated', {
clientId: connection.clientId,
provider: connection.llmConfig.provider,
models: connection.llmConfig.models,
hasApiKey: !!connection.llmConfig.apiKey,
hasEndpoint: !!connection.llmConfig.endpoint
});
return {
status: 'success',
message: 'LLM configuration updated successfully',
appliedConfig: {
provider: connection.llmConfig.provider,
models: connection.llmConfig.models
}
};
}
🤖 Prompt for AI Agents
In eval-server/nodejs/src/lib/EvalServer.js around lines 384–450, change the
configure LLM logic so partial updates don't force provider/models validation:
compute an effectiveProvider = params.provider || connection.llmConfig?.provider
|| CONFIG.defaultProvider; only validate supportedProviders when params.provider
is present (or when there is no existing provider at all), and only require
models.main for full updates (partial === false or no existing models). For
partial updates, merge models permissively (do not throw if params.models or
params.models.main is missing) and when filling apiKey/endpoint for the applied
config use params values or fall back to connection.llmConfig values or
CONFIG.providers[effectiveProvider] defaults. Ensure logging and returned
appliedConfig reflect the final effectiveProvider and merged models.

Comment on lines 124 to 169
// Validate and add the selected provider
const validation = this.#configManager.validateConfiguration();
if (!validation.isValid) {
throw new Error(`Configuration validation failed: ${validation.errors.join(', ')}`);
}

if (provider === 'openrouter' && openrouterKey) {
providers.push({
provider: 'openrouter' as const,
apiKey: openrouterKey
});

// Only add the selected provider if it has valid configuration
switch (provider) {
case 'openai':
if (apiKey) {
providers.push({
provider: 'openai' as const,
apiKey
});
}
break;
case 'litellm':
if (endpoint) {
providers.push({
provider: 'litellm' as const,
apiKey: apiKey || '', // Can be empty for some LiteLLM endpoints
providerURL: endpoint
});
}
break;
case 'groq':
if (apiKey) {
providers.push({
provider: 'groq' as const,
apiKey
});
}
break;
case 'openrouter':
if (apiKey) {
providers.push({
provider: 'openrouter' as const,
apiKey
});
}
break;
}

if (providers.length === 0) {
let errorMessage = 'OpenAI API key is required for this configuration';
if (provider === 'litellm') {
errorMessage = 'LiteLLM endpoint is required for this configuration';
} else if (provider === 'groq') {
errorMessage = 'Groq API key is required for this configuration';
} else if (provider === 'openrouter') {
errorMessage = 'OpenRouter API key is required for this configuration';
}
throw new Error(errorMessage);
throw new Error(`No valid configuration found for provider ${provider}`);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

AUTOMATED_MODE blocked by strict validation; skip credential checks when enabled

#initializeLLMClient calls validateConfiguration unconditionally. In AUTOMATED_MODE this rejects configs that intentionally omit API keys (key provided per‑request). Pass an “ignore credentials” flag.

-    const validation = this.#configManager.validateConfiguration();
+    const validation = this.#configManager.validateConfiguration({
+      ignoreCredentialRequirements: BUILD_CONFIG.AUTOMATED_MODE
+    });
     if (!validation.isValid) {
       throw new Error(`Configuration validation failed: ${validation.errors.join(', ')}`);
     }

Note: This change depends on LLMConfigurationManager.validateConfiguration supporting the new option. See suggested diff there.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In front_end/panels/ai_chat/core/AgentService.ts around lines 124 to 169, the
unconditional call to this.#configManager.validateConfiguration() causes failure
when AUTOMATED_MODE intentionally omits API keys; update the call to pass an
option like { ignoreCredentials: true } when AUTOMATED_MODE is enabled (e.g.,
detect the mode before validation) so credential presence is not enforced, and
relax the subsequent provider checks to allow adding providers when
AUTOMATED_MODE is enabled (do not require apiKey for providers in that mode, but
still include providerURL for litellm); ensure this change aligns with the
updated LLMConfigurationManager.validateConfiguration signature.

Comment on lines +675 to +684
// Clean up early override if we set one
if (hasSetEarlyOverride) {
try {
const configManager = LLMConfigurationManager.getInstance();
configManager.clearOverride();
logger.info('Cleared early configuration override', { evaluationId: params.evaluationId });
} catch (clearError) {
logger.warn('Failed to clear early configuration override:', clearError);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Global override cleanup race across finally blocks.

handleEvaluationRequest and executeChatEvaluation both clear the same global override. With concurrency>1, overrides for other evaluations may be cleared unexpectedly.

Short‑term: set maxConcurrency=1 (see prior comment). Long‑term: remove global override; pass config to AgentService per call (e.g., AgentService.sendMessage(input, { provider, apiKey, models… })) and make AgentService use that context instead of global state.

🤖 Prompt for AI Agents
In front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts around lines
675-684, the code clears a global LLMConfigurationManager override
(clearOverride) which races with other evaluations and can wipe overrides for
concurrent runs; short-term, set maxConcurrency=1 to avoid concurrent clears,
and remove the clearOverride call here; long-term, stop using global overrides:
change AgentService APIs to accept an explicit per-call config object (provider,
apiKey, models, etc.), thread that config through EvaluationAgent ->
AgentService.sendMessage(..., config) and make AgentService use that passed
context instead of reading/writing LLMConfigurationManager global state, then
delete code paths that set or clear global overrides.

Comment on lines +335 to +356
### ✅ Phase 1: Core Infrastructure (COMPLETED)
1.**Create LLMConfigurationManager** singleton class - `/front_end/panels/ai_chat/core/LLMConfigurationManager.ts`
2.**Update AgentService** to use manager instead of localStorage
3.**Update AIChatPanel** model selection methods
4.**Add configuration refresh mechanisms**

### ✅ Phase 2: Per-Request Override Support (COMPLETED)
5.**Update EvaluationAgent** per-request override logic
6.**Update Graph/AgentNodes** configuration passing via LLMConfigurationManager
7.**Per-request override functionality** implemented

### ✅ Phase 3: Persistent Configuration API (COMPLETED)
8.**Extend EvaluationProtocol** with `configure_llm` method
9.**Implement `configure_llm` handler** in EvaluationAgent
10. 🔄 **Add eval-server support** for persistent configuration (server-side implementation needed)
11.**Add configuration validation** and error handling

### 🔄 Phase 4: Testing & Documentation (NEXT)
12.**Maintain backwards compatibility** during transition
13. 🔄 **Add comprehensive tests** for both modes
14.**Update documentation** and usage examples
15. 🔄 **Performance testing** with multiple tabs and configurations
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Doc/code mismatch: partial configure_llm vs server enforcement

The doc says partial updates are supported, but server handleConfigureLLM currently requires provider and models.main in all cases. See EvalServer.js suggestions to align behavior.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- MODEL-CONFIGS.md lines 335-356 ----"
if [ -f MODEL-CONFIGS.md ]; then
  sed -n '335,356p' MODEL-CONFIGS.md || true
else
  echo "MODEL-CONFIGS.md not found in repo root"
fi

echo
echo "---- Searching for configure_llm / handleConfigureLLM / EvalServer ----"
rg -n --hidden --no-ignore -S 'configure_llm|configure-llm|handleConfigureLLM|configureLLM|handle_configure_llm' -g '!**/node_modules/**' || true

echo
echo "---- Showing context for matches (±5 lines) ----"
rg -n -C5 --hidden --no-ignore -S 'configure_llm|configure-llm|handleConfigureLLM|configureLLM|handle_configure_llm' -g '!**/node_modules/**' || true

echo
echo "---- Searching for EvalServer filename / class / eval-server directory ----"
rg -n --hidden --no-ignore -S 'class\s+EvalServer|EvalServer.js|EvalServer.ts|eval-server' -g '!**/node_modules/**' || true
find . -type f \( -iname 'evalserver*' -o -iname '*eval-server*' -o -iname 'EvalServer.js' -o -iname 'EvalServer.ts' \) -print || true

echo
echo "---- Done ----"

Length of output: 1950


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- File: eval-server/nodejs/src/lib/EvalServer.js (full) ----"
if [ -f eval-server/nodejs/src/lib/EvalServer.js ]; then
  nl -ba eval-server/nodejs/src/lib/EvalServer.js | sed -n '1,240p' || true
else
  echo "File not found: eval-server/nodejs/src/lib/EvalServer.js"
fi

echo
echo "---- Search for configure/handleConfigure keywords across repo ----"
rg -n --hidden --no-ignore -S 'configure_llm|configure-llm|configureLLM|handleConfigureLLM|handle_configure_llm' -g '!**/node_modules/**' || true

echo
echo "---- Show matches with context (±10 lines) ----"
rg -n -C10 --hidden --no-ignore -S 'configure_llm|configure-llm|configureLLM|handleConfigureLLM|handle_configure_llm' -g '!**/node_modules/**' || true

Length of output: 520


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="eval-server/nodejs/src/lib/EvalServer.js"
echo "---- File: $FILE ----"
if [ -f "$FILE" ]; then
  awk '{printf "%5d: %s\n", NR, $0}' "$FILE" || true
else
  echo "File not found: $FILE"
fi

echo
echo "---- Repo-wide search for configure/handleConfigure patterns ----"
grep -RIn --exclude-dir=node_modules -C5 -E 'configure(_|-)?llm|configureLLM|handleConfigureLLM|handle_configure_llm' . || true

echo
echo "---- Done ----"

Length of output: 44342


EvalServer unconditionally rejects partial configure_llm updates (provider + models.main required)

  • Evidence: eval-server/nodejs/src/lib/EvalServer.js — handleConfigureLLM (lines 384–400) destructures params and then unconditionally validates provider and models.main, so calls with params.partial === true but missing provider/models will throw.
  • Action: Make validation conditional — require provider/models.main only when partial is false (or validate only the fields present when partial is true). Ensure partial merges only when connection.llmConfig exists; otherwise reject with a clear error. (Change validation at eval-server/nodejs/src/lib/EvalServer.js lines ~391–400 and adjust the partial-merge logic at ~408–432.)
  • Docs: Reconcile MODEL-CONFIGS.md / eval-server/nodejs/CLAUDE.md examples with the server behavior or update docs to reflect stricter requirements.
🤖 Prompt for AI Agents
In MODEL-CONFIGS.md around lines 335 to 356: the comment notes eval-server
currently rejects partial configure_llm updates because EvalServer.js
unconditionally validates provider and models.main; update
eval-server/nodejs/src/lib/EvalServer.js (around lines ~391–400 and ~408–432) so
validation of provider and models.main only occurs when params.partial is false,
and when params.partial is true validate only the fields present (and perform a
partial merge only if connection.llmConfig exists, otherwise return a clear
error indicating no existing config to patch); also adjust the partial-merge
logic to merge safely and persist the result, and update MODEL-CONFIGS.md /
eval-server/nodejs/CLAUDE.md examples to reflect the new behavior or stricter
requirements.

Copy link

claude bot commented Sep 19, 2025

Claude finished @olesho's task —— View job


PR Review: Local Dockerised Eval Server

I've completed a comprehensive review of this substantial PR implementing a local dockerized evaluation server with advanced LLM configuration management. This is an impressive addition with solid architecture, though there are several areas for improvement.

Review Progress:

  • Analyze core eval server changes
  • Review LLM configuration management
  • Check security implementations
  • Evaluate error handling and validation
  • Assess performance considerations
  • Review test coverage needs
  • Provide comprehensive feedback

🎯 Overall Assessment

This PR introduces a sophisticated evaluation server system with excellent separation of concerns and flexible configuration management. The implementation demonstrates strong architectural decisions with room for security and robustness improvements.

✅ Strengths

Architecture & Design

  • Excellent modular architecture with clear separation between eval server, agent service, and LLM management
  • Flexible LLM configuration system supporting both manual and automated modes with override capabilities
  • Comprehensive protocol design with proper JSON-RPC implementation and type safety
  • Well-structured tracing integration for observability

Code Quality

  • Consistent error handling patterns across most components
  • Good logging implementation with appropriate detail levels
  • Proper WebSocket connection management with heartbeat and reconnection logic
  • Comprehensive model configuration documentation (MODEL-CONFIGS.md)

🔍 Areas for Improvement

Critical Security Issues

1. API Key Exposure Risk (api-server.js:349, EvaluationAgent.ts:377-392)

// ISSUE: API keys logged at info level
logger.info('Setting early configuration override', {
  hasApiKey: !!apiKey,
  apiKeyLength: apiKey?.length  // Should not log length
});

Recommendation: Never log API key lengths or related metadata. Use debug level for configuration details.

2. Insufficient Input Validation (api-server.js:263-264)

if (!requestBody.input || typeof requestBody.input !== 'string') {
  throw new Error('Missing or invalid "input" field. Expected a string.');
}

Recommendation: Add comprehensive validation for all request parameters, including size limits and sanitization.

3. CORS Security (api-server.js:61)

res.setHeader('Access-Control-Allow-Origin', '*');

Recommendation: Replace wildcard CORS with specific allowed origins for production use.

Memory & Performance Concerns

4. Potential Memory Leaks (EvalServer.js:117-123)

for (const [clientId, connection] of this.connectedClients) {
  connection.rpcClient.cleanup();
  // Missing: Remove event listeners and clear timers
}

Recommendation: Ensure all event listeners, intervals, and timeouts are properly cleaned up.

5. Missing Request Size Limits (api-server.js:88-91)

for await (const chunk of req) {
  body += chunk; // No size limit
}

Recommendation: Implement request size limits to prevent DoS attacks.

Error Handling & Reliability

6. Incomplete Error Context (EvaluationAgent.ts:633-670)

} catch (error) {
  const errorMessage = error instanceof Error ? error.message : 'Unknown error';
  // Missing: Error stack traces, evaluation context
}

Recommendation: Include evaluation context and stack traces in error responses for better debugging.

7. Inconsistent Timeout Handling (api-server.js:415, EvaluationAgent.ts:573)

timeout: 1500000, // 25 minutes - very long

Recommendation: Implement configurable timeouts with reasonable defaults and proper cleanup on timeout.

Code Quality Issues

8. Complex Configuration Logic (EvaluationAgent.ts:352-412)
The model configuration parsing has nested conditionals and multiple fallbacks that are hard to follow.
Recommendation: Extract to dedicated configuration parser with unit tests.

9. Missing Validation (LLMConfigurationManager.ts:157-167)

setOverride(config: Partial<LLMConfig>): void {
  this.overrideConfig = { ...config }; // No validation
}

Recommendation: Add validation for provider compatibility and required fields.

🧪 Test Coverage Gaps

The PR lacks comprehensive test coverage for:

  • Configuration override scenarios
  • Error handling edge cases
  • WebSocket connection failure scenarios
  • Multi-provider model configuration
  • Tracing integration

Recommendation: Add unit tests for core components and integration tests for the evaluation flow.

🚀 Performance Recommendations

1. Connection Pooling

Implement connection pooling for HTTP requests to external LLM providers to reduce latency.

2. Configuration Caching

Cache parsed configurations to avoid repeated localStorage access.

3. Request Debouncing

Add debouncing for rapid configuration changes to prevent unnecessary reinitializations.

🔧 Implementation Suggestions

1. Enhanced Error Types

export class EvaluationError extends Error {
  constructor(
    message: string,
    public readonly code: string,
    public readonly evaluationId?: string,
    public readonly context?: Record<string, any>
  ) {
    super(message);
  }
}

2. Configuration Validation Schema

const CONFIG_SCHEMA = {
  provider: { type: 'string', enum: ['openai', 'litellm', 'groq', 'openrouter'] },
  apiKey: { type: 'string', minLength: 10 },
  models: { type: 'object', required: ['main'] }
};

3. Request Rate Limiting

Implement rate limiting for evaluation requests to prevent abuse.

📋 Immediate Action Items

High Priority

  1. Fix API key logging - Remove all API key metadata from logs
  2. Add input validation - Implement comprehensive request validation
  3. Configure CORS properly - Replace wildcard with specific origins
  4. Add request size limits - Prevent DoS via large payloads

Medium Priority

  1. Improve error handling - Add context and stack traces
  2. Add configuration validation - Validate provider compatibility
  3. Implement cleanup - Ensure proper resource cleanup on shutdown
  4. Add unit tests - Test core configuration and evaluation logic

Low Priority

  1. Optimize performance - Add connection pooling and caching
  2. Enhanced documentation - Add API documentation and examples

🏆 Conclusion

This is a well-architected and feature-rich addition that significantly enhances the evaluation capabilities. The core design is solid with excellent separation of concerns. The main concerns are around security hardening, comprehensive testing, and production readiness.

Recommendation: Address the security issues and add basic test coverage before merging. The architecture is sound and will provide a strong foundation for evaluation workflows.

Overall Score: 7.5/10 - Excellent architecture with important security/robustness improvements needed.


Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
front_end/panels/ai_chat/core/AgentService.ts (1)

651-651: Replace direct localStorage access with configuration manager

The method uses this.#configManager.getProvider() for the provider but still directly accesses localStorage for the endpoint check at line 670. This inconsistency could lead to mismatched state.

🧹 Nitpick comments (3)
front_end/panels/ai_chat/core/AgentService.ts (3)

113-178: Consider adding provider-specific error messages

The initialization logic is solid with AUTOMATED_MODE support. However, when validation fails, the error message could be more specific about which provider failed and why.

Consider enhancing the error message to include the provider name:

     if (!validation.isValid) {
-      throw new Error(`Configuration validation failed: ${validation.errors.join(', ')}`);
+      throw new Error(`Configuration validation failed for ${provider}: ${validation.errors.join(', ')}`);
     }

296-307: Consider handling ongoing operations during configuration changes

The configuration change handler properly marks the service for reinitialization. However, if there's an ongoing graph operation when the configuration changes, it might lead to undefined behavior.

Consider canceling any ongoing operations before clearing the graph:

   #handleConfigurationChange(): void {
     logger.info('LLM configuration changed, reinitializing if needed');
 
     // If we're initialized, we need to reinitialize with new configuration
     if (this.#isInitialized) {
+      // Cancel any ongoing graph operations
+      if (this.#runningGraphStatePromise) {
+        logger.info('Canceling ongoing graph operation due to config change');
+        this.#runningGraphStatePromise = undefined;
+      }
+
       // Mark as uninitialized to force reinit on next use
       this.#isInitialized = false;
       this.#graph = undefined;
 
       logger.info('Marked agent service for reinitialization due to config change');
     }
   }

313-328: Add error recovery mechanism for failed reinitialization

The refresh method properly reinitializes the service, but if reinitialization fails, the service is left in an unusable state.

Consider adding a fallback to restore the previous state if reinitialization fails:

   async refreshCredentials(): Promise<void> {
     logger.info('Refreshing credentials and reinitializing agent service');
 
+    // Store current state in case we need to rollback
+    const previousInitialized = this.#isInitialized;
+    const previousGraph = this.#graph;
+
     this.#isInitialized = false;
     this.#graph = undefined;
 
     // Force reinitialization on next use
     try {
       const config = this.#configManager.getConfiguration();
       await this.initialize(config.apiKey || null, config.mainModel, config.miniModel || '', config.nanoModel || '');
       logger.info('Agent service reinitialized successfully');
     } catch (error) {
       logger.error('Failed to reinitialize agent service:', error);
+      // Restore previous state if we were previously initialized
+      if (previousInitialized && previousGraph) {
+        this.#isInitialized = previousInitialized;
+        this.#graph = previousGraph;
+        logger.warn('Restored previous state after failed reinitialization');
+      }
       throw error;
     }
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78b020b and 1ab0393.

📒 Files selected for processing (3)
  • front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2 hunks)
  • front_end/panels/ai_chat/core/AgentService.ts (9 hunks)
  • front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • front_end/panels/ai_chat/core/LLMConfigurationManager.ts
  • front_end/panels/ai_chat/LLM/OpenAIProvider.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/config.js : Require OPENAI_API_KEY and PORT (default 8080) from environment configuration

Applied to files:

  • front_end/panels/ai_chat/core/AgentService.ts
🧬 Code graph analysis (1)
front_end/panels/ai_chat/core/AgentService.ts (3)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)
  • LLMConfigurationManager (42-372)
front_end/panels/ai_chat/core/BuildConfig.ts (1)
  • BUILD_CONFIG (11-20)
front_end/panels/ai_chat/core/Graph.ts (1)
  • createAgentGraph (24-42)
🔇 Additional comments (4)
front_end/panels/ai_chat/core/AgentService.ts (4)

17-18: LGTM! Clean centralized configuration management setup

Good architectural decision to use a centralized configuration manager. The imports are properly organized and the singleton pattern ensures consistent state management across the application.

Also applies to: 24-24


62-90: LGTM! Proper initialization with configuration change handling

Good initialization of the configuration manager and subscription to configuration changes. This enables real-time config updates without requiring a restart.


193-205: LGTM! Proper AUTOMATED_MODE bypass for API key validation

The implementation correctly handles AUTOMATED_MODE by bypassing API key requirements when appropriate. The provider-specific error messages are clear and helpful.


345-350: LGTM! Proper AUTOMATED_MODE initialization in sendMessage

Good implementation of AUTOMATED_MODE support that ensures the graph is initialized even without an API key. This allows automated testing and evaluation flows to work correctly.

2. ✓ Fixed multi-provider credential preservation in saveProviderSpecificSettings()
3. ✓ Secured logging in both handleStorageChange() and getDebugInfo()
Copy link

claude bot commented Sep 19, 2025

Claude finished @olesho's task —— View job


I'll analyze this and get back to you.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (3)

253-293: Skip‑credential checks: addressed and looks good.

The AUTOMATED_MODE bypass via skipCredentialChecks is correctly implemented.


295-336: Per‑provider credential persistence: addressed and looks good.

Only the active provider’s keys are updated; others are preserved.


373-389: Debug info redaction: looks good.

apiKey is masked in both override and current configs.

🧹 Nitpick comments (7)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (7)

65-71: Harden provider parsing; avoid unsafe cast.

Validate stored provider against the allowed set instead of casting arbitrary strings.

Apply this diff:

   getProvider(): LLMProvider {
     if (this.overrideConfig?.provider) {
       return this.overrideConfig.provider;
     }
-    const stored = localStorage.getItem(STORAGE_KEYS.PROVIDER);
-    return (stored as LLMProvider) || 'openai';
+    const stored = localStorage.getItem(STORAGE_KEYS.PROVIDER);
+    switch (stored) {
+      case 'openai':
+      case 'litellm':
+      case 'groq':
+      case 'openrouter':
+        return stored;
+      default:
+        return 'openai';
+    }
   }

145-153: Don’t populate optional fields with empty strings in getConfiguration.

Map empty strings to undefined for optional fields to align with LLMConfig and avoid downstream truthiness bugs.

Apply this diff:

   getConfiguration(): LLMConfig {
     return {
       provider: this.getProvider(),
-      apiKey: this.getApiKey(),
+      apiKey: this.getApiKey() || undefined,
       endpoint: this.getEndpoint(),
       mainModel: this.getMainModel(),
-      miniModel: this.getMiniModel(),
-      nanoModel: this.getNanoModel(),
+      miniModel: this.getMiniModel() || undefined,
+      nanoModel: this.getNanoModel() || undefined,
     };
   }

266-269: Trim whitespace when validating mainModel.

Prevent accepting a string of spaces.

Apply this diff:

-    if (!config.mainModel) {
+    if (!config.mainModel?.trim()) {
       errors.push('Main model is required');
     }

341-358: Redact endpoint and tighten event‑key typing in storage logs.

Endpoints can reveal internal hosts or bearer tokens in URLs; redact them too. Also avoid the any cast by using a Set check.

Apply this diff:

-  private handleStorageChange(event: StorageEvent): void {
-    if (event.key && Object.values(STORAGE_KEYS).includes(event.key as any)) {
-      const sensitiveKeys = new Set([
+  private handleStorageChange(event: StorageEvent): void {
+    const allKeys = new Set(Object.values(STORAGE_KEYS));
+    if (event.key && allKeys.has(event.key)) {
+      const sensitiveKeys = new Set<string>([
         STORAGE_KEYS.OPENAI_API_KEY,
         STORAGE_KEYS.LITELLM_API_KEY,
         STORAGE_KEYS.GROQ_API_KEY,
         STORAGE_KEYS.OPENROUTER_API_KEY,
+        STORAGE_KEYS.LITELLM_ENDPOINT, // endpoints may be sensitive
       ]);
-      const redacted =
-        sensitiveKeys.has(event.key as any) ? '(redacted)' :
+      const redacted =
+        sensitiveKeys.has(event.key) ? '(redacted)' :
         (event.newValue ? `${event.newValue.slice(0, 8)}…` : null);
       logger.debug('Configuration changed in another tab', {
         key: event.key,
         newValue: redacted
       });
       this.notifyListeners();
     }
   }

363-371: Debounce change notifications to avoid listener storms.

Saving updates multiple keys, which can spam listeners across tabs. Batch via microtask.

Apply this diff:

 export class LLMConfigurationManager {
   private static instance: LLMConfigurationManager;
   private overrideConfig?: Partial<LLMConfig>; // Override for automated mode
   private changeListeners: Array<() => void> = [];
+  private notifyQueued = false;
   private notifyListeners(): void {
-    this.changeListeners.forEach(listener => {
-      try {
-        listener();
-      } catch (error) {
-        logger.error('Error in configuration change listener:', error);
-      }
-    });
+    if (this.notifyQueued) return;
+    this.notifyQueued = true;
+    queueMicrotask(() => {
+      this.notifyQueued = false;
+      for (const listener of this.changeListeners) {
+        try {
+          listener();
+        } catch (error) {
+          logger.error('Error in configuration change listener:', error);
+        }
+      }
+    });
   }

Also applies to: 43-50


239-251: Return an unsubscribe and prevent duplicate listeners.

Improves ergonomics and avoids accidental duplicate registrations.

Apply this diff:

-  addChangeListener(listener: () => void): void {
-    this.changeListeners.push(listener);
-  }
+  addChangeListener(listener: () => void): () => void {
+    if (!this.changeListeners.includes(listener)) {
+      this.changeListeners.push(listener);
+    }
+    return () => this.removeChangeListener(listener);
+  }

199-220: Minor: consider grouping saves to reduce cross‑tab churn.

If feasible, persist fewer keys (e.g., skip writing unchanged values) to minimize storage events.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ab0393 and 855fb9a.

📒 Files selected for processing (1)
  • front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (3)
front_end/panels/ai_chat/core/AgentService.ts (1)
  • logger (296-307)
front_end/panels/ai_chat/core/Logger.ts (1)
  • createLogger (316-318)
front_end/panels/ai_chat/LLM/LLMTypes.ts (1)
  • LLMProvider (142-142)

Copy link

claude bot commented Sep 19, 2025

Claude finished @olesho's task —— View job


PR Review: Local Dockerised Eval Server

I've completed a comprehensive review of this substantial PR implementing a local dockerized evaluation server with advanced LLM configuration management. This is an impressive addition with solid architecture, though there are several areas for improvement.

Review Progress:

  • Analyze core eval server changes
  • Review LLM configuration management
  • Check security implementations
  • Evaluate error handling and validation
  • Assess performance considerations
  • Review test coverage needs
  • Provide comprehensive feedback

🎯 Overall Assessment

This PR introduces a sophisticated evaluation server system with excellent separation of concerns and flexible configuration management. The implementation demonstrates strong architectural decisions with room for security and robustness improvements.

✅ Strengths

Architecture & Design

  • Excellent modular architecture with clear separation between eval server, agent service, and LLM management
  • Flexible LLM configuration system supporting both manual and automated modes with override capabilities
  • Comprehensive protocol design with proper JSON-RPC implementation and type safety
  • Well-structured tracing integration for observability

Code Quality

  • Consistent error handling patterns across most components
  • Good logging implementation with appropriate detail levels
  • Proper WebSocket connection management with heartbeat and reconnection logic
  • Comprehensive model configuration documentation (MODEL-CONFIGS.md)

🔍 Areas for Improvement

Critical Security Issues

1. API Key Exposure Risk (api-server.js:349, EvaluationAgent.ts:377-392)

// ISSUE: API keys logged at info level
logger.info('Setting early configuration override', {
  hasApiKey: !!apiKey,
  apiKeyLength: apiKey?.length  // Should not log length
});

Recommendation: Never log API key lengths or related metadata. Use debug level for configuration details.

2. Insufficient Input Validation (api-server.js:263-264)

if (!requestBody.input || typeof requestBody.input !== 'string') {
  throw new Error('Missing or invalid "input" field. Expected a string.');
}

Recommendation: Add comprehensive validation for all request parameters, including size limits and sanitization.

3. CORS Security (api-server.js:61)

res.setHeader('Access-Control-Allow-Origin', '*');

Recommendation: Replace wildcard CORS with specific allowed origins for production use.

Memory & Performance Concerns

4. Potential Memory Leaks (EvalServer.js:117-123)

for (const [clientId, connection] of this.connectedClients) {
  connection.rpcClient.cleanup();
  // Missing: Remove event listeners and clear timers
}

Recommendation: Ensure all event listeners, intervals, and timeouts are properly cleaned up.

5. Missing Request Size Limits (api-server.js:88-91)

for await (const chunk of req) {
  body += chunk; // No size limit
}

Recommendation: Implement request size limits to prevent DoS attacks.

Error Handling & Reliability

6. Incomplete Error Context (EvaluationAgent.ts:633-670)

} catch (error) {
  const errorMessage = error instanceof Error ? error.message : 'Unknown error';
  // Missing: Error stack traces, evaluation context
}

Recommendation: Include evaluation context and stack traces in error responses for better debugging.

7. Inconsistent Timeout Handling (api-server.js:415, EvaluationAgent.ts:573)

timeout: 1500000, // 25 minutes - very long

Recommendation: Implement configurable timeouts with reasonable defaults and proper cleanup on timeout.

Code Quality Issues

8. Complex Configuration Logic (EvaluationAgent.ts:352-412)
The model configuration parsing has nested conditionals and multiple fallbacks that are hard to follow.
Recommendation: Extract to dedicated configuration parser with unit tests.

9. Missing Validation (LLMConfigurationManager.ts:157-167)

setOverride(config: Partial<LLMConfig>): void {
  this.overrideConfig = { ...config }; // No validation
}

Recommendation: Add validation for provider compatibility and required fields.

🧪 Test Coverage Gaps

The PR lacks comprehensive test coverage for:

  • Configuration override scenarios
  • Error handling edge cases
  • WebSocket connection failure scenarios
  • Multi-provider model configuration
  • Tracing integration

Recommendation: Add unit tests for core components and integration tests for the evaluation flow.

🚀 Performance Recommendations

1. Connection Pooling

Implement connection pooling for HTTP requests to external LLM providers to reduce latency.

2. Configuration Caching

Cache parsed configurations to avoid repeated localStorage access.

3. Request Debouncing

Add debouncing for rapid configuration changes to prevent unnecessary reinitializations.

🔧 Implementation Suggestions

1. Enhanced Error Types

export class EvaluationError extends Error {
  constructor(
    message: string,
    public readonly code: string,
    public readonly evaluationId?: string,
    public readonly context?: Record<string, any>
  ) {
    super(message);
  }
}

2. Configuration Validation Schema

const CONFIG_SCHEMA = {
  provider: { type: 'string', enum: ['openai', 'litellm', 'groq', 'openrouter'] },
  apiKey: { type: 'string', minLength: 10 },
  models: { type: 'object', required: ['main'] }
};

3. Request Rate Limiting

Implement rate limiting for evaluation requests to prevent abuse.

📋 Immediate Action Items

High Priority

  1. Fix API key logging - Remove all API key metadata from logs
  2. Add input validation - Implement comprehensive request validation
  3. Configure CORS properly - Replace wildcard with specific origins
  4. Add request size limits - Prevent DoS via large payloads

Medium Priority

  1. Improve error handling - Add context and stack traces
  2. Add configuration validation - Validate provider compatibility
  3. Implement cleanup - Ensure proper resource cleanup on shutdown
  4. Add unit tests - Test core configuration and evaluation logic

Low Priority

  1. Optimize performance - Add connection pooling and caching
  2. Enhanced documentation - Add API documentation and examples

🏆 Conclusion

This is a well-architected and feature-rich addition that significantly enhances the evaluation capabilities. The core design is solid with excellent separation of concerns. The main concerns are around security hardening, comprehensive testing, and production readiness.

Recommendation: Address the security issues and add basic test coverage before merging. The architecture is sound and will provide a strong foundation for evaluation workflows.

Overall Score: 7.5/10 - Excellent architecture with important security/robustness improvements needed.


Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (3)

537-560: Don't merge persisted config into chat input unless an override is actually set.

Using getConfiguration() unconditionally can clobber per-request values with localStorage. Check hasOverride() first.


677-686: Global override cleanup race across finally blocks.

handleEvaluationRequest and executeChatEvaluation both clear the same global override. With concurrency>1, overrides for other evaluations may be cleared unexpectedly.


978-1023: Support partial LLM configuration updates but rollback on validation failure needs improvement.

Current handler correctly validates after saving but the rollback mechanism (restoring currentConfig) has issues: saveConfiguration might fail silently or partially apply changes. Instead, validate the merged config first, then save only if valid.

Apply this diff to validate before persisting:

       // Merge configurations based on partial flag
       const mergedConfig = {
         provider: params.partial ? (params.provider ?? currentConfig.provider) : params.provider,
         apiKey: params.partial ? (params.apiKey ?? currentConfig.apiKey) : params.apiKey,
         endpoint: params.partial ? (params.endpoint ?? currentConfig.endpoint) : params.endpoint,
         mainModel: params.partial ? (params.models?.main ?? currentConfig.mainModel) : params.models.main,
         miniModel: params.partial ? (params.models?.mini ?? currentConfig.miniModel) : params.models?.mini,
         nanoModel: params.partial ? (params.models?.nano ?? currentConfig.nanoModel) : params.models?.nano
       };

-      // Validate the merged configuration
-      const validation = configManager.validateConfiguration();
-
-      // Check validation after setting the merged config temporarily
-      configManager.saveConfiguration(mergedConfig);
-      const postSaveValidation = configManager.validateConfiguration();
-
-      if (!postSaveValidation.isValid) {
-        // Restore the original config if validation fails
-        configManager.saveConfiguration(currentConfig);
+      // Temporarily set override to validate the merged config
+      configManager.setOverride(mergedConfig);
+      const validation = configManager.validateConfiguration();
+      configManager.clearOverride();
+
+      if (!validation.isValid) {
         // Send error response with validation errors
         const errorResponse = createErrorResponse(
           id,
           ErrorCodes.INVALID_PARAMS,
           'Invalid configuration',
-          { errors: postSaveValidation.errors }
+          { errors: validation.errors }
         );

         if (this.client) {
           this.client.send(errorResponse);
         }

         logger.error('Configuration validation failed', {
-          errors: postSaveValidation.errors
+          errors: validation.errors
         });

         return;
       }
+
+      // Save configuration only after validation passes
+      configManager.saveConfiguration(mergedConfig);
🧹 Nitpick comments (1)
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (1)

4-4: Remove or update the obsolete cache break comment.

The comment references a date from September 2025, which appears to be in the future. This should either be removed or updated with an accurate date.

-// Cache break: 2025-09-17T17:38:00Z - Fixed API key validation
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 855fb9a and d588841.

📒 Files selected for processing (1)
  • front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (12 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-29T23:08:07.342Z
Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/evaluator.js : LLM evaluator must integrate with the OpenAI API to judge agent responses

Applied to files:

  • front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
🧬 Code graph analysis (1)
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (6)
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (6)
  • isLLMConfigurationRequest (297-299)
  • createAuthVerifyMessage (197-203)
  • LLMConfigurationRequest (259-264)
  • createErrorResponse (240-255)
  • ErrorCodes (141-155)
  • createLLMConfigurationResponse (314-334)
front_end/panels/ai_chat/core/BuildConfig.ts (1)
  • BUILD_CONFIG (11-20)
front_end/panels/ai_chat/core/AgentService.ts (1)
  • AgentService (44-787)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (2)
  • LLMConfigurationManager (42-390)
  • hasOverride (184-186)
front_end/panels/ai_chat/models/ChatTypes.ts (1)
  • ChatMessage (74-75)
front_end/panels/ai_chat/tracing/TracingConfig.ts (1)
  • withTracingContext (313-315)
🔇 Additional comments (3)
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (3)

278-284: LGTM! Clean automated-mode authentication bypass.

The implementation correctly checks BUILD_CONFIG.AUTOMATED_MODE and sends a verified auth response early, bypassing the secret key verification flow for Docker/CI deployments.


352-413: Early override configuration extraction handles both flat and nested formats well.

The debug logging is comprehensive and the extraction logic correctly handles both nested (main_model: { provider, model, api_key }) and flat formats with proper fallbacks.


958-961: Override cleanup OK — maxConcurrency = 3; confirm concurrency strategy.

Finally block clears the config override. The register message sets maxConcurrency: 3 in front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts; since concurrency >1 a race condition is still possible — either reduce to 1 or add per-evaluation locking/scoped overrides.

Comment on lines +848 to +871
const provider = input.provider || config.provider || 'openai';
const mainModel = input.main_model || config.mainModel || this.judgeModel;
const miniModel = input.mini_model ?? config.miniModel ?? this.miniModel;
const nanoModel = input.nano_model ?? config.nanoModel ?? this.nanoModel;
const apiKey = input.api_key ?? config.apiKey ?? agentService.getApiKey();
const endpoint = input.endpoint ?? config.endpoint;

logger.info('Setting consolidated configuration override for chat evaluation', {
provider: provider,
mainModel: mainModel,
hasApiKey: !!apiKey,
hasEndpoint: !!endpoint,
isInitialized: agentService.isInitialized()
});

// Always reinitialize with the current model and explicit mini/nano
await agentService.initialize(agentService.getApiKey(), modelName, miniModel, nanoModel);

// Set single consolidated override with all fallback values
configManager.setOverride({
provider: provider,
mainModel: mainModel,
miniModel: miniModel,
nanoModel: nanoModel,
apiKey: apiKey || undefined,
endpoint: endpoint || undefined
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

🧩 Analysis chain

Single override point with merged fallback values (looks good but ensure no duplicates).

The consolidation looks correct: you set a single override with proper fallback values (input → config → defaults), avoiding the duplicate override issue. Just ensure this is the only setOverride call in this flow.


🏁 Script executed:

#!/bin/bash
# Verify there's only one setOverride call in the executeChatEvaluation flow
rg -A3 -B3 'configManager\.setOverride' front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts | grep -n 'setOverride'

Length of output: 76


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts"
echo "File: $file"
echo
echo "Search for executeChatEvaluation definition:"
rg -n --line-number 'executeChatEvaluation' "$file" || true
echo
echo "Matches for configManager.setOverride with context:"
rg -n -C5 'configManager\.setOverride' "$file" || true

Length of output: 1023


Consolidate duplicate configManager.setOverride calls — keep a single consolidated override or explicitly justify both.

  • front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts — configManager.setOverride at line 385.
  • front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts — configManager.setOverride at line 864.
🤖 Prompt for AI Agents
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts lines ~385 and
848-871: there are duplicate configManager.setOverride calls; keep a single
consolidated override (the one at 848-871 which already collects provider,
mainModel, miniModel, nanoModel, apiKey and endpoint) or, if both are required
for different lifecycles, add a clear comment justifying the second call and
ensure the earlier call only sets the minimal/intentional fields. Fix by
removing the redundant call at line 385 (or merging its unique fields into the
consolidated call at 848-871), and run a quick search for usages relying on the
earlier override to confirm no behavioral regression.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (17)
eval-server/nodejs/schemas/client.schema.json (1)

156-158: Add "llm_judge" to client.schema.json enum for backward compatibility

Many eval YAMLs under eval-server/nodejs/evals use type: "llm_judge" (underscore). Expand the enum in eval-server/nodejs/schemas/client.schema.json to accept both forms to avoid validation failures.

           "type": "string",
-          "enum": ["llm-judge", "snapshot", "hybrid"]
+          "enum": ["llm-judge", "llm_judge", "snapshot", "hybrid"]
front_end/panels/ai_chat/tools/FetcherTool.ts (1)

99-116: Abort handling: stop processing on AbortError instead of accumulating failures.

Current loop catches AbortError per URL and continues, producing many failed entries post‑abort. Short‑circuit and return partial results.

   for (const url of urlsToProcess) {
     throwIfAborted();
     try {
       logger.info('Processing URL', { url });
       const fetchedContent = await this.fetchContentFromUrl(url, reasoning, ctx);
       results.push(fetchedContent);
     } catch (error: any) {
-      logger.error('Error processing URL', { url, error: error.message, stack: error.stack });
-      results.push({
-        url,
-        title: '',
-        markdownContent: '',
-        success: false,
-        error: `Failed to process URL: ${error.message}`
-      });
+      if (error instanceof DOMException && error.name === 'AbortError') {
+        logger.info('Fetch aborted; stopping further URL processing.');
+        return {
+          sources: results,
+          success: results.some(r => r.success),
+          error: 'aborted'
+        };
+      }
+      logger.error('Error processing URL', { url, error: error.message, stack: error.stack });
+      results.push({
+        url,
+        title: '',
+        markdownContent: '',
+        success: false,
+        error: `Failed to process URL: ${error.message}`
+      });
     }
   }
front_end/panels/ai_chat/tools/Tools.ts (3)

1129-1159: Escape selector to avoid JS eval breakage/injection

selector is embedded unescaped into an evaluated string. Quotes or backticks in selector can break the expression or be abused.

Use JSON.stringify for safe injection:

-      const result = await target.runtimeAgent().invoke_evaluate({
-        expression: `(() => {
-          const element = document.querySelector("${selector}");
+      const result = await target.runtimeAgent().invoke_evaluate({
+        expression: `(() => {
+          const sel = ${JSON.stringify(selector)};
+          const element = document.querySelector(sel);
           if (!element) {
             return {
               success: false,
-              message: "Element not found with selector: ${selector}"
+              message: "Element not found with selector: " + sel
             };
           }

1204-1213: Escape query string in evaluated code

query is directly interpolated; escape to prevent syntax errors/injection.

-          const query = "${query}";
+          const query = ${JSON.stringify(query)};
           const limit = ${limit};

3181-3199: Escape XPath in evaluated code

xpath is injected into document.evaluate without escaping; can break the expression.

-          expression: `
-            (function() {
-              const element = document.evaluate("${xpath}", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
+          expression: `
+            (function() {
+              const xpath = ${JSON.stringify(xpath)};
+              const element = document.evaluate(xpath, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
               if (!element) return { found: false };
front_end/panels/ai_chat/evaluation/test-cases/schema-extractor-tests.ts (2)

120-126: Replace remaining 'extract_schema' UI references with 'extract_data'

TOOL_TEST_MAPPING and the default toolType still reference 'extract_schema' which doesn't match ToolRegistry (it registers 'extract_data'); update these to 'extract_data'.

  • front_end/panels/ai_chat/ui/EvaluationDialog.ts:28 — TOOL_TEST_MAPPING key 'extract_schema'
  • front_end/panels/ai_chat/ui/EvaluationDialog.ts:175 — initial state toolType: 'extract_schema'

24-31: Action required: 'extract_data' routes to SchemaBasedExtractorTool (not CombinedExtractionTool)

createSchemaTest uses 'extract_data' but ToolRegistry.registerToolFactory('extract_data', () => new SchemaBasedExtractorTool()) — CombinedExtractionTool is named 'navigate_url_and_extraction' and is not bound to 'extract_data'.

Files to inspect/fix:

  • front_end/panels/ai_chat/evaluation/test-cases/schema-extractor-tests.ts (uses 'extract_data')
  • front_end/panels/ai_chat/agent_framework/implementation/ConfiguredAgents.ts (registerToolFactory('extract_data', ...))
  • front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts (name = 'extract_data')
  • front_end/panels/ai_chat/tools/CombinedExtractionTool.ts (name = 'navigate_url_and_extraction')

Resolution options:

  • Update tests to use 'navigate_url_and_extraction' if CombinedExtractionTool is intended, or
  • Change ConfiguredAgents.ts to register CombinedExtractionTool under 'extract_data' and verify CombinedExtractionTool's argument/response contract matches the test expectations.
config/gni/devtools_grd_files.gni (1)

661-676: Missing LLMConfigurationManager in GRD may break imports at runtime

AI summary and PR context reference LLMConfigurationManager, but it isn't listed here. If any frontend code imports it, the GRD omission will 404 in DevTools.

Add it alongside the other ai_chat core files:

   "front_end/panels/ai_chat/core/AgentErrorHandler.js",
+  "front_end/panels/ai_chat/core/LLMConfigurationManager.js",
   "front_end/panels/ai_chat/core/AgentDescriptorRegistry.js",
front_end/panels/ai_chat/ui/ChatView.ts (1)

1015-1032: Avoid navigating DevTools UI with window.location; open in new tab via host API

Using window.location.href navigates the DevTools front-end away from DevTools. Use InspectorFrontendHost to open external links safely.

Apply:

+import * as Host from '../../../core/host/host.js';
@@
-    // Navigate to OpenAI API keys page in current window
-    window.location.href = 'https://platform.openai.com/settings/organization/api-keys';
+    // Open in a new tab to avoid navigating DevTools UI
+    Host.InspectorFrontendHost.InspectorFrontendHostInstance.openInNewTab(
+      'https://platform.openai.com/settings/organization/api-keys'
+    );

Optional: prefer DevTools settings storage over raw localStorage for provider selection.

front_end/panels/ai_chat/ui/__tests__/AIChatPanel.test.ts (1)

194-209: Fix assertion: Map.get returns undefined, not null.

Your mock localStorage uses a Map; after delete, Map.get(...) returns undefined. The current strictEqual(null) will fail.

Apply:

-      // Should clear mini model (falls back to main model)
-      assert.strictEqual(mockLocalStorage.get('ai_chat_mini_model'), null);
+      // Should clear mini model (falls back to main model)
+      assert.strictEqual(mockLocalStorage.get('ai_chat_mini_model'), undefined);
+      // Or, if you want to assert via the Storage API semantics:
+      assert.strictEqual(window.localStorage.getItem('ai_chat_mini_model'), null);
front_end/panels/ai_chat/core/ToolNameMap.ts (1)

46-57: Potential spec violation: collision resolution can exceed 64 chars.

After appending - and _, names can exceed 64 even if sanitize() truncated. This violates the provider regex and may be rejected.

Apply:

   let candidate = sanitize(original);
   if (sanitizedToOriginal.has(candidate) && sanitizedToOriginal.get(candidate) !== original) {
     const suffix = shortHash(original);
-    const base = candidate.replace(/_+$/g, '');
-    candidate = base + '-' + suffix;
+    const base = candidate.replace(/_+$/g, '');
+    // Ensure total length <= 64 after adding "-<suffix>"
+    const maxBase = Math.max(1, 64 - 1 - suffix.length);
+    candidate = (base.length > maxBase ? base.slice(0, maxBase) : base) + '-' + suffix;
   }
   let unique = candidate;
   let counter = 1;
   while (sanitizedToOriginal.has(unique) && sanitizedToOriginal.get(unique) !== original) {
-    const add = `_${counter++}`;
-    unique = candidate + add;
+    const add = `_${counter++}`;
+    // Ensure total length <= 64 when adding numeric suffix
+    const maxCandidate = Math.max(1, 64 - add.length);
+    unique = candidate.length > maxCandidate ? candidate.slice(0, maxCandidate) + add : candidate + add;
   }

Also consider logging when post-collision trimming occurs for observability.

front_end/panels/ai_chat/ui/AIChatPanel.ts (5)

53-64: Stop logging secret previews in StorageMonitor (severe secret leakage).

Current logs include value previews/lengths for keys that may contain API secrets. Remove them.

-    localStorage.setItem = (key: string, value: string) => {
-      if (key.includes('openrouter') || key.includes('ai_chat')) {
-        logger.debug(`=== LOCALSTORAGE SET ===`);
-        logger.debug(`Key: ${key}`);
-        logger.debug(`Value exists: ${!!value}`);
-        logger.debug(`Value length: ${value?.length || 0}`);
-        logger.debug(`Value preview: ${value?.substring(0, 50) + (value?.length > 50 ? '...' : '') || 'null'}`);
-        logger.debug(`Timestamp: ${new Date().toISOString()}`);
-      }
-      return this.originalSetItem(key, value);
-    };
+    localStorage.setItem = (key: string, value: string) => {
+      if (key.includes('openrouter') || key.includes('ai_chat')) {
+        logger.debug(`=== LOCALSTORAGE SET ===`);
+        logger.debug(`Key: ${key}`);
+        logger.debug(`Timestamp: ${new Date().toISOString()}`);
+      }
+      return this.originalSetItem(key, value);
+    };

1071-1091: Remove API key presence/length logging on OAuth success.

Avoid logging any characteristics of secrets.

-    window.addEventListener('openrouter-oauth-success', async () => {
+    window.addEventListener('openrouter-oauth-success', async () => {
       logger.info('=== OAUTH SUCCESS EVENT RECEIVED IN AICHATPANEL ===');
-      logger.info('Timestamp:', new Date().toISOString());
-      logger.info('Current localStorage state for OpenRouter:');
-      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
-      const authMethod = localStorage.getItem('openrouter_auth_method');
-      logger.info('- API key exists:', !!apiKey);
-      logger.info('- API key length:', apiKey?.length || 0);
-      logger.info('- Auth method:', authMethod);
+      logger.info('Timestamp:', new Date().toISOString());
+      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');

1104-1115: Avoid logging secret lengths on storage change events.

-      if (event.key === 'ai_chat_openrouter_api_key' || 
+      if (event.key === 'ai_chat_openrouter_api_key' || 
           event.key === 'openrouter_auth_method') {
         logger.info('=== STORAGE CHANGE EVENT FOR OPENROUTER ===');
         logger.info('Changed key:', event.key);
-        logger.info('Old value exists:', !!event.oldValue);
-        logger.info('New value exists:', !!event.newValue);
-        logger.info('New value length:', event.newValue?.length || 0);
         logger.info('Re-initializing agent service after storage change...');
         this.#initializeAgentService();
       }

1144-1154: Remove API key presence/length logs from refreshCredentials.

-    logger.info('Current OpenRouter storage state:');
-    const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
-    const authMethod = localStorage.getItem('openrouter_auth_method');
-    logger.info('- API key exists:', !!apiKey);
-    logger.info('- API key length:', apiKey?.length || 0);
-    logger.info('- Auth method:', authMethod);
+    // Redacted OpenRouter storage state (do not log secrets)

1532-1541: Do not log API key prefixes.

Even partial keys are sensitive.

-        logger.info('Retrieved API key:');
-        logger.info('- Exists:', !!apiKey);
-        logger.info('- Length:', apiKey?.length || 0);
-        logger.info('- Prefix:', apiKey?.substring(0, 8) + '...' || 'none');
+        logger.info('Retrieved API key (redacted):', { exists: !!apiKey });
front_end/panels/ai_chat/agent_framework/AgentRunner.ts (1)

889-902: Complex handoff logic with nested session handling could lead to lost sessions

The handoff execution for LLM-triggered tools has complex session management that could result in orphaned or incorrectly linked sessions if an error occurs during handoff.

Consider extracting the handoff session management into a separate, well-tested utility function to reduce complexity and potential for errors.

Comment on lines +12 to 14
tool: "extract_data"
timeout: 45000

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

🧩 Analysis chain

Tool rename to "extract_data" and longer timeout look good; verify end‑to‑end support.

Confirm server/tool registry, schemas, and tests accept "extract_data" (no lingering "extract_schema_data" usage).


🏁 Script executed:

#!/bin/bash
# Check for any remaining references to the old tool name
rg -nP '\bextract_schema_data\b' -C2

# Spot YAML evals still using the old tool field
rg -nP --type=yaml -n -C2 '\btool:\s*"?extract_schema_data"?'

# Sanity-check that the new name is present in configs and schema enums
rg -nP -n -C2 '\btool:\s*"?extract_data"?'
rg -nP -n -C3 '"evaluation"\s*:.*\{[^}]*"tool"[^}]*\benum\b[^}]*\bextract_data\b'

Length of output: 9746


Replace remaining "extract_schema_data" references with "extract_data" and confirm server/tool registry + tests accept the new name.

Docs/examples still use the old tool name while eval YAMLs/templates use "extract_data" — update the docs/snippets and ensure any server enums/switches or tests accept "extract_data".

Affected files (examples found):

  • eval-server/nodejs/docs/PROTOCOL.md
  • eval-server/nodejs/docs/YAML_SCHEMA.md
  • eval-server/nodejs/docs/CLIENT_SETUP.md
  • eval-server/nodejs/docs/TRIGGERING_EVALUATIONS.md
🤖 Prompt for AI Agents
In eval-server/nodejs/evals/schema-extractor/google-search-001.yaml around lines
12 to 14, the YAML uses the new tool name "extract_data" but other docs and code
still reference "extract_schema_data"; search and replace all remaining
"extract_schema_data" occurrences with "extract_data" in the listed docs
(eval-server/nodejs/docs/PROTOCOL.md, YAML_SCHEMA.md, CLIENT_SETUP.md,
TRIGGERING_EVALUATIONS.md) and update any server-side registries/enums/switch
statements and unit/integration tests that validate tool names to accept
"extract_data" (or map the old name to the new one). After changes, run the test
suite and CI to ensure no references remain and the new name is accepted.

- "Coordinated property type selection through action_agent"
- "Applied search filters through proper action_agent calls"
- "Extracted property listings with schema_based_extractor"
- "Extracted property listings with extract_data"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

🧩 Analysis chain

Ensure repo-wide extractor rename alignment.

Double-check that client schema/tool enums and templates now use extract_data.

Run:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Find lingering old extractor names..."
rg -n -C1 -g '!**/node_modules/**' -e 'schema_based_extractor' -e 'extract_schema_data' || true

echo
echo "Validate schema/tool enums reference extract_data:"
fd default-client.yaml | xargs -I{} rg -n -C1 'extract_data' {}
fd client.schema.json | xargs -I{} rg -n -C2 'extract_data|extract_schema_data|schema_based_extractor' {}

Length of output: 465


Ensure repo-wide extractor rename alignment

default-client.yaml already uses "extract_data" (lines 29–31) but client.schema.json still enumerates "extract_schema_streamlined" alongside "extract_data" (lines 131–135); update/remove the legacy enum value and align any templates/libs to "extract_data", then run a repo-wide search for other old extractor names.

🤖 Prompt for AI Agents
In eval-server/nodejs/evals/web-task-agent/realestate-001.yaml around line 52
the extractor name is "extract_data" but client.schema.json (around lines
131–135) still includes the legacy enum value "extract_schema_streamlined";
remove that legacy enum entry from client.schema.json (leaving only
"extract_data"), update any templates or library references that still use
"extract_schema_streamlined" to "extract_data", and run a repo-wide
search/replace to ensure all extractor names are aligned and update tests or
docs if required.

- "Coordinated property type selection through action_agent"
- "Applied search filters through proper action_agent calls"
- "Extracted property listings with schema_based_extractor"
- "Extracted property listings with extract_data"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

🧩 Analysis chain

Consistency check: fully migrate to extract_data across configs/schemas.

Ensure no lingering references to schema_based_extractor or extract_schema_data remain in evals, templates, or schemas.

Run to verify:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for old extractor names..."
rg -n -C1 -g '!**/node_modules/**' -e 'schema_based_extractor' -e 'extract_schema_data' || true

echo
echo "Confirming extract_data is used for evaluation.tool values..."
rg -n -C1 -g '!**/node_modules/**' -e 'tool:\s*"extract_data"' -e '"tool"\s*:\s*"extract_data"' || true

echo
echo "Check client schema enum still doesn't reference old names..."
fd client.schema.json | xargs -I{} rg -n -C2 -e 'extract_schema_data|schema_based_extractor|extract_data' {}

Length of output: 629


Consistency check — fully migrate to extract_data across configs/schemas.
No matches for schema_based_extractor or extract_schema_data, but client.schema.json's enum still contains "extract_schema_streamlined" alongside "extract_data" — remove/rename that legacy entry so only "extract_data" remains.
client.schema.json (enum, ~lines 131–135)

🤖 Prompt for AI Agents
In eval-server/nodejs/evals/web-task-agent/web-task-agent-realestate-001.yaml
around line 52 and in client.schema.json around lines 131–135, the enum still
contains the legacy value "extract_schema_streamlined" alongside "extract_data";
remove (or rename) that legacy enum entry so the enum only includes
"extract_data", and scan the repo/configs for any remaining references to
"extract_schema_streamlined" (rename them to "extract_data" or delete) to
complete the migration to extract_data.

wait_timeout: 5000

tool: "extract_schema_data"
tool: "extract_data"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

🧩 Analysis chain

Potential schema key mismatch with new tool name

You switched to tool: "extract_data" but input still uses “schema” (not “extraction_schema” like other configs). If the new tool expects extraction_schema, this example will fail.

  • Confirm the expected input key for extract_data.
  • If needed, update this template to:
    • input.extraction_schema: <schema…>
    • Or ensure the tool supports both keys.

Also check validator type naming consistency (“llm-judge” vs “llm_judge” used elsewhere).

Run:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Search extract_data handlers:"
rg -n -C3 -S 'extract_data'
echo
echo "Check llm[-_]judge variants:"
rg -n -C1 -S '\bllm[_-]judge\b' eval-server

Length of output: 101830


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "=== SchemaBasedExtractorTool top ==="
sed -n '1,260p' front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts || true

echo
echo "=== Search for related arg/interface names and any 'extraction_schema' usage ==="
rg -n --hidden --no-ignore 'SchemaExtractionArgs|SchemaExtractionResult|extraction_schema|extractionSchema' || true

echo
echo "=== client.schema.json (start) ==="
sed -n '1,260p' eval-server/nodejs/schemas/client.schema.json || true

echo
echo "=== Search for 'schema' keys in eval-server templates/evals to confirm usage ==="
rg -n --hidden -S --glob 'eval-server/**' '\binput:\s*$' -n -C1 || true
rg -n --hidden -S --glob 'eval-server/**' '\bschema\b' -n -C1 || true

Length of output: 87952


No change for extract_data input; fix inconsistent llm-judge naming

  • extract_data expects input.schema (SchemaBasedExtractorTool requires a "schema") — default-client.yaml's input.schema is correct (see front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts and eval-server/nodejs/templates/default-client.yaml).
  • Validation type is inconsistent: client.schema.json and EvalServer.js expect validation.type == "llm-judge" (hyphen) but many eval YAMLs use "llm_judge" (underscore) — e.g., files under eval-server/nodejs/evals/** (streamlined-schema-extractor/*.yaml). Normalize validation.type to "llm-judge" across YAMLs or update eval-server/nodejs/schemas/client.schema.json and eval-server/nodejs/src/lib/EvalServer.js to accept "llm_judge" as well.
🤖 Prompt for AI Agents
In eval-server/nodejs/templates/default-client.yaml around line 30, there's a
mismatch in validation.type naming: some YAMLs use "llm_judge" (underscore)
while client.schema.json and EvalServer.js expect "llm-judge" (hyphen). Fix by
normalizing to "llm-judge" across all eval YAMLs under
eval-server/nodejs/evals/** (replace "llm_judge" with "llm-judge");
alternatively, if you prefer to accept the underscore, update
eval-server/nodejs/schemas/client.schema.json and
eval-server/nodejs/src/lib/EvalServer.js to allow both "llm-judge" and
"llm_judge" (e.g., accept either value in validation.type) and then run tests to
ensure validation passes.

Comment on lines +556 to +560
if (abortSignal?.aborted) {
logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`);
const abortResult = createErrorResult('Execution was cancelled', messages, 'error');
return { ...abortResult, agentSession: currentSession };
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Early abort check doesn't clean up properly

When execution is aborted, the function returns immediately without proper cleanup or status updates for the agent session.

 // Check if execution has been aborted
 if (abortSignal?.aborted) {
   logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`);
+  // Update session status before returning
+  currentSession.status = 'error';
+  currentSession.endTime = new Date();
+  currentSession.terminationReason = 'cancelled';
   const abortResult = createErrorResult('Execution was cancelled', messages, 'error');
   return { ...abortResult, agentSession: currentSession };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (abortSignal?.aborted) {
logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`);
const abortResult = createErrorResult('Execution was cancelled', messages, 'error');
return { ...abortResult, agentSession: currentSession };
}
if (abortSignal?.aborted) {
logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`);
// Update session status before returning
currentSession.status = 'error';
currentSession.endTime = new Date();
currentSession.terminationReason = 'cancelled';
const abortResult = createErrorResult('Execution was cancelled', messages, 'error');
return { ...abortResult, agentSession: currentSession };
}

*/
export class SchemaBasedExtractorTool implements Tool<SchemaExtractionArgs, SchemaExtractionResult> {
name = 'extract_schema_data';
name = 'extract_data';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Rename to 'extract_data' is consistent with config changes.

Good alignment with YAML/tooling.


🏁 Script executed:

#!/bin/bash
# Ensure no references to the old name linger in code/JSON/YAML/tests
rg -nP '\bextract_schema_data\b' -C2

Length of output: 5869


Rename to 'extract_data' — update remaining docs referencing 'extract_schema_data'

Change in SchemaBasedExtractorTool.ts is correct, but search shows many occurrences of "extract_schema_data" in eval-server/nodejs/docs (YAML_SCHEMA.md, CLIENT_SETUP.md, PROTOCOL.md, TRIGGERING_EVALUATIONS.md). Update those docs or add a compatibility alias so clients aren’t broken.

🤖 Prompt for AI Agents
In front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts around line 39,
the tool name was changed to 'extract_data' but many docs and clients still
reference 'extract_schema_data'; update all occurrences in
eval-server/nodejs/docs (YAML_SCHEMA.md, CLIENT_SETUP.md, PROTOCOL.md,
TRIGGERING_EVALUATIONS.md) to 'extract_data' and/or add a compatibility alias so
old clients continue working: either update documentation examples and protocol
specs to the new name, and add server-side handling that accepts
'extract_schema_data' and maps it to 'extract_data' (or vice versa) to avoid
breaking clients while rolling out the rename.

Comment on lines +183 to 186
// Add delay before retry to prevent overloading the LLM (abortable)
if (attempt > 1) {
await new Promise(resolve => setTimeout(resolve, this.RETRY_DELAY_MS));
await this.sleep(this.RETRY_DELAY_MS, ctx?.abortSignal);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

🧩 Analysis chain

LGTM: retry delay is now abortable; consider propagating abort to LLM calls too

Nice improvement. Also pass ctx.abortSignal into callLLMWithTracing (if supported) so in‑flight calls cancel promptly.

// If callLLMWithTracing supports it, add:
// options: { retryConfig: {...}, signal: ctx?.abortSignal }

Run to verify types and support:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Check LLMContext includes abortSignal
rg -n -C3 -P '(interface|type)\s+LLMContext\b'
rg -n -C2 '\babortSignal\b' | head -n 50
# Inspect callLLMWithTracing signature/options for 'signal' support
rg -n -C5 -P 'export\s+(async\s+)?function\s+callLLMWithTracing\b|callLLMWithTracing\('

Length of output: 20076


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- front_end/panels/ai_chat/tools/Tools.ts (around LLMContext) ----"
sed -n '1,240p' front_end/panels/ai_chat/tools/Tools.ts || true

echo
echo "---- front_end/panels/ai_chat/tools/LLMTracingWrapper.ts (full) ----"
sed -n '1,400p' front_end/panels/ai_chat/tools/LLMTracingWrapper.ts || true

echo
echo "---- front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts (around call sites) ----"
sed -n '1,520p' front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts || true

Length of output: 29124


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for LLMCallOptions and LLMClient..."
rg -n --hidden -S "LLMCallOptions" || true
rg -n --hidden -S "interface LLMClient\b|class LLMClient\b|export class LLMClient" || true

echo
echo "---- front_end/panels/ai_chat/LLM/LLMTypes.ts (if exists) ----"
sed -n '1,240p' front_end/panels/ai_chat/LLM/LLMTypes.ts || true

echo
echo "---- front_end/panels/ai_chat/LLM/LLMClient.ts (if exists) ----"
sed -n '1,400p' front_end/panels/ai_chat/LLM/LLMClient.ts || true

Length of output: 19739


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  "front_end/panels/ai_chat/LLM/LLMProvider.ts"
  "front_end/panels/ai_chat/LLM/OpenAIProvider.ts"
  "front_end/panels/ai_chat/LLM/OpenRouterProvider.ts"
  "front_end/panels/ai_chat/LLM/LiteLLMProvider.ts"
  "front_end/panels/ai_chat/LLM/GroqProvider.ts"
)

for f in "${files[@]}"; do
  echo "---- $f ----"
  sed -n '1,420p' "$f" || true
  echo
done

echo "---- Search for callWithMessages definitions/usages ----"
rg -n --hidden "callWithMessages\(" || true

Length of output: 56564


Propagate AbortSignal through LLM call stack

sleep(...) is already abortable via ctx.abortSignal, but callLLMWithTracing won't cancel in‑flight LLM requests: LLMCallOptions (LLMTypes.ts) has no signal and providers don't pass an AbortSignal to fetch. Action: add signal?: AbortSignal to LLMCallOptions, accept/forward options.signal in callLLMWithTracing -> LLMClient.call -> provider.callWithMessages, and pass { signal: options.signal } to fetch in provider implementations (OpenAIProvider.ts, LiteLLMProvider.ts, OpenRouterProvider.ts, GroqProvider.ts). Then update call sites (e.g. front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts) to include options: { retryConfig: {...}, signal: ctx?.abortSignal } so LLM calls abort promptly.

🤖 Prompt for AI Agents
In front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts around lines
183-186, the code awaits an abortable sleep but does not propagate
ctx.abortSignal into the LLM call stack so in‑flight LLM requests cannot be
cancelled; add support for AbortSignal by adding signal?: AbortSignal to
LLMCallOptions (LLMTypes.ts), update callLLMWithTracing to accept and forward
options.signal into LLMClient.call, modify LLMClient.call to forward
options.signal to provider.callWithMessages, update each provider implementation
(OpenAIProvider.ts, LiteLLMProvider.ts, OpenRouterProvider.ts, GroqProvider.ts)
to accept an optional signal and pass { signal: options.signal } into fetch/use
of HTTP clients, and finally update call sites (including this file) to include
options: { retryConfig: {...}, signal: ctx?.abortSignal } so LLM requests are
aborted promptly when the context is cancelled.

Comment on lines +2124 to +2136
#onMCPConnectorsClick(): void {
MCPConnectorsCatalogDialog.show({
onClose: async () => {
// Refresh MCP registry when catalog is closed in case new connectors were added
try {
await MCPRegistry.init(true);
logger.info('MCP registry refreshed after catalog closed');
} catch (error) {
logger.error('Failed to refresh MCP registry after catalog closed', error);
}
}
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Confirm MCP reinit semantics on catalog close.

You call MCPRegistry.init(true) but not refresh(); other places do both. Ensure init(true) covers refresh or call refresh() here too.


🏁 Script executed:

#!/bin/bash
# Inspect MCPRegistry.init to verify whether it performs refresh and what it returns.
rg -n -C3 --type=ts 'class\s+MCPRegistry|static\s+init\s*\(|init\s*\('

Length of output: 23632


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="front_end/panels/ai_chat/mcp/MCPRegistry.ts"
echo "== File and lines around init =="
nl -ba "$file" | sed -n '420,520p'
echo
echo "== Search for refresh definition/calls in file =="
rg -n --hidden --no-ignore -C2 '\brefresh\s*\(' "$file" || true
echo
echo "== Show surrounding lines for any refresh definition (if found) =="
rg -n --hidden --no-ignore '\basync\s+refresh\s*\(|\brefresh\s*:\s*async\b' "$file" -n -C3 || true

Length of output: 73


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="front_end/panels/ai_chat/mcp/MCPRegistry.ts"

echo "== init region (420-500) =="
awk 'NR>=420 && NR<=500{printf "%6d: %s\n",NR,$0}' "$file" || true
echo
echo "== lines with 'refresh(' (with context) =="
rg -n -C3 '\brefresh\s*\(' "$file" || true
echo
echo "== possible 'async refresh' definitions =="
rg -n '\basync\s+refresh\s*\(' "$file" || true
echo
echo "== fallback: full file with numbers (first 900 lines) =="
awk '{printf "%6d: %s\n",NR,$0}' "$file" > /tmp/mcprestxt || true
sed -n '1,900p' /tmp/mcprestxt || true

Length of output: 31635


Call MCPRegistry.refresh() after init(true) in the catalog onClose handler.

init(...) only connects and runs refreshToolsForServer (it lists tools) — refresh() is required to register tools with ToolRegistry. Update front_end/panels/ai_chat/ui/AIChatPanel.ts (onClose handler ~lines 2126–2132) to call await MCPRegistry.refresh() after await MCPRegistry.init(true).

🤖 Prompt for AI Agents
In front_end/panels/ai_chat/ui/AIChatPanel.ts around lines 2124 to 2136, the
onClose handler currently calls await MCPRegistry.init(true) but does not call
MCPRegistry.refresh(), so tools are not registered with the ToolRegistry; update
the try block to await MCPRegistry.init(true) followed immediately by await
MCPRegistry.refresh(), and then log the success (e.g., "MCP registry refreshed
after catalog closed") in the same try; keep the existing catch to log errors.

Comment on lines +1 to 4
The MIT License (MIT)

Copyright (c) 2024 Anthropic, PBC
Copyright (c) 2015-2017 Evgeny Poberezkin

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

License mismatch: content appears to be Ajv’s MIT license, not MCP SDK’s.

This is a compliance risk. Restore the correct MCP SDK license here and place Ajv’s license under the Ajv vendored path (or avoid vendoring Ajv docs). Mixing third‑party licenses under a misleading path can cause attribution errors.


🏁 Script executed:

#!/bin/bash
# Show indicators that this directory contains Ajv content rather than MCP SDK
head -n 3 front_end/third_party/mcp-sdk/README.md || true
rg -n 'Ajv' front_end/third_party/mcp-sdk -C2 || true
rg -n 'Anthropic|Model Context Protocol|MCP' front_end/third_party/mcp-sdk -C2 || true

Length of output: 284608


License mismatch — restore MCP SDK license and relocate Ajv's MIT

front_end/third_party/mcp-sdk contains Ajv sources (README.md, lib/, dist/, ajv/) but front_end/third_party/mcp-sdk/LICENSE is Ajv’s MIT (Evgeny Poberezkin). Replace that file with the correct MCP SDK license, and move Ajv’s MIT into the Ajv vendored path (e.g. front_end/third_party/mcp-sdk/ajv/LICENSE or front_end/third_party/ajv/LICENSE) — or stop vendoring Ajv docs.

Comment on lines +136 to +143
const { code, state } = this.parseAuthCallback(url);
const expected = sessionStorage.getItem(this.STATE_KEY);
if (expected && state && expected !== state) {
throw new Error('oauth_invalid_state');
}
// Cleanup state now that we used it
sessionStorage.removeItem(this.STATE_KEY);
// Return user to original page
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enforce OAuth state validation when expected is set.

If a stored state exists, require an exact match; missing state should be rejected. Prevents CSRF on the redirect.

-          const expected = sessionStorage.getItem(this.STATE_KEY);
-          if (expected && state && expected !== state) {
-            throw new Error('oauth_invalid_state');
-          }
-          // Cleanup state now that we used it
-          sessionStorage.removeItem(this.STATE_KEY);
+          const expected = sessionStorage.getItem(this.STATE_KEY);
+          if (expected) {
+            if (!state || expected !== state) {
+              throw new Error('oauth_invalid_state');
+            }
+            // Cleanup state only after successful validation
+            sessionStorage.removeItem(this.STATE_KEY);
+          }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { code, state } = this.parseAuthCallback(url);
const expected = sessionStorage.getItem(this.STATE_KEY);
if (expected && state && expected !== state) {
throw new Error('oauth_invalid_state');
}
// Cleanup state now that we used it
sessionStorage.removeItem(this.STATE_KEY);
// Return user to original page
const { code, state } = this.parseAuthCallback(url);
const expected = sessionStorage.getItem(this.STATE_KEY);
if (expected) {
if (!state || expected !== state) {
throw new Error('oauth_invalid_state');
}
// Cleanup state only after successful validation
sessionStorage.removeItem(this.STATE_KEY);
}
// Return user to original page
🤖 Prompt for AI Agents
In front_end/third_party/mcp-sdk/MCPOAuthProvider.ts around lines 136 to 143,
the OAuth state validation only rejects when both expected and state exist but
differ; update the logic so that if a stored expected state exists it must be
present in the callback and must match exactly. Replace the current conditional
with a check that if (expected) then if (!state || expected !== state) throw new
Error('oauth_invalid_state');, then proceed to remove the STATE_KEY and continue
returning the user.

@olesho olesho closed this Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant