Local Dockerised Eval Server #50

olesho · 2025-09-18T14:01:42Z

Summary by CodeRabbit

New Features
- Centralized LLM configuration manager with UI API, per-request overrides, cross-tab sync, and a runtime JSON‑RPC configure_llm.
Improvements
- Multi-provider support with nested model tiers (main/mini/nano), per-request api_key/endpoint, automated mode defaults/bypass, credential refresh, cancellation support, richer agent descriptors and tracing, MCP connectors UI and provider integrations, and tool name/behavior updates (e.g., unified extractor name).
Documentation
- Added design docs, provider docs, .env example, and runtime/examples for multi-provider evaluation flows.

coderabbitai · 2025-09-18T14:01:52Z

Walkthrough

Adds a centralized frontend LLMConfigurationManager, a configure_llm JSON‑RPC on the eval server, nested/per-request and persistent LLM model/provider overrides, automated-mode credential bypasses, cross‑tab sync, examples/docs/config updates, and wiring so evaluations use the unified configuration and per-client overrides.

Changes

Cohort / File(s)	Summary of Changes
Design & Docs `MODEL-CONFIGS.md`, `eval-server/nodejs/CLAUDE.md`	New LLM config design doc; CLAUDE.md expanded to document multi‑provider support, configure_llm JSON‑RPC and nested per‑tier model configuration.
Server env & config `eval-server/nodejs/.env.example`, `eval-server/nodejs/src/config.js`	Adds .env template and expands server `CONFIG` with multi-provider creds (openai, litellm, groq, openrouter), defaults (provider/main/mini/nano), auth secret, and clients/evals dirs.
Eval server RPC & API `eval-server/nodejs/src/lib/EvalServer.js`, `eval-server/nodejs/src/api-server.js`	Adds JSON‑RPC handling (`configure_llm`), per-connection `llmConfig` storage/merge, nested model processing helpers, wiring to pass nested `model` into evaluations, and `getClientManager()` export.
Server examples `eval-server/nodejs/examples/*`	Examples import `CONFIG`, introspect providers, demonstrate per‑eval model selection across providers, adjust ports, and show automated/no‑auth flows.
Frontend build & entry `front_end/panels/ai_chat/BUILD.gn`	Adds many new modules to ai_chat build (including `core/LLMConfigurationManager.ts`, agent implementations, MCP dialogs, and MCP SDK v2 entries).
LLM configuration manager `front_end/panels/ai_chat/core/LLMConfigurationManager.ts`	New singleton `LLMConfigurationManager` and `LLMConfig` interface: localStorage-backed config, per‑request overrides, persistence, listeners, cross‑tab sync, validation and redacted logging.
Agent wiring & execution `front_end/panels/ai_chat/core/AgentService.ts`, `.../AgentNodes.ts`, `.../ConfigurableAgentTool.ts`, `agent_framework/AgentRunner.ts`	Replace ad-hoc localStorage reads with manager APIs; provider array construction (openai/litellm/groq/openrouter); config-change subscription and `refreshCredentials()`; execution abort support, execution registry (AbortController), propagate agentDescriptor metadata through traces and tools.
Evaluation protocol & agent `front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts`, `.../EvaluationAgent.ts`, `evaluation/EvaluationAgent.ts`	Adds `LLMConfigurationRequest/Response` types and helpers; extends `EvaluationParams.model` with `api_key`/`endpoint`; applies per-request overrides (nested or flat) via manager during evaluate/chat flows; adds handler to persist configuration; automated-mode auth bypass.
Frontend UI & config `front_end/panels/ai_chat/ui/AIChatPanel.ts`, `front_end/panels/ai_chat/common/EvaluationConfig.ts`	Integrates manager into AIChatPanel, adds `setLLMConfiguration`, reads models/provider from manager, updates WS default to `ws://localhost:8082`, and conditions secret handling for automated mode.
MCP & tooling surface `front_end/panels/ai_chat/mcp/*`, `core/ToolSurfaceProvider.ts`, `core/ToolNameMap.ts`	Major MCP v2 integration: new MCPConfig/MCPRegistry/MCP dialogs, provider/token management, per-server connection/retry/events, async tool registration, LLM-based MCP tool router, tool name mapping changes and tests.
Tools & extractors `front_end/panels/ai_chat/tools/`, `eval-server/nodejs/evals/schema-extractor/`, `schemas/client.schema.json`	Rename extractor tool id `extract_schema_data` → `extract_data` across tools, evals, schemas, and tests; add abortable/sleep helpers and thread AbortSignal through tools.
Agent descriptors & metadata `front_end/panels/ai_chat/core/AgentDescriptorRegistry.ts`, `core/BaseOrchestratorAgent.ts`	New AgentDescriptorRegistry for computing/caching agent descriptors (promptHash, toolsetHash); orchestrator/agent configs include optional `version`; descriptor used to enrich traces and evaluation metadata.
Tests & examples `front_end/panels/ai_chat/*/__tests__/`, `eval-server/nodejs/examples/*`	New and updated tests for ToolNameMap, MCP integration/registration/config, ToolExecutorNode; many import path updates; examples expanded for multi-provider demos.
Misc frontend changes `core/StateGraph.ts`, `core/Types.ts`, `core/ConfigurableGraph.ts`, `core/State.ts`, `core/Logger.ts`, UI components, CSS	Add AbortSignal support across graphs/types/nodes, public Logger.isDevelopment(), input APIs (InputBar.setInputValue/clearInput), UI additions (MCP dialogs, chat example chips), and version bump.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant UI as Frontend (AIChatPanel)
  participant CM as LLMConfigurationManager
  participant Eval as EvalServer (Node)
  Note over UI,Eval: Persistent runtime configuration (configure_llm)
  UI->>Eval: JSON‑RPC configure_llm { provider, apiKey?, endpoint?, models, partial? }
  Eval->>Eval: validate + merge per‑connection llmConfig
  Eval-->>UI: JSON‑RPC response { status: "success", appliedConfig }
  UI->>CM: saveConfiguration(appliedConfig)
  UI->>UI: AgentService.refreshCredentials()

sequenceDiagram
  autonumber
  participant Client as Frontend Client
  participant CM as LLMConfigurationManager
  participant Eval as EvalServer
  Note over Client,Eval: Per-request model override during evaluate/chat
  Client->>CM: setOverride({ provider, mainModel, miniModel?, nanoModel?, apiKey?, endpoint? })
  Client->>Eval: evaluate { model: nested-or-flat config }
  Eval->>Eval: merge connection llmConfig + eval overrides -> nested modelConfig
  Eval-->>Client: evaluation result
  Client->>CM: clearOverride()

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Fix the model config issue, OpenRouter Oauth issue and model refresh … #45 — overlaps on AIChatPanel, AgentService, AgentNodes and model/config wiring; strong code-level overlap.
Add initial MCP version with SDK imports #44 — touches MCP integration surfaces (ToolSurfaceProvider, MCPRegistry, ToolNameMap) that this PR expands.
Feat/automation build #48 — introduces BUILD_CONFIG.AUTOMATED_MODE flag and related automated-mode gating used throughout these changes.

Suggested reviewers

tysonthomas9

Poem

In burrows of bytes I nudge config to bloom,
Models line like carrots—main, mini, nano in room.
A configure_llm hop, JSON‑RPC sings clear,
Overrides flutter, automated mode near.
Logs glow like clover—evals leap, joyful cheer. 🐇🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.30% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Local Dockerised Eval Server" references a real part of this changeset (multiple eval-server files and configuration were modified), so it is related to the PR, but it only captures a subset of the work: the diff also contains extensive front-end and protocol changes (LLMConfigurationManager, MCP UI/connectors, agent descriptors, tool renames, and JSON-RPC configure_llm support) that the title does not reflect, making it a partial summary rather than a complete description.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/local-eval-api-try3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-09-18T14:02:15Z

Claude finished @olesho's task —— View job

PR Review: Local Dockerised Eval Server

I've completed a comprehensive review of this substantial PR implementing a local dockerized evaluation server with advanced LLM configuration management. This is an impressive addition with solid architecture, though there are several areas for improvement.

Review Progress:

Analyze core eval server changes
Review LLM configuration management
Check security implementations
Evaluate error handling and validation
Assess performance considerations
Review test coverage needs
Provide comprehensive feedback

🎯 Overall Assessment

This PR introduces a sophisticated evaluation server system with excellent separation of concerns and flexible configuration management. The implementation demonstrates strong architectural decisions with room for security and robustness improvements.

✅ Strengths

Architecture & Design

Excellent modular architecture with clear separation between eval server, agent service, and LLM management
Flexible LLM configuration system supporting both manual and automated modes with override capabilities
Comprehensive protocol design with proper JSON-RPC implementation and type safety
Well-structured tracing integration for observability

Code Quality

Consistent error handling patterns across most components
Good logging implementation with appropriate detail levels
Proper WebSocket connection management with heartbeat and reconnection logic
Comprehensive model configuration documentation (MODEL-CONFIGS.md)

🔍 Areas for Improvement

Critical Security Issues

1. API Key Exposure Risk (api-server.js:349, EvaluationAgent.ts:377-392)

// ISSUE: API keys logged at info level
logger.info('Setting early configuration override', {
  hasApiKey: !!apiKey,
  apiKeyLength: apiKey?.length  // Should not log length
});

Recommendation: Never log API key lengths or related metadata. Use debug level for configuration details.

2. Insufficient Input Validation (api-server.js:263-264)

if (!requestBody.input || typeof requestBody.input !== 'string') {
  throw new Error('Missing or invalid "input" field. Expected a string.');
}

Recommendation: Add comprehensive validation for all request parameters, including size limits and sanitization.

3. CORS Security (api-server.js:61)

res.setHeader('Access-Control-Allow-Origin', '*');

Recommendation: Replace wildcard CORS with specific allowed origins for production use.

Memory & Performance Concerns

4. Potential Memory Leaks (EvalServer.js:117-123)

for (const [clientId, connection] of this.connectedClients) {
  connection.rpcClient.cleanup();
  // Missing: Remove event listeners and clear timers
}

Recommendation: Ensure all event listeners, intervals, and timeouts are properly cleaned up.

5. Missing Request Size Limits (api-server.js:88-91)

for await (const chunk of req) {
  body += chunk; // No size limit
}

Recommendation: Implement request size limits to prevent DoS attacks.

Error Handling & Reliability

6. Incomplete Error Context (EvaluationAgent.ts:633-670)

} catch (error) {
  const errorMessage = error instanceof Error ? error.message : 'Unknown error';
  // Missing: Error stack traces, evaluation context
}

Recommendation: Include evaluation context and stack traces in error responses for better debugging.

7. Inconsistent Timeout Handling (api-server.js:415, EvaluationAgent.ts:573)

timeout: 1500000, // 25 minutes - very long

Recommendation: Implement configurable timeouts with reasonable defaults and proper cleanup on timeout.

Code Quality Issues

8. Complex Configuration Logic (EvaluationAgent.ts:352-412)
The model configuration parsing has nested conditionals and multiple fallbacks that are hard to follow.
Recommendation: Extract to dedicated configuration parser with unit tests.

9. Missing Validation (LLMConfigurationManager.ts:157-167)

setOverride(config: Partial<LLMConfig>): void {
  this.overrideConfig = { ...config }; // No validation
}

Recommendation: Add validation for provider compatibility and required fields.

🧪 Test Coverage Gaps

The PR lacks comprehensive test coverage for:

Configuration override scenarios
Error handling edge cases
WebSocket connection failure scenarios
Multi-provider model configuration
Tracing integration

Recommendation: Add unit tests for core components and integration tests for the evaluation flow.

🚀 Performance Recommendations

1. Connection Pooling

Implement connection pooling for HTTP requests to external LLM providers to reduce latency.

2. Configuration Caching

Cache parsed configurations to avoid repeated localStorage access.

3. Request Debouncing

Add debouncing for rapid configuration changes to prevent unnecessary reinitializations.

🔧 Implementation Suggestions

1. Enhanced Error Types

export class EvaluationError extends Error {
  constructor(
    message: string,
    public readonly code: string,
    public readonly evaluationId?: string,
    public readonly context?: Record<string, any>
  ) {
    super(message);
  }
}

2. Configuration Validation Schema

const CONFIG_SCHEMA = {
  provider: { type: 'string', enum: ['openai', 'litellm', 'groq', 'openrouter'] },
  apiKey: { type: 'string', minLength: 10 },
  models: { type: 'object', required: ['main'] }
};

3. Request Rate Limiting

Implement rate limiting for evaluation requests to prevent abuse.

📋 Immediate Action Items

High Priority

Fix API key logging - Remove all API key metadata from logs
Add input validation - Implement comprehensive request validation
Configure CORS properly - Replace wildcard with specific origins
Add request size limits - Prevent DoS via large payloads

Medium Priority

Improve error handling - Add context and stack traces
Add configuration validation - Validate provider compatibility
Implement cleanup - Ensure proper resource cleanup on shutdown
Add unit tests - Test core configuration and evaluation logic

Low Priority

Optimize performance - Add connection pooling and caching
Enhanced documentation - Add API documentation and examples

🏆 Conclusion

This is a well-architected and feature-rich addition that significantly enhances the evaluation capabilities. The core design is solid with excellent separation of concerns. The main concerns are around security hardening, comprehensive testing, and production readiness.

Recommendation: Address the security issues and add basic test coverage before merging. The architecture is sound and will provide a strong foundation for evaluation workflows.

Overall Score: 7.5/10 - Excellent architecture with important security/robustness improvements needed.

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

front_end/panels/ai_chat/ui/AIChatPanel.ts (3)

51-63: Stop logging/previewing secrets in StorageMonitor.
Current logs emit value length/preview for keys containing 'openrouter' or 'ai_chat'. This risks API key leakage in dev/prod logs.

-    localStorage.setItem = (key: string, value: string) => {
-      if (key.includes('openrouter') || key.includes('ai_chat')) {
-        logger.debug(`=== LOCALSTORAGE SET ===`);
-        logger.debug(`Key: ${key}`);
-        logger.debug(`Value exists: ${!!value}`);
-        logger.debug(`Value length: ${value?.length || 0}`);
-        logger.debug(`Value preview: ${value?.substring(0, 50) + (value?.length > 50 ? '...' : '') || 'null'}`);
-        logger.debug(`Timestamp: ${new Date().toISOString()}`);
-      }
-      return this.originalSetItem(key, value);
-    };
+    localStorage.setItem = (key: string, value: string) => {
+      if (key.includes('openrouter') || key.includes('ai_chat')) {
+        logger.debug('=== LOCALSTORAGE SET ===');
+        logger.debug(`Key: ${key}`);
+        logger.debug('Value: [redacted]');
+        logger.debug(`Timestamp: ${new Date().toISOString()}`);
+      }
+      return this.originalSetItem(key, value);
+    };
@@
-      if (key.includes('openrouter') || key.includes('ai_chat')) {
+      if (key.includes('openrouter') || key.includes('ai_chat')) {
         logger.debug(`=== LOCALSTORAGE REMOVE ===`);
         logger.debug(`Key: ${key}`);
         logger.debug(`Timestamp: ${new Date().toISOString()}`);
       }
       return this.originalRemoveItem(key);
     };

Also gate StorageMonitor behind a debug flag to avoid shipping monkey‑patching:

-    // Initialize storage monitoring for debugging
-    StorageMonitor.getInstance();
+    // Initialize storage monitoring for debugging
+    if (BUILD_CONFIG.DEBUG) {
+      StorageMonitor.getInstance();
+    }

Also applies to: 66-73

1057-1076: Remove API‑key visibility logs on OAuth success.
Logging “API key length” and method is unnecessary and leaks sensitive metadata.

-      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
-      const authMethod = localStorage.getItem('openrouter_auth_method');
-      logger.info('- API key exists:', !!apiKey);
-      logger.info('- API key length:', apiKey?.length || 0);
-      logger.info('- Auth method:', authMethod);
+      // Avoid logging credential presence/length in production
+      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
+      const authMethod = localStorage.getItem('openrouter_auth_method');
+      logger.info('OpenRouter OAuth success');

1496-1527: Avoid logging API key prefix/length.
These logs leak secrets to console/storage. Log boolean presence only.

-        logger.info('Retrieved API key:');
-        logger.info('- Exists:', !!apiKey);
-        logger.info('- Length:', apiKey?.length || 0);
-        logger.info('- Prefix:', apiKey?.substring(0, 8) + '...' || 'none');
+        logger.info('Retrieved API key: [present=%s]', !!apiKey);

eval-server/nodejs/src/config.js (1)

65-71: Enforce OPENAI_API_KEY per guidelines.
Guideline requires OPENAI_API_KEY (and PORT default). validateConfig currently makes it optional unless requireLLM=true; make it required by default.
-export function validateConfig(requireLLM = false) {
+export function validateConfig(requireLLM = true) {
@@
-  // Only require OpenAI API key if LLM judge is explicitly needed
-  if (requireLLM && !CONFIG.llm.apiKey) {
+  // Require OpenAI API key by default for evaluator/judge usage
+  if (!CONFIG.llm.apiKey) {
     errors.push('OPENAI_API_KEY is required when using LLM judge');
   }
Optionally warn if no provider creds exist (CONFIG.providers.*) even when judge not used.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (2)

212-218: Global LLM override + maxConcurrency=3 risks cross‑evaluation credential bleed. Set to 1 or make overrides per‑evaluation.

Because LLMConfigurationManager override is global, concurrent evaluations can stomp each other’s API key/provider. Until per‑call config is plumbed, reduce concurrency here.

Apply this diff:

-        maxConcurrency: 3,
+        // NOTE: Global LLM override is not concurrency-safe. Keep at 1 until per-eval config is implemented.
+        maxConcurrency: 1,

If you prefer to keep 3, we need to make AgentService/LLM client accept per-call config and eliminate the global override path.

335-342: Redact sensitive payloads from logs (models, keys, message text).

Avoid logging raw params.model, merged inputs, or full user message. Keep booleans/lengths only.

Apply this diff:

-    logger.info('Received evaluation request', {
+    logger.info('Received evaluation request', {
       evaluationId: params.evaluationId,
       tool: params.tool,
       url: params.url,
       isChat: params.tool === 'chat',
-      modelOverride: params.input?.ai_chat_model,
-      modelConfig: params.model
+      modelOverride: !!params.input?.ai_chat_model,
+      hasModelConfig: !!params.model
     });

-    logger.info('DEBUG: Checking for model configuration override', {
-      evaluationId: params.evaluationId,
-      hasModel: !!params.model,
-      model: params.model
-    });
+    logger.info('DEBUG: Checking for model configuration override', {
+      evaluationId: params.evaluationId,
+      hasModel: !!params.model
+    });

-        logger.info('DEBUG: Created merged input for chat evaluation', {
-          evaluationId: params.evaluationId,
-          hasOverrideConfig: !!overrideConfig,
-          overrideConfig,
-          mergedInput: {
-            ...mergedInput,
-            api_key: mergedInput.api_key ? `${mergedInput.api_key.substring(0, 10)}...` : undefined
-          }
-        });
+        logger.info('DEBUG: Created merged input for chat evaluation', {
+          evaluationId: params.evaluationId,
+          hasOverride: configManager.hasOverride?.() || !!overrideConfig,
+          mergedInputPreview: {
+            provider: mergedInput.provider,
+            main_model: mergedInput.main_model,
+            mini_model: mergedInput.mini_model,
+            nano_model: mergedInput.nano_model,
+            has_api_key: !!mergedInput.api_key
+          }
+        });

-    logger.info('Starting chat evaluation', {
-      message: input.message,
+    logger.info('Starting chat evaluation', {
+      messageLength: String(input.message || '').length,
       timeout,
       hasTracingContext: !!tracingContext
     });

Also applies to: 346-351, 367-375, 561-569, 821-826

🧹 Nitpick comments (26)

eval-server/nodejs/.env.example (1)

31-34: Satisfy dotenv-linter ordering and trailing newline

Pure formatting, but easy to fix to keep linters green and examples clean.

Apply:

-PORT=8080
-HOST=127.0.0.1
+HOST=127.0.0.1
+PORT=8080
@@
-LITELLM_ENDPOINT=http://localhost:4000
-LITELLM_API_KEY=your-litellm-api-key-here
+LITELLM_API_KEY=your-litellm-api-key-here
+LITELLM_ENDPOINT=http://localhost:4000
@@
-DEFAULT_PROVIDER=openai
-DEFAULT_MAIN_MODEL=gpt-4
-DEFAULT_MINI_MODEL=gpt-4-mini
-DEFAULT_NANO_MODEL=gpt-3.5-turbo
+DEFAULT_MAIN_MODEL=
+DEFAULT_MINI_MODEL=
+DEFAULT_NANO_MODEL=
+DEFAULT_PROVIDER=openai
@@
-LOG_LEVEL=info
-LOG_DIR=./logs
+LOG_DIR=./logs
+LOG_LEVEL=info
@@
-AUTH_SECRET_KEY=
+AUTH_SECRET_KEY=
+

Also applies to: 39-41, 42-45

eval-server/nodejs/CLAUDE.md (2)

110-131: Add security note for runtime LLM config (api keys over RPC)

Recommend calling out that apiKey in configure_llm must only traverse trusted, local, encrypted channels, and keys must be redacted in logs.

Patch suggestion:

   "id": "config-request-id"
 }

+#### Security considerations
+- Only use configure_llm over trusted local connections (wss:// or ws://localhost).
+- Never log raw API keys; always redact in server/client logs.
+- Consider scoping keys per‑client session and expiring overrides on disconnect.
+


---

`133-163`: **Clarify naming conventions between protocol and UI**

Protocol example uses snake_case (main_model, mini_model, nano_model) while frontend uses camelCase in LLMConfigurationManager. Add a note that the server/bridge maps these fields.

</blockquote></details>
<details>
<summary>front_end/panels/ai_chat/core/AgentNodes.ts (1)</summary><blockquote>

`595-605`: **Validate config and fall back to node defaults; pass endpoint when present**

If override/localStorage is incomplete, tool.execute may get undefined apiKey/model. Validate and fall back to constructor defaults. Also pass endpoint for LiteLLM.



```diff
-          const configManager = LLMConfigurationManager.getInstance();
-          const config = configManager.getConfiguration();
-
-          return await selectedTool.execute(toolArgs as any, {
-            apiKey: config.apiKey,
-            provider: config.provider,
-            model: config.mainModel,
-            miniModel: config.miniModel,
-            nanoModel: config.nanoModel
-          });
+          const mgr = LLMConfigurationManager.getInstance();
+          const cfg = mgr.getConfiguration();
+          const fallback = {
+            provider: this.provider,
+            model: this.modelName,
+            miniModel: this.miniModel,
+            nanoModel: this.nanoModel,
+          };
+          // Minimal validation: require provider+model; keep apiKey optional to support OAuth/OpenRouter flows
+          const provider = cfg.provider || (fallback.provider as any);
+          const model = cfg.mainModel || (fallback.model as any);
+          return await selectedTool.execute(toolArgs as any, {
+            apiKey: cfg.apiKey,
+            provider,
+            model,
+            miniModel: cfg.miniModel || fallback.miniModel,
+            nanoModel: cfg.nanoModel || fallback.nanoModel,
+            ...(cfg.endpoint ? { endpoint: cfg.endpoint } : {}),
+          });

eval-server/nodejs/examples/with-http-wrapper.js (1)

14-17: Example ports diverge from .env; consider env-driven defaults

The example binds EvalServer to 8082 and HTTP to 8080 while .env defaults to 8080. Consider reading from env with safe defaults to prevent confusion.
-const evalServer = new EvalServer({
-  // No authKey - authentication disabled for automated mode
-  host: '127.0.0.1',
-  port: 8082
-});
+const evalServer = new EvalServer({
+  // No authKey - authentication disabled for automated mode (local only)
+  host: process.env.HOST || '127.0.0.1',
+  port: Number(process.env.PORT || 8082),
+});
@@
-const httpWrapper = new HTTPWrapper(evalServer, {
-  port: 8080,
-  host: '127.0.0.1'
-});
+const httpWrapper = new HTTPWrapper(evalServer, {
+  port: Number(process.env.HTTP_PORT || 8080),
+  host: process.env.HOST || '127.0.0.1',
+});
Bind only to loopback in examples to avoid exposing no‑auth endpoints.

Also applies to: 21-23, 28-38

eval-server/nodejs/examples/multiple-evals.js (5)

29-34: Prefer CONFIG.defaults for model names.
Hardcoding 'gpt-4' drifts from server defaults; read defaults so examples follow env.

-      main_model: {
-        provider: 'openai',
-        model: 'gpt-4',
-        api_key: CONFIG.providers.openai.apiKey
-      }
+      main_model: {
+        provider: 'openai',
+        model: CONFIG.defaults.mainModel,
+        api_key: CONFIG.providers.openai.apiKey
+      }

45-51: Same: use CONFIG.defaults or provider‑specific defaults.

-        model: 'llama-3.1-8b-instant',
+        model: CONFIG.defaults.mainModel, // or a GROQ default read from CONFIG if added

62-68: Same: avoid hardcoding OpenRouter model in example.

-        model: 'anthropic/claude-3-sonnet',
+        model: CONFIG.defaults.mainModel,

79-86: Same: LiteLLM model constant may not exist at endpoint.
Use CONFIG.defaults or make the model user‑provided to avoid 404s at proxies.

-        model: 'claude-3-haiku-20240307',
+        model: CONFIG.defaults.mainModel,

100-105: Minor: read host/port from CONFIG for consistency.

-const server = new EvalServer({
-  authKey: 'hello',
-  host: '127.0.0.1',
-  port: 8080
-});
+const server = new EvalServer({
+  authKey: 'hello',
+  host: CONFIG.server.host || '127.0.0.1',
+  port: CONFIG.server.port || 8080
+});

front_end/panels/ai_chat/ui/AIChatPanel.ts (2)

361-376: Use LLMConfigurationManager for provider fallback.
This reads from localStorage; prefer config manager to keep a single source of truth.
-    const currentProvider = localStorage.getItem(PROVIDER_SELECTION_KEY) || 'openai';
+    const currentProvider = LLMConfigurationManager.getInstance().getProvider();
1971-1981: Restore StorageMonitor hooks on panel hide.
To avoid global side effects across DevTools sessions, call restore().
   override willHide(): void {
     // Explicitly remove any event listeners to prevent memory leaks
+    try { StorageMonitor.getInstance().restore(); } catch {}
     if (this.#boundOnMessagesChanged) {
       this.#agentService.removeEventListener(AgentEvents.MESSAGES_CHANGED, this.#boundOnMessagesChanged);
     }

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (1)

92-95: Tighten typing and align casing for provider fields.
Use a union type for provider in EvaluationParams.model to match LLMConfigurationParams; keep casing consistent across protocol objects to reduce translation glue.
-    provider?: string;
+    provider?: 'openai' | 'litellm' | 'groq' | 'openrouter';
-    api_key?: string;    // New: per-request API key
+    api_key?: string;    // Note: snake_case for wire protocol; avoid logging

eval-server/nodejs/src/api-server.js (2)

37-43: Defaults drift from CONFIG.defaults; unify source of truth.
Hardcoded 'gpt-4.1*' differs from src/config.js defaults. Load defaults from CONFIG to prevent divergence.

+import { CONFIG } from './config.js';
@@
-        this.configDefaults = {
-          model: {
-            main_model: 'gpt-4.1',
-            mini_model: 'gpt-4.1-mini',
-            nano_model: 'gpt-4.1-nano',
-            provider: 'openai'
-          }
-        };
+        this.configDefaults = {
+          model: {
+            main_model: CONFIG.defaults.mainModel,
+            mini_model: CONFIG.defaults.miniModel,
+            nano_model: CONFIG.defaults.nanoModel,
+            provider: CONFIG.defaults.provider
+          }
+        };
@@
-      this.configDefaults = {
-        model: {
-          main_model: 'gpt-4.1',
-          mini_model: 'gpt-4.1-mini',
-          nano_model: 'gpt-4.1-nano',
-          provider: 'openai'
-        }
-      };
+      this.configDefaults = {
+        model: {
+          main_model: CONFIG.defaults.mainModel,
+          mini_model: CONFIG.defaults.miniModel,
+          nano_model: CONFIG.defaults.nanoModel,
+          provider: CONFIG.defaults.provider
+        }
+      };

Also applies to: 48-55

58-66: CORS wildcard on local API.
Fine for local dev; consider restricting or making it configurable before shipping.

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

64-70: Defensive provider parsing

getProvider casts arbitrary storage values to LLMProvider. Guard against invalid strings.

   getProvider(): LLMProvider {
     if (this.overrideConfig?.provider) {
       return this.overrideConfig.provider;
     }
     const stored = localStorage.getItem(STORAGE_KEYS.PROVIDER);
-    return (stored as LLMProvider) || 'openai';
+    const allowed: LLMProvider[] = ['openai','litellm','groq','openrouter'];
+    return allowed.includes(stored as LLMProvider) ? (stored as LLMProvider) : 'openai';
   }

front_end/panels/ai_chat/core/AgentService.ts (2)

647-681: Use config manager for endpoint detection; avoid direct localStorage

#doesCurrentConfigRequireApiKey reads localStorage directly. Prefer the centralized manager to avoid key drift.

   #doesCurrentConfigRequireApiKey(): boolean {
     try {
       // Check the selected provider
-      const selectedProvider = this.#configManager.getProvider();
+      const selectedProvider = this.#configManager.getProvider();
       
       // OpenAI provider always requires an API key
       if (selectedProvider === 'openai') {
         return true;
       }
       // Groq provider always requires an API key
       if (selectedProvider === 'groq') {
         return true;
       }
       // OpenRouter provider always requires an API key
       if (selectedProvider === 'openrouter') {
         return true;
       }
       // For LiteLLM, only require API key if no endpoint is configured
       if (selectedProvider === 'litellm') {
-        const hasLiteLLMEndpoint = Boolean(localStorage.getItem('ai_chat_litellm_endpoint'));
+        const hasLiteLLMEndpoint = !!this.#configManager.getEndpoint();
         // If we have an endpoint, API key is optional
         return !hasLiteLLMEndpoint;
       }
       // Default to requiring API key for any unknown provider
       return true;

4-4: “Cache break” comments

Harmless, but consider removing before merge to keep history noise down.

Also applies to: 26-27

MODEL-CONFIGS.md (1)

56-62: Fix markdownlint issues (MD036, MD024)

Replace emphasis-styled headings with proper “### …” headings and dedupe repeated headings.

Also applies to: 62-62, 384-384

eval-server/nodejs/examples/library-usage.js (1)

50-53: LiteLLM should be considered configured with endpoint only

Examples mark LiteLLM configured only if apiKey AND endpoint are present. The rest of the system allows endpoint‑only LiteLLM.
-  if (CONFIG.providers.litellm.apiKey && CONFIG.providers.litellm.endpoint) {
+  if (CONFIG.providers.litellm.endpoint) {
     availableProviders.push('litellm');
     console.log('   ✅ LiteLLM configured');
   }

eval-server/nodejs/src/lib/EvalServer.js (1)

662-678: Minor logging nit: messageType for JSON‑RPC

sendMessage logs data.type which is undefined for JSON‑RPC responses/requests; consider guarding to avoid “undefined”.
-          messageType: data.type
+          messageType: (data && data.type) ? data.type : (data && data.jsonrpc ? 'jsonrpc' : 'unknown')

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (5)

352-365: Include endpoint in early override extraction; else LiteLLM/OpenRouter endpoint overrides are ignored.

You parse provider/model/apiKey but omit endpoint; add it to extraction and override.

Apply this diff:
-      const provider = mainModel?.provider || params.model.provider || 'openai';
-      const apiKey = mainModel?.api_key || params.model.api_key;
+      const provider = mainModel?.provider || params.model.provider || 'openai';
+      const apiKey = mainModel?.api_key || params.model.api_key;
       const modelName = mainModel?.model || mainModel || params.model.main_model;
       const miniModel = (params.model.mini_model as any)?.model || params.model.mini_model;
       const nanoModel = (params.model.nano_model as any)?.model || params.model.nano_model;
+      const endpoint = (params.model as any)?.endpoint || (mainModel as any)?.endpoint;
@@
-        configManager.setOverride({
+        configManager.setOverride({
           provider,
           apiKey,
           mainModel: modelName,
           miniModel: miniModel || modelName,
-          nanoModel: nanoModel || miniModel || modelName
+          nanoModel: nanoModel || miniModel || modelName,
+          endpoint
         });
Also applies to: 385-393

545-551: Also pass through endpoint when building merged chat input.

This ensures the chat path can use LiteLLM/OpenRouter endpoints provided in the request.

Apply this diff:
           ...(params.model && {
             // Extract nested model configuration properly
             main_model: (params.model.main_model as any)?.model || params.model.main_model,
             mini_model: (params.model.mini_model as any)?.model || params.model.mini_model,
             nano_model: (params.model.nano_model as any)?.model || params.model.nano_model,
             provider: (params.model.main_model as any)?.provider || params.model.provider,
-            api_key: (params.model.main_model as any)?.api_key || (params.model as any).api_key
+            api_key: (params.model.main_model as any)?.api_key || (params.model as any).api_key,
+            endpoint: (params.model as any)?.endpoint || (params.model.main_model as any)?.endpoint
           }),
300-307: Auth failure path: short‑circuit earlier on known failure.

You send the verify message and later reject the auth promise. Consider rejecting immediately after sending the negative verification to avoid a dangling connection.

Also applies to: 306-308

4-4: Nit: “Cache break” comment timestamp is great; consider a link to the change or commit for traceability.

978-986: Minor: Don’t log whether an API key exists when processing configuration.

Even boolean presence can be sensitive in shared logs. Safe to keep in debug builds only.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74c9f40 and 78b020b.

⛔ Files ignored due to path filters (1)

eval-server/nodejs/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (19)

MODEL-CONFIGS.md (1 hunks)
eval-server/nodejs/.env.example (1 hunks)
eval-server/nodejs/CLAUDE.md (4 hunks)
eval-server/nodejs/examples/library-usage.js (3 hunks)
eval-server/nodejs/examples/multiple-evals.js (4 hunks)
eval-server/nodejs/examples/with-http-wrapper.js (1 hunks)
eval-server/nodejs/src/api-server.js (6 hunks)
eval-server/nodejs/src/config.js (1 hunks)
eval-server/nodejs/src/lib/EvalServer.js (5 hunks)
front_end/panels/ai_chat/BUILD.gn (2 hunks)
front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2 hunks)
front_end/panels/ai_chat/common/EvaluationConfig.ts (3 hunks)
front_end/panels/ai_chat/core/AgentNodes.ts (2 hunks)
front_end/panels/ai_chat/core/AgentService.ts (9 hunks)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1 hunks)
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (3 hunks)
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (11 hunks)
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (2 hunks)
front_end/panels/ai_chat/ui/AIChatPanel.ts (5 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

eval-server/nodejs/src/config.js

📄 CodeRabbit inference engine (eval-server/nodejs/CLAUDE.md)

eval-server/nodejs/src/config.js: Centralize configuration via environment variables in src/config.js
Require OPENAI_API_KEY and PORT (default 8080) from environment configuration

Files:

eval-server/nodejs/src/config.js

🧠 Learnings (10)

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/config.js : Require OPENAI_API_KEY and PORT (default 8080) from environment configuration

Applied to files:

eval-server/nodejs/.env.example
front_end/panels/ai_chat/core/AgentService.ts
eval-server/nodejs/examples/with-http-wrapper.js
eval-server/nodejs/src/config.js
eval-server/nodejs/examples/library-usage.js

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/config.js : Centralize configuration via environment variables in src/config.js

Applied to files:

eval-server/nodejs/.env.example
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
front_end/panels/ai_chat/common/EvaluationConfig.ts
eval-server/nodejs/examples/multiple-evals.js
eval-server/nodejs/src/config.js
eval-server/nodejs/examples/library-usage.js

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/evaluator.js : LLM evaluator must integrate with the OpenAI API to judge agent responses

Applied to files:

front_end/panels/ai_chat/LLM/OpenAIProvider.ts
eval-server/nodejs/examples/multiple-evals.js
eval-server/nodejs/CLAUDE.md
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
eval-server/nodejs/examples/library-usage.js

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/server.js : WebSocket server must accept agent connections, manage agent lifecycle (connect/ready/disconnect), orchestrate evaluation sessions, and handle bidirectional RPC communication

Applied to files:

eval-server/nodejs/examples/with-http-wrapper.js
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
eval-server/nodejs/src/lib/EvalServer.js
eval-server/nodejs/CLAUDE.md

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must call Evaluate(request: string) -> string on connected agents

Applied to files:

eval-server/nodejs/src/api-server.js
eval-server/nodejs/src/lib/EvalServer.js
eval-server/nodejs/CLAUDE.md

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/evaluator.js : LLM evaluator must return structured JSON with scores and reasoning

Applied to files:

eval-server/nodejs/examples/multiple-evals.js

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must implement JSON-RPC 2.0 protocol

Applied to files:

eval-server/nodejs/src/lib/EvalServer.js
eval-server/nodejs/CLAUDE.md

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/logger.js : Log all RPC calls, evaluations, and connection events

Applied to files:

eval-server/nodejs/src/lib/EvalServer.js

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must correlate requests and responses using unique IDs

Applied to files:

eval-server/nodejs/CLAUDE.md

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/rpc-client.js : RPC client must handle timeouts and error conditions

Applied to files:

eval-server/nodejs/CLAUDE.md

🧬 Code graph analysis (11)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (5)

front_end/panels/ai_chat/core/AgentService.ts (2)

logger (295-306)

event (686-722)

front_end/panels/ai_chat/ui/AIChatPanel.ts (14)

logger (957-1050)

logger (1358-1428)

logger (1578-1591)

logger (1635-1638)

window (1055-1102)

localStorage (822-845)

provider (1478-1541)

event (1689-1696)

event (1701-1705)

event (1710-1714)

event (1719-1723)

event (1728-1732)

event (1737-1741)

error (1948-1961)

front_end/panels/ai_chat/core/Logger.ts (1)

createLogger (316-318)

front_end/panels/ai_chat/LLM/LLMTypes.ts (1)

LLMProvider (142-142)

front_end/panels/ai_chat/core/Constants.ts (1)

STORAGE_KEYS (39-45)

front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

front_end/panels/ai_chat/ui/AIChatPanel.ts (4)

apiKey (1191-1222)

apiKey (1264-1283)

apiKey (1596-1630)

localStorage (822-845)

front_end/panels/ai_chat/core/AgentService.ts (3)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

LLMConfigurationManager (41-368)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

front_end/panels/ai_chat/core/Graph.ts (1)

createAgentGraph (24-42)

front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (1)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

front_end/panels/ai_chat/common/EvaluationConfig.ts (1)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

eval-server/nodejs/examples/multiple-evals.js (2)

eval-server/nodejs/src/config.js (2)

CONFIG (5-63)

CONFIG (5-63)

eval-server/nodejs/examples/library-usage.js (3)

response (63-71)

response (121-135)

response (195-223)

front_end/panels/ai_chat/core/AgentNodes.ts (1)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

LLMConfigurationManager (41-368)

eval-server/nodejs/src/lib/EvalServer.js (2)

front_end/panels/ai_chat/core/AgentService.ts (1)

logger (295-306)

eval-server/nodejs/src/config.js (2)

CONFIG (5-63)

CONFIG (5-63)

front_end/panels/ai_chat/ui/AIChatPanel.ts (2)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

LLMConfigurationManager (41-368)

front_end/panels/ai_chat/core/AgentService.ts (1)

logger (295-306)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (5)

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (5)

isLLMConfigurationRequest (297-299)

LLMConfigurationRequest (259-264)

createLLMConfigurationResponse (314-334)

createErrorResponse (240-255)

ErrorCodes (141-155)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

front_end/panels/ai_chat/core/AgentService.ts (1)

AgentService (44-786)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

LLMConfigurationManager (41-368)

eval-server/nodejs/examples/library-usage.js (3)

response (63-71)

response (121-135)

response (195-223)

eval-server/nodejs/examples/library-usage.js (1)

eval-server/nodejs/src/config.js (2)

CONFIG (5-63)

CONFIG (5-63)

🪛 dotenv-linter (3.3.0)

eval-server/nodejs/.env.example

[warning] 6-6: [UnorderedKey] The HOST key should go before the PORT key

(UnorderedKey)

[warning] 16-16: [UnorderedKey] The LITELLM_API_KEY key should go before the LITELLM_ENDPOINT key

(UnorderedKey)

[warning] 27-27: [UnorderedKey] The DEFAULT_MAIN_MODEL key should go before the DEFAULT_PROVIDER key

(UnorderedKey)

[warning] 28-28: [UnorderedKey] The DEFAULT_MINI_MODEL key should go before the DEFAULT_PROVIDER key

(UnorderedKey)

[warning] 29-29: [UnorderedKey] The DEFAULT_NANO_MODEL key should go before the DEFAULT_PROVIDER key

(UnorderedKey)

[warning] 33-33: [UnorderedKey] The LOG_DIR key should go before the LOG_LEVEL key

(UnorderedKey)

[warning] 45-45: [EndingBlankLine] No blank line at the end of the file

(EndingBlankLine)

🪛 markdownlint-cli2 (0.17.2)

MODEL-CONFIGS.md

56-56: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

62-62: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

384-384: Multiple headings with the same content

(MD024, no-duplicate-heading)

🔇 Additional comments (33)

front_end/panels/ai_chat/BUILD.gn (1)

59-60: LGTM: Manager included in both sources and metadata

The new LLMConfigurationManager.ts is correctly added to build and metadata lists.

Also applies to: 196-197

front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (1)

277-281: Good: no secret leakage in logs

You're logging only boolean presence and verification result, not the secrets themselves.

front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2)

4-4: Cache-break comment is fine

Doc-only; non-functional.

11-11: AUTOMATED_MODE credential bypass — override wiring verified

validateCredentials() intentionally skips checks in AUTOMATED_MODE; the per-request override path supplies the API key into LLMClient → OpenAIProvider at runtime. Key pointers: front_end/panels/ai_chat/LLM/OpenAIProvider.ts (526–532), front_end/panels/ai_chat/core/LLMConfigurationManager.ts (103–107), front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (874–879), front_end/panels/ai_chat/core/AgentService.ts (116–137), front_end/panels/ai_chat/LLM/LLMClient.ts (78–82). Optional: guard verbose logs that expose API keys/payloads behind debug.

front_end/panels/ai_chat/core/AgentNodes.ts (1)

18-18: Central config import looks good

Using the singleton avoids drift across nodes.

front_end/panels/ai_chat/common/EvaluationConfig.ts (2)

33-36: Default ws://localhost:8082 matches examples

Nice alignment with the example start script.

53-64: Automated mode: good defaults; avoid persisting secrets

Behavior matches intent — automated mode defaults evaluation to enabled and endpoint to ws://localhost:8082 and does not persist the secret key. I attempted a repo-wide search for other hardcoded 8080/8082 references, but ripgrep returned "No files were searched" in this environment, so I could not verify. Manually run a repo-wide search for "localhost:8080", "localhost:8082", "ws://localhost:8082", and the ai_chat localStorage keys (ai_chat_evaluation_*) to avoid docs/UI confusion. Also applies to: 68–70, 108–111.

eval-server/nodejs/CLAUDE.md (1)

45-51: configure_llm implemented and wired in EvalServer.js

Switch case 'configure_llm' (eval-server/nodejs/src/lib/EvalServer.js:346) calls this.handleConfigureLLM; async handleConfigureLLM is implemented at eval-server/nodejs/src/lib/EvalServer.js:382.

eval-server/nodejs/.env.example (1)

5-6: Align default ports and pin model IDs — verification required

Ripgrep returned "No files were searched", so repo-wide verification couldn't complete.

Port: eval-server/nodejs/.env.example (lines 5–6) sets PORT=8080 while examples/server/frontend use 8082 — unify to 8082 (recommended) or update examples to 8080.

Models: replace placeholders (gpt-4, gpt-4-mini, gpt-3.5-turbo) with pinned, supported IDs for stability (e.g., gpt-4-0613, gpt-3.5-turbo-0613) or leave blank with a comment referencing OpenAI's canonical model list.

Action: run a repo-wide search for ports and model placeholders (e.g., rg -n --hidden -S 'localhost:808[0-9]' && rg -n --hidden -S 'gpt-4|gpt-4-mini|gpt-3.5-turbo') or allow re-run; verification is incomplete until that search succeeds.

eval-server/nodejs/examples/multiple-evals.js (3)

12-12: Good: centralized config import.
Importing CONFIG keeps example wiring in sync with server defaults.

94-96: Nice touch: provider/default label in stack log.
Clearer operator feedback without leaking secrets.

132-141: Good: runtime log mentions provider and model only.
No API keys in logs. Keep it that way.

front_end/panels/ai_chat/ui/AIChatPanel.ts (6)

321-327: Getter fallback logic LGTM.
Using config manager + fallback to main is correct.

330-337: Nano fallback chain LGTM.
[nano -> mini -> main] is sensible.

339-347: Provider resolution via config manager improves cohesion.

350-358: Same for mini+provider.

727-742: Config manager initialization at construction: good timing.

1108-1123: Update tests/e2e to call setLLMConfiguration instead of editing localStorage directly.

Search for setLLMConfiguration returned only its definition at front_end/panels/ai_chat/ui/AIChatPanel.ts:1111 — no call sites detected; update any tests/e2e that previously mutated localStorage to use this API. Run: rg -n "setLLMConfiguration\(" --type=ts -C2 to re-check.

eval-server/nodejs/src/config.js (1)

7-9: Server port defaults OK.
parseInt fallback to 8080 covers unset/invalid env.

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (1)

259-334: LLM configuration RPC: API shape looks good.
Clear request/response types and guards.

front_end/panels/ai_chat/core/AgentService.ts (4)

192-205: Redundant API‑key gate after prior validation

Following the fix above, this gate remains useful for non‑AUTOMATED_MODE. No change needed, just confirming behavior.

If you’d like to keep one source of truth, you can drop the apiKey portion from validateConfiguration entirely and rely on this gate for runtime enforcement in non‑AUTOMATED_MODE.

344-349: AUTOMATED_MODE cold‑start still hit by validation

sendMessage initializes the graph in AUTOMATED_MODE when #graph is absent, but initialize() → #initializeLLMClient would previously fail validation. The earlier fix resolves this path.

After applying the validation bypass, please test an AUTOMATED_MODE run with no stored API key to confirm first message succeeds.

292-306: Config change handler looks good

Marking the service uninitialized and clearing the graph is the right minimal action on config changes.

171-177: Log detail is appropriate

Provider list and counts are useful; no secrets are logged.

eval-server/nodejs/examples/library-usage.js (1)

76-85: Good progressive demo flow

Basic eval, then model selection, then provider config demos — solid developer UX.

eval-server/nodejs/src/lib/EvalServer.js (3)

329-379: JSON‑RPC handler: solid structure

Routing requests and returning JSON‑RPC compliant responses is correct.

701-729: Model config layering looks good

Nested main/mini/nano with evaluation overrides is sensible.

Also applies to: 730-748

199-205: Exposing getClientManager() is fine

Public accessor is a useful addition.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (5)

193-195: Good: routes configure_llm messages to a dedicated handler.

This integrates the new JSON‑RPC method cleanly.

689-762: Tool timeout handling looks solid.

Clear timeouts, trace updates, and error propagation are handled correctly.

782-793: Heartbeat is fine.

Interval cleared on disconnect; low risk.

1-1048: Repo‑wide verification helpers

Run these to confirm assumptions and spot remaining hazards.

278-284: Restrict automated-mode auth bypass to local endpoints and make it explicit opt‑in.

AUTOMATED_MODE currently skips auth for any endpoint — gate the bypass so it only applies to loopback/private hosts (localhost, 127.0.0.1, ::1, *.local) and require an explicit allow flag (e.g., AUTOMATED_MODE_AUTH_BYPASS) in addition to AUTOMATED_MODE; refuse bypass and log a warning for non-local endpoints.
Location: front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (around lines 278–284).

coderabbitai · 2025-09-18T14:24:15Z

eval-server/nodejs/src/api-server.js

      logger.info('Processing responses request:', {
        input: requestBody.input,
-        modelConfig
+        modelConfig: nestedModelConfig
      });


⚠️ Potential issue

Do not log full model config (leaks API keys).
This prints nestedModelConfig containing api_key. Remove or redact.

- logger.info('Processing responses request:', { - input: requestBody.input, - modelConfig: nestedModelConfig - }); + logger.info('Processing responses request', { input: requestBody.input });

If you need visibility, log only provider/model names without secrets.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info('Processing responses request:', {

input: requestBody.input,

modelConfig

modelConfig: nestedModelConfig

});

logger.info('Processing responses request', { input: requestBody.input });

🤖 Prompt for AI Agents

In eval-server/nodejs/src/api-server.js around lines 270 to 273, the current logger.info call prints nestedModelConfig which may contain secrets (api_key); remove or redact any sensitive fields before logging. Replace logging of the full object with a sanitized object that only includes non-sensitive metadata (e.g., provider name and model name) or explicitly remove api_key/credentials fields, and ensure nested structures are traversed to strip any secret keys before passing data to logger.info.

eval-server/nodejs/src/api-server.js

coderabbitai · 2025-09-18T14:24:15Z

eval-server/nodejs/src/lib/EvalServer.js

+  async handleConfigureLLM(connection, params) {
+    if (!connection.registered) {
+      throw new Error('Client must be registered before configuring LLM');
+    }
+
+    const { provider, apiKey, endpoint, models, partial = false } = params;
+
+    // Validate provider
+    const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];
+    if (!supportedProviders.includes(provider)) {
+      throw new Error(`Unsupported provider: ${provider}. Supported providers: ${supportedProviders.join(', ')}`);
+    }
+
+    // Validate models
+    if (!models || !models.main) {
+      throw new Error('Main model is required');
+    }
+
+    // Store configuration for this client connection
+    if (!connection.llmConfig) {
+      connection.llmConfig = {};
+    }
+
+    // Apply configuration (full or partial update)
+    if (partial && connection.llmConfig) {
+      // Partial update - merge with existing config
+      connection.llmConfig = {
+        ...connection.llmConfig,
+        provider: provider || connection.llmConfig.provider,
+        apiKey: apiKey || connection.llmConfig.apiKey,
+        endpoint: endpoint || connection.llmConfig.endpoint,
+        models: {
+          ...connection.llmConfig.models,
+          ...models
+        }
+      };
+    } else {
+      // Full update - replace entire config
+      connection.llmConfig = {
+        provider,
+        apiKey: apiKey || CONFIG.providers[provider]?.apiKey,
+        endpoint: endpoint || CONFIG.providers[provider]?.endpoint,
+        models: {
+          main: models.main,
+          mini: models.mini || models.main,
+          nano: models.nano || models.mini || models.main
+        }
+      };
+    }
+
+    logger.info('LLM configuration updated', {
+      clientId: connection.clientId,
+      provider: connection.llmConfig.provider,
+      models: connection.llmConfig.models,
+      hasApiKey: !!connection.llmConfig.apiKey,
+      hasEndpoint: !!connection.llmConfig.endpoint
+    });
+
+    return {
+      status: 'success',
+      message: 'LLM configuration updated successfully',
+      appliedConfig: {
+        provider: connection.llmConfig.provider,
+        models: connection.llmConfig.models
+      }
+    };
+  }


⚠️ Potential issue

configure_llm: partial updates require provider/models even when not changing them

For partial=true, the code still validates provider and models.main unconditionally. This rejects legitimate partial updates.

Use an effectiveProvider (from params or existing/default), and only require models.main for full updates.

async handleConfigureLLM(connection, params) { if (!connection.registered) { throw new Error('Client must be registered before configuring LLM'); } - const { provider, apiKey, endpoint, models, partial = false } = params; + const { provider, apiKey, endpoint, models, partial = false } = params; - // Validate provider - const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter']; - if (!supportedProviders.includes(provider)) { - throw new Error(`Unsupported provider: ${provider}. Supported providers: ${supportedProviders.join(', ')}`); - } + const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter']; + const existing = connection.llmConfig || {}; + const effectiveProvider = provider || existing.provider || CONFIG.defaults.provider; + if (!supportedProviders.includes(effectiveProvider)) { + throw new Error(`Unsupported provider: ${effectiveProvider}. Supported providers: ${supportedProviders.join(', ')}`); + } - // Validate models - if (!models || !models.main) { - throw new Error('Main model is required'); - } + // Validate models: required only for full update + if (!partial && (!models || !models.main)) { + throw new Error('Main model is required'); + } // Store configuration for this client connection if (!connection.llmConfig) { connection.llmConfig = {}; } // Apply configuration (full or partial update) if (partial && connection.llmConfig) { // Partial update - merge with existing config connection.llmConfig = { ...connection.llmConfig, - provider: provider || connection.llmConfig.provider, - apiKey: apiKey || connection.llmConfig.apiKey, - endpoint: endpoint || connection.llmConfig.endpoint, + provider: effectiveProvider, + apiKey: apiKey ?? connection.llmConfig.apiKey ?? CONFIG.providers[effectiveProvider]?.apiKey, + endpoint: endpoint ?? connection.llmConfig.endpoint ?? CONFIG.providers[effectiveProvider]?.endpoint, models: { ...connection.llmConfig.models, ...models } }; } else { // Full update - replace entire config connection.llmConfig = { - provider, - apiKey: apiKey || CONFIG.providers[provider]?.apiKey, - endpoint: endpoint || CONFIG.providers[provider]?.endpoint, + provider: effectiveProvider, + apiKey: apiKey || CONFIG.providers[effectiveProvider]?.apiKey, + endpoint: endpoint || CONFIG.providers[effectiveProvider]?.endpoint, models: { - main: models.main, - mini: models.mini || models.main, - nano: models.nano || models.mini || models.main + main: models.main, + mini: models.mini || models.main, + nano: models.nano || models.mini || models.main } }; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async handleConfigureLLM(connection, params) {

if (!connection.registered) {

throw new Error('Client must be registered before configuring LLM');

}

const { provider, apiKey, endpoint, models, partial = false } = params;

// Validate provider

const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];

if (!supportedProviders.includes(provider)) {

throw new Error(`Unsupported provider: ${provider}. Supported providers: ${supportedProviders.join(', ')}`);

}

// Validate models

if (!models || !models.main) {

throw new Error('Main model is required');

}

// Store configuration for this client connection

if (!connection.llmConfig) {

connection.llmConfig = {};

}

// Apply configuration (full or partial update)

if (partial && connection.llmConfig) {

// Partial update - merge with existing config

connection.llmConfig = {

...connection.llmConfig,

provider: provider || connection.llmConfig.provider,

apiKey: apiKey || connection.llmConfig.apiKey,

endpoint: endpoint || connection.llmConfig.endpoint,

models: {

...connection.llmConfig.models,

...models

}

};

} else {

// Full update - replace entire config

connection.llmConfig = {

provider,

apiKey: apiKey || CONFIG.providers[provider]?.apiKey,

endpoint: endpoint || CONFIG.providers[provider]?.endpoint,

models: {

main: models.main,

mini: models.mini || models.main,

nano: models.nano || models.mini || models.main

}

};

}

logger.info('LLM configuration updated', {

clientId: connection.clientId,

provider: connection.llmConfig.provider,

models: connection.llmConfig.models,

hasApiKey: !!connection.llmConfig.apiKey,

hasEndpoint: !!connection.llmConfig.endpoint

});

return {

status: 'success',

message: 'LLM configuration updated successfully',

appliedConfig: {

provider: connection.llmConfig.provider,

models: connection.llmConfig.models

}

};

}

async handleConfigureLLM(connection, params) {

if (!connection.registered) {

throw new Error('Client must be registered before configuring LLM');

}

const { provider, apiKey, endpoint, models, partial = false } = params;

const supportedProviders = ['openai', 'litellm', 'groq', 'openrouter'];

const existing = connection.llmConfig || {};

const effectiveProvider = provider || existing.provider || CONFIG.defaults.provider;

if (!supportedProviders.includes(effectiveProvider)) {

throw new Error(`Unsupported provider: ${effectiveProvider}. Supported providers: ${supportedProviders.join(', ')}`);

}

// Validate models: required only for full update

if (!partial && (!models || !models.main)) {

throw new Error('Main model is required');

}

// Store configuration for this client connection

if (!connection.llmConfig) {

connection.llmConfig = {};

}

// Apply configuration (full or partial update)

if (partial && connection.llmConfig) {

// Partial update - merge with existing config

connection.llmConfig = {

...connection.llmConfig,

provider: effectiveProvider,

apiKey: apiKey ?? connection.llmConfig.apiKey ?? CONFIG.providers[effectiveProvider]?.apiKey,

endpoint: endpoint ?? connection.llmConfig.endpoint ?? CONFIG.providers[effectiveProvider]?.endpoint,

models: {

...connection.llmConfig.models,

...models

}

};

} else {

// Full update - replace entire config

connection.llmConfig = {

provider: effectiveProvider,

apiKey: apiKey || CONFIG.providers[effectiveProvider]?.apiKey,

endpoint: endpoint || CONFIG.providers[effectiveProvider]?.endpoint,

models: {

main: models.main,

mini: models.mini || models.main,

nano: models.nano || models.mini || models.main

}

};

}

logger.info('LLM configuration updated', {

clientId: connection.clientId,

provider: connection.llmConfig.provider,

models: connection.llmConfig.models,

hasApiKey: !!connection.llmConfig.apiKey,

hasEndpoint: !!connection.llmConfig.endpoint

});

return {

status: 'success',

message: 'LLM configuration updated successfully',

appliedConfig: {

provider: connection.llmConfig.provider,

models: connection.llmConfig.models

}

};

}

🤖 Prompt for AI Agents

In eval-server/nodejs/src/lib/EvalServer.js around lines 384–450, change the configure LLM logic so partial updates don't force provider/models validation: compute an effectiveProvider = params.provider || connection.llmConfig?.provider || CONFIG.defaultProvider; only validate supportedProviders when params.provider is present (or when there is no existing provider at all), and only require models.main for full updates (partial === false or no existing models). For partial updates, merge models permissively (do not throw if params.models or params.models.main is missing) and when filling apiKey/endpoint for the applied config use params values or fall back to connection.llmConfig values or CONFIG.providers[effectiveProvider] defaults. Ensure logging and returned appliedConfig reflect the final effectiveProvider and merged models.

coderabbitai · 2025-09-18T14:24:16Z

front_end/panels/ai_chat/core/AgentService.ts

+    // Validate and add the selected provider
+    const validation = this.#configManager.validateConfiguration();
+    if (!validation.isValid) {
+      throw new Error(`Configuration validation failed: ${validation.errors.join(', ')}`);
    }
-
-    if (provider === 'openrouter' && openrouterKey) {
-      providers.push({ 
-        provider: 'openrouter' as const, 
-        apiKey: openrouterKey 
-      });
+
+    // Only add the selected provider if it has valid configuration
+    switch (provider) {
+      case 'openai':
+        if (apiKey) {
+          providers.push({
+            provider: 'openai' as const,
+            apiKey
+          });
+        }
+        break;
+      case 'litellm':
+        if (endpoint) {
+          providers.push({
+            provider: 'litellm' as const,
+            apiKey: apiKey || '', // Can be empty for some LiteLLM endpoints
+            providerURL: endpoint
+          });
+        }
+        break;
+      case 'groq':
+        if (apiKey) {
+          providers.push({
+            provider: 'groq' as const,
+            apiKey
+          });
+        }
+        break;
+      case 'openrouter':
+        if (apiKey) {
+          providers.push({
+            provider: 'openrouter' as const,
+            apiKey
+          });
+        }
+        break;
    }
-    
+
    if (providers.length === 0) {
-      let errorMessage = 'OpenAI API key is required for this configuration';
-      if (provider === 'litellm') {
-        errorMessage = 'LiteLLM endpoint is required for this configuration';
-      } else if (provider === 'groq') {
-        errorMessage = 'Groq API key is required for this configuration';
-      } else if (provider === 'openrouter') {
-        errorMessage = 'OpenRouter API key is required for this configuration';
-      }
-      throw new Error(errorMessage);
+      throw new Error(`No valid configuration found for provider ${provider}`);
    }


⚠️ Potential issue

AUTOMATED_MODE blocked by strict validation; skip credential checks when enabled

#initializeLLMClient calls validateConfiguration unconditionally. In AUTOMATED_MODE this rejects configs that intentionally omit API keys (key provided per‑request). Pass an “ignore credentials” flag.

- const validation = this.#configManager.validateConfiguration(); + const validation = this.#configManager.validateConfiguration({ + ignoreCredentialRequirements: BUILD_CONFIG.AUTOMATED_MODE + }); if (!validation.isValid) { throw new Error(`Configuration validation failed: ${validation.errors.join(', ')}`); }

Note: This change depends on LLMConfigurationManager.validateConfiguration supporting the new option. See suggested diff there.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In front_end/panels/ai_chat/core/AgentService.ts around lines 124 to 169, the unconditional call to this.#configManager.validateConfiguration() causes failure when AUTOMATED_MODE intentionally omits API keys; update the call to pass an option like { ignoreCredentials: true } when AUTOMATED_MODE is enabled (e.g., detect the mode before validation) so credential presence is not enforced, and relax the subsequent provider checks to allow adding providers when AUTOMATED_MODE is enabled (do not require apiKey for providers in that mode, but still include providerURL for litellm); ensure this change aligns with the updated LLMConfigurationManager.validateConfiguration signature.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts

coderabbitai · 2025-09-18T14:24:17Z

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts

+      // Clean up early override if we set one
+      if (hasSetEarlyOverride) {
+        try {
+          const configManager = LLMConfigurationManager.getInstance();
+          configManager.clearOverride();
+          logger.info('Cleared early configuration override', { evaluationId: params.evaluationId });
+        } catch (clearError) {
+          logger.warn('Failed to clear early configuration override:', clearError);
+        }
+      }


⚠️ Potential issue

Global override cleanup race across finally blocks.

handleEvaluationRequest and executeChatEvaluation both clear the same global override. With concurrency>1, overrides for other evaluations may be cleared unexpectedly.

Short‑term: set maxConcurrency=1 (see prior comment). Long‑term: remove global override; pass config to AgentService per call (e.g., AgentService.sendMessage(input, { provider, apiKey, models… })) and make AgentService use that context instead of global state.

🤖 Prompt for AI Agents

In front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts around lines 675-684, the code clears a global LLMConfigurationManager override (clearOverride) which races with other evaluations and can wipe overrides for concurrent runs; short-term, set maxConcurrency=1 to avoid concurrent clears, and remove the clearOverride call here; long-term, stop using global overrides: change AgentService APIs to accept an explicit per-call config object (provider, apiKey, models, etc.), thread that config through EvaluationAgent -> AgentService.sendMessage(..., config) and make AgentService use that passed context instead of reading/writing LLMConfigurationManager global state, then delete code paths that set or clear global overrides.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts

coderabbitai · 2025-09-18T14:24:17Z

MODEL-CONFIGS.md

+### ✅ Phase 1: Core Infrastructure (COMPLETED)
+1. ✅ **Create LLMConfigurationManager** singleton class - `/front_end/panels/ai_chat/core/LLMConfigurationManager.ts`
+2. ✅ **Update AgentService** to use manager instead of localStorage
+3. ✅ **Update AIChatPanel** model selection methods
+4. ✅ **Add configuration refresh mechanisms**
+
+### ✅ Phase 2: Per-Request Override Support (COMPLETED)
+5. ✅ **Update EvaluationAgent** per-request override logic
+6. ✅ **Update Graph/AgentNodes** configuration passing via LLMConfigurationManager
+7. ✅ **Per-request override functionality** implemented
+
+### ✅ Phase 3: Persistent Configuration API (COMPLETED)
+8. ✅ **Extend EvaluationProtocol** with `configure_llm` method
+9. ✅ **Implement `configure_llm` handler** in EvaluationAgent
+10. 🔄 **Add eval-server support** for persistent configuration (server-side implementation needed)
+11. ✅ **Add configuration validation** and error handling
+
+### 🔄 Phase 4: Testing & Documentation (NEXT)
+12. ✅ **Maintain backwards compatibility** during transition
+13. 🔄 **Add comprehensive tests** for both modes
+14. ✅ **Update documentation** and usage examples
+15. 🔄 **Performance testing** with multiple tabs and configurations


⚠️ Potential issue

🧩 Analysis chain

Doc/code mismatch: partial configure_llm vs server enforcement

The doc says partial updates are supported, but server handleConfigureLLM currently requires provider and models.main in all cases. See EvalServer.js suggestions to align behavior.

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- MODEL-CONFIGS.md lines 335-356 ----" if [ -f MODEL-CONFIGS.md ]; then sed -n '335,356p' MODEL-CONFIGS.md || true else echo "MODEL-CONFIGS.md not found in repo root" fi echo echo "---- Searching for configure_llm / handleConfigureLLM / EvalServer ----" rg -n --hidden --no-ignore -S 'configure_llm|configure-llm|handleConfigureLLM|configureLLM|handle_configure_llm' -g '!**/node_modules/**' || true echo echo "---- Showing context for matches (±5 lines) ----" rg -n -C5 --hidden --no-ignore -S 'configure_llm|configure-llm|handleConfigureLLM|configureLLM|handle_configure_llm' -g '!**/node_modules/**' || true echo echo "---- Searching for EvalServer filename / class / eval-server directory ----" rg -n --hidden --no-ignore -S 'class\s+EvalServer|EvalServer.js|EvalServer.ts|eval-server' -g '!**/node_modules/**' || true find . -type f $ -iname 'evalserver*' -o -iname '*eval-server*' -o -iname 'EvalServer.js' -o -iname 'EvalServer.ts' $ -print || true echo echo "---- Done ----"

Length of output: 1950

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- File: eval-server/nodejs/src/lib/EvalServer.js (full) ----" if [ -f eval-server/nodejs/src/lib/EvalServer.js ]; then nl -ba eval-server/nodejs/src/lib/EvalServer.js | sed -n '1,240p' || true else echo "File not found: eval-server/nodejs/src/lib/EvalServer.js" fi echo echo "---- Search for configure/handleConfigure keywords across repo ----" rg -n --hidden --no-ignore -S 'configure_llm|configure-llm|configureLLM|handleConfigureLLM|handle_configure_llm' -g '!**/node_modules/**' || true echo echo "---- Show matches with context (±10 lines) ----" rg -n -C10 --hidden --no-ignore -S 'configure_llm|configure-llm|configureLLM|handleConfigureLLM|handle_configure_llm' -g '!**/node_modules/**' || true

Length of output: 520

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="eval-server/nodejs/src/lib/EvalServer.js" echo "---- File: $FILE ----" if [ -f "$FILE" ]; then awk '{printf "%5d: %s\n", NR, $0}' "$FILE" || true else echo "File not found: $FILE" fi echo echo "---- Repo-wide search for configure/handleConfigure patterns ----" grep -RIn --exclude-dir=node_modules -C5 -E 'configure(_|-)?llm|configureLLM|handleConfigureLLM|handle_configure_llm' . || true echo echo "---- Done ----"

Length of output: 44342

EvalServer unconditionally rejects partial configure_llm updates (provider + models.main required)

Evidence: eval-server/nodejs/src/lib/EvalServer.js — handleConfigureLLM (lines 384–400) destructures params and then unconditionally validates provider and models.main, so calls with params.partial === true but missing provider/models will throw.

Action: Make validation conditional — require provider/models.main only when partial is false (or validate only the fields present when partial is true). Ensure partial merges only when connection.llmConfig exists; otherwise reject with a clear error. (Change validation at eval-server/nodejs/src/lib/EvalServer.js lines ~391–400 and adjust the partial-merge logic at ~408–432.)

Docs: Reconcile MODEL-CONFIGS.md / eval-server/nodejs/CLAUDE.md examples with the server behavior or update docs to reflect stricter requirements.

🤖 Prompt for AI Agents

In MODEL-CONFIGS.md around lines 335 to 356: the comment notes eval-server currently rejects partial configure_llm updates because EvalServer.js unconditionally validates provider and models.main; update eval-server/nodejs/src/lib/EvalServer.js (around lines ~391–400 and ~408–432) so validation of provider and models.main only occurs when params.partial is false, and when params.partial is true validate only the fields present (and perform a partial merge only if connection.llmConfig exists, otherwise return a clear error indicating no existing config to patch); also adjust the partial-merge logic to merge safely and persist the result, and update MODEL-CONFIGS.md / eval-server/nodejs/CLAUDE.md examples to reflect the new behavior or stricter requirements.

claude · 2025-09-19T01:26:31Z

Claude finished @olesho's task —— View job

PR Review: Local Dockerised Eval Server

I've completed a comprehensive review of this substantial PR implementing a local dockerized evaluation server with advanced LLM configuration management. This is an impressive addition with solid architecture, though there are several areas for improvement.

Review Progress:

Analyze core eval server changes
Review LLM configuration management
Check security implementations
Evaluate error handling and validation
Assess performance considerations
Review test coverage needs
Provide comprehensive feedback

🎯 Overall Assessment

This PR introduces a sophisticated evaluation server system with excellent separation of concerns and flexible configuration management. The implementation demonstrates strong architectural decisions with room for security and robustness improvements.

✅ Strengths

Architecture & Design

Excellent modular architecture with clear separation between eval server, agent service, and LLM management
Flexible LLM configuration system supporting both manual and automated modes with override capabilities
Comprehensive protocol design with proper JSON-RPC implementation and type safety
Well-structured tracing integration for observability

Code Quality

Consistent error handling patterns across most components
Good logging implementation with appropriate detail levels
Proper WebSocket connection management with heartbeat and reconnection logic
Comprehensive model configuration documentation (MODEL-CONFIGS.md)

🔍 Areas for Improvement

Critical Security Issues

1. API Key Exposure Risk (api-server.js:349, EvaluationAgent.ts:377-392)

// ISSUE: API keys logged at info level
logger.info('Setting early configuration override', {
  hasApiKey: !!apiKey,
  apiKeyLength: apiKey?.length  // Should not log length
});

Recommendation: Never log API key lengths or related metadata. Use debug level for configuration details.

2. Insufficient Input Validation (api-server.js:263-264)

if (!requestBody.input || typeof requestBody.input !== 'string') {
  throw new Error('Missing or invalid "input" field. Expected a string.');
}

Recommendation: Add comprehensive validation for all request parameters, including size limits and sanitization.

3. CORS Security (api-server.js:61)

res.setHeader('Access-Control-Allow-Origin', '*');

Recommendation: Replace wildcard CORS with specific allowed origins for production use.

Memory & Performance Concerns

4. Potential Memory Leaks (EvalServer.js:117-123)

for (const [clientId, connection] of this.connectedClients) {
  connection.rpcClient.cleanup();
  // Missing: Remove event listeners and clear timers
}

Recommendation: Ensure all event listeners, intervals, and timeouts are properly cleaned up.

5. Missing Request Size Limits (api-server.js:88-91)

for await (const chunk of req) {
  body += chunk; // No size limit
}

Recommendation: Implement request size limits to prevent DoS attacks.

Error Handling & Reliability

6. Incomplete Error Context (EvaluationAgent.ts:633-670)

} catch (error) {
  const errorMessage = error instanceof Error ? error.message : 'Unknown error';
  // Missing: Error stack traces, evaluation context
}

Recommendation: Include evaluation context and stack traces in error responses for better debugging.

7. Inconsistent Timeout Handling (api-server.js:415, EvaluationAgent.ts:573)

timeout: 1500000, // 25 minutes - very long

Recommendation: Implement configurable timeouts with reasonable defaults and proper cleanup on timeout.

Code Quality Issues

8. Complex Configuration Logic (EvaluationAgent.ts:352-412)
The model configuration parsing has nested conditionals and multiple fallbacks that are hard to follow.
Recommendation: Extract to dedicated configuration parser with unit tests.

9. Missing Validation (LLMConfigurationManager.ts:157-167)

setOverride(config: Partial<LLMConfig>): void {
  this.overrideConfig = { ...config }; // No validation
}

Recommendation: Add validation for provider compatibility and required fields.

🧪 Test Coverage Gaps

The PR lacks comprehensive test coverage for:

Configuration override scenarios
Error handling edge cases
WebSocket connection failure scenarios
Multi-provider model configuration
Tracing integration

Recommendation: Add unit tests for core components and integration tests for the evaluation flow.

🚀 Performance Recommendations

1. Connection Pooling

Implement connection pooling for HTTP requests to external LLM providers to reduce latency.

2. Configuration Caching

Cache parsed configurations to avoid repeated localStorage access.

3. Request Debouncing

Add debouncing for rapid configuration changes to prevent unnecessary reinitializations.

🔧 Implementation Suggestions

1. Enhanced Error Types

export class EvaluationError extends Error {
  constructor(
    message: string,
    public readonly code: string,
    public readonly evaluationId?: string,
    public readonly context?: Record<string, any>
  ) {
    super(message);
  }
}

2. Configuration Validation Schema

const CONFIG_SCHEMA = {
  provider: { type: 'string', enum: ['openai', 'litellm', 'groq', 'openrouter'] },
  apiKey: { type: 'string', minLength: 10 },
  models: { type: 'object', required: ['main'] }
};

3. Request Rate Limiting

Implement rate limiting for evaluation requests to prevent abuse.

📋 Immediate Action Items

High Priority

Fix API key logging - Remove all API key metadata from logs
Add input validation - Implement comprehensive request validation
Configure CORS properly - Replace wildcard with specific origins
Add request size limits - Prevent DoS via large payloads

Medium Priority

Improve error handling - Add context and stack traces
Add configuration validation - Validate provider compatibility
Implement cleanup - Ensure proper resource cleanup on shutdown
Add unit tests - Test core configuration and evaluation logic

Low Priority

Optimize performance - Add connection pooling and caching
Enhanced documentation - Add API documentation and examples

🏆 Conclusion

This is a well-architected and feature-rich addition that significantly enhances the evaluation capabilities. The core design is solid with excellent separation of concerns. The main concerns are around security hardening, comprehensive testing, and production readiness.

Recommendation: Address the security issues and add basic test coverage before merging. The architecture is sound and will provide a strong foundation for evaluation workflows.

Overall Score: 7.5/10 - Excellent architecture with important security/robustness improvements needed.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

front_end/panels/ai_chat/core/AgentService.ts (1)

651-651: Replace direct localStorage access with configuration manager

The method uses this.#configManager.getProvider() for the provider but still directly accesses localStorage for the endpoint check at line 670. This inconsistency could lead to mismatched state.

🧹 Nitpick comments (3)

front_end/panels/ai_chat/core/AgentService.ts (3)

113-178: Consider adding provider-specific error messages

The initialization logic is solid with AUTOMATED_MODE support. However, when validation fails, the error message could be more specific about which provider failed and why.

Consider enhancing the error message to include the provider name:

     if (!validation.isValid) {
-      throw new Error(`Configuration validation failed: ${validation.errors.join(', ')}`);
+      throw new Error(`Configuration validation failed for ${provider}: ${validation.errors.join(', ')}`);
     }

296-307: Consider handling ongoing operations during configuration changes

The configuration change handler properly marks the service for reinitialization. However, if there's an ongoing graph operation when the configuration changes, it might lead to undefined behavior.

Consider canceling any ongoing operations before clearing the graph:

   #handleConfigurationChange(): void {
     logger.info('LLM configuration changed, reinitializing if needed');
 
     // If we're initialized, we need to reinitialize with new configuration
     if (this.#isInitialized) {
+      // Cancel any ongoing graph operations
+      if (this.#runningGraphStatePromise) {
+        logger.info('Canceling ongoing graph operation due to config change');
+        this.#runningGraphStatePromise = undefined;
+      }
+
       // Mark as uninitialized to force reinit on next use
       this.#isInitialized = false;
       this.#graph = undefined;
 
       logger.info('Marked agent service for reinitialization due to config change');
     }
   }

313-328: Add error recovery mechanism for failed reinitialization

The refresh method properly reinitializes the service, but if reinitialization fails, the service is left in an unusable state.

Consider adding a fallback to restore the previous state if reinitialization fails:

   async refreshCredentials(): Promise<void> {
     logger.info('Refreshing credentials and reinitializing agent service');
 
+    // Store current state in case we need to rollback
+    const previousInitialized = this.#isInitialized;
+    const previousGraph = this.#graph;
+
     this.#isInitialized = false;
     this.#graph = undefined;
 
     // Force reinitialization on next use
     try {
       const config = this.#configManager.getConfiguration();
       await this.initialize(config.apiKey || null, config.mainModel, config.miniModel || '', config.nanoModel || '');
       logger.info('Agent service reinitialized successfully');
     } catch (error) {
       logger.error('Failed to reinitialize agent service:', error);
+      // Restore previous state if we were previously initialized
+      if (previousInitialized && previousGraph) {
+        this.#isInitialized = previousInitialized;
+        this.#graph = previousGraph;
+        logger.warn('Restored previous state after failed reinitialization');
+      }
       throw error;
     }
   }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78b020b and 1ab0393.

📒 Files selected for processing (3)

front_end/panels/ai_chat/LLM/OpenAIProvider.ts (2 hunks)
front_end/panels/ai_chat/core/AgentService.ts (9 hunks)
front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts
front_end/panels/ai_chat/LLM/OpenAIProvider.ts

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/config.js : Require OPENAI_API_KEY and PORT (default 8080) from environment configuration

Applied to files:

front_end/panels/ai_chat/core/AgentService.ts

🧬 Code graph analysis (1)

front_end/panels/ai_chat/core/AgentService.ts (3)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

LLMConfigurationManager (42-372)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

front_end/panels/ai_chat/core/Graph.ts (1)

createAgentGraph (24-42)

🔇 Additional comments (4)

front_end/panels/ai_chat/core/AgentService.ts (4)

17-18: LGTM! Clean centralized configuration management setup

Good architectural decision to use a centralized configuration manager. The imports are properly organized and the singleton pattern ensures consistent state management across the application.

Also applies to: 24-24

62-90: LGTM! Proper initialization with configuration change handling

Good initialization of the configuration manager and subscription to configuration changes. This enables real-time config updates without requiring a restart.

193-205: LGTM! Proper AUTOMATED_MODE bypass for API key validation

The implementation correctly handles AUTOMATED_MODE by bypassing API key requirements when appropriate. The provider-specific error messages are clear and helpful.

345-350: LGTM! Proper AUTOMATED_MODE initialization in sendMessage

Good implementation of AUTOMATED_MODE support that ensures the graph is initialized even without an API key. This allows automated testing and evaluation flows to work correctly.

2. ✓ Fixed multi-provider credential preservation in saveProviderSpecificSettings() 3. ✓ Secured logging in both handleStorageChange() and getDebugInfo()

claude · 2025-09-19T01:57:01Z

Claude finished @olesho's task —— View job

I'll analyze this and get back to you.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (3)

253-293: Skip‑credential checks: addressed and looks good.

The AUTOMATED_MODE bypass via skipCredentialChecks is correctly implemented.

295-336: Per‑provider credential persistence: addressed and looks good.

Only the active provider’s keys are updated; others are preserved.

373-389: Debug info redaction: looks good.

apiKey is masked in both override and current configs.

🧹 Nitpick comments (7)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (7)

65-71: Harden provider parsing; avoid unsafe cast.

Validate stored provider against the allowed set instead of casting arbitrary strings.

Apply this diff:

   getProvider(): LLMProvider {
     if (this.overrideConfig?.provider) {
       return this.overrideConfig.provider;
     }
-    const stored = localStorage.getItem(STORAGE_KEYS.PROVIDER);
-    return (stored as LLMProvider) || 'openai';
+    const stored = localStorage.getItem(STORAGE_KEYS.PROVIDER);
+    switch (stored) {
+      case 'openai':
+      case 'litellm':
+      case 'groq':
+      case 'openrouter':
+        return stored;
+      default:
+        return 'openai';
+    }
   }

145-153: Don’t populate optional fields with empty strings in getConfiguration.

Map empty strings to undefined for optional fields to align with LLMConfig and avoid downstream truthiness bugs.

Apply this diff:

   getConfiguration(): LLMConfig {
     return {
       provider: this.getProvider(),
-      apiKey: this.getApiKey(),
+      apiKey: this.getApiKey() || undefined,
       endpoint: this.getEndpoint(),
       mainModel: this.getMainModel(),
-      miniModel: this.getMiniModel(),
-      nanoModel: this.getNanoModel(),
+      miniModel: this.getMiniModel() || undefined,
+      nanoModel: this.getNanoModel() || undefined,
     };
   }

266-269: Trim whitespace when validating mainModel.

Prevent accepting a string of spaces.

Apply this diff:

-    if (!config.mainModel) {
+    if (!config.mainModel?.trim()) {
       errors.push('Main model is required');
     }

341-358: Redact endpoint and tighten event‑key typing in storage logs.

Endpoints can reveal internal hosts or bearer tokens in URLs; redact them too. Also avoid the any cast by using a Set check.

Apply this diff:

-  private handleStorageChange(event: StorageEvent): void {
-    if (event.key && Object.values(STORAGE_KEYS).includes(event.key as any)) {
-      const sensitiveKeys = new Set([
+  private handleStorageChange(event: StorageEvent): void {
+    const allKeys = new Set(Object.values(STORAGE_KEYS));
+    if (event.key && allKeys.has(event.key)) {
+      const sensitiveKeys = new Set<string>([
         STORAGE_KEYS.OPENAI_API_KEY,
         STORAGE_KEYS.LITELLM_API_KEY,
         STORAGE_KEYS.GROQ_API_KEY,
         STORAGE_KEYS.OPENROUTER_API_KEY,
+        STORAGE_KEYS.LITELLM_ENDPOINT, // endpoints may be sensitive
       ]);
-      const redacted =
-        sensitiveKeys.has(event.key as any) ? '(redacted)' :
+      const redacted =
+        sensitiveKeys.has(event.key) ? '(redacted)' :
         (event.newValue ? `${event.newValue.slice(0, 8)}…` : null);
       logger.debug('Configuration changed in another tab', {
         key: event.key,
         newValue: redacted
       });
       this.notifyListeners();
     }
   }

363-371: Debounce change notifications to avoid listener storms.

Saving updates multiple keys, which can spam listeners across tabs. Batch via microtask.

Apply this diff:

 export class LLMConfigurationManager {
   private static instance: LLMConfigurationManager;
   private overrideConfig?: Partial<LLMConfig>; // Override for automated mode
   private changeListeners: Array<() => void> = [];
+  private notifyQueued = false;

   private notifyListeners(): void {
-    this.changeListeners.forEach(listener => {
-      try {
-        listener();
-      } catch (error) {
-        logger.error('Error in configuration change listener:', error);
-      }
-    });
+    if (this.notifyQueued) return;
+    this.notifyQueued = true;
+    queueMicrotask(() => {
+      this.notifyQueued = false;
+      for (const listener of this.changeListeners) {
+        try {
+          listener();
+        } catch (error) {
+          logger.error('Error in configuration change listener:', error);
+        }
+      }
+    });
   }

Also applies to: 43-50

239-251: Return an unsubscribe and prevent duplicate listeners.

Improves ergonomics and avoids accidental duplicate registrations.

Apply this diff:

-  addChangeListener(listener: () => void): void {
-    this.changeListeners.push(listener);
-  }
+  addChangeListener(listener: () => void): () => void {
+    if (!this.changeListeners.includes(listener)) {
+      this.changeListeners.push(listener);
+    }
+    return () => this.removeChangeListener(listener);
+  }

199-220: Minor: consider grouping saves to reduce cross‑tab churn.

If feasible, persist fewer keys (e.g., skip writing unchanged values) to minimize storage events.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ab0393 and 855fb9a.

📒 Files selected for processing (1)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (3)

front_end/panels/ai_chat/core/AgentService.ts (1)

logger (296-307)

front_end/panels/ai_chat/core/Logger.ts (1)

createLogger (316-318)

front_end/panels/ai_chat/LLM/LLMTypes.ts (1)

LLMProvider (142-142)

…e, implemented single consolidated override

claude · 2025-09-19T15:59:45Z

Claude finished @olesho's task —— View job

PR Review: Local Dockerised Eval Server

I've completed a comprehensive review of this substantial PR implementing a local dockerized evaluation server with advanced LLM configuration management. This is an impressive addition with solid architecture, though there are several areas for improvement.

Review Progress:

Analyze core eval server changes
Review LLM configuration management
Check security implementations
Evaluate error handling and validation
Assess performance considerations
Review test coverage needs
Provide comprehensive feedback

🎯 Overall Assessment

This PR introduces a sophisticated evaluation server system with excellent separation of concerns and flexible configuration management. The implementation demonstrates strong architectural decisions with room for security and robustness improvements.

✅ Strengths

Architecture & Design

Excellent modular architecture with clear separation between eval server, agent service, and LLM management
Flexible LLM configuration system supporting both manual and automated modes with override capabilities
Comprehensive protocol design with proper JSON-RPC implementation and type safety
Well-structured tracing integration for observability

Code Quality

Consistent error handling patterns across most components
Good logging implementation with appropriate detail levels
Proper WebSocket connection management with heartbeat and reconnection logic
Comprehensive model configuration documentation (MODEL-CONFIGS.md)

🔍 Areas for Improvement

Critical Security Issues

1. API Key Exposure Risk (api-server.js:349, EvaluationAgent.ts:377-392)

// ISSUE: API keys logged at info level
logger.info('Setting early configuration override', {
  hasApiKey: !!apiKey,
  apiKeyLength: apiKey?.length  // Should not log length
});

Recommendation: Never log API key lengths or related metadata. Use debug level for configuration details.

2. Insufficient Input Validation (api-server.js:263-264)

if (!requestBody.input || typeof requestBody.input !== 'string') {
  throw new Error('Missing or invalid "input" field. Expected a string.');
}

Recommendation: Add comprehensive validation for all request parameters, including size limits and sanitization.

3. CORS Security (api-server.js:61)

res.setHeader('Access-Control-Allow-Origin', '*');

Recommendation: Replace wildcard CORS with specific allowed origins for production use.

Memory & Performance Concerns

4. Potential Memory Leaks (EvalServer.js:117-123)

for (const [clientId, connection] of this.connectedClients) {
  connection.rpcClient.cleanup();
  // Missing: Remove event listeners and clear timers
}

Recommendation: Ensure all event listeners, intervals, and timeouts are properly cleaned up.

5. Missing Request Size Limits (api-server.js:88-91)

for await (const chunk of req) {
  body += chunk; // No size limit
}

Recommendation: Implement request size limits to prevent DoS attacks.

Error Handling & Reliability

6. Incomplete Error Context (EvaluationAgent.ts:633-670)

} catch (error) {
  const errorMessage = error instanceof Error ? error.message : 'Unknown error';
  // Missing: Error stack traces, evaluation context
}

Recommendation: Include evaluation context and stack traces in error responses for better debugging.

7. Inconsistent Timeout Handling (api-server.js:415, EvaluationAgent.ts:573)

timeout: 1500000, // 25 minutes - very long

Recommendation: Implement configurable timeouts with reasonable defaults and proper cleanup on timeout.

Code Quality Issues

8. Complex Configuration Logic (EvaluationAgent.ts:352-412)
The model configuration parsing has nested conditionals and multiple fallbacks that are hard to follow.
Recommendation: Extract to dedicated configuration parser with unit tests.

9. Missing Validation (LLMConfigurationManager.ts:157-167)

setOverride(config: Partial<LLMConfig>): void {
  this.overrideConfig = { ...config }; // No validation
}

Recommendation: Add validation for provider compatibility and required fields.

🧪 Test Coverage Gaps

The PR lacks comprehensive test coverage for:

Configuration override scenarios
Error handling edge cases
WebSocket connection failure scenarios
Multi-provider model configuration
Tracing integration

Recommendation: Add unit tests for core components and integration tests for the evaluation flow.

🚀 Performance Recommendations

1. Connection Pooling

Implement connection pooling for HTTP requests to external LLM providers to reduce latency.

2. Configuration Caching

Cache parsed configurations to avoid repeated localStorage access.

3. Request Debouncing

Add debouncing for rapid configuration changes to prevent unnecessary reinitializations.

🔧 Implementation Suggestions

1. Enhanced Error Types

export class EvaluationError extends Error {
  constructor(
    message: string,
    public readonly code: string,
    public readonly evaluationId?: string,
    public readonly context?: Record<string, any>
  ) {
    super(message);
  }
}

2. Configuration Validation Schema

const CONFIG_SCHEMA = {
  provider: { type: 'string', enum: ['openai', 'litellm', 'groq', 'openrouter'] },
  apiKey: { type: 'string', minLength: 10 },
  models: { type: 'object', required: ['main'] }
};

3. Request Rate Limiting

Implement rate limiting for evaluation requests to prevent abuse.

📋 Immediate Action Items

High Priority

Fix API key logging - Remove all API key metadata from logs
Add input validation - Implement comprehensive request validation
Configure CORS properly - Replace wildcard with specific origins
Add request size limits - Prevent DoS via large payloads

Medium Priority

Improve error handling - Add context and stack traces
Add configuration validation - Validate provider compatibility
Implement cleanup - Ensure proper resource cleanup on shutdown
Add unit tests - Test core configuration and evaluation logic

Low Priority

Optimize performance - Add connection pooling and caching
Enhanced documentation - Add API documentation and examples

🏆 Conclusion

This is a well-architected and feature-rich addition that significantly enhances the evaluation capabilities. The core design is solid with excellent separation of concerns. The main concerns are around security hardening, comprehensive testing, and production readiness.

Recommendation: Address the security issues and add basic test coverage before merging. The architecture is sound and will provide a strong foundation for evaluation workflows.

Overall Score: 7.5/10 - Excellent architecture with important security/robustness improvements needed.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (3)

537-560: Don't merge persisted config into chat input unless an override is actually set.

Using getConfiguration() unconditionally can clobber per-request values with localStorage. Check hasOverride() first.

677-686: Global override cleanup race across finally blocks.

handleEvaluationRequest and executeChatEvaluation both clear the same global override. With concurrency>1, overrides for other evaluations may be cleared unexpectedly.

978-1023: Support partial LLM configuration updates but rollback on validation failure needs improvement.

Current handler correctly validates after saving but the rollback mechanism (restoring currentConfig) has issues: saveConfiguration might fail silently or partially apply changes. Instead, validate the merged config first, then save only if valid.

Apply this diff to validate before persisting:

       // Merge configurations based on partial flag
       const mergedConfig = {
         provider: params.partial ? (params.provider ?? currentConfig.provider) : params.provider,
         apiKey: params.partial ? (params.apiKey ?? currentConfig.apiKey) : params.apiKey,
         endpoint: params.partial ? (params.endpoint ?? currentConfig.endpoint) : params.endpoint,
         mainModel: params.partial ? (params.models?.main ?? currentConfig.mainModel) : params.models.main,
         miniModel: params.partial ? (params.models?.mini ?? currentConfig.miniModel) : params.models?.mini,
         nanoModel: params.partial ? (params.models?.nano ?? currentConfig.nanoModel) : params.models?.nano
       };

-      // Validate the merged configuration
-      const validation = configManager.validateConfiguration();
-
-      // Check validation after setting the merged config temporarily
-      configManager.saveConfiguration(mergedConfig);
-      const postSaveValidation = configManager.validateConfiguration();
-
-      if (!postSaveValidation.isValid) {
-        // Restore the original config if validation fails
-        configManager.saveConfiguration(currentConfig);
+      // Temporarily set override to validate the merged config
+      configManager.setOverride(mergedConfig);
+      const validation = configManager.validateConfiguration();
+      configManager.clearOverride();
+
+      if (!validation.isValid) {
         // Send error response with validation errors
         const errorResponse = createErrorResponse(
           id,
           ErrorCodes.INVALID_PARAMS,
           'Invalid configuration',
-          { errors: postSaveValidation.errors }
+          { errors: validation.errors }
         );

         if (this.client) {
           this.client.send(errorResponse);
         }

         logger.error('Configuration validation failed', {
-          errors: postSaveValidation.errors
+          errors: validation.errors
         });

         return;
       }
+
+      // Save configuration only after validation passes
+      configManager.saveConfiguration(mergedConfig);

🧹 Nitpick comments (1)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (1)
4-4: Remove or update the obsolete cache break comment.

The comment references a date from September 2025, which appears to be in the future. This should either be removed or updated with an accurate date.
-// Cache break: 2025-09-17T17:38:00Z - Fixed API key validation

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 855fb9a and d588841.

📒 Files selected for processing (1)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (12 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-29T23:08:07.342Z

Learnt from: CR
PR: BrowserOperator/browser-operator-core#0
File: eval-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-08-29T23:08:07.342Z
Learning: Applies to eval-server/nodejs/src/evaluator.js : LLM evaluator must integrate with the OpenAI API to judge agent responses

Applied to files:

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts

🧬 Code graph analysis (1)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (6)

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (6)

isLLMConfigurationRequest (297-299)

createAuthVerifyMessage (197-203)

LLMConfigurationRequest (259-264)

createErrorResponse (240-255)

ErrorCodes (141-155)

createLLMConfigurationResponse (314-334)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

front_end/panels/ai_chat/core/AgentService.ts (1)

AgentService (44-787)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (2)

LLMConfigurationManager (42-390)

hasOverride (184-186)

front_end/panels/ai_chat/models/ChatTypes.ts (1)

ChatMessage (74-75)

front_end/panels/ai_chat/tracing/TracingConfig.ts (1)

withTracingContext (313-315)

🔇 Additional comments (3)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (3)

278-284: LGTM! Clean automated-mode authentication bypass.

The implementation correctly checks BUILD_CONFIG.AUTOMATED_MODE and sends a verified auth response early, bypassing the secret key verification flow for Docker/CI deployments.

352-413: Early override configuration extraction handles both flat and nested formats well.

The debug logging is comprehensive and the extraction logic correctly handles both nested (main_model: { provider, model, api_key }) and flat formats with proper fallbacks.

958-961: Override cleanup OK — maxConcurrency = 3; confirm concurrency strategy.

Finally block clears the config override. The register message sets maxConcurrency: 3 in front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts; since concurrency >1 a race condition is still possible — either reduce to 1 or add per-evaluation locking/scoped overrides.

coderabbitai · 2025-09-19T16:12:05Z

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts

+        const provider = input.provider || config.provider || 'openai';
+        const mainModel = input.main_model || config.mainModel || this.judgeModel;
+        const miniModel = input.mini_model ?? config.miniModel ?? this.miniModel;
+        const nanoModel = input.nano_model ?? config.nanoModel ?? this.nanoModel;
+        const apiKey = input.api_key ?? config.apiKey ?? agentService.getApiKey();
+        const endpoint = input.endpoint ?? config.endpoint;
+
+        logger.info('Setting consolidated configuration override for chat evaluation', {
+          provider: provider,
+          mainModel: mainModel,
+          hasApiKey: !!apiKey,
+          hasEndpoint: !!endpoint,
          isInitialized: agentService.isInitialized()
        });
-
-        // Always reinitialize with the current model and explicit mini/nano
-        await agentService.initialize(agentService.getApiKey(), modelName, miniModel, nanoModel);
+
+        // Set single consolidated override with all fallback values
+        configManager.setOverride({
+          provider: provider,
+          mainModel: mainModel,
+          miniModel: miniModel,
+          nanoModel: nanoModel,
+          apiKey: apiKey || undefined,
+          endpoint: endpoint || undefined
+        });


🛠️ Refactor suggestion

🧩 Analysis chain

Single override point with merged fallback values (looks good but ensure no duplicates).

The consolidation looks correct: you set a single override with proper fallback values (input → config → defaults), avoiding the duplicate override issue. Just ensure this is the only setOverride call in this flow.

🏁 Script executed:

#!/bin/bash # Verify there's only one setOverride call in the executeChatEvaluation flow rg -A3 -B3 'configManager\.setOverride' front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts | grep -n 'setOverride'

Length of output: 76

🏁 Script executed:

#!/bin/bash set -euo pipefail file="front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts" echo "File: $file" echo echo "Search for executeChatEvaluation definition:" rg -n --line-number 'executeChatEvaluation' "$file" || true echo echo "Matches for configManager.setOverride with context:" rg -n -C5 'configManager\.setOverride' "$file" || true

Length of output: 1023

Consolidate duplicate configManager.setOverride calls — keep a single consolidated override or explicitly justify both.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts — configManager.setOverride at line 385.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts — configManager.setOverride at line 864.

🤖 Prompt for AI Agents

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts lines ~385 and 848-871: there are duplicate configManager.setOverride calls; keep a single consolidated override (the one at 848-871 which already collects provider, mainModel, miniModel, nanoModel, apiKey and endpoint) or, if both are required for different lifecycles, add a clear comment justifying the second call and ensure the earlier call only sets the minimal/intentional fields. Fix by removing the redundant call at line 385 (or merging its unique fields into the consolidated call at 848-871), and run a quick search for usages relying on the earlier override to confirm no behavioral regression.

coderabbitai

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (17)

eval-server/nodejs/schemas/client.schema.json (1)
156-158: Add "llm_judge" to client.schema.json enum for backward compatibility

Many eval YAMLs under eval-server/nodejs/evals use type: "llm_judge" (underscore). Expand the enum in eval-server/nodejs/schemas/client.schema.json to accept both forms to avoid validation failures.
           "type": "string",
-          "enum": ["llm-judge", "snapshot", "hybrid"]
+          "enum": ["llm-judge", "llm_judge", "snapshot", "hybrid"]
front_end/panels/ai_chat/tools/FetcherTool.ts (1)
99-116: Abort handling: stop processing on AbortError instead of accumulating failures.

Current loop catches AbortError per URL and continues, producing many failed entries post‑abort. Short‑circuit and return partial results.
   for (const url of urlsToProcess) {
     throwIfAborted();
     try {
       logger.info('Processing URL', { url });
       const fetchedContent = await this.fetchContentFromUrl(url, reasoning, ctx);
       results.push(fetchedContent);
     } catch (error: any) {
-      logger.error('Error processing URL', { url, error: error.message, stack: error.stack });
-      results.push({
-        url,
-        title: '',
-        markdownContent: '',
-        success: false,
-        error: `Failed to process URL: ${error.message}`
-      });
+      if (error instanceof DOMException && error.name === 'AbortError') {
+        logger.info('Fetch aborted; stopping further URL processing.');
+        return {
+          sources: results,
+          success: results.some(r => r.success),
+          error: 'aborted'
+        };
+      }
+      logger.error('Error processing URL', { url, error: error.message, stack: error.stack });
+      results.push({
+        url,
+        title: '',
+        markdownContent: '',
+        success: false,
+        error: `Failed to process URL: ${error.message}`
+      });
     }
   }
front_end/panels/ai_chat/tools/Tools.ts (3)
1129-1159: Escape selector to avoid JS eval breakage/injection

selector is embedded unescaped into an evaluated string. Quotes or backticks in selector can break the expression or be abused.

Use JSON.stringify for safe injection:
-      const result = await target.runtimeAgent().invoke_evaluate({
-        expression: `(() => {
-          const element = document.querySelector("${selector}");
+      const result = await target.runtimeAgent().invoke_evaluate({
+        expression: `(() => {
+          const sel = ${JSON.stringify(selector)};
+          const element = document.querySelector(sel);
           if (!element) {
             return {
               success: false,
-              message: "Element not found with selector: ${selector}"
+              message: "Element not found with selector: " + sel
             };
           }
1204-1213: Escape query string in evaluated code

query is directly interpolated; escape to prevent syntax errors/injection.
-          const query = "${query}";
+          const query = ${JSON.stringify(query)};
           const limit = ${limit};
3181-3199: Escape XPath in evaluated code

xpath is injected into document.evaluate without escaping; can break the expression.
-          expression: `
-            (function() {
-              const element = document.evaluate("${xpath}", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
+          expression: `
+            (function() {
+              const xpath = ${JSON.stringify(xpath)};
+              const element = document.evaluate(xpath, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
               if (!element) return { found: false };
front_end/panels/ai_chat/evaluation/test-cases/schema-extractor-tests.ts (2)

120-126: Replace remaining 'extract_schema' UI references with 'extract_data'

TOOL_TEST_MAPPING and the default toolType still reference 'extract_schema' which doesn't match ToolRegistry (it registers 'extract_data'); update these to 'extract_data'.

front_end/panels/ai_chat/ui/EvaluationDialog.ts:28 — TOOL_TEST_MAPPING key 'extract_schema'

front_end/panels/ai_chat/ui/EvaluationDialog.ts:175 — initial state toolType: 'extract_schema'

24-31: Action required: 'extract_data' routes to SchemaBasedExtractorTool (not CombinedExtractionTool)

createSchemaTest uses 'extract_data' but ToolRegistry.registerToolFactory('extract_data', () => new SchemaBasedExtractorTool()) — CombinedExtractionTool is named 'navigate_url_and_extraction' and is not bound to 'extract_data'.

Files to inspect/fix:

front_end/panels/ai_chat/evaluation/test-cases/schema-extractor-tests.ts (uses 'extract_data')

front_end/panels/ai_chat/agent_framework/implementation/ConfiguredAgents.ts (registerToolFactory('extract_data', ...))

front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts (name = 'extract_data')

front_end/panels/ai_chat/tools/CombinedExtractionTool.ts (name = 'navigate_url_and_extraction')

Resolution options:

Update tests to use 'navigate_url_and_extraction' if CombinedExtractionTool is intended, or

Change ConfiguredAgents.ts to register CombinedExtractionTool under 'extract_data' and verify CombinedExtractionTool's argument/response contract matches the test expectations.
config/gni/devtools_grd_files.gni (1)
661-676: Missing LLMConfigurationManager in GRD may break imports at runtime

AI summary and PR context reference LLMConfigurationManager, but it isn't listed here. If any frontend code imports it, the GRD omission will 404 in DevTools.

Add it alongside the other ai_chat core files:
   "front_end/panels/ai_chat/core/AgentErrorHandler.js",
+  "front_end/panels/ai_chat/core/LLMConfigurationManager.js",
   "front_end/panels/ai_chat/core/AgentDescriptorRegistry.js",
front_end/panels/ai_chat/ui/ChatView.ts (1)
1015-1032: Avoid navigating DevTools UI with window.location; open in new tab via host API

Using window.location.href navigates the DevTools front-end away from DevTools. Use InspectorFrontendHost to open external links safely.

Apply:
+import * as Host from '../../../core/host/host.js';
@@
-    // Navigate to OpenAI API keys page in current window
-    window.location.href = 'https://platform.openai.com/settings/organization/api-keys';
+    // Open in a new tab to avoid navigating DevTools UI
+    Host.InspectorFrontendHost.InspectorFrontendHostInstance.openInNewTab(
+      'https://platform.openai.com/settings/organization/api-keys'
+    );
Optional: prefer DevTools settings storage over raw localStorage for provider selection.
front_end/panels/ai_chat/ui/__tests__/AIChatPanel.test.ts (1)
194-209: Fix assertion: Map.get returns undefined, not null.

Your mock localStorage uses a Map; after delete, Map.get(...) returns undefined. The current strictEqual(null) will fail.

Apply:
-      // Should clear mini model (falls back to main model)
-      assert.strictEqual(mockLocalStorage.get('ai_chat_mini_model'), null);
+      // Should clear mini model (falls back to main model)
+      assert.strictEqual(mockLocalStorage.get('ai_chat_mini_model'), undefined);
+      // Or, if you want to assert via the Storage API semantics:
+      assert.strictEqual(window.localStorage.getItem('ai_chat_mini_model'), null);
front_end/panels/ai_chat/core/ToolNameMap.ts (1)
46-57: Potential spec violation: collision resolution can exceed 64 chars.

After appending - and _, names can exceed 64 even if sanitize() truncated. This violates the provider regex and may be rejected.

Apply:
   let candidate = sanitize(original);
   if (sanitizedToOriginal.has(candidate) && sanitizedToOriginal.get(candidate) !== original) {
     const suffix = shortHash(original);
-    const base = candidate.replace(/_+$/g, '');
-    candidate = base + '-' + suffix;
+    const base = candidate.replace(/_+$/g, '');
+    // Ensure total length <= 64 after adding "-<suffix>"
+    const maxBase = Math.max(1, 64 - 1 - suffix.length);
+    candidate = (base.length > maxBase ? base.slice(0, maxBase) : base) + '-' + suffix;
   }
   let unique = candidate;
   let counter = 1;
   while (sanitizedToOriginal.has(unique) && sanitizedToOriginal.get(unique) !== original) {
-    const add = `_${counter++}`;
-    unique = candidate + add;
+    const add = `_${counter++}`;
+    // Ensure total length <= 64 when adding numeric suffix
+    const maxCandidate = Math.max(1, 64 - add.length);
+    unique = candidate.length > maxCandidate ? candidate.slice(0, maxCandidate) + add : candidate + add;
   }
Also consider logging when post-collision trimming occurs for observability.
front_end/panels/ai_chat/ui/AIChatPanel.ts (5)
53-64: Stop logging secret previews in StorageMonitor (severe secret leakage).

Current logs include value previews/lengths for keys that may contain API secrets. Remove them.
-    localStorage.setItem = (key: string, value: string) => {
-      if (key.includes('openrouter') || key.includes('ai_chat')) {
-        logger.debug(`=== LOCALSTORAGE SET ===`);
-        logger.debug(`Key: ${key}`);
-        logger.debug(`Value exists: ${!!value}`);
-        logger.debug(`Value length: ${value?.length || 0}`);
-        logger.debug(`Value preview: ${value?.substring(0, 50) + (value?.length > 50 ? '...' : '') || 'null'}`);
-        logger.debug(`Timestamp: ${new Date().toISOString()}`);
-      }
-      return this.originalSetItem(key, value);
-    };
+    localStorage.setItem = (key: string, value: string) => {
+      if (key.includes('openrouter') || key.includes('ai_chat')) {
+        logger.debug(`=== LOCALSTORAGE SET ===`);
+        logger.debug(`Key: ${key}`);
+        logger.debug(`Timestamp: ${new Date().toISOString()}`);
+      }
+      return this.originalSetItem(key, value);
+    };
1071-1091: Remove API key presence/length logging on OAuth success.

Avoid logging any characteristics of secrets.
-    window.addEventListener('openrouter-oauth-success', async () => {
+    window.addEventListener('openrouter-oauth-success', async () => {
       logger.info('=== OAUTH SUCCESS EVENT RECEIVED IN AICHATPANEL ===');
-      logger.info('Timestamp:', new Date().toISOString());
-      logger.info('Current localStorage state for OpenRouter:');
-      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
-      const authMethod = localStorage.getItem('openrouter_auth_method');
-      logger.info('- API key exists:', !!apiKey);
-      logger.info('- API key length:', apiKey?.length || 0);
-      logger.info('- Auth method:', authMethod);
+      logger.info('Timestamp:', new Date().toISOString());
+      const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
1104-1115: Avoid logging secret lengths on storage change events.
-      if (event.key === 'ai_chat_openrouter_api_key' || 
+      if (event.key === 'ai_chat_openrouter_api_key' || 
           event.key === 'openrouter_auth_method') {
         logger.info('=== STORAGE CHANGE EVENT FOR OPENROUTER ===');
         logger.info('Changed key:', event.key);
-        logger.info('Old value exists:', !!event.oldValue);
-        logger.info('New value exists:', !!event.newValue);
-        logger.info('New value length:', event.newValue?.length || 0);
         logger.info('Re-initializing agent service after storage change...');
         this.#initializeAgentService();
       }
1144-1154: Remove API key presence/length logs from refreshCredentials.
-    logger.info('Current OpenRouter storage state:');
-    const apiKey = localStorage.getItem('ai_chat_openrouter_api_key');
-    const authMethod = localStorage.getItem('openrouter_auth_method');
-    logger.info('- API key exists:', !!apiKey);
-    logger.info('- API key length:', apiKey?.length || 0);
-    logger.info('- Auth method:', authMethod);
+    // Redacted OpenRouter storage state (do not log secrets)
1532-1541: Do not log API key prefixes.

Even partial keys are sensitive.
-        logger.info('Retrieved API key:');
-        logger.info('- Exists:', !!apiKey);
-        logger.info('- Length:', apiKey?.length || 0);
-        logger.info('- Prefix:', apiKey?.substring(0, 8) + '...' || 'none');
+        logger.info('Retrieved API key (redacted):', { exists: !!apiKey });
front_end/panels/ai_chat/agent_framework/AgentRunner.ts (1)

889-902: Complex handoff logic with nested session handling could lead to lost sessions

The handoff execution for LLM-triggered tools has complex session management that could result in orphaned or incorrectly linked sessions if an error occurs during handoff.

Consider extracting the handoff session management into a separate, well-tested utility function to reduce complexity and potential for errors.

coderabbitai · 2025-09-23T14:09:32Z

eval-server/nodejs/evals/schema-extractor/google-search-001.yaml

+tool: "extract_data"
 timeout: 45000



🛠️ Refactor suggestion

🧩 Analysis chain

Tool rename to "extract_data" and longer timeout look good; verify end‑to‑end support.

Confirm server/tool registry, schemas, and tests accept "extract_data" (no lingering "extract_schema_data" usage).

🏁 Script executed:

#!/bin/bash # Check for any remaining references to the old tool name rg -nP '\bextract_schema_data\b' -C2 # Spot YAML evals still using the old tool field rg -nP --type=yaml -n -C2 '\btool:\s*"?extract_schema_data"?' # Sanity-check that the new name is present in configs and schema enums rg -nP -n -C2 '\btool:\s*"?extract_data"?' rg -nP -n -C3 '"evaluation"\s*:.*\{[^}]*"tool"[^}]*\benum\b[^}]*\bextract_data\b'

Length of output: 9746

Replace remaining "extract_schema_data" references with "extract_data" and confirm server/tool registry + tests accept the new name.

Docs/examples still use the old tool name while eval YAMLs/templates use "extract_data" — update the docs/snippets and ensure any server enums/switches or tests accept "extract_data".

Affected files (examples found):

eval-server/nodejs/docs/PROTOCOL.md

eval-server/nodejs/docs/YAML_SCHEMA.md

eval-server/nodejs/docs/CLIENT_SETUP.md

eval-server/nodejs/docs/TRIGGERING_EVALUATIONS.md

🤖 Prompt for AI Agents

In eval-server/nodejs/evals/schema-extractor/google-search-001.yaml around lines 12 to 14, the YAML uses the new tool name "extract_data" but other docs and code still reference "extract_schema_data"; search and replace all remaining "extract_schema_data" occurrences with "extract_data" in the listed docs (eval-server/nodejs/docs/PROTOCOL.md, YAML_SCHEMA.md, CLIENT_SETUP.md, TRIGGERING_EVALUATIONS.md) and update any server-side registries/enums/switch statements and unit/integration tests that validate tool names to accept "extract_data" (or map the old name to the new one). After changes, run the test suite and CI to ensure no references remain and the new name is accepted.

coderabbitai · 2025-09-23T14:09:32Z

eval-server/nodejs/evals/web-task-agent/realestate-001.yaml

      - "Coordinated property type selection through action_agent"
      - "Applied search filters through proper action_agent calls"
-      - "Extracted property listings with schema_based_extractor"
+      - "Extracted property listings with extract_data"


🛠️ Refactor suggestion

🧩 Analysis chain

Ensure repo-wide extractor rename alignment.

Double-check that client schema/tool enums and templates now use extract_data.

Run:

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Find lingering old extractor names..." rg -n -C1 -g '!**/node_modules/**' -e 'schema_based_extractor' -e 'extract_schema_data' || true echo echo "Validate schema/tool enums reference extract_data:" fd default-client.yaml | xargs -I{} rg -n -C1 'extract_data' {} fd client.schema.json | xargs -I{} rg -n -C2 'extract_data|extract_schema_data|schema_based_extractor' {}

Length of output: 465

Ensure repo-wide extractor rename alignment

default-client.yaml already uses "extract_data" (lines 29–31) but client.schema.json still enumerates "extract_schema_streamlined" alongside "extract_data" (lines 131–135); update/remove the legacy enum value and align any templates/libs to "extract_data", then run a repo-wide search for other old extractor names.

🤖 Prompt for AI Agents

In eval-server/nodejs/evals/web-task-agent/realestate-001.yaml around line 52 the extractor name is "extract_data" but client.schema.json (around lines 131–135) still includes the legacy enum value "extract_schema_streamlined"; remove that legacy enum entry from client.schema.json (leaving only "extract_data"), update any templates or library references that still use "extract_schema_streamlined" to "extract_data", and run a repo-wide search/replace to ensure all extractor names are aligned and update tests or docs if required.

coderabbitai · 2025-09-23T14:09:32Z

eval-server/nodejs/evals/web-task-agent/web-task-agent-realestate-001.yaml

      - "Coordinated property type selection through action_agent"
      - "Applied search filters through proper action_agent calls"
-      - "Extracted property listings with schema_based_extractor"
+      - "Extracted property listings with extract_data"


🛠️ Refactor suggestion

🧩 Analysis chain

Consistency check: fully migrate to extract_data across configs/schemas.

Ensure no lingering references to schema_based_extractor or extract_schema_data remain in evals, templates, or schemas.

Run to verify:

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Searching for old extractor names..." rg -n -C1 -g '!**/node_modules/**' -e 'schema_based_extractor' -e 'extract_schema_data' || true echo echo "Confirming extract_data is used for evaluation.tool values..." rg -n -C1 -g '!**/node_modules/**' -e 'tool:\s*"extract_data"' -e '"tool"\s*:\s*"extract_data"' || true echo echo "Check client schema enum still doesn't reference old names..." fd client.schema.json | xargs -I{} rg -n -C2 -e 'extract_schema_data|schema_based_extractor|extract_data' {}

Length of output: 629

Consistency check — fully migrate to extract_data across configs/schemas.
No matches for schema_based_extractor or extract_schema_data, but client.schema.json's enum still contains "extract_schema_streamlined" alongside "extract_data" — remove/rename that legacy entry so only "extract_data" remains.
client.schema.json (enum, ~lines 131–135)

🤖 Prompt for AI Agents

In eval-server/nodejs/evals/web-task-agent/web-task-agent-realestate-001.yaml around line 52 and in client.schema.json around lines 131–135, the enum still contains the legacy value "extract_schema_streamlined" alongside "extract_data"; remove (or rename) that legacy enum entry so the enum only includes "extract_data", and scan the repo/configs for any remaining references to "extract_schema_streamlined" (rename them to "extract_data" or delete) to complete the migration to extract_data.

coderabbitai · 2025-09-23T14:09:32Z

eval-server/nodejs/templates/default-client.yaml

      wait_timeout: 5000

-    tool: "extract_schema_data"
+    tool: "extract_data"


🛠️ Refactor suggestion

🧩 Analysis chain

Potential schema key mismatch with new tool name

You switched to tool: "extract_data" but input still uses “schema” (not “extraction_schema” like other configs). If the new tool expects extraction_schema, this example will fail.

Confirm the expected input key for extract_data.

If needed, update this template to:

input.extraction_schema: <schema…>

Or ensure the tool supports both keys.

Also check validator type naming consistency (“llm-judge” vs “llm_judge” used elsewhere).

Run:

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Search extract_data handlers:" rg -n -C3 -S 'extract_data' echo echo "Check llm[-_]judge variants:" rg -n -C1 -S '\bllm[_-]judge\b' eval-server

Length of output: 101830

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== SchemaBasedExtractorTool top ===" sed -n '1,260p' front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts || true echo echo "=== Search for related arg/interface names and any 'extraction_schema' usage ===" rg -n --hidden --no-ignore 'SchemaExtractionArgs|SchemaExtractionResult|extraction_schema|extractionSchema' || true echo echo "=== client.schema.json (start) ===" sed -n '1,260p' eval-server/nodejs/schemas/client.schema.json || true echo echo "=== Search for 'schema' keys in eval-server templates/evals to confirm usage ===" rg -n --hidden -S --glob 'eval-server/**' '\binput:\s*$' -n -C1 || true rg -n --hidden -S --glob 'eval-server/**' '\bschema\b' -n -C1 || true

Length of output: 87952

No change for extract_data input; fix inconsistent llm-judge naming

extract_data expects input.schema (SchemaBasedExtractorTool requires a "schema") — default-client.yaml's input.schema is correct (see front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts and eval-server/nodejs/templates/default-client.yaml).

Validation type is inconsistent: client.schema.json and EvalServer.js expect validation.type == "llm-judge" (hyphen) but many eval YAMLs use "llm_judge" (underscore) — e.g., files under eval-server/nodejs/evals/** (streamlined-schema-extractor/*.yaml). Normalize validation.type to "llm-judge" across YAMLs or update eval-server/nodejs/schemas/client.schema.json and eval-server/nodejs/src/lib/EvalServer.js to accept "llm_judge" as well.

🤖 Prompt for AI Agents

In eval-server/nodejs/templates/default-client.yaml around line 30, there's a mismatch in validation.type naming: some YAMLs use "llm_judge" (underscore) while client.schema.json and EvalServer.js expect "llm-judge" (hyphen). Fix by normalizing to "llm-judge" across all eval YAMLs under eval-server/nodejs/evals/** (replace "llm_judge" with "llm-judge"); alternatively, if you prefer to accept the underscore, update eval-server/nodejs/schemas/client.schema.json and eval-server/nodejs/src/lib/EvalServer.js to allow both "llm-judge" and "llm_judge" (e.g., accept either value in validation.type) and then run tests to ensure validation passes.

coderabbitai · 2025-09-23T14:09:32Z

front_end/panels/ai_chat/agent_framework/AgentRunner.ts

+      if (abortSignal?.aborted) {
+        logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`);
+        const abortResult = createErrorResult('Execution was cancelled', messages, 'error');
+        return { ...abortResult, agentSession: currentSession };
+      }


⚠️ Potential issue

Early abort check doesn't clean up properly

When execution is aborted, the function returns immediately without proper cleanup or status updates for the agent session.

// Check if execution has been aborted if (abortSignal?.aborted) { logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`); + // Update session status before returning + currentSession.status = 'error'; + currentSession.endTime = new Date(); + currentSession.terminationReason = 'cancelled'; const abortResult = createErrorResult('Execution was cancelled', messages, 'error'); return { ...abortResult, agentSession: currentSession }; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (abortSignal?.aborted) {

logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`);

const abortResult = createErrorResult('Execution was cancelled', messages, 'error');

return { ...abortResult, agentSession: currentSession };

}

if (abortSignal?.aborted) {

logger.info(`${agentName} execution aborted at iteration ${iteration + 1}/${maxIterations}`);

// Update session status before returning

currentSession.status = 'error';

currentSession.endTime = new Date();

currentSession.terminationReason = 'cancelled';

const abortResult = createErrorResult('Execution was cancelled', messages, 'error');

return { ...abortResult, agentSession: currentSession };

}

coderabbitai · 2025-09-23T14:09:34Z

front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts

 */
 export class SchemaBasedExtractorTool implements Tool<SchemaExtractionArgs, SchemaExtractionResult> {
-  name = 'extract_schema_data';
+  name = 'extract_data';


⚠️ Potential issue

🧩 Analysis chain

Rename to 'extract_data' is consistent with config changes.

Good alignment with YAML/tooling.

🏁 Script executed:

#!/bin/bash # Ensure no references to the old name linger in code/JSON/YAML/tests rg -nP '\bextract_schema_data\b' -C2

Length of output: 5869

Rename to 'extract_data' — update remaining docs referencing 'extract_schema_data'

Change in SchemaBasedExtractorTool.ts is correct, but search shows many occurrences of "extract_schema_data" in eval-server/nodejs/docs (YAML_SCHEMA.md, CLIENT_SETUP.md, PROTOCOL.md, TRIGGERING_EVALUATIONS.md). Update those docs or add a compatibility alias so clients aren’t broken.

🤖 Prompt for AI Agents

In front_end/panels/ai_chat/tools/SchemaBasedExtractorTool.ts around line 39, the tool name was changed to 'extract_data' but many docs and clients still reference 'extract_schema_data'; update all occurrences in eval-server/nodejs/docs (YAML_SCHEMA.md, CLIENT_SETUP.md, PROTOCOL.md, TRIGGERING_EVALUATIONS.md) to 'extract_data' and/or add a compatibility alias so old clients continue working: either update documentation examples and protocol specs to the new name, and add server-side handling that accepts 'extract_schema_data' and maps it to 'extract_data' (or vice versa) to avoid breaking clients while rolling out the rename.

coderabbitai · 2025-09-23T14:09:34Z

front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts

+      // Add delay before retry to prevent overloading the LLM (abortable)
      if (attempt > 1) {
-        await new Promise(resolve => setTimeout(resolve, this.RETRY_DELAY_MS));
+        await this.sleep(this.RETRY_DELAY_MS, ctx?.abortSignal);
      }


🛠️ Refactor suggestion

🧩 Analysis chain

LGTM: retry delay is now abortable; consider propagating abort to LLM calls too

Nice improvement. Also pass ctx.abortSignal into callLLMWithTracing (if supported) so in‑flight calls cancel promptly.

// If callLLMWithTracing supports it, add: // options: { retryConfig: {...}, signal: ctx?.abortSignal }

Run to verify types and support:

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check LLMContext includes abortSignal rg -n -C3 -P '(interface|type)\s+LLMContext\b' rg -n -C2 '\babortSignal\b' | head -n 50 # Inspect callLLMWithTracing signature/options for 'signal' support rg -n -C5 -P 'export\s+(async\s+)?function\s+callLLMWithTracing\b|callLLMWithTracing\('

Length of output: 20076

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- front_end/panels/ai_chat/tools/Tools.ts (around LLMContext) ----" sed -n '1,240p' front_end/panels/ai_chat/tools/Tools.ts || true echo echo "---- front_end/panels/ai_chat/tools/LLMTracingWrapper.ts (full) ----" sed -n '1,400p' front_end/panels/ai_chat/tools/LLMTracingWrapper.ts || true echo echo "---- front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts (around call sites) ----" sed -n '1,520p' front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts || true

Length of output: 29124

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Searching for LLMCallOptions and LLMClient..." rg -n --hidden -S "LLMCallOptions" || true rg -n --hidden -S "interface LLMClient\b|class LLMClient\b|export class LLMClient" || true echo echo "---- front_end/panels/ai_chat/LLM/LLMTypes.ts (if exists) ----" sed -n '1,240p' front_end/panels/ai_chat/LLM/LLMTypes.ts || true echo echo "---- front_end/panels/ai_chat/LLM/LLMClient.ts (if exists) ----" sed -n '1,400p' front_end/panels/ai_chat/LLM/LLMClient.ts || true

Length of output: 19739

🏁 Script executed:

#!/bin/bash set -euo pipefail files=( "front_end/panels/ai_chat/LLM/LLMProvider.ts" "front_end/panels/ai_chat/LLM/OpenAIProvider.ts" "front_end/panels/ai_chat/LLM/OpenRouterProvider.ts" "front_end/panels/ai_chat/LLM/LiteLLMProvider.ts" "front_end/panels/ai_chat/LLM/GroqProvider.ts" ) for f in "${files[@]}"; do echo "---- $f ----" sed -n '1,420p' "$f" || true echo done echo "---- Search for callWithMessages definitions/usages ----" rg -n --hidden "callWithMessages\(" || true

Length of output: 56564

Propagate AbortSignal through LLM call stack

sleep(...) is already abortable via ctx.abortSignal, but callLLMWithTracing won't cancel in‑flight LLM requests: LLMCallOptions (LLMTypes.ts) has no signal and providers don't pass an AbortSignal to fetch. Action: add signal?: AbortSignal to LLMCallOptions, accept/forward options.signal in callLLMWithTracing -> LLMClient.call -> provider.callWithMessages, and pass { signal: options.signal } to fetch in provider implementations (OpenAIProvider.ts, LiteLLMProvider.ts, OpenRouterProvider.ts, GroqProvider.ts). Then update call sites (e.g. front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts) to include options: { retryConfig: {...}, signal: ctx?.abortSignal } so LLM calls abort promptly.

🤖 Prompt for AI Agents

In front_end/panels/ai_chat/tools/StreamlinedSchemaExtractorTool.ts around lines 183-186, the code awaits an abortable sleep but does not propagate ctx.abortSignal into the LLM call stack so in‑flight LLM requests cannot be cancelled; add support for AbortSignal by adding signal?: AbortSignal to LLMCallOptions (LLMTypes.ts), update callLLMWithTracing to accept and forward options.signal into LLMClient.call, modify LLMClient.call to forward options.signal to provider.callWithMessages, update each provider implementation (OpenAIProvider.ts, LiteLLMProvider.ts, OpenRouterProvider.ts, GroqProvider.ts) to accept an optional signal and pass { signal: options.signal } into fetch/use of HTTP clients, and finally update call sites (including this file) to include options: { retryConfig: {...}, signal: ctx?.abortSignal } so LLM requests are aborted promptly when the context is cancelled.

coderabbitai · 2025-09-23T14:09:34Z

front_end/panels/ai_chat/ui/AIChatPanel.ts

+  #onMCPConnectorsClick(): void {
+    MCPConnectorsCatalogDialog.show({
+      onClose: async () => {
+        // Refresh MCP registry when catalog is closed in case new connectors were added
+        try {
+          await MCPRegistry.init(true);
+          logger.info('MCP registry refreshed after catalog closed');
+        } catch (error) {
+          logger.error('Failed to refresh MCP registry after catalog closed', error);
+        }
+      }
+    });
+  }


⚠️ Potential issue

🧩 Analysis chain

Confirm MCP reinit semantics on catalog close.

You call MCPRegistry.init(true) but not refresh(); other places do both. Ensure init(true) covers refresh or call refresh() here too.

🏁 Script executed:

#!/bin/bash # Inspect MCPRegistry.init to verify whether it performs refresh and what it returns. rg -n -C3 --type=ts 'class\s+MCPRegistry|static\s+init\s*\(|init\s*\('

Length of output: 23632

🏁 Script executed:

#!/bin/bash set -euo pipefail file="front_end/panels/ai_chat/mcp/MCPRegistry.ts" echo "== File and lines around init ==" nl -ba "$file" | sed -n '420,520p' echo echo "== Search for refresh definition/calls in file ==" rg -n --hidden --no-ignore -C2 '\brefresh\s*\(' "$file" || true echo echo "== Show surrounding lines for any refresh definition (if found) ==" rg -n --hidden --no-ignore '\basync\s+refresh\s*\(|\brefresh\s*:\s*async\b' "$file" -n -C3 || true

Length of output: 73

🏁 Script executed:

#!/bin/bash set -euo pipefail file="front_end/panels/ai_chat/mcp/MCPRegistry.ts" echo "== init region (420-500) ==" awk 'NR>=420 && NR<=500{printf "%6d: %s\n",NR,$0}' "$file" || true echo echo "== lines with 'refresh(' (with context) ==" rg -n -C3 '\brefresh\s*\(' "$file" || true echo echo "== possible 'async refresh' definitions ==" rg -n '\basync\s+refresh\s*\(' "$file" || true echo echo "== fallback: full file with numbers (first 900 lines) ==" awk '{printf "%6d: %s\n",NR,$0}' "$file" > /tmp/mcprestxt || true sed -n '1,900p' /tmp/mcprestxt || true

Length of output: 31635

Call MCPRegistry.refresh() after init(true) in the catalog onClose handler.

init(...) only connects and runs refreshToolsForServer (it lists tools) — refresh() is required to register tools with ToolRegistry. Update front_end/panels/ai_chat/ui/AIChatPanel.ts (onClose handler ~lines 2126–2132) to call await MCPRegistry.refresh() after await MCPRegistry.init(true).

🤖 Prompt for AI Agents

In front_end/panels/ai_chat/ui/AIChatPanel.ts around lines 2124 to 2136, the onClose handler currently calls await MCPRegistry.init(true) but does not call MCPRegistry.refresh(), so tools are not registered with the ToolRegistry; update the try block to await MCPRegistry.init(true) followed immediately by await MCPRegistry.refresh(), and then log the success (e.g., "MCP registry refreshed after catalog closed") in the same try; keep the existing catch to log errors.

coderabbitai · 2025-09-23T14:09:34Z

front_end/third_party/mcp-sdk/LICENSE

+The MIT License (MIT)

-Copyright (c) 2024 Anthropic, PBC
+Copyright (c) 2015-2017 Evgeny Poberezkin



⚠️ Potential issue

🧩 Analysis chain

License mismatch: content appears to be Ajv’s MIT license, not MCP SDK’s.

This is a compliance risk. Restore the correct MCP SDK license here and place Ajv’s license under the Ajv vendored path (or avoid vendoring Ajv docs). Mixing third‑party licenses under a misleading path can cause attribution errors.

🏁 Script executed:

#!/bin/bash # Show indicators that this directory contains Ajv content rather than MCP SDK head -n 3 front_end/third_party/mcp-sdk/README.md || true rg -n 'Ajv' front_end/third_party/mcp-sdk -C2 || true rg -n 'Anthropic|Model Context Protocol|MCP' front_end/third_party/mcp-sdk -C2 || true

Length of output: 284608

License mismatch — restore MCP SDK license and relocate Ajv's MIT

front_end/third_party/mcp-sdk contains Ajv sources (README.md, lib/, dist/, ajv/) but front_end/third_party/mcp-sdk/LICENSE is Ajv’s MIT (Evgeny Poberezkin). Replace that file with the correct MCP SDK license, and move Ajv’s MIT into the Ajv vendored path (e.g. front_end/third_party/mcp-sdk/ajv/LICENSE or front_end/third_party/ajv/LICENSE) — or stop vendoring Ajv docs.

coderabbitai · 2025-09-23T14:09:34Z

front_end/third_party/mcp-sdk/MCPOAuthProvider.ts

+          const { code, state } = this.parseAuthCallback(url);
+          const expected = sessionStorage.getItem(this.STATE_KEY);
+          if (expected && state && expected !== state) {
+            throw new Error('oauth_invalid_state');
+          }
+          // Cleanup state now that we used it
+          sessionStorage.removeItem(this.STATE_KEY);
+          // Return user to original page


⚠️ Potential issue

Enforce OAuth state validation when expected is set.

If a stored state exists, require an exact match; missing state should be rejected. Prevents CSRF on the redirect.

- const expected = sessionStorage.getItem(this.STATE_KEY); - if (expected && state && expected !== state) { - throw new Error('oauth_invalid_state'); - } - // Cleanup state now that we used it - sessionStorage.removeItem(this.STATE_KEY); + const expected = sessionStorage.getItem(this.STATE_KEY); + if (expected) { + if (!state || expected !== state) { + throw new Error('oauth_invalid_state'); + } + // Cleanup state only after successful validation + sessionStorage.removeItem(this.STATE_KEY); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const { code, state } = this.parseAuthCallback(url);

const expected = sessionStorage.getItem(this.STATE_KEY);

if (expected && state && expected !== state) {

throw new Error('oauth_invalid_state');

}

// Cleanup state now that we used it

sessionStorage.removeItem(this.STATE_KEY);

// Return user to original page

const { code, state } = this.parseAuthCallback(url);

const expected = sessionStorage.getItem(this.STATE_KEY);

if (expected) {

if (!state || expected !== state) {

throw new Error('oauth_invalid_state');

}

// Cleanup state only after successful validation

sessionStorage.removeItem(this.STATE_KEY);

}

// Return user to original page

🤖 Prompt for AI Agents

In front_end/third_party/mcp-sdk/MCPOAuthProvider.ts around lines 136 to 143, the OAuth state validation only rejects when both expected and state exist but differ; update the logic so that if a stored expected state exists it must be present in the callback and must match exactly. Replace the current conditional with a check that if (expected) then if (!state || expected !== state) throw new Error('oauth_invalid_state');, then proceed to remove the STATE_KEY and continue returning the user.

olesho added 8 commits September 16, 2025 04:52

Skipping secret key verification in automated mode

2a8a461

Initial refactoring of LLM config

e9a4af4

Added plan

700d1a9

Fix the build issues

7ec8302

Extra configs for automated mode

ce6b75f

Fix syntax

7724f65

Bypass key verification for chat panel in automated mode

4cda0cd

Updated package-lock.json

78b020b

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

Extra verifications skipped

1ab0393

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

1. ✓ Added skipCredentialChecks parameter to validateConfiguration()

855fb9a

2. ✓ Fixed multi-provider credential preservation in saveProviderSpecificSettings() 3. ✓ Secured logging in both handleStorageChange() and getDebugInfo()

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

Refactoring and improvements; specifically removed the double overrid…

d588841

…e, implemented single consolidated override

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

Merged version

38b725c

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

olesho closed this Sep 23, 2025

Local Dockerised Eval Server #50

Local Dockerised Eval Server #50

Uh oh!

Conversation

olesho commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

claude bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Local Dockerised Eval Server

🎯 Overall Assessment

✅ Strengths

Architecture & Design

Code Quality

🔍 Areas for Improvement

Critical Security Issues

Memory & Performance Concerns

Error Handling & Reliability

Code Quality Issues

🧪 Test Coverage Gaps

🚀 Performance Recommendations

1. Connection Pooling

2. Configuration Caching

3. Request Debouncing

🔧 Implementation Suggestions

1. Enhanced Error Types

2. Configuration Validation Schema

3. Request Rate Limiting

📋 Immediate Action Items

High Priority

Medium Priority

Low Priority

🏆 Conclusion

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Local Dockerised Eval Server

🎯 Overall Assessment

✅ Strengths

Architecture & Design

Code Quality

🔍 Areas for Improvement

Critical Security Issues

Memory & Performance Concerns

Error Handling & Reliability

olesho commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 18, 2025 •

edited

Loading

claude bot commented Sep 18, 2025 •

edited

Loading

claude bot commented Sep 19, 2025 •

edited

Loading

claude bot commented Sep 19, 2025 •

edited

Loading

claude bot commented Sep 19, 2025 •

edited

Loading