Skip to content

feat: implement server-grouped model list with health monitoring and resource conflict detection#9

Merged
bjeans merged 6 commits intomainfrom
claude/organize-model-list-ZwkBE
Dec 20, 2025
Merged

feat: implement server-grouped model list with health monitoring and resource conflict detection#9
bjeans merged 6 commits intomainfrom
claude/organize-model-list-ZwkBE

Conversation

@bjeans
Copy link
Copy Markdown
Owner

@bjeans bjeans commented Dec 18, 2025

Backend changes:

  • Add comprehensive schemas for model health, size, server info, and selection analysis
  • Implement LiteLLM client methods for /v1/model/info and /health/latest endpoints
  • Create model processor service with intelligent grouping, size parsing, and provider inference
  • Add new API endpoints: /models/by-server and /models/analyze-selection
  • Support for duplicate detection across multiple Ollama servers

Frontend changes:

  • Create ServerGroup component to display models grouped by Ollama host
  • Enhance ModelCard with health status, size badges, and performance metrics
  • Add SelectionAnalysis component with real-time conflict warnings and recommendations
  • Update useLLMModels hook to support server grouping and selection analysis
  • Create utility functions for model selection and formatting
  • Add comprehensive CSS styling with color-coded health/size indicators

Key features:

  • Group models by Ollama server (10.0.1.14, 10.0.1.150, etc.)
  • Display health status with response times from LiteLLM health checks
  • Show model sizes (tiny/small/medium/large) with memory estimates
  • Detect and warn about resource conflicts (multiple large models on same host)
  • Identify duplicate models across servers with performance recommendations
  • Resolve :latest tags to actual versions (e.g., llama3.2:latest → 3b)
  • Backward compatible with simple flat model list view

This helps users avoid memory contention by preventing selection of multiple
large models on the same Ollama server, improving council debate performance.

…resource conflict detection

Backend changes:
- Add comprehensive schemas for model health, size, server info, and selection analysis
- Implement LiteLLM client methods for /v1/model/info and /health/latest endpoints
- Create model processor service with intelligent grouping, size parsing, and provider inference
- Add new API endpoints: /models/by-server and /models/analyze-selection
- Support for duplicate detection across multiple Ollama servers

Frontend changes:
- Create ServerGroup component to display models grouped by Ollama host
- Enhance ModelCard with health status, size badges, and performance metrics
- Add SelectionAnalysis component with real-time conflict warnings and recommendations
- Update useLLMModels hook to support server grouping and selection analysis
- Create utility functions for model selection and formatting
- Add comprehensive CSS styling with color-coded health/size indicators

Key features:
- Group models by Ollama server (10.0.1.14, 10.0.1.150, etc.)
- Display health status with response times from LiteLLM health checks
- Show model sizes (tiny/small/medium/large) with memory estimates
- Detect and warn about resource conflicts (multiple large models on same host)
- Identify duplicate models across servers with performance recommendations
- Resolve :latest tags to actual versions (e.g., llama3.2:latest → 3b)
- Backward compatible with simple flat model list view

This helps users avoid memory contention by preventing selection of multiple
large models on the same Ollama server, improving council debate performance.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 18, 2025

🔍 Vulnerabilities of bjeans/multi-ai-chat:pr-9

📦 Image Reference bjeans/multi-ai-chat:pr-9
digestsha256:d51bb603b20ae3ad860d2e83868dc4187b5b938df5cc13dd45cd622d35aaa9c6
vulnerabilitiescritical: 0 high: 0 medium: 0 low: 0
size48 MB
packages74
📦 Base Image python:3-alpine
also known as
  • 3-alpine3.23
  • 3.14-alpine
  • 3.14-alpine3.23
  • 3.14.2-alpine
  • 3.14.2-alpine3.23
  • 897f7d46efcffcccfc367bab2142c1e247247caaaa53b9065240cad2531ee121
  • alpine
  • alpine3.23
digestsha256:079b889b270e730f38bdbabf505362538db449a216895c2c45664b1538bd34d5
vulnerabilitiescritical: 0 high: 0 medium: 0 low: 0

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive server-grouped model list with health monitoring and resource conflict detection for LiteLLM models across multiple Ollama servers. The feature helps users avoid memory contention by detecting and warning about resource conflicts when selecting multiple large models on the same host.

Key Changes:

  • Added backend services for model processing with intelligent grouping, size parsing, provider inference, and health status integration
  • Created new API endpoints for server-grouped models and selection analysis with conflict detection
  • Built frontend components for server grouping visualization, health status display, and real-time selection analysis with warnings

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
frontend/src/utils/modelSelection.js New utility functions for model lookup, diversity analysis, server statistics, and formatting helpers
frontend/src/styles/index.css Comprehensive styling for server groups, health badges, size tiers, warnings, and selection analysis UI
frontend/src/hooks/useLLMModels.js Enhanced hook with server grouping support, selection tracking, and analysis integration
frontend/src/components/ServerGroup.jsx New component to display models grouped by Ollama server with collapsible sections and resource warnings
frontend/src/components/SelectionAnalysis.jsx Component showing diversity scores, warnings, and recommendations for model selections
frontend/src/components/ModelCard.jsx Enhanced card with health status, size badges, performance metrics, and duplicate detection
frontend/src/components/QueryInput.jsx Updated to support both server-grouped and flat model views with selection analysis integration
frontend/src/App.jsx Integration of new hooks and props for server grouping functionality
backend/services/model_processor.py Core service for parsing model metadata, grouping by server, detecting duplicates, and analyzing selections
backend/services/litellm_client.py Added methods to fetch model info and health status from LiteLLM endpoints
backend/models/schemas.py New schemas for model health, size, server info, warnings, and selection analysis
backend/api/config.py New endpoints for server-grouped models and selection analysis with caching

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +65 to +66
{rec.from_server.split('//')[1]} to{' '}
{rec.to_server.split('//')[1]}
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server URL parsing using split('//')[1] will throw an error if the URL doesn't contain '//'. Add defensive checks or use the URL API for more robust parsing to handle edge cases where the server URL format might be unexpected.

Copilot uses AI. Check for mistakes.
Comment on lines +229 to +257
all_models = []
for server_data in server_map.values():
all_models.extend(server_data["models"])

# Group by base_model:actual_tag to find duplicates
model_groups = {}
for model in all_models:
key = f"{model.base_model}:{model.actual_tag}"
if key not in model_groups:
model_groups[key] = []
model_groups[key].append(model)

# Mark duplicates and set better_server
for key, models in model_groups.items():
if len(models) > 1:
# Sort by TPM (descending) to find best server
models_sorted = sorted(
models,
key=lambda m: m.server_tpm,
reverse=True
)
best_server = models_sorted[0].api_base

for model in models:
model.is_duplicate = True
model.duplicate_count = len(models)
if model.api_base != best_server:
model.better_server = best_server

Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate detection iterates through all_models multiple times. With many models across servers, this could become inefficient. Consider collecting models into model_groups dictionary in a single pass during the initial model creation loop (lines 159-226) rather than iterating again through all_models.

Copilot uses AI. Check for mistakes.
Comment thread backend/api/config.py Outdated
Comment on lines +53 to +63
global _server_groups_cache

# Use cache if available (TODO: add expiration logic)
if _server_groups_cache is not None:
return _server_groups_cache

client = LiteLLMClient()
raw_data = await client.get_model_info()

server_groups = await process_models_with_health(raw_data)
_server_groups_cache = server_groups
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global _server_groups_cache is accessed and modified without any locking mechanism in an async context. This can lead to race conditions where multiple concurrent requests might all see the cache as None and trigger redundant expensive processing operations. Consider using asyncio.Lock to protect cache access or implement a proper async caching solution.

Copilot uses AI. Check for mistakes.
Comment thread backend/api/config.py Outdated
"""
global _server_groups_cache

# Use cache if available (TODO: add expiration logic)
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment mentions adding expiration logic but provides no details about the intended cache duration or invalidation strategy. Consider documenting the expected cache behavior or creating a proper ticket to track this work. Without expiration, the cache will become stale indefinitely after the first request.

Copilot uses AI. Check for mistakes.
Comment thread frontend/src/components/ServerGroup.jsx Outdated
Comment on lines +18 to +21
<div
className="server-header"
onClick={() => setExpanded(!expanded)}
>
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server header is clickable to expand/collapse but has no ARIA attributes to indicate its role or state. Add role="button", aria-expanded={expanded}, and aria-controls attributes to improve accessibility for screen reader users who need to understand that this is an expandable section.

Copilot uses AI. Check for mistakes.
Comment thread backend/services/model_processor.py Outdated
warnings.append(SelectionWarning(
severity="high",
server=api_base,
message=f"⚠️ {len(large_models)} large models selected on {api_base.split('//')[1]}. Expect significant delays during model swapping.",
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string formatting split('//')[1] will throw an IndexError if the api_base doesn't contain '//'. This same issue appears multiple times in the codebase. Add defensive checks to handle malformed URLs gracefully, or use urllib.parse to properly parse URLs.

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +87
<button
className="details-toggle"
onClick={(e) => {
e.stopPropagation();
setShowDetails(!showDetails);
}}
>
{showDetails ? '▲' : '▼'}
</button>
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details toggle button should have an aria-label or aria-expanded attribute to indicate its purpose and state to screen reader users. Consider adding aria-expanded={showDetails} and aria-label="Toggle model details" for better accessibility.

Copilot uses AI. Check for mistakes.
Comment thread frontend/src/App.jsx Outdated
selectionAnalysis,
loading: modelsLoading,
handleModelSelect,
clearSelection
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable clearSelection.

Suggested change
clearSelection

Copilot uses AI. Check for mistakes.
Comment thread backend/services/model_processor.py Outdated
Model processing service for organizing and analyzing LiteLLM models
"""
import re
from typing import Dict, List, Tuple
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Tuple' is not used.

Suggested change
from typing import Dict, List, Tuple
from typing import Dict, List

Copilot uses AI. Check for mistakes.
Comment thread backend/services/model_processor.py Outdated
"""
import re
from typing import Dict, List, Tuple
from datetime import datetime
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'datetime' is not used.

Suggested change
from datetime import datetime

Copilot uses AI. Check for mistakes.
…sibility improvements

  - Fix URL parsing vulnerabilities using urlparse/URL API with fallbacks
  - Fix cache race condition with asyncio.Lock
  - Implement configurable cache expiration (default 120s via CACHE_TTL_SECONDS)
  - Remove unused imports (Tuple, datetime) and variables (clearSelection)
  - Add accessibility attributes to ServerGroup and ModelCard components
  - Replace non-semantic div with button for ServerGroup expandable header
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 20 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +11
export function findModelById(modelId, serverGroups) {
for (const group of serverGroups) {
const model = group.models.find(m => m.id === modelId);
if (model) return model;
}
return null;
}
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns null when a model isn't found instead of a more descriptive result. While this works, the function name 'findModelById' suggests it should return the model or throw an error. Consider renaming to 'findModelByIdOrNull' or documenting the null return behavior clearly to avoid confusion.

Copilot uses AI. Check for mistakes.
case 'unhealthy':
return '✗';
default:
return '?';
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default case returns '?' for unknown health status, but this icon may not be intuitive to users. Consider using a more descriptive icon or text label like 'Unknown' or '⚪' (hollow circle) to better indicate uncertain status.

Suggested change
return '?';
return 'Unknown';

Copilot uses AI. Check for mistakes.
Comment thread frontend/src/components/ModelCard.jsx Outdated
Comment on lines +6 to +24
getSizeClassName
} from '../utils/modelSelection';

/**
* Safely extract hostname from URL string
* @param {string} url - URL to parse (e.g., "http://192.168.1.100:11434")
* @returns {string} hostname or full URL if parsing fails
*/
function extractHostname(url) {
try {
const parsed = new URL(url);
return parsed.hostname;
} catch {
// Fallback: try to extract after // and before :
const match = url.match(/\/\/([^:\/]+)/);
return match ? match[1] : url;
}
}

Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same extractHostname function is duplicated here as well. This is a third occurrence of identical code across the codebase. Consolidate all instances into a single utility function in modelSelection.js.

Suggested change
getSizeClassName
} from '../utils/modelSelection';
/**
* Safely extract hostname from URL string
* @param {string} url - URL to parse (e.g., "http://192.168.1.100:11434")
* @returns {string} hostname or full URL if parsing fails
*/
function extractHostname(url) {
try {
const parsed = new URL(url);
return parsed.hostname;
} catch {
// Fallback: try to extract after // and before :
const match = url.match(/\/\/([^:\/]+)/);
return match ? match[1] : url;
}
}
getSizeClassName,
extractHostname,
} from '../utils/modelSelection';

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +105
function getDiversityLevel(score) {
if (score >= 70) return 'good';
if (score >= 40) return 'fair';
return 'poor';
}
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getDiversityLevel function is defined at the bottom of the file and only used internally. Following React conventions, helper functions should typically be defined outside the component (at the top of the file) or explicitly marked as internal. Consider moving it above the component definition for better code organization.

Copilot uses AI. Check for mistakes.
Comment thread backend/api/config.py Outdated
Comment on lines 18 to 20
CACHE_TTL_SECONDS = int(os.getenv("CACHE_TTL_SECONDS", "120")) # Default 2 minutes


Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CACHE_TTL_SECONDS environment variable is read with a string default but immediately cast to int. If the environment variable contains an invalid value, this will raise a ValueError. Consider adding error handling or validation for the environment variable value.

Suggested change
CACHE_TTL_SECONDS = int(os.getenv("CACHE_TTL_SECONDS", "120")) # Default 2 minutes
def _get_cache_ttl(default: int = 120) -> int:
"""
Retrieve cache TTL from environment, falling back to a safe default if invalid.
"""
raw_value = os.getenv("CACHE_TTL_SECONDS")
if raw_value is None:
return default
try:
ttl = int(raw_value)
except ValueError:
return default
# Ensure TTL is positive; otherwise, fall back to default.
if ttl <= 0:
return default
return ttl
CACHE_TTL_SECONDS = _get_cache_ttl() # Default 2 minutes

Copilot uses AI. Check for mistakes.

return (
<div
className={`model-card ${isSelected ? 'selected' : ''} ${getSizeClassName(model.size.size_tier)}`}
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSS class 'selected' is applied to model cards, but this class is not defined in the stylesheet. The code should use 'model-card-selected' which is defined at line 36, or the missing 'selected' class should be added to the stylesheet.

Suggested change
className={`model-card ${isSelected ? 'selected' : ''} ${getSizeClassName(model.size.size_tier)}`}
className={`model-card ${isSelected ? 'model-card-selected' : ''} ${getSizeClassName(model.size.size_tier)}`}

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +51
hasConflict: largeModels.length > 1 || (largeModels.length >= 1 && mediumModels.length >= 1)
};
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conflict detection logic uses a hardcoded threshold (hasConflict: largeModels.length > 1 || ...). This business logic is duplicated in multiple places (here and in model_processor.py). Consider extracting this to a shared constant or configuration to ensure consistency across the codebase.

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +68
export function formatMemory(gb) {
if (gb < 1) {
return `${Math.round(gb * 1024)}MB`;
} else {
return `${gb}GB`;
}
}
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatMemory function converts values less than 1 GB to MB but doesn't handle the edge case where gb is 0 or negative. While unlikely with the current data, adding validation would make the function more robust.

Copilot uses AI. Check for mistakes.
checked={isSelected}
onChange={onSelect}
onClick={(e) => e.stopPropagation()}
tabIndex={-1}
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkbox inside the model card has tabIndex={-1} and aria-hidden="true", which removes it from keyboard navigation. However, the parent div handles keyboard events. This creates a confusing experience where the visual checkbox can't be directly focused. Consider either making the checkbox focusable or using a custom styled element that better reflects the actual interaction model.

Suggested change
tabIndex={-1}

Copilot uses AI. Check for mistakes.
Comment thread backend/api/config.py
Comment on lines +87 to +98
@router.post("/models/analyze-selection", response_model=SelectionAnalysis)
async def analyze_model_selection(request: AnalyzeSelectionRequest):
"""
Analyze selected models for resource conflicts and provide recommendations
"""
# Get current server groups
server_groups = await get_models_by_server()

# Analyze selection
analysis = await analyze_selection(request.model_ids, server_groups)

return analysis
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyze_selection endpoint doesn't validate that the model_ids in the request actually exist in the current server groups before processing. While the code handles missing models gracefully by filtering them out, this could hide issues where clients send stale or incorrect model IDs. Consider adding validation or returning information about invalid IDs.

Copilot uses AI. Check for mistakes.
  - Consolidate duplicate extractHostname() function into shared modelSelection.js utility
  - Add error handling for CACHE_TTL_SECONDS parsing with validation and logging
  - Update CLAUDE.md with Docker build validation requirements
  - Addresses Copilot review feedback: DRY principle and crash prevention
@bjeans bjeans merged commit 9d114da into main Dec 20, 2025
2 checks passed
@bjeans bjeans deleted the claude/organize-model-list-ZwkBE branch December 20, 2025 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants