A comprehensive Model Context Protocol (MCP) server for tool registration, discovery, and management with semantic search capabilities and seamless LiteLLM integration. Built with Python 3.11+, FastAPI, FastMCP, and PostgreSQL + pgvector for production-ready deployment.
β οΈ Important Migration Notice: If you're upgrading from a previous version, please read the Embedding Dimension Migration Guide before running migrations. The embedding dimension has been updated to match the application configuration.
Note: This project provides a centralized tool registry that syncs with external MCP servers and integrates with LiteLLM for unified tool access across multiple LLM providers.
- Features
- Architecture
- Quick Start
- Upgrading
- Kubernetes Deployment
- API Documentation
- Configuration
- Code Quality
- Advanced tool search using vector embeddings and similarity matching
- Support for natural language queries to find relevant tools (e.g., "calculate addition" finds calculator)
- Hybrid search combining keyword and semantic matching
- Fuzzy search with typo tolerance
- FastMCP Server: Built on the fastmcp framework for optimal performance
- MCP Resources: Exposes registry data via MCP resources (
toolbox://categories,toolbox://stats,toolbox://tools/{category}) - MCP Prompts: Reusable prompt templates for tool discovery, execution, and workflow planning
- Automatic External MCP Server Discovery: Connects to external MCP servers and syncs their tools
- Bidirectional Sync:
- Syncs tools FROM external MCP servers
- Syncs tools TO/FROM LiteLLM gateway
- Namespacing: Tools are namespaced by server (e.g.,
server_name:tool_name) - Tool Execution: Execute tools directly via REST API or MCP protocol
- Two-way Integration:
- Tools can be discovered from LiteLLM's MCP registry
- Tools can be executed via LiteLLM's MCP REST API
- Automatic Sync: Syncs tools from LiteLLM MCP servers on startup
- Tool Deletion: Deactivates tools that no longer exist in LiteLLM
- Token-Efficient Tool Execution:
call_tool_summarizedautomatically summarizes large tool outputs to reduce token usage by 80-90% - Configurable Thresholds: Set
max_tokensto control when summarization triggers (default: 2000 tokens) - Context-Aware Summaries: Provide
summarization_contexthints to focus summaries on relevant information (e.g., "Focus on error messages") - Graceful Fallback: Falls back to truncation if LLM summarization fails
- Transparency: Response includes
was_summarizedflag and token estimates
- Complete Kubernetes deployment with production-grade manifests
- PostgreSQL with pgvector extension for vector storage
- All configuration externalized via environment variables with validation
- Health checks (liveness, readiness with proper HTTP 503 responses), metrics, and OpenTelemetry observability support
- Modern Python 3.11+ with type hints (
X | None,list[str],dict[str, Any]) - Comprehensive input validation and SQL injection protection
- Python 3.11 slim-based Docker containers
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββ
β MCP Client βββββΆβ LiteLLM Gateway βββββΆβ Toolbox ββββββ€ MCP Servers β
β (Claude, etc.) β β (Port 4000) β β (Port 8000) β β (External) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β MCP HTTP Server β β PostgreSQL + β βEmbedding Serviceβ
β (Port 8080) β β pgvector β β (LM Studio/API) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
| Component | Port | Description |
|---|---|---|
| Toolbox REST API | 8000 | FastAPI REST endpoints for tool management |
| Toolbox MCP Server | 8080 | FastMCP server for MCP Inspector/clients |
| LiteLLM | 4000 | MCP gateway with multiple MCP servers |
| PostgreSQL | 5432 | Database with pgvector for embeddings |
- LiteLLM Sync: Toolbox syncs tools from LiteLLM's MCP servers
- Tool Registration: Tools are stored with vector embeddings for semantic search
- Tool Execution:
call_toolroutes to appropriate executor (LiteLLM, MCP, Python, HTTP) - Semantic Search: Natural language queries find relevant tools using similarity search
- Python 3.11+
- PostgreSQL 16 with pgvector extension
- Kubernetes (Docker Desktop, minikube, or cloud provider)
- Local embedding service (e.g., LM Studio) or OpenAI API key
- Clone the repository
git clone <repository-url>
cd Toolbox- Install dependencies
pip install -r requirements.txt- Set up PostgreSQL with pgvector
docker run -d \
--name postgres-pgvector \
-e POSTGRES_DB=toolregistry \
-e POSTGRES_USER=toolregistry \
-e POSTGRES_PASSWORD=devpassword \
-p 5432:5432 \
pgvector/pgvector:pg16- Configure environment
export DATABASE_URL="postgresql+asyncpg://toolregistry:devpassword@localhost:5432/toolregistry"
export EMBEDDING_ENDPOINT_URL="http://localhost:1234/v1/embeddings"
export EMBEDDING_API_KEY="dummy-key"
export EMBEDDING_DIMENSION="768"- Run migrations
alembic upgrade head- Start the application
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000- Test the endpoints
# Health check
curl http://localhost:8000/health
# List tools
curl -X POST http://localhost:8000/mcp/list_tools \
-H "Content-Type: application/json" -d '{}'
# Semantic search
curl -X POST http://localhost:8000/mcp/find_tool \
-H "Content-Type: application/json" \
-d '{"query": "temperature conversion", "limit": 5}'
# Execute a tool
curl -X POST http://localhost:8000/mcp/call_tool \
-H "Content-Type: application/json" \
-d '{"tool_name": "converter-convert_temperature", "arguments": {"value": 100, "from_unit": "celsius", "to_unit": "fahrenheit"}}'EMBEDDING_DIMENSION).
Before upgrading:
-
Set your embedding dimension based on your embedding model:
# For OpenAI text-embedding-ada-002 (default) export EMBEDDING_DIMENSION=1536 # For Nomic embed-text-v1.5 export EMBEDDING_DIMENSION=768 # For other models, check their documentation
-
Run the migration:
alembic upgrade head
This will:
- Update the vector column dimension
- Clear existing embeddings (incompatible dimensions)
- Recreate the vector index
-
Regenerate embeddings:
# Regenerate all tool embeddings python scripts/regenerate_embeddings.py # Or with concurrent workers for faster processing python scripts/regenerate_embeddings.py --concurrent 10 # Or for specific category python scripts/regenerate_embeddings.py --category math
For detailed migration instructions, see the Embedding Dimension Migration Guide.
For fresh installations, this migration is handled automatically - just set EMBEDDING_DIMENSION before running alembic upgrade head.
- Build the Docker image
docker build -f Dockerfile.otel -t toolbox:latest .- Deploy to Kubernetes
# Create namespace and deploy all components
kubectl apply -f k8s/namespace/
kubectl apply -f k8s/postgres/
kubectl apply -f k8s/toolbox/- Verify deployment
kubectl get pods -n toolbox
kubectl get services -n toolbox- Access services
# REST API
kubectl port-forward svc/toolbox 8000:8000 -n toolbox
# MCP Server (for MCP Inspector)
kubectl port-forward svc/toolbox-mcp-http 8080:8080 -n toolboxToolbox/
βββ app/ # Main application code
β βββ api/ # FastAPI endpoints
β β βββ admin.py # Admin API endpoints
β β βββ mcp.py # MCP protocol endpoints
β βββ models/ # Database models
β βββ registry/ # Tool registry and search
β βββ services/ # MCP discovery service
β βββ adapters/ # LiteLLM adapter
β βββ execution/ # Tool execution engine
β βββ observability/ # OpenTelemetry instrumentation
β βββ main.py # Application entry point
β βββ mcp_fastmcp_server.py # FastMCP server module
βββ k8s/ # Kubernetes manifests
β βββ namespace/ # Namespace definition
β βββ postgres/ # PostgreSQL deployment
β βββ toolbox/ # Toolbox deployments
βββ helm/ # Helm chart
βββ alembic/ # Database migrations
βββ scripts/ # Runtime scripts
βββ tests/ # Test suite
βββ examples/ # Example scripts
βββ Dockerfile.otel # Production Dockerfile
βββ otel-collector-config.yaml # OpenTelemetry config
POST /mcp/list_tools
Content-Type: application/json
{"limit": 50}POST /mcp/find_tool
Content-Type: application/json
{
"query": "calculator for basic math",
"limit": 10,
"threshold": 0.7
}POST /mcp/call_tool
Content-Type: application/json
{
"tool_name": "converter-convert_temperature",
"arguments": {
"value": 100,
"from_unit": "celsius",
"to_unit": "fahrenheit"
}
}POST /admin/mcp/sync-from-liteLLM
Content-Type: application/json
X-API-Key: dev-api-keyPOST /admin/mcp/sync
Content-Type: application/json
X-API-Key: dev-api-keyGET /health- Application health statusGET /ready- Readiness probeGET /live- Liveness probe
Access Swagger UI at http://localhost:8000/docs when running locally.
| Variable | Description | Required | Default |
|---|---|---|---|
DATABASE_URL |
PostgreSQL connection string | Yes | - |
SECRET_KEY |
Application secret key | Yes | - |
LOG_LEVEL |
Logging level | No | INFO |
CORS_ORIGINS |
Allowed CORS origins | No | * |
| Variable | Description | Required | Default |
|---|---|---|---|
EMBEDDING_ENDPOINT_URL |
Embedding service URL | Yes | - |
EMBEDDING_API_KEY |
Embedding service API key | Yes | - |
EMBEDDING_DIMENSION |
Embedding vector dimension | No | 768 |
| Variable | Description | Required | Default |
|---|---|---|---|
LITELLM_SYNC_ENABLED |
Enable LiteLLM sync | No | true |
LITELLM_MCP_SERVER_URL |
LiteLLM server URL | No | - |
LITELLM_MCP_API_KEY |
LiteLLM API key | No | - |
| Variable | Description | Required | Default |
|---|---|---|---|
MCP_SERVERS |
JSON array of MCP servers | No | [] |
MCP_AUTO_SYNC_ON_STARTUP |
Auto-sync on startup | No | true |
MCP_REQUEST_TIMEOUT |
Request timeout (seconds) | No | 30.0 |
| Variable | Description | Required | Default |
|---|---|---|---|
OTEL_ENABLED |
Enable OpenTelemetry | No | false |
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP endpoint | No | - |
OTEL_SERVICE_NAME |
Service name | No | toolbox |
Connect MCP Inspector to the FastMCP server:
- Start the MCP HTTP deployment:
kubectl apply -f k8s/toolbox/mcp-http-deployment.yaml- Port-forward:
kubectl port-forward svc/toolbox-mcp-http 8080:8080 -n toolbox- Connect MCP Inspector to:
http://localhost:8080/mcp
| Tool | Description |
|---|---|
find_tools |
Search for tools using natural language |
call_tool |
Execute a tool by name |
list_tools |
List all available tools |
get_tool_schema |
Get schema for a specific tool |
| Resource URI | Description |
|---|---|
toolbox://categories |
List all tool categories |
toolbox://stats |
Registry statistics (counts by category, implementation type) |
toolbox://tools/{category} |
List tools in a specific category |
| Prompt | Description |
|---|---|
tool_discovery_prompt |
Generate a prompt for discovering tools for a task |
tool_execution_prompt |
Generate a prompt for executing a specific tool |
workflow_planning_prompt |
Generate a prompt for planning multi-tool workflows |
-
Tools not syncing from LiteLLM
- Verify LiteLLM is running and accessible
- Check
LITELLM_MCP_SERVER_URLandLITELLM_MCP_API_KEY - Ensure LiteLLM has MCP servers configured
-
Semantic search not working
- Verify embedding service is running
- Check
EMBEDDING_ENDPOINT_URLconnectivity - Ensure vectors are generated in database
-
Tool execution failing
- Check tool's
implementation_type(LITELLM, MCP_SERVER, PYTHON_CODE, etc.) - Verify LiteLLM connectivity for LITELLM type tools
- Check logs:
kubectl logs -l app=toolbox -n toolbox
- Check tool's
# Check Toolbox logs
kubectl logs deployment/toolbox -n toolbox
# Check MCP HTTP server logs
kubectl logs deployment/toolbox-mcp-http -n toolbox
# Verify database connection
kubectl exec -it deployment/toolbox -n toolbox -- python3 -c \
"from app.db.session import engine; print('DB OK')"
# Count tools in database
kubectl exec -it deployment/postgres -n toolbox -- psql -U toolregistry \
-d toolregistry -c "SELECT COUNT(*) FROM tools;"This project follows FastAPI, FastMCP, and Python best practices:
- Lifespan Context Manager: Uses
@asynccontextmanagerfor startup/shutdown instead of deprecated@app.on_event() - Annotated Dependencies: Uses
Annotated[Type, Depends()]for cleaner dependency injection - Response Models: All endpoints define response models including error cases (400, 404, 500)
- Proper HTTP Status Codes: Readiness probe returns HTTP 503 when not ready
- Modern Type Hints: Uses Python 3.11+ syntax (
str | None,list[str],dict[str, Any]) - Pydantic v2: Uses
model_config = ConfigDict()instead of nestedConfigclass - Module Exports: All modules define
__all__for explicit public API
- Input Validation: Comprehensive validation for all user inputs
- SQL Injection Protection: Parameterized queries and identifier validation
- XSS Prevention: Input sanitization for string values
- Configuration Validation: URL format, positive integers, reasonable ranges
- OpenTelemetry: Full tracing and metrics with noop fallbacks when disabled
- Structured Logging: Exception logging with
logger.exception()for stack traces - Health Checks: Detailed health endpoints with component-level status
- Connection Pooling: Configurable pool size, overflow, timeout, and connection recycling
- Async Operations: Full async/await support with SQLAlchemy 2.0
For detailed improvement history, see tickets.md.
Built for the MCP ecosystem with LiteLLM integration.