A production-ready Model Context Protocol (MCP) server deployed to Cloudflare Workers, providing HTTP-based semantic search with Workers AI and Vectorize.
Any Client ──HTTP──> Workers MCP Server ──> Workers AI + Vectorize
This is a fully remote MCP server - no local dependencies required. Accessible from anywhere via HTTP.
- âś… HTTP-to-MCP Adapter: Custom implementation (MCP SDK expects stdio, we use HTTP)
- âś… Semantic Search: Natural language queries with vector similarity
- âś… Edge Deployment: Runs globally on Cloudflare's network
- âś… Workers AI Integration:
bge-small-en-v1.5embeddings (384 dimensions) - âś… Vectorize Search: HNSW indexing for fast similarity search
- âś… CORS Enabled: Works with web apps and API clients
- âś… Production Ready: Includes error handling, proper responses
The official MCP SDK uses stdio transport (standard input/output), which works for local processes but not for serverless Workers. We built a custom HTTP adapter that implements the MCP protocol over HTTP.
- Cloudflare account with Workers enabled
- Wrangler CLI installed
- Vectorize index created and populated
1. Clone and install:
git clone https://github.com/dannwaneri/mcp-server-worker.git
cd mcp-server-worker
npm install2. Create Vectorize index:
wrangler vectorize create mcp-knowledge-base --dimensions=384 --metric=cosine3. Configure wrangler.jsonc:
4. Deploy:
wrangler deployYour MCP server will be available at: https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev
You need to populate your Vectorize index first. Use the vectorize-mcp-worker to do this:
curl -X POST https://vectorize-mcp-worker.YOUR-SUBDOMAIN.workers.dev/populateHealth check endpoint.
Response:
{
"status": "ok",
"server": "mcp-server-worker",
"version": "1.0.0"
}MCP protocol endpoint. Accepts JSON-RPC style requests.
Request:
{
"method": "tools/list",
"params": {}
}Response:
{
"tools": [
{
"name": "semantic_search",
"description": "Search the knowledge base using semantic similarity...",
"inputSchema": {
"type": "object",
"properties": {
"query": { "type": "string" },
"topK": { "type": "number", "default": 5 }
},
"required": ["query"]
}
}
]
}Request:
{
"method": "tools/call",
"params": {
"name": "semantic_search",
"arguments": {
"query": "vector databases",
"topK": 3
}
}
}Response:
{
"content": [
{
"type": "text",
"text": "{\"query\":\"vector databases\",\"resultsCount\":3,\"results\":[...]}"
}
]
}List tools:
curl -X POST https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp \
-H "Content-Type: application/json" \
-d '{"method":"tools/list","params":{}}'Semantic search:
curl -X POST https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp \
-H "Content-Type: application/json" \
-d '{
"method": "tools/call",
"params": {
"name": "semantic_search",
"arguments": {"query": "AI embeddings", "topK": 5}
}
}'const response = await fetch('https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
method: 'tools/call',
params: {
name: 'semantic_search',
arguments: { query: 'vector databases', topK: 3 }
}
})
});
const data = await response.json();
const results = JSON.parse(data.content[0].text);
console.log(results);import requests
response = requests.post(
'https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp',
json={
'method': 'tools/call',
'params': {
'name': 'semantic_search',
'arguments': {'query': 'vector databases', 'topK': 3}
}
}
)
data = response.json()
print(data['content'][0]['text'])The key innovation is mapping HTTP requests to MCP protocol:
// HTTP POST /mcp
{
"method": "tools/list",
"params": {}
}
// Maps to MCP ListToolsRequestSchema
// Returns tools array
// HTTP POST /mcp
{
"method": "tools/call",
"params": {
"name": "semantic_search",
"arguments": {...}
}
}
// Maps to MCP CallToolRequestSchema
// Executes tool, returns resultGlobal edge deployment provides:
- 47ms average query latency (Lagos to SF)
- 23ms from London
- 31ms from San Francisco
- 52ms from Sydney
Breakdown:
- Generate query embedding: ~18ms
- Vectorize similarity search: ~8ms
- Format and return: ~21ms
const apiKey = request.headers.get("Authorization");
if (apiKey !== env.API_KEY) {
return new Response("Unauthorized", { status: 401 });
}Store API key as a secret:
wrangler secret put API_KEYUse Durable Objects or track requests in KV:
const clientId = request.headers.get("CF-Connecting-IP");
const rateLimitKey = `ratelimit:${clientId}`;
const count = await env.KV.get(rateLimitKey);
if (parseInt(count || "0") > 100) {
return new Response("Rate limit exceeded", { status: 429 });
}
await env.KV.put(rateLimitKey, String(parseInt(count || "0") + 1), {
expirationTtl: 3600
});Use Workers Analytics Engine:
ctx.waitUntil(
env.ANALYTICS.writeDataPoint({
blobs: ["semantic_search", clientId],
doubles: [latency, score],
indexes: [Date.now()]
})
);wrangler devAccess at http://localhost:8787
"Not connected" errors:
- Ensure
nodejs_compatflag is inwrangler.jsonc - Check AI and Vectorize bindings are configured
- Verify index exists:
wrangler vectorize list
No search results:
- Populate the index first (see "Populating Data")
- Check index has vectors: Use Cloudflare dashboard
Slow responses:
- Check Workers Analytics for bottlenecks
- Consider caching embeddings in KV
- Verify using nearest Cloudflare datacenter
- Cloudflare Workers: Serverless execution
- Workers AI:
@cf/baai/bge-small-en-v1.5(384-dim embeddings) - Vectorize: HNSW indexing, cosine similarity
- TypeScript: Type-safe development
For 100,000 searches/month:
- Workers AI embeddings: $0.40
- Vectorize: Included in Workers plan ($5/month)
- Workers requests: Free (under 10M)
Total: ~$5.40/month
| Architecture | Accessibility | Latency | Setup Complexity |
|---|---|---|---|
| Local (stdio) | Claude Desktop only | Instant | Easy |
| Hybrid (bridge) | Claude Desktop only | ~100ms | Medium |
| Workers (HTTP) | Anywhere | 20-50ms | Medium |
This Workers approach is best for:
- Production applications
- Web/mobile apps
- Team collaboration
- API integrations
- SaaS products
- vectorize-mcp-worker - Standalone Worker for embeddings/search
- vectorize-mcp-server - Local bridge to Workers backend
Read the full tutorial: Building an MCP Server on Cloudflare Workers with Semantic Search
MIT
{ "name": "mcp-server-worker", "main": "src/index.ts", "compatibility_date": "2025-12-02", "compatibility_flags": ["nodejs_compat"], "observability": { "enabled": true }, "ai": { "binding": "AI" }, "vectorize": [ { "binding": "VECTORIZE", "index_name": "mcp-knowledge-base" } ] }