MCP Server on Cloudflare Workers

A production-ready Model Context Protocol (MCP) server deployed to Cloudflare Workers, providing HTTP-based semantic search with Workers AI and Vectorize.

Architecture

Any Client ──HTTP──> Workers MCP Server ──> Workers AI + Vectorize

This is a fully remote MCP server - no local dependencies required. Accessible from anywhere via HTTP.

Features

✅ HTTP-to-MCP Adapter: Custom implementation (MCP SDK expects stdio, we use HTTP)
✅ Semantic Search: Natural language queries with vector similarity
✅ Edge Deployment: Runs globally on Cloudflare's network
✅ Workers AI Integration: bge-small-en-v1.5 embeddings (384 dimensions)
✅ Vectorize Search: HNSW indexing for fast similarity search
✅ CORS Enabled: Works with web apps and API clients
✅ Production Ready: Includes error handling, proper responses

Why This Approach?

The official MCP SDK uses stdio transport (standard input/output), which works for local processes but not for serverless Workers. We built a custom HTTP adapter that implements the MCP protocol over HTTP.

Prerequisites

Cloudflare account with Workers enabled
Wrangler CLI installed
Vectorize index created and populated

Setup

1. Clone and install:

git clone https://github.com/dannwaneri/mcp-server-worker.git
cd mcp-server-worker
npm install

2. Create Vectorize index:

wrangler vectorize create mcp-knowledge-base --dimensions=384 --metric=cosine

3. Configure wrangler.jsonc:

{
  "name": "mcp-server-worker",
  "main": "src/index.ts",
  "compatibility_date": "2025-12-02",
  "compatibility_flags": ["nodejs_compat"],
  "observability": {
    "enabled": true
  },
  "ai": {
    "binding": "AI"
  },
  "vectorize": [
    {
      "binding": "VECTORIZE",
      "index_name": "mcp-knowledge-base"
    }
  ]
}

4. Deploy:

wrangler deploy

Your MCP server will be available at: https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev

Populating Data

You need to populate your Vectorize index first. Use the vectorize-mcp-worker to do this:

curl -X POST https://vectorize-mcp-worker.YOUR-SUBDOMAIN.workers.dev/populate

API Endpoints

`GET /health`

Health check endpoint.

Response:

{
  "status": "ok",
  "server": "mcp-server-worker",
  "version": "1.0.0"
}

`POST /mcp`

MCP protocol endpoint. Accepts JSON-RPC style requests.

List Tools

Request:

{
  "method": "tools/list",
  "params": {}
}

Response:

{
  "tools": [
    {
      "name": "semantic_search",
      "description": "Search the knowledge base using semantic similarity...",
      "inputSchema": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "topK": { "type": "number", "default": 5 }
        },
        "required": ["query"]
      }
    }
  ]
}

Call Tool

Request:

{
  "method": "tools/call",
  "params": {
    "name": "semantic_search",
    "arguments": {
      "query": "vector databases",
      "topK": 3
    }
  }
}

Response:

{
  "content": [
    {
      "type": "text",
      "text": "{\"query\":\"vector databases\",\"resultsCount\":3,\"results\":[...]}"
    }
  ]
}

Usage Examples

cURL

List tools:

curl -X POST https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp \
  -H "Content-Type: application/json" \
  -d '{"method":"tools/list","params":{}}'

Semantic search:

curl -X POST https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "semantic_search",
      "arguments": {"query": "AI embeddings", "topK": 5}
    }
  }'

JavaScript/TypeScript

const response = await fetch('https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    method: 'tools/call',
    params: {
      name: 'semantic_search',
      arguments: { query: 'vector databases', topK: 3 }
    }
  })
});

const data = await response.json();
const results = JSON.parse(data.content[0].text);
console.log(results);

Python

import requests

response = requests.post(
    'https://mcp-server-worker.YOUR-SUBDOMAIN.workers.dev/mcp',
    json={
        'method': 'tools/call',
        'params': {
            'name': 'semantic_search',
            'arguments': {'query': 'vector databases', 'topK': 3}
        }
    }
)

data = response.json()
print(data['content'][0]['text'])

HTTP-to-MCP Adapter Implementation

The key innovation is mapping HTTP requests to MCP protocol:

// HTTP POST /mcp
{
  "method": "tools/list",
  "params": {}
}

// Maps to MCP ListToolsRequestSchema
// Returns tools array

// HTTP POST /mcp
{
  "method": "tools/call",
  "params": {
    "name": "semantic_search",
    "arguments": {...}
  }
}

// Maps to MCP CallToolRequestSchema
// Executes tool, returns result

Performance

Global edge deployment provides:

47ms average query latency (Lagos to SF)
23ms from London
31ms from San Francisco
52ms from Sydney

Breakdown:

Generate query embedding: ~18ms
Vectorize similarity search: ~8ms
Format and return: ~21ms

Production Enhancements

Add Authentication

const apiKey = request.headers.get("Authorization");
if (apiKey !== env.API_KEY) {
  return new Response("Unauthorized", { status: 401 });
}

Store API key as a secret:

wrangler secret put API_KEY

Add Rate Limiting

Use Durable Objects or track requests in KV:

const clientId = request.headers.get("CF-Connecting-IP");
const rateLimitKey = `ratelimit:${clientId}`;
const count = await env.KV.get(rateLimitKey);

if (parseInt(count || "0") > 100) {
  return new Response("Rate limit exceeded", { status: 429 });
}

await env.KV.put(rateLimitKey, String(parseInt(count || "0") + 1), {
  expirationTtl: 3600
});

Add Monitoring

Use Workers Analytics Engine:

ctx.waitUntil(
  env.ANALYTICS.writeDataPoint({
    blobs: ["semantic_search", clientId],
    doubles: [latency, score],
    indexes: [Date.now()]
  })
);

Local Development

wrangler dev

Access at http://localhost:8787

Troubleshooting

"Not connected" errors:

Ensure nodejs_compat flag is in wrangler.jsonc
Check AI and Vectorize bindings are configured
Verify index exists: wrangler vectorize list

No search results:

Populate the index first (see "Populating Data")
Check index has vectors: Use Cloudflare dashboard

Slow responses:

Check Workers Analytics for bottlenecks
Consider caching embeddings in KV
Verify using nearest Cloudflare datacenter

Technology Stack

Cloudflare Workers: Serverless execution
Workers AI: @cf/baai/bge-small-en-v1.5 (384-dim embeddings)
Vectorize: HNSW indexing, cosine similarity
TypeScript: Type-safe development

Cost Estimate

For 100,000 searches/month:

Workers AI embeddings: $0.40
Vectorize: Included in Workers plan ($5/month)
Workers requests: Free (under 10M)

Total: ~$5.40/month

Comparison with Other Architectures

Architecture	Accessibility	Latency	Setup Complexity
Local (stdio)	Claude Desktop only	Instant	Easy
Hybrid (bridge)	Claude Desktop only	~100ms	Medium
Workers (HTTP)	Anywhere	20-50ms	Medium

This Workers approach is best for:

Production applications
Web/mobile apps
Team collaboration
API integrations
SaaS products

Related Projects

vectorize-mcp-worker - Standalone Worker for embeddings/search
vectorize-mcp-server - Local bridge to Workers backend

Learn More

Read the full tutorial: Building an MCP Server on Cloudflare Workers with Semantic Search

License

MIT

Author

Daniel Nwaneri - GitHub | Upwork

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
src		src
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.mts		vitest.config.mts
worker-configuration.d.ts		worker-configuration.d.ts
wrangler.jsonc		wrangler.jsonc

dannwaneri/mcp-server-worker

Folders and files

Latest commit

History

Repository files navigation

MCP Server on Cloudflare Workers

Architecture

Features

Why This Approach?

Prerequisites

Setup

Populating Data

API Endpoints

GET /health

POST /mcp

List Tools

Call Tool

Usage Examples

cURL

JavaScript/TypeScript

Python

HTTP-to-MCP Adapter Implementation

Performance

Production Enhancements

Add Authentication

Add Rate Limiting

Add Monitoring

Local Development

Troubleshooting

Technology Stack

Cost Estimate

Comparison with Other Architectures

Related Projects

Learn More

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /health`

`POST /mcp`

Packages