Skip to content

JeffreyBennet/mcp-sql-server

Repository files navigation

MCP SQL Server

A Model Context Protocol (MCP) server that lets AI assistants query any PostgreSQL database using natural language. Ask questions in plain English, get SQL and results back.

It learns over time — every query is logged with embeddings so similar future questions get better answers. Optionally connects to a Neo4j knowledge graph to store business term definitions and corrections.

How It Works

Query Pipeline

When you ask a question like "How many users signed up last month?", this is what happens:

  1. Context Gathering (parallel) — Searches the knowledge graph for domain terms, finds similar past queries from memory, and lists all available tables.
  2. Table Selection — Fast heuristic picks which tables are relevant based on keywords in your question, past query patterns, and knowledge graph facts. No LLM call needed.
  3. Schema Discovery (parallel) — For each candidate table, discovers columns, types, foreign keys, sample rows, distinct values, and NULL rates from pg_stats. All cached for 5 minutes.
  4. SQL Generation — Sends the question + full schema context + domain knowledge + proven query patterns to OpenAI, which generates a PostgreSQL SELECT query.
  5. Validation & Execution — Validates the SQL is read-only (no INSERT/UPDATE/DELETE/DROP), runs EXPLAIN, then executes.
  6. Memory Logging — Logs the question, SQL, tables used, and result count with an embedding vector for future similarity search.
  7. Retry on Error — If the SQL fails (wrong column name, type mismatch, etc.), it discovers any missing tables mentioned in the error and retries up to 3 times.

Learning Memory

Every successful query is stored in mcp_query_memory with an OpenAI embedding. When a new question comes in, the server finds the most similar past queries and includes their SQL as proven patterns for the LLM. This means repeated or similar questions get faster, more accurate answers over time.

Knowledge Graph (Optional)

If you connect Neo4j, you can:

  • Teach terms — Tell the system that "contractor" means the candidate table, or that "active" means status = 'Active'.
  • Correct queries — Record what was wrong about a past query so the system learns from mistakes.
  • Seed knowledge — Pre-load domain definitions via seed_knowledge.json.

The knowledge graph is optional. Without it, teach_term and correct_query return errors, but query_database and search_context still work using query memory.

Tools

Tool Description
query_database Ask a question in natural language, get SQL + results
teach_term Teach a business term mapping (requires Neo4j)
correct_query Record a correction for a past query (requires Neo4j)
search_context Look up what business terms mean in the knowledge graph and query memory

Setup

Prerequisites

  • Python 3.11+
  • PostgreSQL database (any provider — local, AWS RDS, Azure, etc.)
  • OpenAI API key (for embeddings and SQL generation)

1. Clone & Install

git clone https://github.com/JeffreyBennet/mcp-sql-server.git
cd mcp-sql-server
chmod +x install.sh
./install.sh

Or manually:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Configure Environment

Copy the example and fill in your credentials:

cp .env.example .env

Edit .env:

# Required
POSTGRES_HOST=your-host.com
POSTGRES_PORT=5432
POSTGRES_DB=your_database
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password
OPENAI_API_KEY=sk-your-openai-key

# Optional
ANTHROPIC_API_KEY=sk-ant-your-key
EMBEDDING_MODEL=text-embedding-3-small
LOG_LEVEL=INFO

3. (Optional) Restrict Tables

Edit tables.conf to list only the tables you want the AI to access. One table per line. Leave empty or delete the file to allow all tables.

users
orders
products

4. (Optional) Seed Knowledge

Edit seed_knowledge.json to pre-load domain definitions:

[
  ["users", "user = account = member. Table: users."],
  ["orders", "order = purchase = transaction. Table: orders. Links to users via user_id."],
  ["products", "product = item = SKU. Table: products."]
]

Each entry is [name, description]. These get loaded into the Neo4j knowledge graph on first startup.

5. (Optional) Custom Instructions

Set MCP_INSTRUCTIONS in .env to customize what the AI sees:

MCP_INSTRUCTIONS=You are a database assistant for our e-commerce platform. Use query_database for all data questions. Key tables: users, orders, products.

Using with Claude

Claude Desktop

Add to your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "sql-server": {
      "command": "/path/to/mcp-sql-server/venv/bin/python",
      "args": ["/path/to/mcp-sql-server/server.py"],
      "env": {
        "POSTGRES_HOST": "your-host.com",
        "POSTGRES_PORT": "5432",
        "POSTGRES_DB": "your_database",
        "POSTGRES_USER": "your_user",
        "POSTGRES_PASSWORD": "your_password",
        "OPENAI_API_KEY": "sk-your-key",
        "ANTHROPIC_API_KEY": "sk-ant-your-key"
      }
    }
  }
}

Replace /path/to/mcp-sql-server with the actual path (e.g., /Users/you/mcp-sql-server).

You can either put credentials in the env block (shown above) or rely on the .env file in the project directory.

Claude Code (CLI)

Add to your Claude Code MCP settings:

Project-level: .claude/settings.json in your project root User-level: ~/.claude/settings.json

{
  "mcpServers": {
    "sql-server": {
      "command": "/path/to/mcp-sql-server/venv/bin/python",
      "args": ["/path/to/mcp-sql-server/server.py"]
    }
  }
}

Or use the Claude Code CLI:

claude mcp add sql-server /path/to/mcp-sql-server/venv/bin/python /path/to/mcp-sql-server/server.py

Remote / Docker

Build and run with Docker:

docker build -t mcp-sql-server .
docker run -p 8000:8000 \
  -e POSTGRES_HOST=your-host \
  -e POSTGRES_PORT=5432 \
  -e POSTGRES_DB=your_db \
  -e POSTGRES_USER=your_user \
  -e POSTGRES_PASSWORD=your_pass \
  -e OPENAI_API_KEY=sk-your-key \
  mcp-sql-server

The SSE server exposes:

  • GET /health — health check
  • GET /sse — SSE connection for MCP clients
  • POST /messages/ — MCP message handler

Configuration Reference

Variable Required Default Description
POSTGRES_HOST Yes PostgreSQL host
POSTGRES_PORT Yes 5432 PostgreSQL port
POSTGRES_DB Yes Database name
POSTGRES_USER Yes Database user
POSTGRES_PASSWORD Yes Database password
OPENAI_API_KEY Yes OpenAI API key (embeddings + SQL gen)
ANTHROPIC_API_KEY No Anthropic API key
EMBEDDING_MODEL No text-embedding-3-small OpenAI embedding model
CLAUDE_MODEL No claude-opus-4-6 Claude model name
MCP_INSTRUCTIONS No (built-in) Custom instructions shown to the AI
DEFAULT_TABLES No Comma-separated fallback tables
NEO4J_URI No bolt://localhost:7687 Neo4j connection URI
NEO4J_USER No neo4j Neo4j username
NEO4J_PASSWORD No compass2025 Neo4j password
GRAPHITI_GROUP_ID No compass Knowledge graph group ID
RAG_MEMORY_TOP_K No 30 Max similar queries to retrieve
EMBEDDING_BACKFILL_LIMIT No 200 Rows to backfill on startup
LOG_LEVEL No INFO Logging level

Files

mcp-sql-server/
  server.py            # Main MCP server (stdio transport)
  server_sse.py        # HTTP/SSE transport for remote deployment
  requirements.txt     # Python dependencies
  .env.example         # Environment template
  tables.conf          # Table allowlist (empty = allow all)
  seed_knowledge.json  # Pre-loaded knowledge graph entries
  install.sh           # One-click setup script
  Dockerfile           # Container deployment

Security

  • Only SELECT queries are allowed. INSERT, UPDATE, DELETE, DROP, and other DDL/DML are blocked.
  • SQL is validated before execution — forbidden keywords are rejected, and EXPLAIN is run first.
  • Use tables.conf to restrict which tables the AI can access.
  • Database credentials are stored in .env (gitignored) or passed via environment variables.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors