"The only secure AI is one that cannot see what it doesn't need to see."
This project is a reference implementation of a Zero Trust Data Gateway for Large Language Models (LLMs). It bridges the gap between the chaotic, non-deterministic nature of AI agents and the rigid, security-critical requirements of enterprise databases.
It is not a wrapper. It is an airgap.
Most "AI Database" tools are reckless. They give the LLM a connection string and hope for the best. This project takes the opposite approach: Paranoia by Design.
We implement a strict Middleware Layer (src/middleware.ts) that intercepts
every single request from the LLM before it ever touches the database connection
pool.
-
AST Parsing (Abstract Syntax Tree): We don't use Regex. We parse the raw SQL into an AST using
pgsql-ast-parser. If the AST reveals any mutation (INSERT,UPDATE,DROP,GRANT), the query is rejected instantly. The database never even sees it. -
Schema Allowlisting (The "Need-to-Know" Principle): The agent does not have
SELECT *permissions.- Config-Driven:
config.yamldefines exactly which tables and which specific columns are visible. - Row Filtering: Even if the agent tries
SELECT *, our middleware intercepts the result set and strips out unauthorized columns (like PII, internal margins, or trap columns) before passing the JSON back to the agent. - Multi-Table Support: We recursively traverse JOINs in the AST to ensure every table involved is authorized.
- Config-Driven:
-
Dynamic Token Limits: LLMs have context windows. Dumping 10,000 rows will crash your agent or cost you a fortune.
- We calculate the token density of the result set in real-time.
- We enforce a dynamic
LIMITbased on the model's remaining context window. - If the result is too large, we truncate it and return a "Delivery Slip" (a pointer to a temporary file) instead of the raw data.
-
Audit Logging: Every single tool call, successful or failed, is logged to
audit.logwith a timestamp, actor, and justification. This is immutable proof of what the AI did.
- Runtime: Deno (Chosen for its secure-by-default permission model. Node.js is too permissive for this.)
- Protocol: Model Context Protocol (MCP) (The emerging standard for AI connectivity.)
- Database: PostgreSQL 15 (The industry standard.)
- Validation:
zod(Runtime type safety) +pgsql-ast-parser(SQL safety).
Let's be honest about the engineering costs of this approach:
- Latency: Parsing AST and filtering rows in JavaScript adds overhead. This is slower than a direct DB connection. We trade milliseconds for safety.
- Complexity: Supporting
JOINs and aliases required implementing a custom AST traverser. It is complex. A native SQL engine would be robust, but we needed logic outside the DB to be database-agnostic. - The "BigInt" Problem: JSON doesn't support 64-bit integers. PostgreSQL
COUNT(*)returns a 64-bit int. We had to implement a global serializer to convert allBigInts to strings at the API boundary. It's a hack, but it's a necessary one. - Zero Trust is Hard: You will find yourself fighting the middleware. "Why can't I query this?" Because you didn't allowlist it. Security is friction.
We publish to GitHub Container Registry. This is the cleanest way to run.
# 1. Pull the image
docker pull ghcr.io/ahammednibras8/secure-mcp-db:latest
# 2. Run with the MCP Inspector (The "Browser" for Agents)
npx @modelcontextprotocol/inspector \
docker run \
-i \
--rm \
-e DB_HOST=host.docker.internal \
-e DB_NAME=postgres \
-e DB_USER=mcp_agent \
-e DB_PASSWORD=agent123 \
ghcr.io/ahammednibras8/secure-mcp-db:latestIf you want to hack on the middleware itself.
-
Start the Database:
docker-compose up -d
-
Provision Data (The "Nuke & Pave"): We use
faker-jsto generate a realistic E-Commerce dataset (10k+ rows).# This connects as Admin, drops the schema, and rebuilds it. deno run --allow-net --allow-read --allow-env setup_db.ts -
Run the Server:
deno run --allow-net --allow-read --allow-env --allow-write src/server.ts
- RLS Integration: Move some row-level filtering to PostgreSQL Row Level Security for performance.
- Vector Search: Add
pgvectorsupport to allow semantic search over product descriptions. - Kubernetes Operator: Deploy this as a sidecar to your DB pods.
Ahammed Nibras Building the system for the AI era.