LLMShield is an open-source AI gateway that acts as a transparent middleware between your application and LLM providers. Think of it as a security and optimization layer — every LLM call passes through LLMShield, which automatically blocks PII leaks, caches repeated prompts, routes to the cheapest model that fits, and logs costs in real time. It was built because as LLM-powered apps scale, teams lose visibility into costs, accidentally leak sensitive data to third-party APIs, and waste money re-running identical prompts. LLMShield solves all three with a single-line integration.
Every request goes directly from your app to the LLM provider. There is no caching, no cost tracking, no PII protection, and no visibility into what is being sent.
┌─────────────┐ ┌──────────────────┐
│ Your App │────────►│ LLM Provider │
│ │◄────────│ (Groq/OpenAI) │
└─────────────┘ └──────────────────┘
Problems:
✗ PII (emails, cards, phone numbers) sent directly to third-party APIs
✗ Identical prompts re-processed — wasted tokens and money
✗ No idea which model is optimal for a given prompt
✗ No cost visibility until the monthly bill arrives
✗ No audit trail of what was sent or blocked
LLMShield sits in between. Your app talks to LLMShield (same OpenAI-compatible API), and LLMShield handles everything else before the request ever reaches the LLM.
┌─────────────┐ ┌──────────────────────────────────┐ ┌──────────────────┐
│ Your App │────────►│ LLMShield Gateway │────────►│ LLM Provider │
│ (or SDK) │◄────────│ │◄────────│ (Groq/OpenAI) │
└─────────────┘ │ ┌───────────┐ ┌─────────────┐ │ └──────────────────┘
│ │ Guardrails│ │ Smart Cache │ │
│ │ (PII/PCI) │ │ (Redis) │ │
│ └───────────┘ └─────────────┘ │
│ ┌───────────┐ ┌─────────────┐ │
│ │ Model │ │ Cost │ │
│ │ Router │ │ Tracker │ │
│ └───────────┘ └─────────────┘ │
└──────────────────┬───────────────┘
│
▼
┌─────────────────┐
│ Live Dashboard │
│ (Socket.io) │
└─────────────────┘
Benefits:
✓ PII detected and blocked before reaching the LLM
✓ Duplicate prompts served from Redis cache at $0 cost
✓ Short prompts auto-routed to fast/cheap models
✓ Every request logged with token count, cost, and latency
✓ Real-time dashboard with live analytics via Socket.io
| Feature | Description |
|---|---|
| PII Guardrails | Regex-based detection of credit card numbers, Aadhaar numbers, email addresses, and phone numbers. Blocked requests never reach the LLM — they return a 403 immediately. |
| Smart Cache | Prompts are normalized (lowercased, whitespace-collapsed, punctuation-stripped) and hashed with SHA-256. Identical prompts return cached responses at zero cost and sub-50ms latency. TTL: 1 hour. |
| Smart Routing | Short prompts (< 100 characters) go to the fast/cheap model (llama-3.1-8b-instant). Longer prompts go to the powerful model (llama-3.3-70b-versatile). You can also specify a model explicitly per request. |
| Cost Tracking | Every request is logged with input/output token counts, calculated cost (based on Groq pricing), latency, cache hit status, and the model used. Stored in PostgreSQL. |
| Real-time Dashboard | Next.js 14 UI with live cost ticker, request timeline, cache hit analytics, cost-by-model charts, and guardrail alert feed — all updated in real time via Socket.io. |
HackVision/
├── gateway/ Fastify API — the core proxy server
│ ├── src/
│ │ ├── server.js Server entry point (Fastify + Socket.io + Redis pub/sub)
│ │ ├── config/ Environment configs (dev.js, prod.js)
│ │ ├── routes/
│ │ │ ├── chat.js POST /v1/chat/completions — main pipeline
│ │ │ └── analytics.js GET /api/analytics/* — dashboard data
│ │ ├── services/
│ │ │ ├── guardrail.js PII pattern matching (credit card, aadhaar, email, phone)
│ │ │ ├── cache.js Redis cache — normalize, hash, get/set with TTL
│ │ │ ├── router.js Smart model selection by prompt length
│ │ │ ├── proxy.js Forward requests to Groq API
│ │ │ └── cost.js Token estimation and cost calculation
│ │ └── lib/
│ │ ├── prisma.js Prisma client instance
│ │ └── redis.js Redis client instance
│ └── prisma/
│ └── schema.prisma RequestLog + GuardrailEvent models
│
├── dashboard/ Next.js 14 real-time analytics UI
│ └── src/
│ ├── app/
│ │ ├── page.js Landing page
│ │ ├── docs/page.js API documentation page
│ │ └── dashboard/page.js Live analytics dashboard
│ ├── components/ CostTicker, RequestTimeline, CacheAnalytics, etc.
│ └── lib/
│ ├── api.js REST client for gateway analytics endpoints
│ └── utils.js Helpers
│
├── sdk/ Lightweight Node.js client for the gateway
│ ├── index.js createLLMClient() — chat() and chatRaw()
│ └── package.json Package: llmshield-sdk
│
├── demo-app/ Interactive CLI chatbot that routes through the gateway
│ ├── index.js readline REPL using the SDK
│ └── package.json Package: llmshield-demo
│
├── docker-compose.yml One-command setup: Postgres + Redis + Gateway + Dashboard
├── render.yaml Render.com deployment blueprint
├── .env.example Template for environment variables
└── README.md You are here
- Node.js 20+ — download
- PostgreSQL running on
localhost:5432— install guide - Redis running on
localhost:6379— install guide - Groq API Key (free) — sign up at https://console.groq.com
Tip: If you don't want to install Postgres and Redis locally, skip to the Docker setup below.
git clone <your-repo-url>
cd HackVision
# Copy the example env file into the gateway folder
cp .env.example gateway/.envOpen gateway/.env and set your Groq API key:
GROQ_API_KEY=gsk_your_actual_key_here
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/llmshield
REDIS_URL=redis://localhost:6379
PORT=3000
SMALL_MODEL=llama-3.1-8b-instant
LARGE_MODEL=llama-3.3-70b-versatile
EMBEDDINGS_ENABLED=falsecd gateway
npm install
npx prisma generate # Generate the Prisma client
npx prisma db push # Create tables in PostgreSQL
npm run dev # Starts on http://localhost:3000 with --watchVerify the gateway is running:
curl http://localhost:3000/health
# → { "status": "ok", "env": "dev", "ts": ... }Open a new terminal:
cd dashboard
npm install
npm run dev # Starts on http://localhost:3001Open http://localhost:3001 in your browser. The landing page describes LLMShield. Navigate to /dashboard for live analytics.
Open another terminal:
cd demo-app
node index.jsThis starts an interactive chatbot in your terminal. Type any prompt and it will route through the LLMShield gateway. Try sending a prompt with an email address (e.g., Send me info at test@example.com) to see guardrails in action.
If you have Docker and Docker Compose installed, you can spin up the entire stack (Postgres, Redis, Gateway, Dashboard) with one command:
cp .env.example .env
# Edit .env and set your GROQ_API_KEY
docker compose up --build| Service | URL |
|---|---|
| Gateway | http://localhost:3000 |
| Dashboard | http://localhost:3001 |
| Postgres | localhost:5432 |
| Redis | localhost:6379 |
To stop everything: docker compose down. To also remove the database volume: docker compose down -v.
The gateway exposes an OpenAI-compatible endpoint. Any tool or library that speaks the OpenAI chat format works out of the box.
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is LLMShield?"}]
}'You can optionally specify a model (otherwise smart routing picks one):
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"model": "llama-3.3-70b-versatile"
}'The response includes standard OpenAI-format fields plus an _llmshield metadata block:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [
{
"message": { "role": "assistant", "content": "LLMShield is..." },
"finish_reason": "stop",
"index": 0
}
],
"usage": { "prompt_tokens": 12, "completion_tokens": 85, "total_tokens": 97 },
"_llmshield": {
"model": "llama-3.1-8b-instant",
"cost": 0.0000082,
"latency": 842,
"cacheHit": false
}
}If a guardrail is triggered (e.g., PII detected), the gateway returns a 403:
{
"error": "Request blocked by guardrail",
"details": {
"blocked": true,
"type": "email",
"message": "Email address detected"
}
}The llmshield-sdk package provides a lightweight wrapper around the gateway API.
The demo-app already uses the SDK via a relative import (require('../sdk')). You can do the same from any script inside this repo:
const { createLLMClient } = require('./sdk'); // adjust path as needed
const client = createLLMClient({ baseURL: 'http://localhost:3000' });
// Simple: send a prompt, get back a string
const answer = await client.chat('What is Node.js?');
console.log(answer);
// Advanced: full control over the request body
const raw = await client.chatRaw({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain caching in one sentence.' },
],
model: 'llama-3.3-70b-versatile',
});
console.log(raw.choices[0].message.content);
console.log(raw._llmshield); // { model, cost, latency, cacheHit }Since the SDK is not published to npm, use npm link to make it available globally on your machine:
Step 1 — Register the SDK globally:
cd /path/to/HackVision/sdk
npm linkThis creates a global symlink for llmshield-sdk.
Step 2 — Link it into your project:
cd /path/to/your-other-project
npm link llmshield-sdkStep 3 — Use it in your code:
const { createLLMClient } = require('llmshield-sdk');
const client = createLLMClient({ baseURL: 'http://localhost:3000' });
async function main() {
try {
const answer = await client.chat('Hello from my project!');
console.log('Response:', answer);
} catch (err) {
if (err.status === 403) {
console.log('Blocked by guardrail:', err.details?.message);
} else {
console.error('Error:', err.message);
}
}
}
main();Note:
npm linkcreates a symlink, so any changes you make toHackVision/sdk/index.jsare immediately reflected in linked projects. To unlink later, runnpm unlink llmshield-sdkin your project.
| Method | Signature | Returns |
|---|---|---|
chat |
chat(prompt: string, options?: object) |
Promise<string> — the assistant's reply text |
chatRaw |
chatRaw(body: object) |
Promise<object> — full OpenAI-format response with _llmshield metadata |
The options parameter in chat() is spread into the request body, so you can pass { model: 'llama-3.3-70b-versatile' } to override smart routing.
The demo-app is an interactive terminal chatbot that uses the SDK under the hood. It is useful for testing the gateway manually.
cd demo-app
node index.js ╔═══════════════════════════════════════╗
║ LLMShield Demo Chatbot ║
╠═══════════════════════════════════════╣
║ All requests route through the ║
║ LLMShield gateway (cache, guard, ║
║ cost tracking, smart routing). ║
║ ║
║ Type "exit" to quit. ║
╚═══════════════════════════════════════╝
You > What is caching?
Assistant (312ms) > Caching is the process of storing frequently accessed data...
You > My email is john@example.com
[BLOCKED] Email address detected
You > exit
Goodbye!
You can point it at a different gateway by setting the GATEWAY_URL environment variable:
GATEWAY_URL=https://your-deployed-gateway.com node index.jsEvery request to POST /v1/chat/completions goes through this pipeline inside the gateway:
Client (SDK / cURL / any HTTP client)
│
│ POST /v1/chat/completions
│ { "messages": [{ "role": "user", "content": "..." }] }
│
▼
┌──────────────────────────────────────────────────────────────┐
│ LLMShield Gateway │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 1. GUARDRAIL CHECK │ │
│ │ Scan prompt against PII regex patterns: │ │
│ │ • Credit card numbers (16-digit patterns) │ │
│ │ • Aadhaar numbers (12-digit formatted) │ │
│ │ • Email addresses │ │
│ │ • Phone numbers (Indian format) │ │
│ │ If matched → 403 + log to DB + emit Socket.io event │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 2. CACHE LOOKUP │ │
│ │ Normalize prompt → SHA-256 hash → Redis GET │ │
│ │ If HIT → return cached response (cost $0, ~1ms) │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 3. SMART ROUTING │ │
│ │ If no model specified in request: │ │
│ │ • prompt.length < 100 → llama-3.1-8b-instant (fast) │ │
│ │ • prompt.length >= 100 → llama-3.3-70b-versatile │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 4. LLM PROXY │ │
│ │ Forward to Groq API (OpenAI-compatible endpoint) │ │
│ │ https://api.groq.com/openai/v1/chat/completions │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 5. POST-PROCESSING │ │
│ │ • Calculate cost (tokens × per-model pricing) │ │
│ │ • Store response in Redis cache (TTL 1 hour) │ │
│ │ • Log to PostgreSQL (RequestLog table) │ │
│ │ • Publish event to Redis pub/sub channel │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ │ │
└───────────────────────────┼──────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼
Response to Client Redis pub/sub channel
(with _llmshield "llmshield:events"
metadata) │
▼
Socket.io
│
▼
┌───────────────┐
│ Dashboard │
│ (Next.js) │
│ Live updates │
└───────────────┘
| Layer | Technology |
|---|---|
| Gateway server | Fastify 4 |
| Database | PostgreSQL 16 + Prisma ORM |
| Cache & pub/sub | Redis 7 + ioredis |
| Real-time | Socket.io 4 |
| Dashboard | Next.js 14 + Tailwind CSS + Recharts |
| LLM provider | Groq API (OpenAI-compatible) |
| SDK | Vanilla Node.js (fetch) |
| Containerization | Docker + Docker Compose |
RequestLog
├── id (UUID, primary key)
├── prompt (String, truncated to 500 chars)
├── response (String, truncated to 500 chars)
├── tokens (Int, total input + output)
├── cost (Float, USD)
├── latency (Int, milliseconds)
├── cacheHit (Boolean)
├── model (String)
└── createdAt (DateTime)
GuardrailEvent
├── id (UUID, primary key)
├── type (String — creditCard, aadhaar, email, phone)
├── message (String — human-readable description)
├── prompt (String, truncated to 500 chars)
└── createdAt (DateTime)
All configuration is via environment variables in gateway/.env:
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
— | Required. Your Groq API key |
DATABASE_URL |
postgresql://postgres:postgres@localhost:5432/llmshield |
PostgreSQL connection string |
REDIS_URL |
redis://localhost:6379 |
Redis connection string |
PORT |
3000 |
Port the gateway listens on |
SMALL_MODEL |
llama-3.1-8b-instant |
Model used for short prompts (< 100 chars) |
LARGE_MODEL |
llama-3.3-70b-versatile |
Model used for long prompts (>= 100 chars) |
EMBEDDINGS_ENABLED |
false |
Future: toggle for vector-based semantic cache |
For the dashboard, set this in dashboard/.env.production (or as an environment variable):
| Variable | Default | Description |
|---|---|---|
NEXT_PUBLIC_GATEWAY_URL |
http://localhost:3000 |
Gateway URL for the dashboard UI |
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI-compatible chat endpoint (main pipeline) |
GET |
/health |
Health check — returns { status, env, ts } |
GET |
/api/analytics/overview |
Aggregate stats: total requests, cost, avg latency |
GET |
/api/analytics/requests |
Recent request logs |
GET |
/api/analytics/cost-by-model |
Cost breakdown by model |
GET |
/api/analytics/guardrail-events |
Recent guardrail block events |
| Route | Description |
|---|---|
/ |
Landing page |
/docs |
Integration documentation |
/dashboard |
Live analytics (cost ticker, timeline, charts, guardrail alerts) |
MIT