LLMShield — Cloudflare for AI APIs

LLMShield is an open-source AI gateway that acts as a transparent middleware between your application and LLM providers. Think of it as a security and optimization layer — every LLM call passes through LLMShield, which automatically blocks PII leaks, caches repeated prompts, routes to the cheapest model that fits, and logs costs in real time. It was built because as LLM-powered apps scale, teams lose visibility into costs, accidentally leak sensitive data to third-party APIs, and waste money re-running identical prompts. LLMShield solves all three with a single-line integration.

How LLM Calls Work — Before vs After LLMShield

Without LLMShield

Every request goes directly from your app to the LLM provider. There is no caching, no cost tracking, no PII protection, and no visibility into what is being sent.

┌─────────────┐         ┌──────────────────┐
│   Your App  │────────►│  LLM Provider    │
│             │◄────────│  (Groq/OpenAI)   │
└─────────────┘         └──────────────────┘

  Problems:
  ✗ PII (emails, cards, phone numbers) sent directly to third-party APIs
  ✗ Identical prompts re-processed — wasted tokens and money
  ✗ No idea which model is optimal for a given prompt
  ✗ No cost visibility until the monthly bill arrives
  ✗ No audit trail of what was sent or blocked

With LLMShield

LLMShield sits in between. Your app talks to LLMShield (same OpenAI-compatible API), and LLMShield handles everything else before the request ever reaches the LLM.

┌─────────────┐         ┌──────────────────────────────────┐         ┌──────────────────┐
│   Your App  │────────►│         LLMShield Gateway        │────────►│  LLM Provider    │
│  (or SDK)   │◄────────│                                  │◄────────│  (Groq/OpenAI)   │
└─────────────┘         │  ┌───────────┐  ┌─────────────┐  │         └──────────────────┘
                        │  │ Guardrails│  │ Smart Cache  │  │
                        │  │ (PII/PCI) │  │  (Redis)     │  │
                        │  └───────────┘  └─────────────┘  │
                        │  ┌───────────┐  ┌─────────────┐  │
                        │  │  Model    │  │    Cost      │  │
                        │  │  Router   │  │   Tracker    │  │
                        │  └───────────┘  └─────────────┘  │
                        └──────────────────┬───────────────┘
                                           │
                                           ▼
                                  ┌─────────────────┐
                                  │  Live Dashboard  │
                                  │  (Socket.io)     │
                                  └─────────────────┘

  Benefits:
  ✓ PII detected and blocked before reaching the LLM
  ✓ Duplicate prompts served from Redis cache at $0 cost
  ✓ Short prompts auto-routed to fast/cheap models
  ✓ Every request logged with token count, cost, and latency
  ✓ Real-time dashboard with live analytics via Socket.io

Features

Feature	Description
PII Guardrails	Regex-based detection of credit card numbers, Aadhaar numbers, email addresses, and phone numbers. Blocked requests never reach the LLM — they return a `403` immediately.
Smart Cache	Prompts are normalized (lowercased, whitespace-collapsed, punctuation-stripped) and hashed with SHA-256. Identical prompts return cached responses at zero cost and sub-50ms latency. TTL: 1 hour.
Smart Routing	Short prompts (< 100 characters) go to the fast/cheap model (`llama-3.1-8b-instant`). Longer prompts go to the powerful model (`llama-3.3-70b-versatile`). You can also specify a model explicitly per request.
Cost Tracking	Every request is logged with input/output token counts, calculated cost (based on Groq pricing), latency, cache hit status, and the model used. Stored in PostgreSQL.
Real-time Dashboard	Next.js 14 UI with live cost ticker, request timeline, cache hit analytics, cost-by-model charts, and guardrail alert feed — all updated in real time via Socket.io.

Project Structure

HackVision/
├── gateway/              Fastify API — the core proxy server
│   ├── src/
│   │   ├── server.js         Server entry point (Fastify + Socket.io + Redis pub/sub)
│   │   ├── config/           Environment configs (dev.js, prod.js)
│   │   ├── routes/
│   │   │   ├── chat.js       POST /v1/chat/completions — main pipeline
│   │   │   └── analytics.js  GET /api/analytics/* — dashboard data
│   │   ├── services/
│   │   │   ├── guardrail.js  PII pattern matching (credit card, aadhaar, email, phone)
│   │   │   ├── cache.js      Redis cache — normalize, hash, get/set with TTL
│   │   │   ├── router.js     Smart model selection by prompt length
│   │   │   ├── proxy.js      Forward requests to Groq API
│   │   │   └── cost.js       Token estimation and cost calculation
│   │   └── lib/
│   │       ├── prisma.js     Prisma client instance
│   │       └── redis.js      Redis client instance
│   └── prisma/
│       └── schema.prisma     RequestLog + GuardrailEvent models
│
├── dashboard/            Next.js 14 real-time analytics UI
│   └── src/
│       ├── app/
│       │   ├── page.js           Landing page
│       │   ├── docs/page.js      API documentation page
│       │   └── dashboard/page.js Live analytics dashboard
│       ├── components/           CostTicker, RequestTimeline, CacheAnalytics, etc.
│       └── lib/
│           ├── api.js            REST client for gateway analytics endpoints
│           └── utils.js          Helpers
│
├── sdk/                  Lightweight Node.js client for the gateway
│   ├── index.js              createLLMClient() — chat() and chatRaw()
│   └── package.json          Package: llmshield-sdk
│
├── demo-app/             Interactive CLI chatbot that routes through the gateway
│   ├── index.js              readline REPL using the SDK
│   └── package.json          Package: llmshield-demo
│
├── docker-compose.yml    One-command setup: Postgres + Redis + Gateway + Dashboard
├── render.yaml           Render.com deployment blueprint
├── .env.example          Template for environment variables
└── README.md             You are here

Quick Start (Local)

Prerequisites

Node.js 20+ — download
PostgreSQL running on localhost:5432 — install guide
Redis running on localhost:6379 — install guide
Groq API Key (free) — sign up at https://console.groq.com

Tip: If you don't want to install Postgres and Redis locally, skip to the Docker setup below.

1. Clone and Configure Environment

git clone <your-repo-url>
cd HackVision

# Copy the example env file into the gateway folder
cp .env.example gateway/.env

Open gateway/.env and set your Groq API key:

GROQ_API_KEY=gsk_your_actual_key_here
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/llmshield
REDIS_URL=redis://localhost:6379
PORT=3000
SMALL_MODEL=llama-3.1-8b-instant
LARGE_MODEL=llama-3.3-70b-versatile
EMBEDDINGS_ENABLED=false

2. Start the Gateway

cd gateway
npm install
npx prisma generate    # Generate the Prisma client
npx prisma db push     # Create tables in PostgreSQL
npm run dev            # Starts on http://localhost:3000 with --watch

Verify the gateway is running:

curl http://localhost:3000/health
# → { "status": "ok", "env": "dev", "ts": ... }

3. Start the Dashboard

Open a new terminal:

cd dashboard
npm install
npm run dev            # Starts on http://localhost:3001

Open http://localhost:3001 in your browser. The landing page describes LLMShield. Navigate to /dashboard for live analytics.

4. Try the Demo CLI Chatbot

Open another terminal:

cd demo-app
node index.js

This starts an interactive chatbot in your terminal. Type any prompt and it will route through the LLMShield gateway. Try sending a prompt with an email address (e.g., Send me info at test@example.com) to see guardrails in action.

Quick Start (Docker)

If you have Docker and Docker Compose installed, you can spin up the entire stack (Postgres, Redis, Gateway, Dashboard) with one command:

cp .env.example .env
# Edit .env and set your GROQ_API_KEY
docker compose up --build

Service	URL
Gateway	http://localhost:3000
Dashboard	http://localhost:3001
Postgres	localhost:5432
Redis	localhost:6379

To stop everything: docker compose down. To also remove the database volume: docker compose down -v.

Usage

Option 1: Direct API (cURL / any HTTP client)

The gateway exposes an OpenAI-compatible endpoint. Any tool or library that speaks the OpenAI chat format works out of the box.

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is LLMShield?"}]
  }'

You can optionally specify a model (otherwise smart routing picks one):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "model": "llama-3.3-70b-versatile"
  }'

The response includes standard OpenAI-format fields plus an _llmshield metadata block:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [
    {
      "message": { "role": "assistant", "content": "LLMShield is..." },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "usage": { "prompt_tokens": 12, "completion_tokens": 85, "total_tokens": 97 },
  "_llmshield": {
    "model": "llama-3.1-8b-instant",
    "cost": 0.0000082,
    "latency": 842,
    "cacheHit": false
  }
}

If a guardrail is triggered (e.g., PII detected), the gateway returns a 403:

{
  "error": "Request blocked by guardrail",
  "details": {
    "blocked": true,
    "type": "email",
    "message": "Email address detected"
  }
}

Option 2: Using the SDK (Node.js)

The llmshield-sdk package provides a lightweight wrapper around the gateway API.

Within this monorepo

The demo-app already uses the SDK via a relative import (require('../sdk')). You can do the same from any script inside this repo:

const { createLLMClient } = require('./sdk');  // adjust path as needed

const client = createLLMClient({ baseURL: 'http://localhost:3000' });

// Simple: send a prompt, get back a string
const answer = await client.chat('What is Node.js?');
console.log(answer);

// Advanced: full control over the request body
const raw = await client.chatRaw({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain caching in one sentence.' },
  ],
  model: 'llama-3.3-70b-versatile',
});
console.log(raw.choices[0].message.content);
console.log(raw._llmshield); // { model, cost, latency, cacheHit }

In an external project (using `npm link`)

Since the SDK is not published to npm, use npm link to make it available globally on your machine:

Step 1 — Register the SDK globally:

cd /path/to/HackVision/sdk
npm link

This creates a global symlink for llmshield-sdk.

Step 2 — Link it into your project:

cd /path/to/your-other-project
npm link llmshield-sdk

Step 3 — Use it in your code:

const { createLLMClient } = require('llmshield-sdk');

const client = createLLMClient({ baseURL: 'http://localhost:3000' });

async function main() {
  try {
    const answer = await client.chat('Hello from my project!');
    console.log('Response:', answer);
  } catch (err) {
    if (err.status === 403) {
      console.log('Blocked by guardrail:', err.details?.message);
    } else {
      console.error('Error:', err.message);
    }
  }
}

main();

Note: npm link creates a symlink, so any changes you make to HackVision/sdk/index.js are immediately reflected in linked projects. To unlink later, run npm unlink llmshield-sdk in your project.

SDK API Reference

Method	Signature	Returns
`chat`	`chat(prompt: string, options?: object)`	`Promise<string>` — the assistant's reply text
`chatRaw`	`chatRaw(body: object)`	`Promise<object>` — full OpenAI-format response with `_llmshield` metadata

The options parameter in chat() is spread into the request body, so you can pass { model: 'llama-3.3-70b-versatile' } to override smart routing.

Option 3: Using the CLI Demo

The demo-app is an interactive terminal chatbot that uses the SDK under the hood. It is useful for testing the gateway manually.

cd demo-app
node index.js

  ╔═══════════════════════════════════════╗
  ║        LLMShield Demo Chatbot         ║
  ╠═══════════════════════════════════════╣
  ║  All requests route through the       ║
  ║  LLMShield gateway (cache, guard,     ║
  ║  cost tracking, smart routing).        ║
  ║                                       ║
  ║  Type "exit" to quit.                 ║
  ╚═══════════════════════════════════════╝

You > What is caching?

Assistant (312ms) > Caching is the process of storing frequently accessed data...

You > My email is john@example.com

[BLOCKED] Email address detected

You > exit

Goodbye!

You can point it at a different gateway by setting the GATEWAY_URL environment variable:

GATEWAY_URL=https://your-deployed-gateway.com node index.js

Architecture

Request Pipeline (detailed)

Every request to POST /v1/chat/completions goes through this pipeline inside the gateway:

Client (SDK / cURL / any HTTP client)
  │
  │  POST /v1/chat/completions
  │  { "messages": [{ "role": "user", "content": "..." }] }
  │
  ▼
┌──────────────────────────────────────────────────────────────┐
│                     LLMShield Gateway                        │
│                                                              │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 1. GUARDRAIL CHECK                                      │ │
│  │    Scan prompt against PII regex patterns:               │ │
│  │    • Credit card numbers (16-digit patterns)             │ │
│  │    • Aadhaar numbers (12-digit formatted)                │ │
│  │    • Email addresses                                     │ │
│  │    • Phone numbers (Indian format)                       │ │
│  │    If matched → 403 + log to DB + emit Socket.io event   │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 2. CACHE LOOKUP                                         │ │
│  │    Normalize prompt → SHA-256 hash → Redis GET           │ │
│  │    If HIT → return cached response (cost $0, ~1ms)       │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 3. SMART ROUTING                                        │ │
│  │    If no model specified in request:                     │ │
│  │    • prompt.length < 100 → llama-3.1-8b-instant (fast)   │ │
│  │    • prompt.length >= 100 → llama-3.3-70b-versatile       │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 4. LLM PROXY                                           │ │
│  │    Forward to Groq API (OpenAI-compatible endpoint)      │ │
│  │    https://api.groq.com/openai/v1/chat/completions       │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 5. POST-PROCESSING                                      │ │
│  │    • Calculate cost (tokens × per-model pricing)         │ │
│  │    • Store response in Redis cache (TTL 1 hour)          │ │
│  │    • Log to PostgreSQL (RequestLog table)                │ │
│  │    • Publish event to Redis pub/sub channel              │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           │                                  │
└───────────────────────────┼──────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼                           ▼
     Response to Client          Redis pub/sub channel
     (with _llmshield              "llmshield:events"
      metadata)                         │
                                        ▼
                                   Socket.io
                                        │
                                        ▼
                                ┌───────────────┐
                                │   Dashboard    │
                                │  (Next.js)     │
                                │  Live updates  │
                                └───────────────┘

Tech Stack

Layer	Technology
Gateway server	Fastify 4
Database	PostgreSQL 16 + Prisma ORM
Cache & pub/sub	Redis 7 + ioredis
Real-time	Socket.io 4
Dashboard	Next.js 14 + Tailwind CSS + Recharts
LLM provider	Groq API (OpenAI-compatible)
SDK	Vanilla Node.js (`fetch`)
Containerization	Docker + Docker Compose

Database Schema

RequestLog
├── id         (UUID, primary key)
├── prompt     (String, truncated to 500 chars)
├── response   (String, truncated to 500 chars)
├── tokens     (Int, total input + output)
├── cost       (Float, USD)
├── latency    (Int, milliseconds)
├── cacheHit   (Boolean)
├── model      (String)
└── createdAt  (DateTime)

GuardrailEvent
├── id         (UUID, primary key)
├── type       (String — creditCard, aadhaar, email, phone)
├── message    (String — human-readable description)
├── prompt     (String, truncated to 500 chars)
└── createdAt  (DateTime)

Configuration

All configuration is via environment variables in gateway/.env:

Variable	Default	Description
`GROQ_API_KEY`	—	Required. Your Groq API key
`DATABASE_URL`	`postgresql://postgres:postgres@localhost:5432/llmshield`	PostgreSQL connection string
`REDIS_URL`	`redis://localhost:6379`	Redis connection string
`PORT`	`3000`	Port the gateway listens on
`SMALL_MODEL`	`llama-3.1-8b-instant`	Model used for short prompts (< 100 chars)
`LARGE_MODEL`	`llama-3.3-70b-versatile`	Model used for long prompts (>= 100 chars)
`EMBEDDINGS_ENABLED`	`false`	Future: toggle for vector-based semantic cache

For the dashboard, set this in dashboard/.env.production (or as an environment variable):

Variable	Default	Description
`NEXT_PUBLIC_GATEWAY_URL`	`http://localhost:3000`	Gateway URL for the dashboard UI

API Endpoints

Gateway

Method	Endpoint	Description
`POST`	`/v1/chat/completions`	OpenAI-compatible chat endpoint (main pipeline)
`GET`	`/health`	Health check — returns `{ status, env, ts }`
`GET`	`/api/analytics/overview`	Aggregate stats: total requests, cost, avg latency
`GET`	`/api/analytics/requests`	Recent request logs
`GET`	`/api/analytics/cost-by-model`	Cost breakdown by model
`GET`	`/api/analytics/guardrail-events`	Recent guardrail block events

Dashboard

Route	Description
`/`	Landing page
`/docs`	Integration documentation
`/dashboard`	Live analytics (cost ticker, timeline, charts, guardrail alerts)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dashboard		dashboard
demo-app		demo-app
gateway		gateway
sdk		sdk
.env.example		.env.example
.gitignore		.gitignore
DEVPOST.md		DEVPOST.md
PITCH_DECK.md		PITCH_DECK.md
README.md		README.md
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml

Folders and files

Latest commit

History

Repository files navigation

LLMShield — Cloudflare for AI APIs

How LLM Calls Work — Before vs After LLMShield

Without LLMShield

With LLMShield

Features

Project Structure

Quick Start (Local)

Prerequisites

1. Clone and Configure Environment

2. Start the Gateway

3. Start the Dashboard

4. Try the Demo CLI Chatbot

Quick Start (Docker)

Usage

Option 1: Direct API (cURL / any HTTP client)

Option 2: Using the SDK (Node.js)

Within this monorepo

In an external project (using npm link)

SDK API Reference

Option 3: Using the CLI Demo

Architecture

Request Pipeline (detailed)

Tech Stack

Database Schema

Configuration

API Endpoints

Gateway

Dashboard

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

In an external project (using `npm link`)

Packages