Skip to content

Rohith1221/LLMShield

Repository files navigation

LLMShield — Cloudflare for AI APIs

LLMShield is an open-source AI gateway that acts as a transparent middleware between your application and LLM providers. Think of it as a security and optimization layer — every LLM call passes through LLMShield, which automatically blocks PII leaks, caches repeated prompts, routes to the cheapest model that fits, and logs costs in real time. It was built because as LLM-powered apps scale, teams lose visibility into costs, accidentally leak sensitive data to third-party APIs, and waste money re-running identical prompts. LLMShield solves all three with a single-line integration.


How LLM Calls Work — Before vs After LLMShield

Without LLMShield

Every request goes directly from your app to the LLM provider. There is no caching, no cost tracking, no PII protection, and no visibility into what is being sent.

┌─────────────┐         ┌──────────────────┐
│   Your App  │────────►│  LLM Provider    │
│             │◄────────│  (Groq/OpenAI)   │
└─────────────┘         └──────────────────┘

  Problems:
  ✗ PII (emails, cards, phone numbers) sent directly to third-party APIs
  ✗ Identical prompts re-processed — wasted tokens and money
  ✗ No idea which model is optimal for a given prompt
  ✗ No cost visibility until the monthly bill arrives
  ✗ No audit trail of what was sent or blocked

With LLMShield

LLMShield sits in between. Your app talks to LLMShield (same OpenAI-compatible API), and LLMShield handles everything else before the request ever reaches the LLM.

┌─────────────┐         ┌──────────────────────────────────┐         ┌──────────────────┐
│   Your App  │────────►│         LLMShield Gateway        │────────►│  LLM Provider    │
│  (or SDK)   │◄────────│                                  │◄────────│  (Groq/OpenAI)   │
└─────────────┘         │  ┌───────────┐  ┌─────────────┐  │         └──────────────────┘
                        │  │ Guardrails│  │ Smart Cache  │  │
                        │  │ (PII/PCI) │  │  (Redis)     │  │
                        │  └───────────┘  └─────────────┘  │
                        │  ┌───────────┐  ┌─────────────┐  │
                        │  │  Model    │  │    Cost      │  │
                        │  │  Router   │  │   Tracker    │  │
                        │  └───────────┘  └─────────────┘  │
                        └──────────────────┬───────────────┘
                                           │
                                           ▼
                                  ┌─────────────────┐
                                  │  Live Dashboard  │
                                  │  (Socket.io)     │
                                  └─────────────────┘

  Benefits:
  ✓ PII detected and blocked before reaching the LLM
  ✓ Duplicate prompts served from Redis cache at $0 cost
  ✓ Short prompts auto-routed to fast/cheap models
  ✓ Every request logged with token count, cost, and latency
  ✓ Real-time dashboard with live analytics via Socket.io

Features

Feature Description
PII Guardrails Regex-based detection of credit card numbers, Aadhaar numbers, email addresses, and phone numbers. Blocked requests never reach the LLM — they return a 403 immediately.
Smart Cache Prompts are normalized (lowercased, whitespace-collapsed, punctuation-stripped) and hashed with SHA-256. Identical prompts return cached responses at zero cost and sub-50ms latency. TTL: 1 hour.
Smart Routing Short prompts (< 100 characters) go to the fast/cheap model (llama-3.1-8b-instant). Longer prompts go to the powerful model (llama-3.3-70b-versatile). You can also specify a model explicitly per request.
Cost Tracking Every request is logged with input/output token counts, calculated cost (based on Groq pricing), latency, cache hit status, and the model used. Stored in PostgreSQL.
Real-time Dashboard Next.js 14 UI with live cost ticker, request timeline, cache hit analytics, cost-by-model charts, and guardrail alert feed — all updated in real time via Socket.io.

Project Structure

HackVision/
├── gateway/              Fastify API — the core proxy server
│   ├── src/
│   │   ├── server.js         Server entry point (Fastify + Socket.io + Redis pub/sub)
│   │   ├── config/           Environment configs (dev.js, prod.js)
│   │   ├── routes/
│   │   │   ├── chat.js       POST /v1/chat/completions — main pipeline
│   │   │   └── analytics.js  GET /api/analytics/* — dashboard data
│   │   ├── services/
│   │   │   ├── guardrail.js  PII pattern matching (credit card, aadhaar, email, phone)
│   │   │   ├── cache.js      Redis cache — normalize, hash, get/set with TTL
│   │   │   ├── router.js     Smart model selection by prompt length
│   │   │   ├── proxy.js      Forward requests to Groq API
│   │   │   └── cost.js       Token estimation and cost calculation
│   │   └── lib/
│   │       ├── prisma.js     Prisma client instance
│   │       └── redis.js      Redis client instance
│   └── prisma/
│       └── schema.prisma     RequestLog + GuardrailEvent models
│
├── dashboard/            Next.js 14 real-time analytics UI
│   └── src/
│       ├── app/
│       │   ├── page.js           Landing page
│       │   ├── docs/page.js      API documentation page
│       │   └── dashboard/page.js Live analytics dashboard
│       ├── components/           CostTicker, RequestTimeline, CacheAnalytics, etc.
│       └── lib/
│           ├── api.js            REST client for gateway analytics endpoints
│           └── utils.js          Helpers
│
├── sdk/                  Lightweight Node.js client for the gateway
│   ├── index.js              createLLMClient() — chat() and chatRaw()
│   └── package.json          Package: llmshield-sdk
│
├── demo-app/             Interactive CLI chatbot that routes through the gateway
│   ├── index.js              readline REPL using the SDK
│   └── package.json          Package: llmshield-demo
│
├── docker-compose.yml    One-command setup: Postgres + Redis + Gateway + Dashboard
├── render.yaml           Render.com deployment blueprint
├── .env.example          Template for environment variables
└── README.md             You are here

Quick Start (Local)

Prerequisites

Tip: If you don't want to install Postgres and Redis locally, skip to the Docker setup below.

1. Clone and Configure Environment

git clone <your-repo-url>
cd HackVision

# Copy the example env file into the gateway folder
cp .env.example gateway/.env

Open gateway/.env and set your Groq API key:

GROQ_API_KEY=gsk_your_actual_key_here
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/llmshield
REDIS_URL=redis://localhost:6379
PORT=3000
SMALL_MODEL=llama-3.1-8b-instant
LARGE_MODEL=llama-3.3-70b-versatile
EMBEDDINGS_ENABLED=false

2. Start the Gateway

cd gateway
npm install
npx prisma generate    # Generate the Prisma client
npx prisma db push     # Create tables in PostgreSQL
npm run dev            # Starts on http://localhost:3000 with --watch

Verify the gateway is running:

curl http://localhost:3000/health
# → { "status": "ok", "env": "dev", "ts": ... }

3. Start the Dashboard

Open a new terminal:

cd dashboard
npm install
npm run dev            # Starts on http://localhost:3001

Open http://localhost:3001 in your browser. The landing page describes LLMShield. Navigate to /dashboard for live analytics.

4. Try the Demo CLI Chatbot

Open another terminal:

cd demo-app
node index.js

This starts an interactive chatbot in your terminal. Type any prompt and it will route through the LLMShield gateway. Try sending a prompt with an email address (e.g., Send me info at test@example.com) to see guardrails in action.


Quick Start (Docker)

If you have Docker and Docker Compose installed, you can spin up the entire stack (Postgres, Redis, Gateway, Dashboard) with one command:

cp .env.example .env
# Edit .env and set your GROQ_API_KEY
docker compose up --build
Service URL
Gateway http://localhost:3000
Dashboard http://localhost:3001
Postgres localhost:5432
Redis localhost:6379

To stop everything: docker compose down. To also remove the database volume: docker compose down -v.


Usage

Option 1: Direct API (cURL / any HTTP client)

The gateway exposes an OpenAI-compatible endpoint. Any tool or library that speaks the OpenAI chat format works out of the box.

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is LLMShield?"}]
  }'

You can optionally specify a model (otherwise smart routing picks one):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "model": "llama-3.3-70b-versatile"
  }'

The response includes standard OpenAI-format fields plus an _llmshield metadata block:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [
    {
      "message": { "role": "assistant", "content": "LLMShield is..." },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "usage": { "prompt_tokens": 12, "completion_tokens": 85, "total_tokens": 97 },
  "_llmshield": {
    "model": "llama-3.1-8b-instant",
    "cost": 0.0000082,
    "latency": 842,
    "cacheHit": false
  }
}

If a guardrail is triggered (e.g., PII detected), the gateway returns a 403:

{
  "error": "Request blocked by guardrail",
  "details": {
    "blocked": true,
    "type": "email",
    "message": "Email address detected"
  }
}

Option 2: Using the SDK (Node.js)

The llmshield-sdk package provides a lightweight wrapper around the gateway API.

Within this monorepo

The demo-app already uses the SDK via a relative import (require('../sdk')). You can do the same from any script inside this repo:

const { createLLMClient } = require('./sdk');  // adjust path as needed

const client = createLLMClient({ baseURL: 'http://localhost:3000' });

// Simple: send a prompt, get back a string
const answer = await client.chat('What is Node.js?');
console.log(answer);

// Advanced: full control over the request body
const raw = await client.chatRaw({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain caching in one sentence.' },
  ],
  model: 'llama-3.3-70b-versatile',
});
console.log(raw.choices[0].message.content);
console.log(raw._llmshield); // { model, cost, latency, cacheHit }

In an external project (using npm link)

Since the SDK is not published to npm, use npm link to make it available globally on your machine:

Step 1 — Register the SDK globally:

cd /path/to/HackVision/sdk
npm link

This creates a global symlink for llmshield-sdk.

Step 2 — Link it into your project:

cd /path/to/your-other-project
npm link llmshield-sdk

Step 3 — Use it in your code:

const { createLLMClient } = require('llmshield-sdk');

const client = createLLMClient({ baseURL: 'http://localhost:3000' });

async function main() {
  try {
    const answer = await client.chat('Hello from my project!');
    console.log('Response:', answer);
  } catch (err) {
    if (err.status === 403) {
      console.log('Blocked by guardrail:', err.details?.message);
    } else {
      console.error('Error:', err.message);
    }
  }
}

main();

Note: npm link creates a symlink, so any changes you make to HackVision/sdk/index.js are immediately reflected in linked projects. To unlink later, run npm unlink llmshield-sdk in your project.

SDK API Reference

Method Signature Returns
chat chat(prompt: string, options?: object) Promise<string> — the assistant's reply text
chatRaw chatRaw(body: object) Promise<object> — full OpenAI-format response with _llmshield metadata

The options parameter in chat() is spread into the request body, so you can pass { model: 'llama-3.3-70b-versatile' } to override smart routing.


Option 3: Using the CLI Demo

The demo-app is an interactive terminal chatbot that uses the SDK under the hood. It is useful for testing the gateway manually.

cd demo-app
node index.js
  ╔═══════════════════════════════════════╗
  ║        LLMShield Demo Chatbot         ║
  ╠═══════════════════════════════════════╣
  ║  All requests route through the       ║
  ║  LLMShield gateway (cache, guard,     ║
  ║  cost tracking, smart routing).        ║
  ║                                       ║
  ║  Type "exit" to quit.                 ║
  ╚═══════════════════════════════════════╝

You > What is caching?

Assistant (312ms) > Caching is the process of storing frequently accessed data...

You > My email is john@example.com

[BLOCKED] Email address detected

You > exit

Goodbye!

You can point it at a different gateway by setting the GATEWAY_URL environment variable:

GATEWAY_URL=https://your-deployed-gateway.com node index.js

Architecture

Request Pipeline (detailed)

Every request to POST /v1/chat/completions goes through this pipeline inside the gateway:

Client (SDK / cURL / any HTTP client)
  │
  │  POST /v1/chat/completions
  │  { "messages": [{ "role": "user", "content": "..." }] }
  │
  ▼
┌──────────────────────────────────────────────────────────────┐
│                     LLMShield Gateway                        │
│                                                              │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 1. GUARDRAIL CHECK                                      │ │
│  │    Scan prompt against PII regex patterns:               │ │
│  │    • Credit card numbers (16-digit patterns)             │ │
│  │    • Aadhaar numbers (12-digit formatted)                │ │
│  │    • Email addresses                                     │ │
│  │    • Phone numbers (Indian format)                       │ │
│  │    If matched → 403 + log to DB + emit Socket.io event   │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 2. CACHE LOOKUP                                         │ │
│  │    Normalize prompt → SHA-256 hash → Redis GET           │ │
│  │    If HIT → return cached response (cost $0, ~1ms)       │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 3. SMART ROUTING                                        │ │
│  │    If no model specified in request:                     │ │
│  │    • prompt.length < 100 → llama-3.1-8b-instant (fast)   │ │
│  │    • prompt.length >= 100 → llama-3.3-70b-versatile       │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 4. LLM PROXY                                           │ │
│  │    Forward to Groq API (OpenAI-compatible endpoint)      │ │
│  │    https://api.groq.com/openai/v1/chat/completions       │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ 5. POST-PROCESSING                                      │ │
│  │    • Calculate cost (tokens × per-model pricing)         │ │
│  │    • Store response in Redis cache (TTL 1 hour)          │ │
│  │    • Log to PostgreSQL (RequestLog table)                │ │
│  │    • Publish event to Redis pub/sub channel              │ │
│  └────────────────────────┬────────────────────────────────┘ │
│                           │                                  │
└───────────────────────────┼──────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼                           ▼
     Response to Client          Redis pub/sub channel
     (with _llmshield              "llmshield:events"
      metadata)                         │
                                        ▼
                                   Socket.io
                                        │
                                        ▼
                                ┌───────────────┐
                                │   Dashboard    │
                                │  (Next.js)     │
                                │  Live updates  │
                                └───────────────┘

Tech Stack

Layer Technology
Gateway server Fastify 4
Database PostgreSQL 16 + Prisma ORM
Cache & pub/sub Redis 7 + ioredis
Real-time Socket.io 4
Dashboard Next.js 14 + Tailwind CSS + Recharts
LLM provider Groq API (OpenAI-compatible)
SDK Vanilla Node.js (fetch)
Containerization Docker + Docker Compose

Database Schema

RequestLog
├── id         (UUID, primary key)
├── prompt     (String, truncated to 500 chars)
├── response   (String, truncated to 500 chars)
├── tokens     (Int, total input + output)
├── cost       (Float, USD)
├── latency    (Int, milliseconds)
├── cacheHit   (Boolean)
├── model      (String)
└── createdAt  (DateTime)

GuardrailEvent
├── id         (UUID, primary key)
├── type       (String — creditCard, aadhaar, email, phone)
├── message    (String — human-readable description)
├── prompt     (String, truncated to 500 chars)
└── createdAt  (DateTime)

Configuration

All configuration is via environment variables in gateway/.env:

Variable Default Description
GROQ_API_KEY Required. Your Groq API key
DATABASE_URL postgresql://postgres:postgres@localhost:5432/llmshield PostgreSQL connection string
REDIS_URL redis://localhost:6379 Redis connection string
PORT 3000 Port the gateway listens on
SMALL_MODEL llama-3.1-8b-instant Model used for short prompts (< 100 chars)
LARGE_MODEL llama-3.3-70b-versatile Model used for long prompts (>= 100 chars)
EMBEDDINGS_ENABLED false Future: toggle for vector-based semantic cache

For the dashboard, set this in dashboard/.env.production (or as an environment variable):

Variable Default Description
NEXT_PUBLIC_GATEWAY_URL http://localhost:3000 Gateway URL for the dashboard UI

API Endpoints

Gateway

Method Endpoint Description
POST /v1/chat/completions OpenAI-compatible chat endpoint (main pipeline)
GET /health Health check — returns { status, env, ts }
GET /api/analytics/overview Aggregate stats: total requests, cost, avg latency
GET /api/analytics/requests Recent request logs
GET /api/analytics/cost-by-model Cost breakdown by model
GET /api/analytics/guardrail-events Recent guardrail block events

Dashboard

Route Description
/ Landing page
/docs Integration documentation
/dashboard Live analytics (cost ticker, timeline, charts, guardrail alerts)

License

MIT

About

One line of code between your app and any LLM — instantly cutting costs by 60%, blocking PII before it leaks, and routing every request to the smartest model for the job.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors