Skip to content

dikshith/QueryGuard

Repository files navigation

πŸ›‘οΈ QueryGuard β€” AI-Powered Query Cost Prediction

An intelligent RAG-based system that analyzes and predicts the cost, performance impact, and safety of database queries, API requests, GraphQL queries, and vector searches before they execute.


🎯 Features

  • Multi-Query Type Support β€” SQL, GraphQL, REST API, Vector Search
  • Cost Prediction β€” Estimate CPU, memory, and disk I/O before execution
  • User Tier Management β€” Free, Pro, and Enterprise tier limits
  • RAG-Powered Analysis β€” Learns from historical query patterns
  • LLM Reasoning β€” Uses Llama 3.2 for intelligent cost prediction
  • Vector Store Flexibility β€” SQLite (local) and Pinecone (cloud) support
  • Beautiful Dashboard β€” Modern animated web dashboard
  • REST API β€” Easy integration with any application
  • Persistent Metrics β€” Track queries across restarts

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.12+ (for local development)
  • Ollama with Llama 3.2:3b model (or use Docker setup)
  • Pinecone account (required for cloud vector store β€” free tier available)

1. Clone the repository

git clone https://github.com/dikshith/QueryGuard.git
cd QueryGuard

2. Configure API Keys

Important: You must configure your own API keys before starting.

Option A β€” Environment variables (recommended):

# Copy the environment template
cp .env.example .env

# Edit .env and add your keys
nano .env

Add the following to your .env file:

PINECONE_API_KEY=your_actual_pinecone_api_key_here

Option B β€” config.yaml (alternative):

Edit config.yaml and update the vector_db section:

vector_db:
  provider: "pinecone"   # or "sqlite" for local testing
  pinecone:
    api_key: "your_actual_pinecone_api_key_here"   # CHANGE THIS
    environment: "us-east1-gcp"
    index_name: "query-guardrail"
    dimension: 384
    metric: "cosine"

Get a Pinecone API key:

  1. Sign up at https://www.pinecone.io (free tier available)
  2. Go to API Keys section
  3. Copy your key and paste it into .env or config.yaml

3. Start with Docker

# Pull the Ollama model first (one-time setup)
docker run -d --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3.2:3b

# Start all services
docker-compose up -d

# Check logs
docker-compose logs -f api

4. Access services

Service URL
Dashboard http://localhost:8000/
API Docs (Swagger) http://localhost:8000/docs
Health Check http://localhost:8000/api/v1/health
Debug Vector Store http://localhost:8000/api/v1/debug/vector-store

5. Verify setup

curl http://localhost:8000/api/v1/debug/vector-store

Expected response:

{
  "provider": "pinecone",
  "total_vectors": 0,
  "index_name": "query-guardrail"
}

6. Load sample data (optional)

docker-compose exec api python scripts/ingest_sample_data.py

πŸ“Š Usage Examples

Python client

from examples.python_client import QueryGuardrailClient

client = QueryGuardrailClient()

result = client.analyze_query(
    query="SELECT * FROM users WHERE created_at > '2024-01-01' LIMIT 100",
    query_type="sql",
    user_tier="pro"
)

print(f"Decision:  {result['decision']}")
print(f"Cost:      {result['cost_class']}")
print(f"Reasoning: {result['reasoning']}")

cURL β€” SQL query

curl -X POST "http://localhost:8000/api/v1/analyze-query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT * FROM users LIMIT 100",
    "query_type": "sql",
    "user_tier": "free"
  }'

cURL β€” GraphQL query

curl -X POST "http://localhost:8000/api/v1/analyze-query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ users(first: 100) { id name orders { id total } } }",
    "query_type": "graphql",
    "user_tier": "pro"
  }'

cURL β€” API request analysis

curl -X POST "http://localhost:8000/api/v1/analyze-query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{\"method\": \"POST\", \"endpoint\": \"/api/v1/export\", \"payload_size_kb\": 200}",
    "query_type": "api",
    "user_tier": "enterprise"
  }'

cURL β€” Vector search analysis

curl -X POST "http://localhost:8000/api/v1/analyze-query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{\"vector_dim\": 384, \"top_k\": 10, \"metric\": \"cosine\"}",
    "query_type": "vector_search",
    "user_tier": "pro"
  }'

πŸ”§ Configuration

Vector store providers

SQLite (local testing):

vector_db:
  provider: "sqlite"
  sqlite:
    path: "./data/vector_db/queries.db"
    dimension: 384

Pinecone (production β€” recommended):

vector_db:
  provider: "pinecone"
  pinecone:
    api_key: ${PINECONE_API_KEY}   # use environment variable
    environment: "us-east1-gcp"
    index_name: "query-guardrail"
    dimension: 384

LLM configuration

Ollama (local β€” default):

llm:
  provider: "ollama"
  model: "llama3.2:3b"
  base_url: "http://ollama:11434"   # or http://localhost:11434 for local dev
  temperature: 0.2
  max_tokens: 1000

OpenAI (alternative):

llm:
  provider: "openai"
  model: "gpt-4"
  api_key: "${OPENAI_API_KEY}"
  temperature: 0.2
  max_tokens: 1000

Tier limits

policies:
  tier_limits:
    free:
      max_cpu_ms: 100
      max_memory_kb: 10240      # 10 MB
      max_rows: 1000
      rate_limit_per_minute: 10
    pro:
      max_cpu_ms: 500
      max_memory_kb: 51200      # 50 MB
      max_rows: 10000
      rate_limit_per_minute: 60
    enterprise:
      max_cpu_ms: 5000
      max_memory_kb: 512000     # 500 MB
      max_rows: 100000
      rate_limit_per_minute: 300

πŸ“ Project Structure

QueryGuard/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/                  # FastAPI routes & Pydantic models
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   └── parsers/          # SQL, GraphQL, API, Vector parsers
β”‚   β”œβ”€β”€ rag/                  # Vector store, LLM client, retriever
β”‚   β”œβ”€β”€ policies/             # Rules engine & decision maker
β”‚   β”œβ”€β”€ data/                 # Data ingestion & schema store
β”‚   └── utils/                # Config, logging, metrics
β”œβ”€β”€ dashboard/
β”‚   β”œβ”€β”€ dashboard.html        # Animated metrics dashboard
β”‚   └── index.html            # Query analyzer interface
β”œβ”€β”€ examples/                 # Python client examples
β”œβ”€β”€ scripts/                  # Utility scripts (data ingestion etc.)
β”œβ”€β”€ config.yaml               # Main configuration file
β”œβ”€β”€ docker-compose.yml        # Docker service orchestration
β”œβ”€β”€ Dockerfile                # API container definition
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ init_system.py            # One-time system initialisation
└── README.md

πŸŽ“ How It Works

  1. Query parsing β€” the appropriate parser extracts features (complexity, joins, nesting depth, etc.) based on query type
  2. Embedding generation β€” the query is converted to a 384-dimensional vector using all-MiniLM-L6-v2
  3. Vector search β€” similar historical queries are retrieved from SQLite or Pinecone using cosine similarity
  4. LLM reasoning β€” Llama 3.2 predicts CPU, memory, and cost class using the retrieved examples as context (RAG)
  5. Policy evaluation β€” the rules engine checks predicted cost against the user's tier limits
  6. Decision β€” returns ALLOW, WARN, or BLOCK with a reasoning string
  7. Storage β€” the query and its result are stored in the vector DB for future similarity matching

πŸ§ͺ Testing

# Run all tests
pytest

# Run parser unit tests only (no API or LLM required)
pytest tests/test_parsers.py

# Run with coverage report
pytest --cov=src tests/

# Run end-to-end tests (requires running API)
pytest tests/test_e2e.py -v

πŸ“Š Monitoring

# Health check
curl http://localhost:8000/api/v1/health

# Query statistics
curl http://localhost:8000/api/v1/stats

# Vector store status
curl http://localhost:8000/api/v1/debug/vector-store

πŸš€ Local Development (without Docker)

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export PINECONE_API_KEY=your_key_here

# Start Ollama and pull the model
ollama serve
ollama pull llama3.2:3b

# Initialise the system (run once)
python init_system.py

# Start the API
uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000

Then open http://localhost:8000/ in your browser.


πŸ” Security & Best Practices

  1. API keys β€” never commit .env files or hardcode keys in config.yaml. Use environment variables.
  2. Network isolation β€” run services in an isolated Docker network so Ollama is not publicly accessible.
  3. Tier limits β€” review and tighten limits in config.yaml before going to production.
  4. CORS β€” configure allowed origins in src/api/main.py to match your frontend domain.
  5. Rate limiting β€” add rate limiting middleware before exposing the API publicly.
  6. Monitoring β€” persistent metrics are already configured in ./data/metrics.db.

πŸ› Troubleshooting

"Vector store not initialized"

  • Check that your Pinecone API key is set correctly in config.yaml or .env
  • Verify network connectivity to Pinecone
  • Check logs: docker-compose logs -f api

"total_vectors: 0" after analyzing queries

Dashboard shows 0 queries after restart

  • This is expected on first run β€” metrics are persisted in ./data/metrics.db and will accumulate over time

Ollama connection issues

  • Confirm the Ollama container is running: docker ps
  • Pull the model manually: docker exec -it ollama ollama pull llama3.2:3b
  • Check Ollama logs: docker logs ollama

🌟 Acknowledgments


Remember: Configure your Pinecone API key before first use. See the Configuration section above.