An intelligent RAG-based system that analyzes and predicts the cost, performance impact, and safety of database queries, API requests, GraphQL queries, and vector searches before they execute.
- Multi-Query Type Support β SQL, GraphQL, REST API, Vector Search
- Cost Prediction β Estimate CPU, memory, and disk I/O before execution
- User Tier Management β Free, Pro, and Enterprise tier limits
- RAG-Powered Analysis β Learns from historical query patterns
- LLM Reasoning β Uses Llama 3.2 for intelligent cost prediction
- Vector Store Flexibility β SQLite (local) and Pinecone (cloud) support
- Beautiful Dashboard β Modern animated web dashboard
- REST API β Easy integration with any application
- Persistent Metrics β Track queries across restarts
- Docker & Docker Compose
- Python 3.12+ (for local development)
- Ollama with Llama 3.2:3b model (or use Docker setup)
- Pinecone account (required for cloud vector store β free tier available)
git clone https://github.com/dikshith/QueryGuard.git
cd QueryGuardImportant: You must configure your own API keys before starting.
Option A β Environment variables (recommended):
# Copy the environment template
cp .env.example .env
# Edit .env and add your keys
nano .envAdd the following to your .env file:
PINECONE_API_KEY=your_actual_pinecone_api_key_hereOption B β config.yaml (alternative):
Edit config.yaml and update the vector_db section:
vector_db:
provider: "pinecone" # or "sqlite" for local testing
pinecone:
api_key: "your_actual_pinecone_api_key_here" # CHANGE THIS
environment: "us-east1-gcp"
index_name: "query-guardrail"
dimension: 384
metric: "cosine"Get a Pinecone API key:
- Sign up at https://www.pinecone.io (free tier available)
- Go to API Keys section
- Copy your key and paste it into
.envorconfig.yaml
# Pull the Ollama model first (one-time setup)
docker run -d --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3.2:3b
# Start all services
docker-compose up -d
# Check logs
docker-compose logs -f api| Service | URL |
|---|---|
| Dashboard | http://localhost:8000/ |
| API Docs (Swagger) | http://localhost:8000/docs |
| Health Check | http://localhost:8000/api/v1/health |
| Debug Vector Store | http://localhost:8000/api/v1/debug/vector-store |
curl http://localhost:8000/api/v1/debug/vector-storeExpected response:
{
"provider": "pinecone",
"total_vectors": 0,
"index_name": "query-guardrail"
}docker-compose exec api python scripts/ingest_sample_data.pyfrom examples.python_client import QueryGuardrailClient
client = QueryGuardrailClient()
result = client.analyze_query(
query="SELECT * FROM users WHERE created_at > '2024-01-01' LIMIT 100",
query_type="sql",
user_tier="pro"
)
print(f"Decision: {result['decision']}")
print(f"Cost: {result['cost_class']}")
print(f"Reasoning: {result['reasoning']}")curl -X POST "http://localhost:8000/api/v1/analyze-query" \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT * FROM users LIMIT 100",
"query_type": "sql",
"user_tier": "free"
}'curl -X POST "http://localhost:8000/api/v1/analyze-query" \
-H "Content-Type: application/json" \
-d '{
"query": "{ users(first: 100) { id name orders { id total } } }",
"query_type": "graphql",
"user_tier": "pro"
}'curl -X POST "http://localhost:8000/api/v1/analyze-query" \
-H "Content-Type: application/json" \
-d '{
"query": "{\"method\": \"POST\", \"endpoint\": \"/api/v1/export\", \"payload_size_kb\": 200}",
"query_type": "api",
"user_tier": "enterprise"
}'curl -X POST "http://localhost:8000/api/v1/analyze-query" \
-H "Content-Type: application/json" \
-d '{
"query": "{\"vector_dim\": 384, \"top_k\": 10, \"metric\": \"cosine\"}",
"query_type": "vector_search",
"user_tier": "pro"
}'SQLite (local testing):
vector_db:
provider: "sqlite"
sqlite:
path: "./data/vector_db/queries.db"
dimension: 384Pinecone (production β recommended):
vector_db:
provider: "pinecone"
pinecone:
api_key: ${PINECONE_API_KEY} # use environment variable
environment: "us-east1-gcp"
index_name: "query-guardrail"
dimension: 384Ollama (local β default):
llm:
provider: "ollama"
model: "llama3.2:3b"
base_url: "http://ollama:11434" # or http://localhost:11434 for local dev
temperature: 0.2
max_tokens: 1000OpenAI (alternative):
llm:
provider: "openai"
model: "gpt-4"
api_key: "${OPENAI_API_KEY}"
temperature: 0.2
max_tokens: 1000policies:
tier_limits:
free:
max_cpu_ms: 100
max_memory_kb: 10240 # 10 MB
max_rows: 1000
rate_limit_per_minute: 10
pro:
max_cpu_ms: 500
max_memory_kb: 51200 # 50 MB
max_rows: 10000
rate_limit_per_minute: 60
enterprise:
max_cpu_ms: 5000
max_memory_kb: 512000 # 500 MB
max_rows: 100000
rate_limit_per_minute: 300QueryGuard/
βββ src/
β βββ api/ # FastAPI routes & Pydantic models
β βββ core/
β β βββ parsers/ # SQL, GraphQL, API, Vector parsers
β βββ rag/ # Vector store, LLM client, retriever
β βββ policies/ # Rules engine & decision maker
β βββ data/ # Data ingestion & schema store
β βββ utils/ # Config, logging, metrics
βββ dashboard/
β βββ dashboard.html # Animated metrics dashboard
β βββ index.html # Query analyzer interface
βββ examples/ # Python client examples
βββ scripts/ # Utility scripts (data ingestion etc.)
βββ config.yaml # Main configuration file
βββ docker-compose.yml # Docker service orchestration
βββ Dockerfile # API container definition
βββ requirements.txt # Python dependencies
βββ init_system.py # One-time system initialisation
βββ README.md
- Query parsing β the appropriate parser extracts features (complexity, joins, nesting depth, etc.) based on query type
- Embedding generation β the query is converted to a 384-dimensional vector using
all-MiniLM-L6-v2 - Vector search β similar historical queries are retrieved from SQLite or Pinecone using cosine similarity
- LLM reasoning β Llama 3.2 predicts CPU, memory, and cost class using the retrieved examples as context (RAG)
- Policy evaluation β the rules engine checks predicted cost against the user's tier limits
- Decision β returns
ALLOW,WARN, orBLOCKwith a reasoning string - Storage β the query and its result are stored in the vector DB for future similarity matching
# Run all tests
pytest
# Run parser unit tests only (no API or LLM required)
pytest tests/test_parsers.py
# Run with coverage report
pytest --cov=src tests/
# Run end-to-end tests (requires running API)
pytest tests/test_e2e.py -v# Health check
curl http://localhost:8000/api/v1/health
# Query statistics
curl http://localhost:8000/api/v1/stats
# Vector store status
curl http://localhost:8000/api/v1/debug/vector-store# Install dependencies
pip install -r requirements.txt
# Set environment variables
export PINECONE_API_KEY=your_key_here
# Start Ollama and pull the model
ollama serve
ollama pull llama3.2:3b
# Initialise the system (run once)
python init_system.py
# Start the API
uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000Then open http://localhost:8000/ in your browser.
- API keys β never commit
.envfiles or hardcode keys inconfig.yaml. Use environment variables. - Network isolation β run services in an isolated Docker network so Ollama is not publicly accessible.
- Tier limits β review and tighten limits in
config.yamlbefore going to production. - CORS β configure allowed origins in
src/api/main.pyto match your frontend domain. - Rate limiting β add rate limiting middleware before exposing the API publicly.
- Monitoring β persistent metrics are already configured in
./data/metrics.db.
"Vector store not initialized"
- Check that your Pinecone API key is set correctly in
config.yamlor.env - Verify network connectivity to Pinecone
- Check logs:
docker-compose logs -f api
"total_vectors: 0" after analyzing queries
- Restart the API:
docker-compose restart api - Check logs for storage errors
- Visit the debug endpoint: http://localhost:8000/api/v1/debug/vector-store
Dashboard shows 0 queries after restart
- This is expected on first run β metrics are persisted in
./data/metrics.dband will accumulate over time
Ollama connection issues
- Confirm the Ollama container is running:
docker ps - Pull the model manually:
docker exec -it ollama ollama pull llama3.2:3b - Check Ollama logs:
docker logs ollama
- Sentence Transformers β query embeddings
- Pinecone β cloud vector database
- Ollama β local LLM inference
- FastAPI β API framework
- Llama 3.2 by Meta AI β cost reasoning model
Remember: Configure your Pinecone API key before first use. See the Configuration section above.