Skip to content

mishrak5j/Query-Guardian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ██████╗ ██╗   ██╗███████╗██████╗ ██╗   ██╗
██╔═══██╗██║   ██║██╔════╝██╔══██╗╚██╗ ██╔╝
██║   ██║██║   ██║█████╗  ██████╔╝ ╚████╔╝ 
██║▄▄ ██║██║   ██║██╔══╝  ██╔═══╝   ╚██╔╝  
╚██████╔╝╚██████╔╝███████╗██║        ██║   
 ╚══▀▀═╝  ╚═════╝ ╚══════╝╚═╝        ╚═╝   

Query Guardian — ask in plain language. SQL that survives the plan.

Natural language → pruned schema → SQL → EXPLAIN → (optional) rewrite → guarded execution → formatted output.

Python LangGraph Ollama


Why Query Guardian exists

Most “text-to-SQL” demos stop at the first SELECT. Query Guardian treats that as the beginning: it inspects the query plan (via dialect-aware EXPLAIN), loops with an optimizer node when the plan looks risky (e.g. full table scans), and caps execution so huge result sets do not silently blow memory—optionally pausing for a human export choice before the final report.

Think of it as a small safety rail around LLM-generated SQL, not a black box that talks to your database.


Architecture

flowchart LR
  A[Pruner] --> B[Draftsman]
  B --> C[Planner]
  C -->|plan OK| E[Executor]
  C -->|needs work| D[Optimizer]
  D --> C
  E --> F[Reporter]
Loading
Stage Role
Pruner Picks relevant tables from the live catalog and pulls schema, row stats, and tiny samples for context.
Draftsman Turns the question into SQL (local coder model via Ollama).
Planner Runs EXPLAIN / EXPLAIN QUERY PLAN, flags scans and missing LIMIT, sets is_optimized.
Optimizer Rewrites SQL when the plan is still noisy—then re-plans until things look acceptable.
Executor Counts rows first; large results trigger a interrupt path so you choose table vs CSV export.
Reporter Applies the chosen format (e.g. CSV export to export_result.csv).

The graph is compiled with checkpointing and interrupt_before=["reporter"] so you can resume after user input—see tests/run_scenarios.py for a streaming + resume example.


Stack

  • LangGraph — state machine for the pipeline
  • LangChain OllamaChatOllama for local models
  • SQLAlchemy — connections, inspection, EXPLAIN
  • pytest — node and integration-style tests (make test)

Prerequisites

  • Python 3.10+ recommended
  • Ollama with models referenced in the nodes (e.g. llama3.2, deepseek-coder:6.7b)—adjust names in nodes/*.py to match your install
  • A database URL (or use the default SQLite fallback)

Configuration

Create a .env in the project root (see .gitignore; do not commit secrets):

DATABASE_URL=sqlite:///path/to/your.db
# Or, for example:
# DATABASE_URL=postgresql+psycopg2://user:pass@localhost:5432/dbname

If DATABASE_URL is unset, connection.py falls back to sqlite:///chinook.db.


Setup

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install langgraph langchain-ollama sqlalchemy python-dotenv pytest pytest-cov

Ensure Ollama is running and the models you reference are pulled.


Run tests

make test

For watch mode while developing:

make watch

Project layout

Query-Guardian/
├── main.py           # LangGraph definition + compile + interrupt
├── state.py          # Shared TypedDict state
├── connection.py     # DatabaseManager + global db_manager
├── nodes/            # Pruner, draftsman, planner, optimizer, executor, reporter
├── utils/            # Formatting, DB helpers
└── tests/            # pytest + scenario runner

Philosophy

Guard the query, not just the prompt. Plans and row counts are cheap; wrong answers and runaway reads are expensive. Query Guardian keeps the LLM in the loop for rewriting—but lets the database tell the truth about what will actually run.


Built for teams who want NL→SQL without pretending the database is optional.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors