██████╗ ██╗ ██╗███████╗██████╗ ██╗ ██╗ ██╔═══██╗██║ ██║██╔════╝██╔══██╗╚██╗ ██╔╝ ██║ ██║██║ ██║█████╗ ██████╔╝ ╚████╔╝ ██║▄▄ ██║██║ ██║██╔══╝ ██╔═══╝ ╚██╔╝ ╚██████╔╝╚██████╔╝███████╗██║ ██║ ╚══▀▀═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝
Query Guardian — ask in plain language. SQL that survives the plan.
Natural language → pruned schema → SQL → EXPLAIN → (optional) rewrite → guarded execution → formatted output.
Most “text-to-SQL” demos stop at the first SELECT. Query Guardian treats that as the beginning: it inspects the query plan (via dialect-aware EXPLAIN), loops with an optimizer node when the plan looks risky (e.g. full table scans), and caps execution so huge result sets do not silently blow memory—optionally pausing for a human export choice before the final report.
Think of it as a small safety rail around LLM-generated SQL, not a black box that talks to your database.
flowchart LR
A[Pruner] --> B[Draftsman]
B --> C[Planner]
C -->|plan OK| E[Executor]
C -->|needs work| D[Optimizer]
D --> C
E --> F[Reporter]
| Stage | Role |
|---|---|
| Pruner | Picks relevant tables from the live catalog and pulls schema, row stats, and tiny samples for context. |
| Draftsman | Turns the question into SQL (local coder model via Ollama). |
| Planner | Runs EXPLAIN / EXPLAIN QUERY PLAN, flags scans and missing LIMIT, sets is_optimized. |
| Optimizer | Rewrites SQL when the plan is still noisy—then re-plans until things look acceptable. |
| Executor | Counts rows first; large results trigger a interrupt path so you choose table vs CSV export. |
| Reporter | Applies the chosen format (e.g. CSV export to export_result.csv). |
The graph is compiled with checkpointing and interrupt_before=["reporter"] so you can resume after user input—see tests/run_scenarios.py for a streaming + resume example.
- LangGraph — state machine for the pipeline
- LangChain Ollama —
ChatOllamafor local models - SQLAlchemy — connections, inspection,
EXPLAIN - pytest — node and integration-style tests (
make test)
- Python 3.10+ recommended
- Ollama with models referenced in the nodes (e.g.
llama3.2,deepseek-coder:6.7b)—adjust names innodes/*.pyto match your install - A database URL (or use the default SQLite fallback)
Create a .env in the project root (see .gitignore; do not commit secrets):
DATABASE_URL=sqlite:///path/to/your.db
# Or, for example:
# DATABASE_URL=postgresql+psycopg2://user:pass@localhost:5432/dbnameIf DATABASE_URL is unset, connection.py falls back to sqlite:///chinook.db.
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install langgraph langchain-ollama sqlalchemy python-dotenv pytest pytest-covEnsure Ollama is running and the models you reference are pulled.
make testFor watch mode while developing:
make watchQuery-Guardian/
├── main.py # LangGraph definition + compile + interrupt
├── state.py # Shared TypedDict state
├── connection.py # DatabaseManager + global db_manager
├── nodes/ # Pruner, draftsman, planner, optimizer, executor, reporter
├── utils/ # Formatting, DB helpers
└── tests/ # pytest + scenario runner
Guard the query, not just the prompt. Plans and row counts are cheap; wrong answers and runaway reads are expensive. Query Guardian keeps the LLM in the loop for rewriting—but lets the database tell the truth about what will actually run.
Built for teams who want NL→SQL without pretending the database is optional.