SQL Agent Security & Analytics Workshop
- An educational codebase walking you through the journey from simple SQL agents to enterprise-grade, secure analytics systems built with LangChain and OpenAI. Step by step, youβll see how to evolve from unsafe prototypes to production-ready solutions.
π― Learning Goals
- This repository demonstrates how to design SQL agents with progressively stronger safeguards. Youβll begin with no restrictions (unsafe) and progress toward fully hardened, analytics-focused implementations. Each script extends the previous one, showcasing real security and reliability improvements.
π Curriculum Flow
-
Foundational LLM β raw language model interaction, no tools
-
Intro SQL Agent β basic database querying with LangChainβs utilities
-
Insecure Agent β examples of unsafe patterns and risks
-
Hardened Agent β applying robust guardrails and validations
-
Analytics Agent β advanced, business-oriented data analysis
π Security Techniques Explored
-
Input validation & defense against SQL injection
-
Whitelist enforcement (only SELECT allowed)
-
Automatic limits to control dataset size
-
Blocking multi-statement chains to reduce exploit risks
-
Error handling with safe, user-friendly messages
-
Schema-based rules to restrict table access
π Analytics Demonstrations
-
Multi-table JOINs for richer business insights
-
Revenue and profitability analysis
-
Cohort & customer lifetime value calculations
-
Time-series exploration for growth trends
-
Product/category performance comparisons
-
Iterative, conversational data exploration
π Getting Started
Requirements
-
Python 3.8+
-
OpenAI API key
-
Basic knowledge of SQL & Python
Setup
``` # 1. Create and activate a virtual environment python -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows # 2. Install dependencies pip install -r requirements.txt # 3. Setup environment variables cp .env.example .env # Edit .env and insert your API key: OPENAI_API_KEY=sk-... # 4. Move into the SQLAgent folder cd SQLAgent # 5. (Optional) Re-initialize the database python scripts/reset_db.py # 6. Run the scripts sequentially python scripts/01_simple_agent.py python scripts/02_risky_delete_demo.py python scripts/03_guardrailed_agent.py python scripts/04_complex_queries.py ```
π Project Layout
``` Project Root/ βββ requirements.txt # Shared dependencies βββ .env.example # Template env vars βββ .env # Local secrets (ignored by git) βββ .venv/ # Virtual environment βββ SQLAgent/ βββ sql_agent_class.db # SQLite sample database βββ sql_agent_seed.sql # Schema & seed data βββ README.md # Main guide βββ scripts/ βββ reset_db.py βββ 00_simple_llm.py βββ 01_simple_agent.py βββ 02_risky_delete_demo.py βββ 03_guardrailed_agent.py βββ 04_complex_queries.py ```
π Script Highlights
-
01 β Simple Agent
-
Purpose: first steps with LangChain SQL agents
-
Security: none
-
Risks: can execute any SQL, destructive included
-
Use case: introductory demo
-
02 β Risky Agent Demo
-
Purpose: intentionally insecure for education
-
Features: executes arbitrary SQL, including deletes/drops
-
Lesson: illustrates why guardrails are critical
-
03 β Guardrailed Agent
-
Purpose: secure, production-style agent
-
Features:
-
Regex-based query validation
-
Only allows SELECT statements
-
Auto-injects LIMIT
-
Rejects multi-queries
-
Structured error feedback
-
Outcome: -
-
safe for analytics & BI
-
04 β Advanced Analytics
-
Purpose: business intelligence queries
-
Adds:
-
Revenue tracking
-
Customer segmentation
-
Time-based trend analysis
-
Comparative product analysis
-
Security: inherits safeguards from script 03
ποΈ Database Overview
Tables:
-
customers β customer profiles
-
orders β order history
-
order_items β line-level details
-
products β product catalog
-
payments β payment records
-
refunds β refund tracking
-
Metrics supported:
-
Revenue = (quantity Γ unit price) β refunds
-
CLV = customer total spend β refunds
-
Product & category performance
-
Weekly/monthly trends
π‘οΈ Security Lessons
-
Validate input β reject dangerous keywords & patterns
-
Force read-only β never permit write operations
-
Enforce limits β prevent resource abuse
-
Handle errors gracefully β reveal nothing sensitive
-
Adopt least privilege β analytics DB should be read-only
β Safe for development & training
β Do not use risky agent patterns in real systems
β Use read-only DB credentials in production
β Add monitoring, logging, and regular audits
π§ Troubleshooting
-
Missing packages β activate venv & run pip install -r requirements.txt
-
No API key β edit .env and add OPENAI_API_KEY=...
-
DB missing β run python scripts/reset_db.py
-
Pydantic errors β check Python version (3.8+) and dependency compatibility
π€ Contributing
-
Contributions that improve learning are welcome:
-
Better documentation/examples
-
More analytics queries
-
Additional security techniques
-
Enhanced error-handling patterns
π License
- This project is for educational purposes. Youβre free to reference these ideas in your own work with proper attribution.
π₯ Remember: the key takeaway is understanding the journey β from unsafe to secure SQL agents. Start small, learn the pitfalls, then build systems that are safe, scalable, and insightful.