A comprehensive educational repository demonstrating the progression from basic SQL agents to production-ready, secure analytics systems using LangChain and OpenAI. Learn by example how to build safe, powerful SQL agents for business intelligence.
This repository teaches you to build SQL agents with progressive security implementations, from completely unrestricted (dangerous) to production-ready (secure). Each script builds upon the previous one, showing the evolution of security practices.
- Simple LLM β Pure language model usage without agents or tools
- Basic SQL Agent β Simple database querying with built-in tools
- Dangerous Agent β What NOT to do (security vulnerabilities)
- Secure Agent β Production-ready guardrails and validation
- Analytics Agent β Advanced business intelligence capabilities
- Input validation and SQL injection prevention
- Whitelist-based security (only allow SELECT statements)
- Result set limiting to prevent resource exhaustion
- Multi-statement prevention to block SQL injection attacks
- Error handling and graceful failure modes
- Schema-based restrictions for table access control
- Complex multi-table JOINs for business intelligence
- Revenue analysis and customer lifetime value calculations
- Time-series analysis with date functions and aggregations
- Product performance rankings and trend analysis
- Multi-turn conversations for iterative data exploration
- Python 3.8+
- OpenAI API key
- Basic understanding of SQL and Python
# 1. Create and activate virtual environment (from project root)
python -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows
# 2. Install dependencies (from project root)
pip install -r requirements.txt
# 3. Configure environment variables (from project root, if .env doesn't exist)
cp .env.example .env
# Edit .env and add your OpenAI API key: OPENAI_API_KEY=sk-...
# 4. Navigate to the SQLAgent folder
cd SQLAgent
# 5. (Optional) Reset database to initial state
python scripts/reset_db.py
# 6. Run the tutorial scripts in order
python scripts/01_simple_agent.py # Basic SQL agent
python scripts/02_risky_delete_demo.py # β οΈ Dangerous patterns (educational)
python scripts/03_guardrailed_agent.py # Secure implementation
python scripts/04_complex_queries.py # Advanced analytics
Project Root/
βββ βοΈ requirements.txt # Python dependencies (shared by all projects)
βββ π .env.example # Environment variables template
βββ π .env # Your environment variables (with API key)
βββ ποΈ .venv/ # Virtual environment (shared)
βββ π SQLAgent/ # SQL Agent Educational Package
βββ π sql_agent_class.db # Pre-built SQLite database with sample data
βββ π sql_agent_seed.sql # Database schema and seed data (idempotent)
βββ π README.md # This comprehensive guide
βββ π scripts/ # Progressive tutorial scripts
βββ π reset_db.py # Database reset utility
βββ 0οΈβ£ 00_simple_llm.py # Simple LLM usage (no agents/tools)
βββ 1οΈβ£ 01_simple_agent.py # Basic SQL agent implementation
βββ β οΈ 02_risky_delete_demo.py # Dangerous patterns (educational only)
βββ π‘οΈ 03_guardrailed_agent.py # Secure SQL agent with guardrails
βββ π 04_complex_queries.py # Advanced analytics capabilities
Purpose: Introduction to LangChain SQL agents Security Level: β None (unrestricted access) Use Case: Learning basic agent concepts
Key Features:
- Uses LangChain's built-in
create_sql_agent()
- No security restrictions whatsoever
- Can execute ANY SQL including DELETE, DROP, etc.
- Simple demonstration of agent capabilities
- No input validation
- No operation restrictions
- No result set limits
- Not suitable for any production use
Purpose: Educational example of what NOT to do Security Level: β Actively dangerous Use Case: Understanding security vulnerabilities
Key Features:
- Custom tool that executes ANY SQL
- Explicitly allows destructive operations
- Demonstrates actual data deletion
- Shows transaction commit behavior
- Zero input validation or sanitization
- Allows DELETE, DROP, TRUNCATE operations
- No user permissions or access controls
- Direct SQL execution without safety checks
- Automatic transaction commits make changes permanent
Educational Value:
- Shows exactly why guardrails are needed
- Demonstrates real consequences of unrestricted access
- Provides basis for understanding security improvements
Purpose: Production-ready secure SQL agent Security Level: β High (multiple guardrails) Use Case: Safe analytics and reporting
Security Features:
- β Input validation using regex patterns
- β Whitelist approach - only SELECT statements allowed
- β Automatic LIMIT injection to prevent large result sets
- β SQL injection protection through pattern matching
- β Multiple statement prevention to block chained attacks
- β Comprehensive error handling with informative messages
- β Read-only operations only - no data modification possible
Technical Implementation:
- Custom
SafeSQLTool
class with validation layers - Regex-based dangerous operation detection
- Performance optimization through result limiting
- Structured error handling and reporting
Purpose: Business intelligence and complex analytics Security Level: β High (inherits all guardrails from script 03) Use Case: Production analytics and reporting systems
Advanced Features:
- π Complex multi-table JOINs for comprehensive analysis
- π Revenue analysis with sophisticated business logic
- π Time-series analysis for trend identification
- π₯ Customer segmentation and lifetime value calculations
- π Performance rankings and comparative analysis
- π¬ Multi-turn conversations for iterative data exploration
Analytics Capabilities Demonstrated:
-
Product revenue analysis with cross-table aggregations
-
Weekly trend analysis using date functions
-
Customer lifecycle analysis with segmentation
-
Lifetime value rankings with complex calculations
-
Category performance analysis with drill-down capabilities

The included SQLite database contains realistic e-commerce data:
customers
- Customer information (id, name, email, created_at, region)orders
- Order records (id, customer_id, order_date, status)order_items
- Line items (id, order_id, product_id, quantity, unit_price_cents)products
- Product catalog (id, name, category, description)payments
- Payment records (id, order_id, amount_cents, method)refunds
- Refund tracking (id, order_id, amount_cents, reason)
- Revenue calculation:
sum(quantity * unit_price_cents) - refunds.amount_cents
- Customer lifetime value: Total revenue per customer minus refunds
- Product performance: Revenue and units sold by product/category
- Time-series analysis: Weekly/monthly revenue trends
# Multi-layer validation approach
if re.search(r"\b(INSERT|UPDATE|DELETE|DROP|TRUNCATE|ALTER|CREATE|REPLACE)\b", s, re.I):
return "ERROR: write operations are not allowed."
if ";" in s:
return "ERROR: multiple statements are not allowed."
if not re.match(r"(?is)^\s*select\b", s):
return "ERROR: only SELECT statements are allowed."
# Automatic LIMIT injection for performance
if not re.search(r"\blimit\s+\d+\b", s, re.I):
s += " LIMIT 200" # Conservative limit
# Comprehensive error catching and reporting
try:
result = conn.exec_driver_sql(s)
# ... process results
except Exception as e:
return f"ERROR: {e}" # Safe error reporting
-
Start with
01_simple_agent.py
- Understand basic LangChain SQL agent concepts
- See how easy it is to create an unrestricted agent
- Note the security implications
-
Run
02_risky_delete_demo.py
(carefully!)- Observe how dangerous unrestricted access can be
- See actual data deletion in action
- Understand why security measures are essential
-
Study
03_guardrailed_agent.py
- Learn comprehensive security implementation
- Understand each validation layer
- See how to maintain functionality while adding security
-
Explore
04_complex_queries.py
- Discover advanced analytics capabilities
- Learn business intelligence patterns
- Practice with multi-turn conversations
- Security mindset: Always validate and restrict agent capabilities
- Progressive enhancement: Build security incrementally
- Error handling: Provide helpful feedback without exposing vulnerabilities
- Performance consideration: Limit result sets and prevent resource exhaustion
- Business context: Understand the domain for better analytics
- β Use the provided SQLite database for learning
- β Test all security features thoroughly
- β Understand each validation layer before proceeding
- β NEVER use the dangerous patterns from script 02 in production
- β Always use read-only database users for analytics agents
- β Implement comprehensive logging and monitoring
- β Regular security audits and testing
- β Schema-based access controls and table restrictions
- Database isolation: Use dedicated analytics databases
- User permissions: Implement least-privilege access
- Monitoring: Log all agent queries and results
- Rate limiting: Prevent resource exhaustion attacks
- Regular audits: Review agent behavior and query patterns
"ModuleNotFoundError" for LangChain packages
# Ensure virtual environment is activated and dependencies installed
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
"OpenAI API key not found"
# Verify .env file configuration
cp .env.example .env
# Edit .env and add: OPENAI_API_KEY=sk-your-key-here
"Database file not found"
# Reset the database
python scripts/reset_db.py
Pydantic validation errors
- Ensure you're using Python 3.8+ with compatible package versions
- Check that type annotations are properly formatted
- Check the comprehensive inline documentation in each script
- Review error messages carefully - they provide specific guidance
- Test with simple queries first before attempting complex analytics
- Use the Jupyter notebook for interactive exploration
This is an educational repository. Contributions that enhance the learning experience are welcome:
- Documentation improvements: Clearer explanations or additional examples
- Security enhancements: Additional validation patterns or safety measures
- Analytics examples: More sophisticated business intelligence queries
- Error handling: Better user experience and debugging information
Educational use encouraged. Please reference this repository when using the patterns in your own projects.
π― Remember: The goal is to understand the progression from dangerous to secure SQL agent implementations. Start with the basics, understand the risks, then build robust, production-ready solutions.