A modern, production-ready Text-to-SQL system built with LangChain, LangGraph, and FastAPI. Designed for social science research with NORP (https://norpanel.org/) datasets including crime, homelessness, population, and economic data. Convert natural language questions into SQL queries with conversational context and session management.
- π€ LangGraph-powered SQL Agent - Intelligent workflow for reliable SQL generation
- π Async-First Architecture - Built for performance with async/await throughout
- ποΈ SQLite Integration - Lightweight, fast database operations with aiosqlite
- π NORP Social Science Data - Pre-loaded with US shootings, NYC crime, homelessness, economic, and population datasets
- π¬ Conversational Interface - Maintains context across questions with Redis sessions
- π Multi-LLM Support - Works with OpenAI, Anthropic, and Together AI
- π‘οΈ Type Safety - Full type hints and Pydantic validation
- π Health Monitoring - Built-in health checks and structured logging
- π§ͺ Comprehensive Testing - Unit and integration tests with pytest
- π REST API - Complete FastAPI application with automatic docs
- Python 3.11 or higher
- uv package manager
- Redis server (optional, for session management)
# Clone the repository
git clone https://github.com/bharatr21/text2sql.git
cd text2sql
# Install dependencies with uv
uv sync
# Activate the virtual environment
source .venv/bin/activate # Linux/macOS
# or
.venv\Scripts\activate # Windows- Copy the example environment file:
cp .env.example .env- Edit
.envwith your settings:
# Database
DB_URL=sqlite:///data/sample.db
# LLM Configuration
LLM_PROVIDER=openai
LLM_MODEL=gpt-4
OPENAI_API_KEY=your-api-key-here
# Optional: Redis for sessions
REDIS_HOST=localhost
REDIS_PORT=6379# Create a sample SQLite database with NORP social science data
uv run text2sql create-db --db-path data/sample.db
# View available sample questions
uv run text2sql sample-questions# Development mode with auto-reload
uv run text2sql serve --reload
# Production mode
uv run text2sql serve --host 0.0.0.0 --port 8000 --workers 4The API will be available at http://localhost:8000 with interactive docs at http://localhost:8000/docs.
# Send a query via CLI
uv run text2sql query --question "How many shooting incidents occurred in New York?"
# View sample questions
uv run text2sql sample-questions --limit 5
# Check system health
uv run text2sql health
# List active sessions
uv run text2sql sessions
# Get session details
uv run text2sql sessions --session-id your-session-idcurl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{
"question": "Which state had the highest number of victims killed in shootings?",
"session_id": "user-123",
"message_type": "human"
}'import httpx
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/query",
json={
"question": "What is the number of individuals experiencing homelessness in California?",
"session_id": "user-123",
"message_type": "human"
}
)
print(response.json())Try these questions with the NORP sample database:
Population & Demographics:
- "What was the population of California in 2020?"
- "Which state had the highest population in 2020?"
- "What is the population of Los Angeles County?"
Crime & Public Safety:
- "How many shooting incidents occurred in New York?"
- "Which state had the highest number of victims killed in shootings?"
- "Get all criminal records from NYC where the crime classification is 'Felony'"
- "How many incidents of Assault occurred in Manhattan?"
Social Issues:
- "What is the number of individuals experiencing homelessness in California?"
- "Which counties have the highest poverty rates?"
- "List the top 5 locations with the highest number of homeless individuals"
Correlational Analysis:
- "Compare the population of New York and Texas over the last three census years"
- "Get the number of shooting incidents per 1 million residents in each state"
- "Which areas with high homelessness rates also have high crime rates?"
src/text2sql/
βββ core/ # App configuration, logging, FastAPI setup
βββ agents/ # LangGraph SQL agent
βββ services/ # Database, LLM, Redis, Session services
βββ models/ # Pydantic schemas
βββ utils/ # Database utilities and helpers
βββ cli.py # Command-line interface
tests/
βββ unit/ # Unit tests
βββ integration/ # Integration tests
βββ conftest.py # Test configuration
- SQLAgent: LangGraph state machine for SQL generation workflow
- DatabaseService: Async SQLAlchemy 2.0 database operations
- LLMService: Multi-provider LLM interface (OpenAI, Anthropic, Together)
- SessionService: Redis-based conversation management
- FastAPI App: Modern async REST API with dependency injection
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src/text2sql --cov-report=html
# Run only unit tests
uv run pytest tests/unit/ -m unit
# Run only integration tests
uv run pytest tests/integration/ -m integration# Linting
uv run ruff check src/ tests/
# Auto-fix issues
uv run ruff check --fix src/ tests/
# Type checking
uv run mypy src/
# Format code
uv run ruff format src/ tests/- Install the provider's LangChain integration
- Add provider configuration to
LLMSettings - Implement provider initialization in
LLMService._create_*_llm() - Update tests and documentation
The system includes comprehensive social science datasets from NORP:
Available Tables:
us_shootings- Gun violence incidents across the US with casualty datanyc_crime_data- New York City crime incidents with classifications and locationshomelessness_demographics- Homelessness counts by state, year, and age groupeconomic_income_and_benefits- Household income data by zipcode and yearus_population- State population data from US Censusus_population_county- County-level population datafood_access- Food security and access metrics by census tract
While optimized for SQLite, the system can work with other databases:
- Install appropriate async driver (e.g.,
asyncpgfor PostgreSQL) - Update
DB_URLin configuration - Modify
DatabaseService.async_enginefor database-specific settings
Once the server is running, visit:
- Interactive API Docs:
http://localhost:8000/docs - ReDoc Documentation:
http://localhost:8000/redoc - OpenAPI Schema:
http://localhost:8000/openapi.json
POST /query- Execute natural language queryGET /health- System health checkGET /sessions- List active sessionsGET /sessions/{id}/history- Get conversation historyGET /database/tables- List database tablesGET /database/schema- Get database schema
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests (
uv run pytest) - Run code quality checks (
uv run ruff check src/) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the excellent LLM framework
- LangGraph for agent workflow orchestration
- FastAPI for the modern web framework
- uv for fast Python package management
Bharat Raghunathan - bharatraghunthan9767@gmail.com
Project Link: https://github.com/bharatr21/text2sql