A hands-on RAG (Retrieval-Augmented Generation) application that transforms natural language into SQL queries
Perfect for learning RAG, LangChain, and LLM-powered database interactions
This is a complete RAG application that demonstrates how to build an intelligent system that:
- Understands natural language questions
- Generates SQL queries automatically
- Executes queries on a real estate database
- Returns human-friendly answers
Perfect for students learning:
- 🤖 Retrieval-Augmented Generation (RAG)
- 🔗 LangChain framework
- 💬 LLM prompt engineering
- 🗄️ SQL generation from natural language
- 🌐 Building full-stack AI applications
| Feature | Description |
|---|---|
| 🧠 Intelligent Query Generation | Converts natural language to SQL using GPT-4o-mini |
| 💾 SQLite Database | Embedded database with real estate data (properties, agents, clients) |
| 🎨 Integrated Frontend & Backend | Streamlit UI communicates with FastAPI REST API |
| 🔄 RAG Pipeline | Complete RAG implementation with LangChain |
| 📊 Real Data | Pre-seeded with sample real estate listings |
| 🚀 Production Ready | Modular architecture, error handling, and best practices |
┌─────────────────┐
│ User Query │ "Show me houses with 3 bedrooms"
└────────┬────────┘
│
▼
┌──────────────────────┐
│ Streamlit Frontend │ Web UI (http://localhost:8501)
└──────────┬───────────┘
│ HTTP POST /api/query
▼
┌──────────────────────┐
│ FastAPI Backend │ REST API (http://localhost:8000)
└──────────┬───────────┘
│
▼
┌─────────────────────────────────────┐
│ LangChain RAG Pipeline │
│ ┌───────────────────────────────┐ │
│ │ 1. LLM generates SQL query │ │
│ │ from natural language │ │
│ └──────────────┬────────────────┘ │
│ │ │
│ ┌──────────────▼────────────────┐ │
│ │ 2. Execute SQL on database │ │
│ └──────────────┬────────────────┘ │
│ │ │
│ ┌──────────────▼────────────────┐ │
│ │ 3. LLM formats results into │ │
│ │ natural language answer │ │
│ └──────────────┬────────────────┘ │
└─────────────────┼────────────────────┘
│
▼
┌────────────────┐
│ User-friendly │
│ Answer │
└────────────────┘
| Category | Technology | Purpose |
|---|---|---|
| 🤖 AI/ML | LangChain | RAG pipeline orchestration |
| OpenAI GPT-4o-mini | LLM for SQL generation & formatting | |
| 🌐 Backend | FastAPI | REST API server |
| 💻 Frontend | Streamlit | Interactive web interface |
| httpx | HTTP client for API communication | |
| 🗄️ Database | SQLite | Embedded database |
| SQLAlchemy | ORM and database management | |
| ⚙️ Tools | uv | Fast Python package manager |
| Python 3.11+ | Programming language |
rag-database-chat/
├── app/
│ ├── __init__.py
│ ├── config.py # ⚙️ Configuration & environment variables
│ ├── streamlit_app.py # 🎨 Streamlit web interface
│ │
│ ├── api/
│ │ ├── __init__.py
│ │ └── main.py # 🚀 FastAPI REST endpoints
│ │
│ ├── database/
│ │ ├── __init__.py
│ │ ├── models.py # 📊 SQLAlchemy models (Property, Agent, Client)
│ │ ├── session.py # 🔌 Database connection management
│ │ └── seed.py # 🌱 Sample data seeding script
│ │
│ └── rag/
│ ├── __init__.py
│ └── chain.py # 🧠 RAG pipeline implementation
│
├── pyproject.toml # 📋 Dependencies & project config
├── .gitignore
└── README.md
- Python 3.11+ installed
- OpenAI API Key (Get one here)
- uv package manager (we'll install it if needed)
curl -LsSf https://astral.sh/uv/install.sh | shcd rag-database-chatuv syncThis will create a virtual environment and install all required packages.
Create a .env file in the root directory:
OPENAI_API_KEY=sk-your-api-key-here
DATABASE_URL=sqlite:///./real_estate.db
API_BASE_URL=http://localhost:8000💡 Note:
API_BASE_URLis optional and defaults tohttp://localhost:8000if not specified.
💡 Tip: Never commit your
.envfile! It's already in.gitignore
Seed the database with sample real estate data:
uv run python -m app.database.seedYou should see:
Database seeded successfully!
Created 3 agents, 6 properties, and 3 clients.
First, start the FastAPI backend server:
uv run uvicorn app.api.main:app --reloadThe API will be available at http://localhost:8000
⚠️ Important: Keep this terminal running. The Streamlit frontend requires the backend to be running to function properly.
In a new terminal, launch the interactive web interface:
uv run streamlit run app/streamlit_app.pyThen open your browser to http://localhost:8501
💡 Note: The Streamlit frontend communicates with the FastAPI backend via HTTP requests. Make sure the backend is running first, or you'll see connection errors in the UI.
Features:
- 💬 Chat interface for natural language queries
- 🔌 Backend connection status indicator
- ⚙️ Configurable API URL in sidebar
- 📊 Database initialization button
- 🎨 Clean, modern UI
- 📝 Chat history
If you prefer to use the API directly without the Streamlit interface, you can interact with the FastAPI endpoints:
Interactive API Docs: Visit http://localhost:8000/docs for Swagger UI
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
API information |
GET |
/health |
Health check |
POST |
/api/query |
Query the database |
curl -X POST "http://localhost:8000/api/query" \
-H "Content-Type: application/json" \
-d '{"question": "What properties are available?"}'Response:
{
"answer": "There are 4 available properties in the database..."
}Try asking these questions to see RAG in action:
"What properties are available?""Show me all houses""List all agents""How many properties do we have?"
"Show me houses with 3 bedrooms""Find properties under $300,000""What properties are in Springfield?""Show me available apartments"
"What's the average price of properties?""What's the most expensive property?""How many properties does each agent have?""What's the total value of all properties?"
"Show me properties with more than 2 bedrooms and price less than $300,000""Which agent has the most properties?""What's the price range of houses in Springfield?"
-
User Input → Natural language question
"Show me houses with 3 bedrooms" -
SQL Generation → LLM converts question to SQL
SELECT * FROM properties WHERE property_type = 'house' AND bedrooms = 3;
-
Query Execution → SQL runs on SQLite database
Returns: 2 properties matching criteria -
Result Formatting → LLM formats results naturally
"I found 2 houses with 3 bedrooms: 123 Oak Street ($250,000) and 987 Birch Boulevard ($280,000)..."
-
app/streamlit_app.py: Streamlit frontend- Interactive chat interface
- HTTP client for backend communication
- Connection status monitoring
-
app/api/main.py: FastAPI backend- REST API endpoints (
/api/query,/health) - Request/response handling
- CORS middleware configuration
- REST API endpoints (
-
app/rag/chain.py: Core RAG implementation- SQL query generation using LLM
- Database schema awareness
- Natural language result formatting
-
app/database/models.py: Database schema- Properties, Agents, Clients tables
- Relationships and constraints
- id (Primary Key)
- address, city, state, zip_code
- property_type (house, apartment, condo)
- bedrooms, bathrooms, square_feet
- price, status (available, sold, pending)
- description, year_built, lot_size
- agent_id (Foreign Key)
- created_at, updated_at- id (Primary Key)
- name, email, phone
- license_number (Unique)
- created_at- id (Primary Key)
- name, email, phone
- budget_min, budget_max
- preferred_location
- agent_id (Foreign Key)
- created_atBy exploring this project, you'll learn:
✅ RAG Fundamentals
- How to combine retrieval (database queries) with generation (LLM)
- Building end-to-end RAG pipelines
✅ LangChain Patterns
- Creating custom chains
- Prompt engineering
- LLM integration
✅ SQL Generation
- Converting natural language to SQL
- Handling database schemas
- Error handling in query generation
✅ Full-Stack AI Apps
- Building APIs for AI services
- Creating interactive UIs
- Frontend-backend communication
- Managing state and sessions
✅ Best Practices
- Modular code organization
- Environment configuration
- Error handling
- Type hints and documentation
uv run pytestuv run ruff format .
uv run ruff check .# Run Streamlit app
./run_streamlit.sh
# Run FastAPI server
./run_api.shQ: Why SQLite instead of PostgreSQL/MySQL?
A: SQLite is perfect for learning - no setup required, embedded, and works great for small-medium datasets.
Q: Can I use a different LLM?
A: Yes! LangChain supports many providers. Just change the LLM initialization in app/rag/chain.py.
Q: How do I add more data?
A: Modify app/database/seed.py or use SQLAlchemy to insert data programmatically.
Q: Is this production-ready?
A: This is a learning project. For production, add authentication, rate limiting, logging, and monitoring.
- LangChain Documentation
- FastAPI Documentation
- Streamlit Documentation
- OpenAI API Reference
- RAG Paper (Original)
MIT License - see LICENSE file for details
Built with ❤️ for students learning AI and generative models.
Happy Learning! 🚀
Made with ❤️ for the AI learning community
⭐ Star this repo if you found it helpful!
