🏠 Real Estate RAG Database Chat

A hands-on RAG (Retrieval-Augmented Generation) application that transforms natural language into SQL queries

Perfect for learning RAG, LangChain, and LLM-powered database interactions

🎯 What is This Project?

This is a complete RAG application that demonstrates how to build an intelligent system that:

Understands natural language questions
Generates SQL queries automatically
Executes queries on a real estate database
Returns human-friendly answers

Perfect for students learning:

🤖 Retrieval-Augmented Generation (RAG)
🔗 LangChain framework
💬 LLM prompt engineering
🗄️ SQL generation from natural language
🌐 Building full-stack AI applications

✨ Key Features

Feature	Description
🧠 Intelligent Query Generation	Converts natural language to SQL using GPT-4o-mini
💾 SQLite Database	Embedded database with real estate data (properties, agents, clients)
🎨 Integrated Frontend & Backend	Streamlit UI communicates with FastAPI REST API
🔄 RAG Pipeline	Complete RAG implementation with LangChain
📊 Real Data	Pre-seeded with sample real estate listings
🚀 Production Ready	Modular architecture, error handling, and best practices

🏗️ Architecture Overview

┌─────────────────┐
│   User Query    │  "Show me houses with 3 bedrooms"
└────────┬────────┘
         │
         ▼
┌──────────────────────┐
│  Streamlit Frontend  │  Web UI (http://localhost:8501)
└──────────┬───────────┘
           │ HTTP POST /api/query
           ▼
┌──────────────────────┐
│  FastAPI Backend     │  REST API (http://localhost:8000)
└──────────┬───────────┘
           │
           ▼
┌─────────────────────────────────────┐
│      LangChain RAG Pipeline         │
│  ┌───────────────────────────────┐  │
│  │  1. LLM generates SQL query   │  │
│  │     from natural language     │  │
│  └──────────────┬────────────────┘  │
│                 │                    │
│  ┌──────────────▼────────────────┐  │
│  │  2. Execute SQL on database   │  │
│  └──────────────┬────────────────┘  │
│                 │                    │
│  ┌──────────────▼────────────────┐  │
│  │  3. LLM formats results into  │  │
│  │     natural language answer   │  │
│  └──────────────┬────────────────┘  │
└─────────────────┼────────────────────┘
                  │
                  ▼
         ┌────────────────┐
         │  User-friendly │
         │     Answer     │
         └────────────────┘

🛠️ Tech Stack

Category	Technology	Purpose
🤖 AI/ML	LangChain	RAG pipeline orchestration
	OpenAI GPT-4o-mini	LLM for SQL generation & formatting
🌐 Backend	FastAPI	REST API server
💻 Frontend	Streamlit	Interactive web interface
	httpx	HTTP client for API communication
🗄️ Database	SQLite	Embedded database
	SQLAlchemy	ORM and database management
⚙️ Tools	uv	Fast Python package manager
	Python 3.11+	Programming language

📦 Project Structure

rag-database-chat/
├── app/
│   ├── __init__.py
│   ├── config.py              # ⚙️ Configuration & environment variables
│   ├── streamlit_app.py       # 🎨 Streamlit web interface
│   │
│   ├── api/
│   │   ├── __init__.py
│   │   └── main.py            # 🚀 FastAPI REST endpoints
│   │
│   ├── database/
│   │   ├── __init__.py
│   │   ├── models.py          # 📊 SQLAlchemy models (Property, Agent, Client)
│   │   ├── session.py         # 🔌 Database connection management
│   │   └── seed.py            # 🌱 Sample data seeding script
│   │
│   └── rag/
│       ├── __init__.py
│       └── chain.py           # 🧠 RAG pipeline implementation
│
├── pyproject.toml             # 📋 Dependencies & project config
├── .gitignore
└── README.md

🚀 Quick Start

Prerequisites

Python 3.11+ installed
OpenAI API Key (Get one here)
uv package manager (we'll install it if needed)

Installation Steps

1️⃣ Install uv (if needed)

curl -LsSf https://astral.sh/uv/install.sh | sh

2️⃣ Clone and Navigate

cd rag-database-chat

3️⃣ Install Dependencies

uv sync

This will create a virtual environment and install all required packages.

4️⃣ Configure Environment

Create a .env file in the root directory:

OPENAI_API_KEY=sk-your-api-key-here
DATABASE_URL=sqlite:///./real_estate.db
API_BASE_URL=http://localhost:8000

💡 Note: API_BASE_URL is optional and defaults to http://localhost:8000 if not specified.

💡 Tip: Never commit your .env file! It's already in .gitignore

5️⃣ Initialize Database

Seed the database with sample real estate data:

uv run python -m app.database.seed

You should see:

Database seeded successfully!
Created 3 agents, 6 properties, and 3 clients.

🎮 Usage

Step 1: Start the Backend (FastAPI)

First, start the FastAPI backend server:

uv run uvicorn app.api.main:app --reload

The API will be available at http://localhost:8000

⚠️ Important: Keep this terminal running. The Streamlit frontend requires the backend to be running to function properly.

Step 2: Start the Frontend (Streamlit)

In a new terminal, launch the interactive web interface:

uv run streamlit run app/streamlit_app.py

Then open your browser to http://localhost:8501

💡 Note: The Streamlit frontend communicates with the FastAPI backend via HTTP requests. Make sure the backend is running first, or you'll see connection errors in the UI.

Features:

💬 Chat interface for natural language queries
🔌 Backend connection status indicator
⚙️ Configurable API URL in sidebar
📊 Database initialization button
🎨 Clean, modern UI
📝 Chat history

Option: Using the REST API Directly

If you prefer to use the API directly without the Streamlit interface, you can interact with the FastAPI endpoints:

Interactive API Docs: Visit http://localhost:8000/docs for Swagger UI

API Endpoints

Method	Endpoint	Description
`GET`	`/`	API information
`GET`	`/health`	Health check
`POST`	`/api/query`	Query the database

Example API Request

curl -X POST "http://localhost:8000/api/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What properties are available?"}'

Response:

{
  "answer": "There are 4 available properties in the database..."
}

💡 Example Questions

Try asking these questions to see RAG in action:

Basic Queries

"What properties are available?"
"Show me all houses"
"List all agents"
"How many properties do we have?"

Filtered Queries

"Show me houses with 3 bedrooms"
"Find properties under $300,000"
"What properties are in Springfield?"
"Show me available apartments"

Aggregations

"What's the average price of properties?"
"What's the most expensive property?"
"How many properties does each agent have?"
"What's the total value of all properties?"

Complex Queries

"Show me properties with more than 2 bedrooms and price less than $300,000"
"Which agent has the most properties?"
"What's the price range of houses in Springfield?"

🧠 How RAG Works Here

Step-by-Step Process

User Input → Natural language question
```
"Show me houses with 3 bedrooms"
```

SQL Generation → LLM converts question to SQL

SELECT * FROM properties 
WHERE property_type = 'house' 
AND bedrooms = 3;

Query Execution → SQL runs on SQLite database
```
Returns: 2 properties matching criteria
```

Result Formatting → LLM formats results naturally

"I found 2 houses with 3 bedrooms: 
123 Oak Street ($250,000) and 987 Birch Boulevard ($280,000)..."

Key Components

app/streamlit_app.py: Streamlit frontend
- Interactive chat interface
- HTTP client for backend communication
- Connection status monitoring
app/api/main.py: FastAPI backend
- REST API endpoints (/api/query, /health)
- Request/response handling
- CORS middleware configuration
app/rag/chain.py: Core RAG implementation
- SQL query generation using LLM
- Database schema awareness
- Natural language result formatting
app/database/models.py: Database schema
- Properties, Agents, Clients tables
- Relationships and constraints

📊 Database Schema

Properties Table

- id (Primary Key)
- address, city, state, zip_code
- property_type (house, apartment, condo)
- bedrooms, bathrooms, square_feet
- price, status (available, sold, pending)
- description, year_built, lot_size
- agent_id (Foreign Key)
- created_at, updated_at

Agents Table

- id (Primary Key)
- name, email, phone
- license_number (Unique)
- created_at

Clients Table

- id (Primary Key)
- name, email, phone
- budget_min, budget_max
- preferred_location
- agent_id (Foreign Key)
- created_at

🎓 Learning Objectives

By exploring this project, you'll learn:

✅ RAG Fundamentals

How to combine retrieval (database queries) with generation (LLM)
Building end-to-end RAG pipelines

✅ LangChain Patterns

Creating custom chains
Prompt engineering
LLM integration

✅ SQL Generation

Converting natural language to SQL
Handling database schemas
Error handling in query generation

✅ Full-Stack AI Apps

Building APIs for AI services
Creating interactive UIs
Frontend-backend communication
Managing state and sessions

✅ Best Practices

Modular code organization
Environment configuration
Error handling
Type hints and documentation

🔧 Development

Running Tests

uv run pytest

Code Formatting

uv run ruff format .
uv run ruff check .

Project Scripts

# Run Streamlit app
./run_streamlit.sh

# Run FastAPI server
./run_api.sh

🤔 Common Questions

Q: Why SQLite instead of PostgreSQL/MySQL?
A: SQLite is perfect for learning - no setup required, embedded, and works great for small-medium datasets.

Q: Can I use a different LLM?
A: Yes! LangChain supports many providers. Just change the LLM initialization in app/rag/chain.py.

Q: How do I add more data?
A: Modify app/database/seed.py or use SQLAlchemy to insert data programmatically.

Q: Is this production-ready?
A: This is a learning project. For production, add authentication, rate limiting, logging, and monitoring.

📚 Additional Resources

📝 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Built with ❤️ for students learning AI and generative models.

Happy Learning! 🚀

Made with ❤️ for the AI learning community

⭐ Star this repo if you found it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ragsql.png		ragsql.png
run_api.sh		run_api.sh
run_streamlit.sh		run_streamlit.sh
uv.lock		uv.lock

License

JaimeLucena/rag-database-chat

Folders and files

Latest commit

History

Repository files navigation