Skip to content

A hands-on RAG application that converts natural language questions into SQL queries for a real estate database. Built with LangChain, FastAPI, Streamlit, and OpenAI GPT-4o-mini. Perfect for learning Retrieval-Augmented Generation and LLM-powered database interactions.

License

Notifications You must be signed in to change notification settings

JaimeLucena/rag-database-chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Banner

🏠 Real Estate RAG Database Chat

A hands-on RAG (Retrieval-Augmented Generation) application that transforms natural language into SQL queries

Python LangChain OpenAI FastAPI Streamlit

Perfect for learning RAG, LangChain, and LLM-powered database interactions


🎯 What is This Project?

This is a complete RAG application that demonstrates how to build an intelligent system that:

  • Understands natural language questions
  • Generates SQL queries automatically
  • Executes queries on a real estate database
  • Returns human-friendly answers

Perfect for students learning:

  • 🤖 Retrieval-Augmented Generation (RAG)
  • 🔗 LangChain framework
  • 💬 LLM prompt engineering
  • 🗄️ SQL generation from natural language
  • 🌐 Building full-stack AI applications

✨ Key Features

Feature Description
🧠 Intelligent Query Generation Converts natural language to SQL using GPT-4o-mini
💾 SQLite Database Embedded database with real estate data (properties, agents, clients)
🎨 Integrated Frontend & Backend Streamlit UI communicates with FastAPI REST API
🔄 RAG Pipeline Complete RAG implementation with LangChain
📊 Real Data Pre-seeded with sample real estate listings
🚀 Production Ready Modular architecture, error handling, and best practices

🏗️ Architecture Overview

┌─────────────────┐
│   User Query    │  "Show me houses with 3 bedrooms"
└────────┬────────┘
         │
         ▼
┌──────────────────────┐
│  Streamlit Frontend  │  Web UI (http://localhost:8501)
└──────────┬───────────┘
           │ HTTP POST /api/query
           ▼
┌──────────────────────┐
│  FastAPI Backend     │  REST API (http://localhost:8000)
└──────────┬───────────┘
           │
           ▼
┌─────────────────────────────────────┐
│      LangChain RAG Pipeline         │
│  ┌───────────────────────────────┐  │
│  │  1. LLM generates SQL query   │  │
│  │     from natural language     │  │
│  └──────────────┬────────────────┘  │
│                 │                    │
│  ┌──────────────▼────────────────┐  │
│  │  2. Execute SQL on database   │  │
│  └──────────────┬────────────────┘  │
│                 │                    │
│  ┌──────────────▼────────────────┐  │
│  │  3. LLM formats results into  │  │
│  │     natural language answer   │  │
│  └──────────────┬────────────────┘  │
└─────────────────┼────────────────────┘
                  │
                  ▼
         ┌────────────────┐
         │  User-friendly │
         │     Answer     │
         └────────────────┘

🛠️ Tech Stack

Category Technology Purpose
🤖 AI/ML LangChain RAG pipeline orchestration
OpenAI GPT-4o-mini LLM for SQL generation & formatting
🌐 Backend FastAPI REST API server
💻 Frontend Streamlit Interactive web interface
httpx HTTP client for API communication
🗄️ Database SQLite Embedded database
SQLAlchemy ORM and database management
⚙️ Tools uv Fast Python package manager
Python 3.11+ Programming language

📦 Project Structure

rag-database-chat/
├── app/
│   ├── __init__.py
│   ├── config.py              # ⚙️ Configuration & environment variables
│   ├── streamlit_app.py       # 🎨 Streamlit web interface
│   │
│   ├── api/
│   │   ├── __init__.py
│   │   └── main.py            # 🚀 FastAPI REST endpoints
│   │
│   ├── database/
│   │   ├── __init__.py
│   │   ├── models.py          # 📊 SQLAlchemy models (Property, Agent, Client)
│   │   ├── session.py         # 🔌 Database connection management
│   │   └── seed.py            # 🌱 Sample data seeding script
│   │
│   └── rag/
│       ├── __init__.py
│       └── chain.py           # 🧠 RAG pipeline implementation
│
├── pyproject.toml             # 📋 Dependencies & project config
├── .gitignore
└── README.md

🚀 Quick Start

Prerequisites

  • Python 3.11+ installed
  • OpenAI API Key (Get one here)
  • uv package manager (we'll install it if needed)

Installation Steps

1️⃣ Install uv (if needed)

curl -LsSf https://astral.sh/uv/install.sh | sh

2️⃣ Clone and Navigate

cd rag-database-chat

3️⃣ Install Dependencies

uv sync

This will create a virtual environment and install all required packages.

4️⃣ Configure Environment

Create a .env file in the root directory:

OPENAI_API_KEY=sk-your-api-key-here
DATABASE_URL=sqlite:///./real_estate.db
API_BASE_URL=http://localhost:8000

💡 Note: API_BASE_URL is optional and defaults to http://localhost:8000 if not specified.

💡 Tip: Never commit your .env file! It's already in .gitignore

5️⃣ Initialize Database

Seed the database with sample real estate data:

uv run python -m app.database.seed

You should see:

Database seeded successfully!
Created 3 agents, 6 properties, and 3 clients.

🎮 Usage

Step 1: Start the Backend (FastAPI)

First, start the FastAPI backend server:

uv run uvicorn app.api.main:app --reload

The API will be available at http://localhost:8000

⚠️ Important: Keep this terminal running. The Streamlit frontend requires the backend to be running to function properly.

Step 2: Start the Frontend (Streamlit)

In a new terminal, launch the interactive web interface:

uv run streamlit run app/streamlit_app.py

Then open your browser to http://localhost:8501

💡 Note: The Streamlit frontend communicates with the FastAPI backend via HTTP requests. Make sure the backend is running first, or you'll see connection errors in the UI.

Features:

  • 💬 Chat interface for natural language queries
  • 🔌 Backend connection status indicator
  • ⚙️ Configurable API URL in sidebar
  • 📊 Database initialization button
  • 🎨 Clean, modern UI
  • 📝 Chat history

Option: Using the REST API Directly

If you prefer to use the API directly without the Streamlit interface, you can interact with the FastAPI endpoints:

Interactive API Docs: Visit http://localhost:8000/docs for Swagger UI

API Endpoints

Method Endpoint Description
GET / API information
GET /health Health check
POST /api/query Query the database

Example API Request

curl -X POST "http://localhost:8000/api/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What properties are available?"}'

Response:

{
  "answer": "There are 4 available properties in the database..."
}

💡 Example Questions

Try asking these questions to see RAG in action:

Basic Queries

  • "What properties are available?"
  • "Show me all houses"
  • "List all agents"
  • "How many properties do we have?"

Filtered Queries

  • "Show me houses with 3 bedrooms"
  • "Find properties under $300,000"
  • "What properties are in Springfield?"
  • "Show me available apartments"

Aggregations

  • "What's the average price of properties?"
  • "What's the most expensive property?"
  • "How many properties does each agent have?"
  • "What's the total value of all properties?"

Complex Queries

  • "Show me properties with more than 2 bedrooms and price less than $300,000"
  • "Which agent has the most properties?"
  • "What's the price range of houses in Springfield?"

🧠 How RAG Works Here

Step-by-Step Process

  1. User Input → Natural language question

    "Show me houses with 3 bedrooms"
    
  2. SQL Generation → LLM converts question to SQL

    SELECT * FROM properties 
    WHERE property_type = 'house' 
    AND bedrooms = 3;
  3. Query Execution → SQL runs on SQLite database

    Returns: 2 properties matching criteria
    
  4. Result Formatting → LLM formats results naturally

    "I found 2 houses with 3 bedrooms: 
    123 Oak Street ($250,000) and 987 Birch Boulevard ($280,000)..."
    

Key Components

  • app/streamlit_app.py: Streamlit frontend

    • Interactive chat interface
    • HTTP client for backend communication
    • Connection status monitoring
  • app/api/main.py: FastAPI backend

    • REST API endpoints (/api/query, /health)
    • Request/response handling
    • CORS middleware configuration
  • app/rag/chain.py: Core RAG implementation

    • SQL query generation using LLM
    • Database schema awareness
    • Natural language result formatting
  • app/database/models.py: Database schema

    • Properties, Agents, Clients tables
    • Relationships and constraints

📊 Database Schema

Properties Table

- id (Primary Key)
- address, city, state, zip_code
- property_type (house, apartment, condo)
- bedrooms, bathrooms, square_feet
- price, status (available, sold, pending)
- description, year_built, lot_size
- agent_id (Foreign Key)
- created_at, updated_at

Agents Table

- id (Primary Key)
- name, email, phone
- license_number (Unique)
- created_at

Clients Table

- id (Primary Key)
- name, email, phone
- budget_min, budget_max
- preferred_location
- agent_id (Foreign Key)
- created_at

🎓 Learning Objectives

By exploring this project, you'll learn:

RAG Fundamentals

  • How to combine retrieval (database queries) with generation (LLM)
  • Building end-to-end RAG pipelines

LangChain Patterns

  • Creating custom chains
  • Prompt engineering
  • LLM integration

SQL Generation

  • Converting natural language to SQL
  • Handling database schemas
  • Error handling in query generation

Full-Stack AI Apps

  • Building APIs for AI services
  • Creating interactive UIs
  • Frontend-backend communication
  • Managing state and sessions

Best Practices

  • Modular code organization
  • Environment configuration
  • Error handling
  • Type hints and documentation

🔧 Development

Running Tests

uv run pytest

Code Formatting

uv run ruff format .
uv run ruff check .

Project Scripts

# Run Streamlit app
./run_streamlit.sh

# Run FastAPI server
./run_api.sh

🤔 Common Questions

Q: Why SQLite instead of PostgreSQL/MySQL?
A: SQLite is perfect for learning - no setup required, embedded, and works great for small-medium datasets.

Q: Can I use a different LLM?
A: Yes! LangChain supports many providers. Just change the LLM initialization in app/rag/chain.py.

Q: How do I add more data?
A: Modify app/database/seed.py or use SQLAlchemy to insert data programmatically.

Q: Is this production-ready?
A: This is a learning project. For production, add authentication, rate limiting, logging, and monitoring.


📚 Additional Resources


📝 License

MIT License - see LICENSE file for details


🙏 Acknowledgments

Built with ❤️ for students learning AI and generative models.

Happy Learning! 🚀


Made with ❤️ for the AI learning community

Star this repo if you found it helpful!

About

A hands-on RAG application that converts natural language questions into SQL queries for a real estate database. Built with LangChain, FastAPI, Streamlit, and OpenAI GPT-4o-mini. Perfect for learning Retrieval-Augmented Generation and LLM-powered database interactions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published