This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

# Building Generative AI Solutions
##  - 🏠 A Personalized HomeMatch Real Estate Q&A System

This notebook loads real estate listings, converts them into vector embeddings using OpenAI, and allows querying using natural language.

It uses:
- `real_estate_listings.csv` as data source.
- OpenAI `text-embedding-ada-002` model.
- `ChromaDB` for vector search.

We'll answer questions like:
- "Which neighborhoods have 4-bedroom homes under $600,000?"
- "Tell me about homes in Park Slope."

In [1]:
import os
import pandas as pd
import openai
import logging
import chromadb
from chromadb.utils import embedding_functions

## 🔐 Step 1: Set up OpenAI API & Logging

We define the API key and base URL (Vocareum-specific) and configure logging to track progress.

In [None]:
# Vocareum OpenAI setup
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
os.environ["OPENAI_API_BASE"] = "YOUR OPENAI BASE"
openai.api_key = os.environ["OPENAI_API_KEY"]
openai.api_base = os.environ["OPENAI_API_BASE"]

# Logging setup
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.info("Starting the script.")

INFO:__main__:Starting the script.


## Step 2: Load and Prepare Real Estate Listings

We load the CSV and combine relevant columns into a searchable text format (`full_text`), which will be used for embedding.

In [3]:
df = pd.read_csv("./real_estate_listings.csv")

# Combine key columns into a single document
df["full_text"] = (
    "Neighborhood: " + df["Neighborhood"] + "\n" +
    "Price: " + df["Price"].astype(str) + "\n" +
    "Bedrooms: " + df["Bedrooms"].astype(str) + "\n" +
    "Bathrooms: " + df["Bathrooms"].astype(str) + "\n" +
    "Size: " + df["House Size"] + "\n" +
    df["Description"] + "\n" +
    df["Neighborhood Description"]
)

df[["Neighborhood", "Price", "Bedrooms", "Bathrooms", "House Size", "full_text"]].head()

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House Size,full_text
0,Pinecrest Estates,"$550,000",4,3.0,"2,500 sqft","Neighborhood: Pinecrest Estates\nPrice: $550,0..."
1,Westchester Estates,"$750,000",4,3.0,"2,500 sqft",Neighborhood: Westchester Estates\nPrice: $750...
2,Park Slope,"$1,200,000",3,2.5,"2,000 sqft","Neighborhood: Park Slope\nPrice: $1,200,000\nB..."
3,Green Hills,"$800,000",4,3.0,"2,500 sqft","Neighborhood: Green Hills\nPrice: $800,000\nBe..."
4,Willow Creek,"$550,000",4,3.0,"2,500 sqft","Neighborhood: Willow Creek\nPrice: $550,000\nB..."


## 🔍 Step 3: Initialize ChromaDB and Embedding Function

We use OpenAI’s `text-embedding-ada-002` model via ChromaDB for fast vector-based search.

In [4]:
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    api_base=os.environ["OPENAI_API_BASE"],
    model_name="text-embedding-ada-002"
)

chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="real_estate", embedding_function=embedding_fn)

INFO:chromadb.telemetry.posthog:Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.


## 🧠 Step 4: Add Documents to Vector Store

We embed each listing’s full description and store it in ChromaDB for similarity-based retrieval.

In [5]:
for idx, row in df.iterrows():
    collection.add(
        documents=[row["full_text"]],
        metadatas=[{
            "Neighborhood": row["Neighborhood"],
            "Price": row["Price"],
            "Bedrooms": row["Bedrooms"],
            "Bathrooms": row["Bathrooms"],
            "Size": row["House Size"]
        }],
        ids=[str(idx)]
    )

logger.info("Embeddings added to ChromaDB.")

INFO:__main__:Embeddings added to ChromaDB.


## ❓ Step 5: Ask Natural Language Questions

We'll now query ChromaDB using natural language questions.

In [6]:
questions = [
    "Which neighborhoods have 4-bedroom homes under $600,000?",
    "Tell me about homes in Park Slope.",
    "Are there any listings with at least 3 bathrooms?",
    "Which neighborhood is best for families?"
]

for question in questions:
    logger.info(f"\nQUESTION: {question}")
    results = collection.query(query_texts=[question], n_results=3)
    for i, doc in enumerate(results["documents"][0]):
        meta = results["metadatas"][0][i]
        logger.info(f"Answer {i+1}:\n{doc}\nMetadata: {meta}\n")

INFO:__main__:
QUESTION: Which neighborhoods have 4-bedroom homes under $600,000?
INFO:__main__:Answer 1:
Neighborhood: Green Hills
Price: $800,000
Bedrooms: 4
Bathrooms: 3.0
Size: 2,500 sqft
Welcome to this beautiful 4-bedroom, 3-bathroom home located in the sought-after Green Hills neighborhood. This updated property features a spacious living area with hardwood floors, a modern kitchen with stainless steel appliances, and a cozy fireplace in the family room. The master suite offers a large walk-in closet and a luxurious en-suite bathroom. Outside, the backyard is perfect for entertaining with a patio and lush landscaping. 
Green Hills is known for its upscale residential homes, top-rated schools, and proximity to trendy shopping, dining, and entertainment options. Residents enjoy the convenience of the nearby Green Hills Mall, restaurants, and parks. This family-friendly neighborhood also boasts tree-lined streets and a sense of community, making it a desirable place to call home.
M

INFO:__main__:Answer 2:
Neighborhood: Oak Park
Price: $650,000
Bedrooms: 3
Bathrooms: 2.0
Size: 2,000 sqft
Welcome to this charming 3-bedroom, 2-bathroom home located in the desirable Oak Park neighborhood. This meticulously maintained property features a spacious living room with a cozy fireplace, a sunlit dining area, and a beautifully updated kitchen with stainless steel appliances and granite countertops. The master bedroom offers a walk-in closet and an en-suite bathroom with a dual vanity. The backyard is perfect for entertaining with a patio area and lush landscaping. Additionally, this home includes a 2-car garage and a separate laundry room. 
Oak Park is known for its family-friendly atmosphere, top-rated schools, and proximity to parks and hiking trails. Residents enjoy easy access to shopping centers, restaurants, and entertainment options. With tree-lined streets and a strong sense of community, Oak Park is a perfect place to call home.
Metadata: {'Bathrooms': 2.0, 'Bedroom

## ✅ Summary

- Loaded listings from CSV
- Embedded using OpenAI Embeddings
- Stored in ChromaDB
- Queried using natural language

You can extend this with filters, a chatbot UI (e.g. Streamlit), or even ranking results by user preferences.