This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [1]:
import os

os.environ["OPENAI_API_KEY"] = "YOUR API KEY"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

# Step 1: Setting Up the Python Application

**Initialize a Python Project**: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (e.g., ChromaDB or LanceDB).

In [2]:
from langchain.llms import OpenAI
model_name = 'gpt-3.5-turbo'
llm = OpenAI(model_name=model_name, temperature=0)



# Step 2: Generating Real Estate Listings

Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties.

In [3]:
INSTRUCTION = "Generate 13 real estate listings."
Listing1 = \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

In [4]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List

class RealEstateListing(BaseModel):
    neighborhood: str = Field(description="Name of the neighborhood")
    price: NonNegativeInt = Field(description="Price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="Number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="Size of the property in square feet")
    description: str = Field(description="Description of the property.")   
    neighborhood_description: str = Field(description="Description of the neighborhood.")  

class ListingCollection(BaseModel):
    listing: List[RealEstateListing] = Field(description="List of available real estate")
        
parser = PydanticOutputParser(pydantic_object=ListingCollection)

Prompt Creation

In [5]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)
query = prompt.format(instruction = INSTRUCTION, sample = Listing1)
print(query)

Generate 13 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a 

In [6]:
output = llm(query)

In [7]:
output

'{\n  "listing": [\n    {\n      "neighborhood": "Green Oaks",\n      "price": 800000,\n      "bedrooms": 3,\n      "bathrooms": 2,\n      "house_size": 2000,\n      "description": "Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.",\n      "neighborhood_description": "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access t

In [8]:
listings = parser.parse(output)
listings

ListingCollection(listing=[RealEstateListing(neighborhood='Green Oaks', price=800000, bedrooms=3, bathrooms=2, house_size=2000, description='Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.', neighborhood_description='Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting i

Converting to DataFrame

In [10]:
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m68.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting tzdata>=2022.7
  Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m347.8/347.8 kB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tzdata, pandas
Successfully installed pandas-2.3.0 tzdata-2025.2


In [10]:
import pandas as pd
df = pd.DataFrame([listing.dict() for listing in listings.listing])

In [11]:
df

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Sunnyvale,950000,4,3,2500,"Beautiful 4-bedroom, 3-bathroom home in the de...","Sunnyvale is known for its top-rated schools, ..."
2,Lakeview,700000,2,1,1500,"Cozy 2-bedroom, 1-bathroom home in the peacefu...",Lakeview offers a tranquil setting with scenic...
3,Downtown,1200000,5,4,3500,"Luxurious 5-bedroom, 4-bathroom home in the he...",Downtown is a bustling urban neighborhood with...
4,Hillside,850000,3,2,2200,"Inviting 3-bedroom, 2-bathroom home in the sce...",Hillside offers stunning views of the mountain...
5,Waterfront,1500000,4,3,2800,"Stunning 4-bedroom, 3-bathroom waterfront home...",Waterfront is a prestigious neighborhood known...
6,Mountain View,1000000,3,2,2000,"Charming 3-bedroom, 2-bathroom home in the sou...",Mountain View is a picturesque neighborhood wi...
7,Oceanfront,2000000,5,4,4000,"Magnificent 5-bedroom, 4-bathroom oceanfront e...",Oceanfront is a prestigious neighborhood known...
8,Historic District,750000,4,3,2300,"Historic 4-bedroom, 3-bathroom home in the cha...",Historic District is a quaint neighborhood wit...
9,Riverside,900000,3,2,2100,"Riverside 3-bedroom, 2-bathroom home with a pr...",Riverside offers a tranquil setting with sceni...


In [12]:
df.to_csv('listings.csv',index_label = 'id')

# Step 3: Storing Listings in a Vector Database

Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.

Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [22]:
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

In [17]:
CHROMA_PATH = "chroma"

In [18]:
df = pd.read_csv('listings.csv')
df


Unnamed: 0,id,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,1,Sunnyvale,950000,4,3,2500,"Beautiful 4-bedroom, 3-bathroom home in the de...","Sunnyvale is known for its top-rated schools, ..."
2,2,Lakeview,700000,2,1,1500,"Cozy 2-bedroom, 1-bathroom home in the peacefu...",Lakeview offers a tranquil setting with scenic...
3,3,Downtown,1200000,5,4,3500,"Luxurious 5-bedroom, 4-bathroom home in the he...",Downtown is a bustling urban neighborhood with...
4,4,Hillside,850000,3,2,2200,"Inviting 3-bedroom, 2-bathroom home in the sce...",Hillside offers stunning views of the mountain...
5,5,Waterfront,1500000,4,3,2800,"Stunning 4-bedroom, 3-bathroom waterfront home...",Waterfront is a prestigious neighborhood known...
6,6,Mountain View,1000000,3,2,2000,"Charming 3-bedroom, 2-bathroom home in the sou...",Mountain View is a picturesque neighborhood wi...
7,7,Oceanfront,2000000,5,4,4000,"Magnificent 5-bedroom, 4-bathroom oceanfront e...",Oceanfront is a prestigious neighborhood known...
8,8,Historic District,750000,4,3,2300,"Historic 4-bedroom, 3-bathroom home in the cha...",Historic District is a quaint neighborhood wit...
9,9,Riverside,900000,3,2,2100,"Riverside 3-bedroom, 2-bathroom home with a pr...",Riverside offers a tranquil setting with sceni...


In [20]:
from langchain.schema import Document
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['description'], metadata={'id': str(index)}))

In [28]:
splitter = CharacterTextSplitter(chunk_size=300,chunk_overlap=100)
split_docs = splitter.split_documents(documents)

In [29]:
print(f"Split {len(documents)} documents into {len(split_docs)} chunks.")

Split 13 documents into 13 chunks.


In [31]:
print(split_docs[3])

page_content='Luxurious 5-bedroom, 4-bathroom home in the heart of Downtown. This elegant property features high-end finishes, a gourmet kitchen, and a rooftop terrace with city views. Live in style in this sophisticated Downtown residence.' metadata={'id': '3'}


In [32]:
print(split_docs[3].page_content)
print(split_docs[3].metadata)

Luxurious 5-bedroom, 4-bathroom home in the heart of Downtown. This elegant property features high-end finishes, a gourmet kitchen, and a rooftop terrace with city views. Live in style in this sophisticated Downtown residence.
{'id': '3'}


In [34]:
# Save to Chroma - backup
import shutil
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    split_docs, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Saved split docs.")

Saved split docs.


# Step 4: Building the User Preference Interface

Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language

Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

In [40]:
preferences = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

# Step 5: Searching Based on Preferences

Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
    

Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences

In [37]:
PROMPT_TEMPLATE = \
"""
Context: 

{context}

----

Provide the answer for the question: {question}
"""

In [49]:
from langchain.chat_models import ChatOpenAI
def get_top_listings(query, PROMPT_TEMPLATE) -> list[int]:
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    results = db.similarity_search_with_relevance_scores(query, k=5)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = PromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query)
        print(f"Generated Prompt:\n{prompt}")
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        sources = [doc.metadata.get("id", None) for doc, _score in results]
        formatted_response = f"Response: {response_text}\nSources: {sources}"
        print(formatted_response)

In [50]:
get_top_listings(preferences[2], PROMPT_TEMPLATE)

Generated Prompt:

Context: 

Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

---

Spacious 4-bedroom, 3-bathroom home in the lush Garden District. This beautiful property features a gourmet kitchen, formal dining room, and a landscaped backyard. Embrace the beauty of nature in this Garden District oasis.

---

Cozy 2-bedroom, 1-bathroom home in the peaceful neighborhood of Lakeview. This charming property features a fireplace, hardwood floors, and a private backyard. Perfect for first-time homebuyers or those looking t

# Step 6: Personalizing Listing Descriptions

LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [51]:
PERSONALIZED_PROMPT = \
"""
Context:

{context}

------
Provide a response which will not only answer for the buyer's question but also helps buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Question: {question}
"""

In [52]:
get_top_listings(preferences[2], PERSONALIZED_PROMPT)

Generated Prompt:

Context:

Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

---

Spacious 4-bedroom, 3-bathroom home in the lush Garden District. This beautiful property features a gourmet kitchen, formal dining room, and a landscaped backyard. Embrace the beauty of nature in this Garden District oasis.

---

Cozy 2-bedroom, 1-bathroom home in the peaceful neighborhood of Lakeview. This charming property features a fireplace, hardwood floors, and a private backyard. Perfect for first-time homebuyers or those looking to

# Step 7: Deliverables and Testing

In [62]:
def chat(query, PROMPT_TEMPLATE) -> list[int]:
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    results = db.similarity_search_with_relevance_scores(query, k=5)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = PromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query)
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        print(f"\n\n*Response*: {response_text}\n---------\n")

In [63]:
print("I am a personalized real estate chat agent. Please enter your query, or type 'exit' to end the chat.\n")
while True:
    user_input = input("*User Query*: ")
    if user_input.lower() == "exit": 
        print("Thank you!")
        break
    chat(user_input, PERSONALIZED_PROMPT)

I am a personalized real estate chat agent. Please enter your query, or type 'exit' to end the chat.

*User Query*: A comfortable three-bedroom house with a spacious kitchen and a cozy living room.


*Response*: The charming 3-bedroom, 2-bathroom home in the sought-after neighborhood of Mountain View would be perfect for you. This cozy property features an updated kitchen that is perfect for cooking and entertaining, as well as a cozy living room with hardwood floors for a warm and inviting atmosphere. Don't miss out on experiencing the beauty of Mountain View in this lovely home.
---------

*User Query*: A quiet neighborhood, good local schools, and convenient shopping options.


*Response*: Based on your preferences for a quiet neighborhood, good local schools, and convenient shopping options, I would highly recommend the charming 3-bedroom, 2-bathroom home in the peaceful neighborhood of Lakeview. This cozy property not only offers a tranquil setting but also features hardwood floor