# HomeMatch: The AI Personalized Real Estate Agent 


## *About*

This application leverages large language models (LLMs) and vector databases to transform standard real estate listings into personalized narratives that resonate with potential buyers' unique preferences and needs.

## *High level process description*


#### <u>Step 1: Setting Up the Python Application</u>

Install necessary packages: 
- LLM library: OpenAI's GPT
- LangChain
- Vector database package: ChromaDB

#### <u>Step 2: Generating Real Estate Listings</u>

Generate real estate listings using a Large Language Model through creating prompts for the LLM to produce descriptions of various properties. 

These listings will be used to populate the database for testing and development of the HomeMatch application.

#### <u>Step 3: Storing Listings in a Vector Database</u>

Initialize and configure ChromaDB vector database to store the real estate listings, the convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

#### <u>Step 4: Building the User Preference Interface</u>

Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions. The questions/answers can be hard-coded in the buyer preferences.

Implement logic to interpret and structure these preferences for querying the vector database.

#### <u>Step 5: Searching Based on Preferences</u>

Implement Semantic Search: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

#### <u>Step 6: Personalizing Listing Descriptions</u>

LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.


# Imports

In [1]:
from langchain.llms import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.memory import ConversationSummaryMemory, ConversationBufferMemory, CombinedMemory, ChatMessageHistory
from langchain.chains import ConversationChain

import pandas as pd
import json
import os
from typing import Any, Dict



# 1) Generate Real Estate Listings with a LLM

In [None]:
credentials = {}
try:
    with open('credentials.json') as file:
        credentials = json.load(file)
except FileNotFoundError:
    print("Error: file credentials.json was not found.")

api_key = credentials['OpenAIAPIKey']
print(api_key)

os.environ['OPENAI_API_KEY'] = api_key


In [3]:
model="gpt-3.5-turbo"
temperature = 0.0

llm = OpenAI(
    model_name=model, 
    temperature=temperature, 
    max_tokens=4000, 
)

prompt_template='''
Generate a CSV file that contains {num_listings} unique property listings with each listing tabulating the following attributes:

1- Neighborhood: Specify the name of the neighborhood where the property is located.
2- Price: Specify the property's price.
3- Bedrooms: Specify the number bedrooms.
4- Bathrooms: Specify the property's bathrooms.
5- House Size: Specify the property's square footage.
6- Description: Craft a distinguished description of the property that showcases its appeal and charm, and lists features such as: a new roof, an upgraded kitchen, energy efficient appliances, solar roof, water or mountain views, car garage, fireplace, patio, deck, large backyard, garden.
7- Neighborhood Description: Craft a description of the neighborhood and what it offers in terms of amenities and community such as: bike-friendly roads, parks, public gardens, restaurants, organic stores, easy access to highways, bus or train transporation, low noise levels.

Here is a sample listing entry format with the header:
[Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description],
[Green Oaks,"$800,000",3,2,"2,000 sqft","Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.","Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."],
'''

prompt = PromptTemplate.from_template(prompt_template)

listings = llm(prompt.format(num_listings = 15))
print(listings)



  warn_deprecated(


Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description
Willow Creek,"$750,000",4,3,"2,500 sqft","Step into this spacious 4-bedroom, 3-bathroom home located in the serene neighborhood of Willow Creek. This property features a newly renovated kitchen with stainless steel appliances, a cozy fireplace in the living room, and a large deck overlooking the lush backyard. With ample natural light and a functional layout, this home is perfect for families looking for comfort and style.","Willow Creek offers a peaceful setting with tree-lined streets and easy access to parks and walking trails. Enjoy the convenience of nearby shopping centers and restaurants, as well as top-rated schools in the area."
Sunset Hills,"$900,000",5,4,"3,000 sqft","Welcome to this stunning 5-bedroom, 4-bathroom home in the prestigious neighborhood of Sunset Hills. This property boasts a grand entrance, high ceilings, and a gourmet kitchen with granite countertops and top-of-the-line appl

In [4]:
# Save the resulting listings in a csv

with open('listings.csv','w') as file:
    for line in listings:
        file.write(line)


In [5]:
df=pd.read_csv('listings.csv')
df.head()

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description
0,Willow Creek,"$750,000",4,3,"2,500 sqft","Step into this spacious 4-bedroom, 3-bathroom ...",Willow Creek offers a peaceful setting with tr...
1,Sunset Hills,"$900,000",5,4,"3,000 sqft","Welcome to this stunning 5-bedroom, 4-bathroom...",Sunset Hills is known for its upscale living a...
2,Riverfront Estates,"$1,200,000",6,5,"4,500 sqft","Luxury awaits in this 6-bedroom, 5-bathroom ho...",Riverfront Estates offers a luxurious lifestyl...
3,Mountain View Terrace,"$650,000",3,2,"1,800 sqft","Discover this charming 3-bedroom, 2-bathroom h...",Mountain View Terrace is a picturesque neighbo...
4,Downtown Loft District,"$500,000",2,1,"1,200 sqft",Live in the heart of the city in this stylish ...,Downtown Loft District is a bustling urban nei...


# 2) Semantic Search

## a) Create a Vector Database and Store the Listings

In [6]:
# Load the CSV document
file_path = "listings.csv"
loader = CSVLoader(file_path=file_path)
docs = loader.load()
# print docs

In [7]:
# Use a Text Splitter to split the documents into chunks
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
split_docs = splitter.split_documents(docs)

In [8]:
# Initialize the embeddings model
embeddings = OpenAIEmbeddings()

  warn_deprecated(


In [9]:
# Populate the vector database with the chunks
db = Chroma.from_documents(split_docs, embeddings)

In [10]:
# Define the LLM
model_name = "gpt-3.5-turbo"
llm = OpenAI(model_name=model_name, temperature=0, max_tokens=2000)



## b) Build the Semantic Search of Listings Based on Buyer's Preferences

In [19]:
# Simulate a buyer's questions and answers

questions = [
    "How big do you want your house to be?"
    "What are 3 most important things for you in choosing this property?",
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters.",
]
    

In [20]:
history = ChatMessageHistory()
history.add_user_message(f"""You are AI that will recommend user a new home based on their answers to questions about their home preferences. Ask the user {len(questions)} questions""")
for i in range(len(questions)):
    history.add_ai_message(questions[i])
    history.add_user_message(answers[i])

In [21]:
max_rating = 100

summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="recommendation_summary",
    input_key="input",
    buffer=f"The human answered {len(questions)} personal questions. Use them to rate, from 1 to {max_rating}, how much they like a home recommendation.",
    return_messages=True
)

class MementoBufferMemory(ConversationBufferMemory):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        input_str, output_str = self._get_input_output(inputs, outputs)
        self.chat_memory.add_ai_message(output_str)
        
conversational_memory = MementoBufferMemory(
    chat_memory=history,
    memory_key="questions_and_answers",
    input_key="input"
)

memory = CombinedMemory(memories=[conversational_memory, summary_memory])

In [26]:
# Retrieve the buyer's responses from the conversation buffer
# instead of the harcoded 'answers' list

user_responses = []

for message in conversational_memory.buffer_as_messages:
    if message.type == "human":
        user_responses.append(message.content)

user_preferences = " ".join(user_responses)

similar_docs = db.similarity_search(user_preferences, k=5)

recommended_listings = "\n\n---------------------\n\n".join([f"{doc.page_content}" for doc in similar_docs])


# 3) Generate the Augmented Response

## a) Run the Search and Augment the Listings' Descriptions

In [35]:
RECOMMENDER_TEMPLATE = """The following is a friendly conversation between a human and an AI Real Estate Agent. The AI follows human instructions and provides home ratings for a human based on the home preferences. 

Summary of Recommendations:
{recommendation_summary}
Buyer's Preferences Q&A:
{questions_and_answers}
Recommended Listings:
{recommended_listings}
Human: {input}
AI:"""

PROMPT = PromptTemplate.from_template(RECOMMENDER_TEMPLATE).partial(recommended_listings=recommended_listings)

# PROMPT = PromptTemplate(
#     template=RECOMMENDER_TEMPLATE,
#     input_variables=["recommended_listings", "input"],
#     partial_variables={"recommendation_summary": "recommendation_summary", "questions_and_answers": "questions_and_answers"}
# )


recommender = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)

## b) Generate Personalized Descriptions

In [36]:
augmented_query = """
Now score (0-100) each of the 5 listings based on the buyer's preferences. Format the output as follows:

Home Match Score: [Score]
Neighborhood: [Neighborhood]
Price: [Price]
Bedrooms: [Bedrooms]
Bathrooms: [Bathrooms]
Size sqft: [Size sqft]
Description: [Personalize both the description and the neighborhood description of the listing based on buyer's preferences. Make sure the modified description is unique, appealing, and tailored to the buyer's provided preferences but keep the modified description factual]
"""

In [37]:
personalized_recommendation = recommender.predict(input=augmented_query)
print(personalized_recommendation)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI Real Estate Agent. The AI follows human instructions and provides home ratings for a human based on the home preferences. 

Summary of Recommendations:
[SystemMessage(content='The human answered 4 personal questions. Use them to rate, from 1 to 100, how much they like a home recommendation.')]
Buyer's Preferences Q&A:
Human: You are AI that will recommend user a new home based on their answers to questions about their home preferences. Ask user 4 questions
AI: How big do you want your house to be?What are 3 most important things for you in choosing this property?
Human: A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
AI: Which amenities would you like?
Human: A quiet neighborhood, good local schools, and convenient shopping options.
AI: Which transportation options are important to you?
Human: A ba