This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

# HomeMatch: The AI-Driven Real Estate Personalization Agent
## *Synopsis*
HomeMatch utilizes advanced large language models (LLMs) and vector databases to convert typical property listings into personalized recommendations that meet each buyer's unique preferences and needs.
## *Overview of the Process*
####  <u>1. Python Application Setup</u>


Install the required packages:

- LLM library: OpenAI's GPT

- LangChain

- Vector database package: ChromaDB

#### <u>2. Creating Property Listings</u>
Generate property descriptions using a Large Language Model by crafting prompts for the LLM to produce various property descriptions.These genearted listings will populate the vector database for HomeMatch's testing and development.

#### <u>3. Saving Listings in a Vector Database</u>
Set up and configure the ChromaDB vector database to store the property listings. Transform the LLM-generated listings into appropriate embeddings that reflect each listing's semantic content, and save these embeddings into the vector database.

#### <u>4. Developing the User Preference Interface</u>
Gather buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific criteria through a set of questions. The questions and answers can be hardcoded in the buyer preferences.

#### <u>5. Conducting Preference-Based Searches</u>
Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

#### <u>6. Personalizing Listing Descriptions</u>
LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [1]:
#installing necessary packages 
!pip install openai
!pip install langchain
!pip install numpy
!pip install chromadb
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.1/13.1 MB[0m [31m59.7 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
Collecting tzdata>=2022.7
  Downloading tzdata-2025.1-py2.py3-none-any.whl (346 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m346.8/346.8 kB[0m [31m37.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tzdata, pandas
Successfully installed pandas-2.2.3 tzdata-2025.1


In [2]:
!pip install -U langchain-openai
!pip install pydantic
!pip install shutil

Defaulting to user installation because normal site-packages is not writeable
Collecting langchain-openai
  Downloading langchain_openai-0.3.8-py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.4/55.4 kB[0m [31m311.6 kB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hCollecting tiktoken<1,>=0.7
  Downloading tiktoken-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting openai<2.0.0,>=1.58.1
  Downloading openai-1.65.5-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.5/474.5 kB[0m [31m43.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<1.0.0,>=0.3.42
  Downloading langchain_core-0.3.43-py3-none-any.whl (415 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m415.4/415.4 kB[0m [31m41.0 MB/s[0m eta

Defaulting to user installation because normal site-packages is not writeable
[31mERROR: Could not find a version that satisfies the requirement shutil (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for shutil[0m[31m
[0m

In [2]:
import os

os.environ["OPENAI_API_KEY"] = "voc-865037966126677383792667add26794b526.68427818"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

In [3]:
from langchain import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain import hub
from langchain.memory import ConversationSummaryMemory, ChatMessageHistory
from langchain.chains.conversational_retrieval.base import ConversationalRetrievalChain


# Step 1 : Generating House Listings
### gpt-3.5-turbo

In [9]:
#model_name = 'gpt-4-turbo-preview'
#from langchain.chat_models import ChatOpenAI
#llm = OpenAI(model_name=model_name, temperature=0.0, max_tokens=2000)
# I tried out GPT 4 and was getting errors so I used 3.5 with optimal settings maybe the account restrictions 

In [12]:
model="gpt-3.5-turbo"
temperature = 0.0

llm = OpenAI(
    model_name=model, 
    temperature=temperature, 
    max_tokens=4000, 
)

In [10]:
# Generate House listings
listing_gen_template = '''
Generate a CSV file that contains {num_listings} unique property listings with each listing tabulating the following attributes:

1- Neighborhood: Specify the name of the neighborhood where the property is located.
2- Price: Specify the property's price.
3- Bedrooms: Specify the number bedrooms.
4- Bathrooms: Specify the property's bathrooms.
5- House Size: Specify the property's square footage.
6- Description: Craft a distinguished description of the property that showcases its appeal and charm, and lists features such as: a new roof, an upgraded kitchen, energy efficient appliances, solar roof, water or mountain views, car garage, fireplace, patio, deck, large backyard, garden.
7- Neighborhood Description: Craft a description of the neighborhood and what it offers in terms of amenities and community such as: bike-friendly roads, parks, public gardens, restaurants, organic stores, easy access to highways, bus or train transporation, low noise levels.

Here is a sample listing entry format with the header:
[Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description],
[Green Oaks,"$800,000",3,2,"2,000 sqft","Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.","Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."],
'''

In [13]:
prompt = PromptTemplate.from_template(listing_gen_template)

listings = llm(prompt.format(num_listings = 15))
print(listings)


Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description
Willow Creek,"$750,000",4,3,"2,500 sqft","Step into this spacious 4-bedroom, 3-bathroom home located in the serene neighborhood of Willow Creek. This property features a newly renovated kitchen with stainless steel appliances, a cozy fireplace in the living room, and a large deck overlooking the lush backyard. With ample natural light and a functional layout, this home is perfect for families looking for comfort and style.","Willow Creek offers a peaceful setting with tree-lined streets and easy access to parks and walking trails. Enjoy the convenience of nearby shopping centers and restaurants, as well as top-rated schools in the area. With a strong sense of community and a variety of amenities, Willow Creek is the ideal place to call home."
Sunset Ridge,"$900,000",5,4,"3,000 sqft","Welcome to this stunning 5-bedroom, 4-bathroom home in the prestigious neighborhood of Sunset Ridge. This property boas

In [14]:
with open('listings.csv', 'w') as f:
    f.write(listings)

# 2 Semantic Search
## Create a Vector Database and Store the Listings

In [4]:
# Load the CSV document
file_path = "listings.csv"
loader = CSVLoader(file_path=file_path)
docs = loader.load()

In [5]:
# Use a Text Splitter to split the documents into chunks
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
split_docs = splitter.split_documents(docs)

In [6]:
# Initialize the embeddings model
embeddings = OpenAIEmbeddings()

In [7]:
# Populate the vector database with the chunks
db = Chroma.from_documents(split_docs, embeddings)

In [8]:
# Define the LLM
model_name = "gpt-3.5-turbo"
llm = OpenAI(model_name=model_name, temperature=0, max_tokens=2000)




#### Build the Semantic Search of Listings Based on Buyer's Preferences

In [9]:
# Simulate a buyer's questions and answers

questions = [
    "How big do you want your house to be?"
    "What are 3 most important things for you in choosing this property?",
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters.",
]
    

In [10]:
# create a chat with the customer and summarize it
history = ChatMessageHistory()
history.add_user_message(f"""You are AI sales assisstant that will recommend user a home based on their answers to personal questions. Ask user {len(questions)} questions""")
for i in range(len(questions)):
    history.add_ai_message(questions[i])
    history.add_user_message(answers[i])
    

In [13]:
from langchain.memory import ConversationSummaryMemory, ConversationBufferMemory, CombinedMemory, ChatMessageHistory
from langchain.chains import ConversationChain
from typing import Any, Dict

In [14]:
max_rating = 100

summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="recommendation_summary",
    input_key="input",
    buffer=f"The human answered {len(questions)} personal questions. Use them to rate, from 1 to {max_rating}, how much they like a home recommendation.",
    return_messages=True
)

class MementoBufferMemory(ConversationBufferMemory):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        input_str, output_str = self._get_input_output(inputs, outputs)
        self.chat_memory.add_ai_message(output_str)
        
conversational_memory = MementoBufferMemory(
    chat_memory=history,
    memory_key="questions_and_answers",
    input_key="input"
)

memory = CombinedMemory(memories=[conversational_memory, summary_memory])

In [16]:
user_responses = []

for message in conversational_memory.buffer_as_messages:
    if message.type == "human":
        user_responses.append(message.content)

user_preferences = " ".join(user_responses)

similar_docs = db.similarity_search(user_preferences, k=5)

recommended_listings = "\n\n---------------------\n\n".join([f"{doc.page_content}" for doc in similar_docs])

# 3. Generate the Augmented Response
## Run the Search and Augment the Listings with Descriptions

In [21]:
template = """The following is a friendly conversation between a human and an AI Real Estate Agent. The AI follows human instructions and provides home ratings for a human based on the home preferences. 

Summary of Recommendations:
{recommendation_summary}
Buyer's Preferences Q&A:
{questions_and_answers}
Recommended Listings:
{recommended_listings}
Human: {input}
AI:"""

PROMPT = PromptTemplate.from_template(template).partial(recommended_listings=recommended_listings)




recommender = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)

In [22]:
augmented_query = """
Now score (0-100) each of the 5 listings based on the buyer's preferences. Format the output as follows:

Home Match Score: [Score]
Neighborhood: [Neighborhood]
Price: [Price]
Bedrooms: [Bedrooms]
Bathrooms: [Bathrooms]
Size sqft: [Size sqft]
Description: [Personalize both the description and the neighborhood description of the listing based on buyer's preferences. Make sure the modified description is unique, appealing, and tailored to the buyer's provided preferences but keep the modified description factual]
"""

In [23]:
personalized_recommendation = recommender.predict(input=augmented_query)
print(personalized_recommendation)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI Real Estate Agent. The AI follows human instructions and provides home ratings for a human based on the home preferences. 

Summary of Recommendations:
[SystemMessage(content="The human answered 4 personal questions and used them to rate 5 home listings based on their preferences. The AI provided personalized descriptions for each listing, tailored to the buyer's preferences, and rated them accordingly. The listings ranged from a cozy mountain retreat in Mountain View to an elegant waterfront property in Riverside Gardens, with scores ranging from 75 to 90.")]
Buyer's Preferences Q&A:
Human: You are AI sales assisstant that will recommend user a home based on their answers to personal questions. Ask user 4 questions
AI: How big do you want your house to be?What are 3 most important things for you in choosing this property?
Human: A


[1m> Finished chain.[0m
Home Match Score: 90
Neighborhood: Mountain View
Price: $700,000
Bedrooms: 3
Bathrooms: 2
Size sqft: 2,000 sqft
Description: Immerse yourself in the tranquility of mountain living in this 3-bedroom, 2-bathroom home nestled in the scenic neighborhood of Mountain View. The cozy living room with a wood-burning stove, bright kitchen with mountain views, and spacious deck for outdoor dining make this property a perfect retreat for nature lovers. The master suite offers a private balcony and a renovated bathroom with a walk-in shower, while the large backyard with a garden and fruit trees provides ample space for gardening and relaxation.
Neighborhood Description: Mountain View offers a serene setting with panoramic mountain views and easy access to hiking trails and ski resorts. Residents can enjoy the beauty of nature year-round, from skiing in the winter to hiking and biking in the summer. With a strong sense of community and a variety of amenities, Mountain Vie