This notebook conducts the following tasks to assist a potential buyer in finding a home that will meet a set of prescribed preferences:

1. loads a real estate listing from a CSV file, 
2. creates embeddings for the property description of each listing and stores them in the Chroma vectorstore for query,
3. collects home and neighborhood preferences from a potential buyer,
4. uses the retriever utility function in LangChain to locate a set of listings that best match the user-provided preferences in home size and budget,
5. uses GPT-3-turbo-0125 language model to provide a summary description for each home that resonates with the potential buyer.

In [1]:
import json
import pandas as pd

from langchain.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from langchain_openai import OpenAIEmbeddings

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

from langchain_community.document_loaders import CSVLoader
from langchain_community.document_loaders import DataFrameLoader

from langchain.docstore.document import Document


import ipywidgets as widgets
from widgets import slider, textML, RB

import os
from dotenv import load_dotenv, find_dotenv


In [2]:
#
# look up API key from the .env file
#
_ = load_dotenv(find_dotenv()) # read local .env file


In [3]:
#
# load the listing database from a csv file
#

df = pd.read_csv("./home_listing.csv")

df.head()

Unnamed: 0,Listing Number,Neighborhood,Price,Bedrooms,Bathrooms,House Size,House Description,Neighborhood Description
0,L000001,West Lake Village,"$800,000",3,2.0,"2,500 sqft",Welcome to this charming home nestled in the h...,This home is situated in a quiet and friendly ...
1,L000002,Agoura Hills,"$900,000",3,2.5,"2,200 sqft",This stunning home in Agoura Hills offers a pe...,Located in the desirable Agoura Hills communit...
2,L000003,Newbury Park,"$850,000",4,2.5,"2,100 sqft",Welcome to this beautifully updated home in Ne...,This home is located in a friendly and family-...
3,L000004,Oak Park,"$750,000",3,2.0,"2,100 sqft",This charming home in Oak Park offers a perfec...,Located in a peaceful and picturesque neighbor...
4,L000005,Dos Vientos,"$950,000",4,3.0,"2,400 sqft",This stunning home in Dos Vientos offers luxur...,Located in the highly desirable Dos Vientos co...


In [4]:
print(f"There are {df.shape[0]} listings")

There are 70 listings


In [5]:
df.shape

(70, 8)

In [6]:
#
# remove duplicate records
#

df = df.drop_duplicates(subset=['Listing Number'])


In [7]:
df.shape

(51, 8)

In [8]:
df["Price"] = df.apply(lambda row: int(row["Price"][1:].replace(",","")), axis=1)

In [9]:
df["Features"] =  df.apply(
                lambda row: Document(page_content=row["House Description"]+ ' ' + row["Neighborhood Description"], 
                                     metadata={"LN": row["Listing Number"], "Price": row["Price"], "Size": row["House Size"],
                                              "Bedrooms": row["Bedrooms"], "Bathrooms": row["Bathrooms"]
                                              } ), axis=1
          )

In [10]:
df.head()
    

Unnamed: 0,Listing Number,Neighborhood,Price,Bedrooms,Bathrooms,House Size,House Description,Neighborhood Description,Features
0,L000001,West Lake Village,800000,3,2.0,"2,500 sqft",Welcome to this charming home nestled in the h...,This home is situated in a quiet and friendly ...,"page_content=""Welcome to this charming home ne..."
1,L000002,Agoura Hills,900000,3,2.5,"2,200 sqft",This stunning home in Agoura Hills offers a pe...,Located in the desirable Agoura Hills communit...,"page_content=""This stunning home in Agoura Hil..."
2,L000003,Newbury Park,850000,4,2.5,"2,100 sqft",Welcome to this beautifully updated home in Ne...,This home is located in a friendly and family-...,"page_content=""Welcome to this beautifully upda..."
3,L000004,Oak Park,750000,3,2.0,"2,100 sqft",This charming home in Oak Park offers a perfec...,Located in a peaceful and picturesque neighbor...,"page_content=""This charming home in Oak Park o..."
4,L000005,Dos Vientos,950000,4,3.0,"2,400 sqft",This stunning home in Dos Vientos offers luxur...,Located in the highly desirable Dos Vientos co...,"page_content=""This stunning home in Dos Viento..."


In [11]:
#df.dtypes

In [12]:
#
# set up the chat prompt template with system instructions
#
template = """You are a professional real estate agent assisting home buyers. 
Use the retrieved Listings below to identify which ones can best match the given Preferences. 
You can select up to 3 items from the Listings for each Answer. 
The Answer should start in a new line with the message: 
"## Thank you for your interest, home(s) that best meet your preferences are: ##"
Each offered item must start in a separate line with all the metadata that include "LN", "Price", "Size", "Bedrooms", "Bathrooms" ** no exceptions **. 
They are then followed by a tailored description of the listing that resonates with buyer's preferences, 
try to subtly emphasize aspects of the property that align with what the buyer is looking for, however, ** they MUST be factual and you cannot make things up **.
You must strictly adhere to these instructions. Do not provide any other information not asked for.
Preferences: {question} 
Listings: {context} 
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

print(prompt)

input_variables=['context', 'question'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='You are a professional real estate agent assisting home buyers. \nUse the retrieved Listings below to identify which ones can best match the given Preferences. \nYou can select up to 3 items from the Listings for each Answer. \nThe Answer should start in a new line with the message: \n"## Thank you for your interest, home(s) that best meet your preferences are: ##"\nEach offered item must start in a separate line with all the metadata that include "LN", "Price", "Size", "Bedrooms", "Bathrooms" ** no exceptions **. \nThey are then followed by a tailored description of the listing that resonates with buyer\'s preferences, \ntry to subtly emphasize aspects of the property that align with what the buyer is looking for, however, ** they MUST be factual and you cannot make things up **.\nYou must strictly adhere to these instructions. Do not pr

In [13]:
#
# set up widgets to collect user input iteractively
#
preferences = textML("preferences:","e.g. house amenities, style, neighborhood and transportation")
budget = slider("Budget in $:", 800000, 700000, 1000000, 50000)
bedroom = RB([2,3,4,5],3, "number of bedrooms:")
bathroom = RB([2,2.5,3],2, "number of bathrooms:")
rooms = widgets.HBox([bedroom, bathroom])

box = widgets.VBox([budget, rooms, preferences])
display(box)

VBox(children=(IntSlider(value=800000, description='Budget in $:', max=1000000, min=700000, step=50000), HBox(…

In [57]:
print(f"Budget is ${budget.value}")
print(f"Minimum number of bedrooms is {bedroom.value}")
print(f"Minimum number of bathrooms is {bathroom.value}")
print(f"Preferences listed: {preferences.value}")


Budget is $1000000
Minimum number of bedrooms is 3
Minimum number of bathrooms is 2.5
Preferences listed: a gourmet kitchen, large family room with lots of windows, fenced backyard with a BBQ grill, near top-rated schools and recreational areas, lots of tree and away from city center, low crime rate.



In [58]:
d_list = df[ (df["Price"] <= budget.value) & (df["Bedrooms"] >= bedroom.value) & (df["Bathrooms"] >= bathroom.value) ]["Features"] 

In [59]:
data = []
for d in d_list:
    data.append(d)
    

In [60]:
print(f"{len(d_list)} on the shortlist")

11 on the shortlist


In [61]:
question = f"Find me home listings with {preferences.value}"

print(question)

Find me home listings with a gourmet kitchen, large family room with lots of windows, fenced backyard with a BBQ grill, near top-rated schools and recreational areas, lots of tree and away from city center, low crime rate.



In [62]:
if (len(data)==0): print("** sorry! no listing in the inventory will meet your perferences, please modify your inputs **")

In [63]:
#
# set up embeddings and Chroma vectorstore
#
docsearch = Chroma.from_documents(data, OpenAIEmbeddings())
retriever = docsearch.as_retriever()

In [64]:
docs = docsearch.similarity_search_with_score(question)

In [65]:
docs

[(Document(page_content="Nestled in the highly sought-after Oak Park community, this stunning home offers the perfect blend of luxury and comfort. The open and bright floor plan features a gourmet kitchen with granite countertops, high-end appliances, and a large island. The spacious living and dining area feature soaring ceilings, a cozy fireplace, and large windows that look out onto the backyard. The master suite is a true oasis, with a luxurious en-suite bathroom and a private balcony overlooking the mountains. The backyard features a built-in BBQ, multiple seating areas, and plenty of space for outdoor entertaining. This home also offers a three-car garage and solar panels. Oak Park is known for its beautiful parks, top-rated schools, and close-knit community. Residents can enjoy miles of hiking and biking trails, as well as nearby shopping and dining options. The neighborhood also offers easy access to major highways and is just a short drive from nearby beaches and the city. Don

In [66]:
#
# set up LLM and RAG chain
#

llm = ChatOpenAI(model_name="gpt-3.5-turbo-0125", temperature=0.5)


rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)



In [67]:
question = f"Find me home listings with {preferences.value}"

print(question)

Find me home listings with a gourmet kitchen, large family room with lots of windows, fenced backyard with a BBQ grill, near top-rated schools and recreational areas, lots of tree and away from city center, low crime rate.



### Top 3 listings that meet buyer's preferences 

In [68]:
output = rag_chain.invoke(question)

In [69]:
print(output)

## Thank you for your interest, home(s) that best meet your preferences are: ##
LN: L056789
Price: 900000
Size: 2,500 sqft
Bedrooms: 4
Bathrooms: 3.0
This stunning home in Oak Park features a gourmet kitchen with granite countertops, high-end appliances, and a large island. The spacious living and dining area have large windows that provide lots of natural light. The backyard includes a built-in BBQ, multiple seating areas, and ample space for outdoor entertaining. Located near top-rated schools and recreational areas, this home offers a peaceful setting with plenty of trees away from the city center.

LN: L038634
Price: 950000
Size: 2,200 sqft
Bedrooms: 4
Bathrooms: 3.0
Nestled in Moorpark, this beautiful home boasts a gourmet kitchen with granite countertops and stainless steel appliances. The large family room has high ceilings and plenty of natural light. The fenced backyard features a BBQ grill and a fire pit, perfect for outdoor gatherings. Situated near top-rated schools and rec