# Step 1: Setting Up the Python Application

Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (e.g., ChromaDB or LanceDB). If you don't wish to create your files from scratch, starter files are available in the workspace on the next page as an application skeleton.

In [4]:
os.environ["OPENAI_API_KEY"] = ''
OPENAI_API_KEY=''

In [None]:
!pip install -U pandas
!pip install tiktoken
!pip install pytest
!pip install sentence-transformers
!pip install transformers
!pip install jupyter
!pip install -U openai
!pip install chromadb
!pip install langchain
!pip install numpy
!pip install -U langchain-openai
!pip install pydantic
!pip install shutil

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting langchain-openai
  Downloading langchain_openai-0.1.8-py3-none-any.whl (38 kB)
Collecting tiktoken<1,>=0.7
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting langchain-core<0.3,>=0.2.2
  Downloading langchain_core-0.2.7-py3-none-any.whl (315 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.6/315.6 kB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
Collecting langsmith<0.2.0,>=0.1.75
  Downloading langsmith-0.1.77-py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.2/125.2 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
Collecting packaging<25,>=23.2
  Downloading packaging-24.1

In [1]:
import os
import shutil
import numpy as np
import langchain
import openai
import chromadb
import pandas as pd
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores.chroma import Chroma
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI


# Step 2: Generating Real Estate Listings 

Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties. An example of a listing might be:
    Neighborhood: Green Oaks
    Price: $800,000
    Bedrooms: 3
    Bathrooms: 2
    House Size: 2,000 sqft

    Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

    Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
    
You'll use these listings to populate the database for testing and development of "HomeMatch".

In [83]:
prompt='''
Create a CSV file of 15 real estate listings. Generate representative data with the attributes as columns:

1. Neighborhood: Identify the neighborhood location. Example: "Green Oaks."
2. Price: The property's market price in U.S. dollars. The format should be "$xxx,xxx".
3. Bedrooms: Number of bedrooms in the property. Example: "2".
4. Bathrooms: Number of bathrooms in the property.  There can be half-bathrooms that don't have a shower. Example: "2.5".
5. House Size: The property's square footage.  Example: "2,000 sqft".

Write a description of the property that is consistent with the data above.  Highlight important features or unique characteristics of the property at hand.
Write a description of the surrounding neighborhood that is consistent with the data above.  Highlight important features or unique characteristics area that the property is located in such as schools, town characteristics, and surrounding businesses.

Here is an example data point:
Neighborhood,Price,Bedrooms,Bathrooms,House_Size,House_Description,Neighborhood_Description
"Green Oaks","$800,000",3,2,"2,000 sqft","Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.", "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."


Make sure the CSV file has headers for each column. Follow the example provided to format each row with information specific to a unique property listing.
Make sure you generate 15 unique listings.
'''

#Using GPT-4omni for cheaper and better results
import openai

client = openai.OpenAI(api_key = OPENAI_API_KEY)





# Step 3: Storing Listings in a Vector Database

Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.

Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [None]:
messages=[{"role": "system", "content": "You are a generator of synthetic data."},
    {"role": "user", "content":  prompt}]

print(messages)


In [None]:
response = client.chat.completions.create(
    model="gpt-4o",
    temperature = 0.0,
    messages=messages
)
print(response.choices[0].message.content)

In [2]:
df=pd.read_csv('home_reviews.txt')
df.head()

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House_Size,House_Description,Neighborhood_Description
0,Maplewood,"$450,000",3,2.0,"1,800 sqft","This delightful 3-bedroom, 2-bathroom home in ...",Maplewood is a family-friendly neighborhood kn...
1,Sunnyvale,"$600,000",4,3.0,"2,500 sqft","This stunning 4-bedroom, 3-bathroom home in Su...",Sunnyvale is a vibrant community with top-rate...
2,Riverside,"$525,000",3,2.5,"2,100 sqft","Welcome to this charming 3-bedroom, 2.5-bathro...",Riverside is a picturesque neighborhood with t...
3,Brookside,"$700,000",4,3.5,"3,000 sqft","This elegant 4-bedroom, 3.5-bathroom home in B...",Brookside is an upscale neighborhood known for...
4,Lakeview,"$480,000",3,2.0,"1,900 sqft","This lovely 3-bedroom, 2-bathroom home in Lake...",Lakeview is a serene neighborhood with a stron...


In [5]:
# Configuration
CHROMA_PATH = "/Users/chrismarkson/Downloads/project/chroma/"
DATA_PATH = "home_reviews.txt" 

# Create the chromaDB
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embeddings)

df = pd.read_csv(DATA_PATH)
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['House_Description'], metadata={'id': str(index), 'type': 'House'}))


# Split Text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)

print(f"Number of chunks: {len(chunks)}")


Number of chunks: 15


In [6]:
# Save to Chroma
db = Chroma.from_documents(
    chunks, OpenAIEmbeddings(model="text-embedding-3-small"), persist_directory=CHROMA_PATH)

print(f"Saved {len(chunks)} chunks to {CHROMA_PATH}.")


Saved 15 chunks to /Users/chrismarkson/Downloads/project/chroma/.


#### Step 4: Building the User Preference Interface

Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively.

Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.


In [15]:
questions = ["How big do you want your house to be?" 
"What are 3 most important things for you in choosing this property?", 
"Which amenities would you like?", 
"Which transportation options are important to you?",
"How urban do you want your neighborhood to be?"]
answers = ["A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
"A quiet neighborhood, good local schools, and convenient shopping options.",
"A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
"Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
"A balance between suburban tranquility and access to urban amenities like restaurants and theaters."]

#### Step 5: Searching Based on Preferences

Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [88]:

CHROMA_PATH = "/Users/chrismarkson/Downloads/project/chroma"

PROMPT_TEMPLATE = """
You are a helpful real estate chat bot.
Answer the question based only on the following context:

{context}

####

Answer the question based on the above context: {question}
"""




In [89]:
query_text = input("Enter a description of the property you're looking for:")
#query_text = "Find me a house close to walking trails" 

Enter a description of the property you're looking for: good schools


In [73]:
print(query_text)

good schools


In [90]:
# Search the DB
results = db.similarity_search_with_relevance_scores(query_text, k=3,)
print(results)

[(Document(page_content='Greenfield is a family-friendly neighborhood with excellent schools and a variety of recreational activities. The area is home to several parks, including Greenfield Park, which offers playgrounds, sports fields, and walking trails. The neighborhood also has a range of local shops, cafes, and restaurants. Greenfield is well-connected with public transportation options, making commuting easy.', metadata={'id': '12', 'start_index': 0, 'type': 'Neighborhood'}), 0.08690181047310996), (Document(page_content='Parkside is a vibrant community with top-rated schools and a variety of recreational activities. The neighborhood is home to several parks, including Parkside Central Park, which offers playgrounds, sports fields, and walking trails. The area also has a bustling downtown with shops, restaurants, and entertainment options.', metadata={'id': '8', 'start_index': 0, 'type': 'Neighborhood'}), 0.06704407899104259), (Document(page_content='Fairview is a vibrant communi

In [91]:
for i in range(0,len(questions)):
    context_text=context_text+(f"\n\n---Customer Preference Question: {i+1}\n\n")+questions[i]
    context_text=context_text+(f"\nCustomer Preference Response: ")+(answers[i])

context_text2 = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
context_text=context_text + context_text2
prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = prompt_template.format(context=context_text, question=query_text)

model = ChatOpenAI()
response_text = model.predict(prompt)

sources = [doc.metadata.get("id", None) for doc, _score in results]
formatted_response = f"I was able to find the following results: {response_text}\n\n"
print(prompt+"\n\n\n------")
print(formatted_response)

for i in range(0,len(sources)):
    print(f"Option {i+1}:")
    print(df.iloc[int(sources[i]),0:4])
    print("\n\n")

Human: 
You are a helpful real estate chat bot.
Answer the question based only on the following context:

Greenfield is a family-friendly neighborhood with excellent schools and a variety of recreational activities. The area is home to several parks, including Greenfield Park, which offers playgrounds, sports fields, and walking trails. The neighborhood also has a range of local shops, cafes, and restaurants. Greenfield is well-connected with public transportation options, making commuting easy.

---

Parkside is a vibrant community with top-rated schools and a variety of recreational activities. The neighborhood is home to several parks, including Parkside Central Park, which offers playgrounds, sports fields, and walking trails. The area also has a bustling downtown with shops, restaurants, and entertainment options.

---

Fairview is a vibrant community with top-rated schools and a variety of recreational activities. The neighborhood is home to several parks, including Fairview Cent

In [None]:
df[0]

#### Step 6: Personalizing Listing Descriptions

LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [17]:
PROMPT_TEMPLATE = """
You are a helpful real estate chat bot.
Answer the question based only on the following context:

{context}

####

Using the information above, modify the descriptions of each property to highlight the components most important to the customer.  Emphasize the spects of the property to make it a compelling sell.  Important! Maintain factual information according to the listing but feel free to use creative language.

"""


In [12]:
query_text = input("Enter a description of the property you're looking for:")
#query_text = "Find me a house close to walking trails" 

Enter a description of the property you're looking for: I'm looking for a house with a great backyard


In [13]:
# Search the DB
results = db.similarity_search_with_relevance_scores(query_text, k=3)
print(results)

[(Document(page_content='This spacious 4-bedroom, 3-bathroom home in Greenfield offers a perfect blend of comfort and style. The open floor plan includes a large living room with a fireplace, a formal dining area, and a modern kitchen with stainless steel appliances and granite countertops. The master suite features a walk-in closet and a luxurious bathroom with a soaking tub and separate shower. The backyard is perfect for entertaining, with a covered patio and a beautifully landscaped garden.', metadata={'id': '12', 'start_index': 0, 'type': 'House'}), 0.24285284727932088), (Document(page_content='This delightful 3-bedroom, 2-bathroom home in Meadowbrook offers a cozy and inviting atmosphere. The spacious living room features a beautiful fireplace, perfect for chilly evenings. The modern kitchen is equipped with stainless steel appliances and granite countertops. The master bedroom includes an en-suite bathroom with a luxurious soaking tub. Enjoy the large backyard, ideal for family 

In [20]:
for i in range(0,len(questions)):
    context_text=context_text+(f"\n\n---Customer Preference Question: {i+1}\n\n")+questions[i]
    context_text=context_text+(f"\nCustomer Preference Response: ")+(answers[i])
context_text=context_text+'\n\n'
for i in range(0,len(results)):
    context_text=context_text+(f"\n\n---Property: {results[i][0].metadata['id']}\n")
    context_text=context_text+("\nDescription: ")+(results[i][0].page_content)

prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = prompt_template.format(context=context_text, question=query_text)

model = ChatOpenAI()
response_text = model.predict(prompt)
og_listing=''
for i in range(0,len(results)):
    og_listing=og_listing+(f"\n\n---Property: {results[i][0].metadata['id']}\n")
    og_listing=og_listing+("\nDescription: ")+(results[i][0].page_content)

formatted_response = f"Modified Listings: {response_text}\n\n"
print(og_listing+"\n\n\n####")
print(formatted_response)



---Property: 12

Description: This spacious 4-bedroom, 3-bathroom home in Greenfield offers a perfect blend of comfort and style. The open floor plan includes a large living room with a fireplace, a formal dining area, and a modern kitchen with stainless steel appliances and granite countertops. The master suite features a walk-in closet and a luxurious bathroom with a soaking tub and separate shower. The backyard is perfect for entertaining, with a covered patio and a beautifully landscaped garden.

---Property: 7

Description: This delightful 3-bedroom, 2-bathroom home in Meadowbrook offers a cozy and inviting atmosphere. The spacious living room features a beautiful fireplace, perfect for chilly evenings. The modern kitchen is equipped with stainless steel appliances and granite countertops. The master bedroom includes an en-suite bathroom with a luxurious soaking tub. Enjoy the large backyard, ideal for family gatherings and outdoor activities.

---Property: 6

Description: This 