This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [1]:
import os

os.environ["OPENAI_API_KEY"] = "voc-10428826251266774227972687a76f7c7b7b8.94443740"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"


from langchain.llms import OpenAI


# Step 1: Setting Up the Python Application
Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (e.g., ChromaDB or LanceDB).

In [2]:
from langchain.llms import OpenAI
model_name = 'gpt-3.5-turbo'
llm = OpenAI(model_name=model_name, temperature=0)



# Step 2: Generating Real Estate Listings
Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties.

In [3]:
INSTRUCTION = "Generate 13 real estate listings."
Listing1 = \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

In [4]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List

class RealEstateListing(BaseModel):
    neighborhood: str = Field(description="Name of the neighborhood")
    price: NonNegativeInt = Field(description="Price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="Number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="Size of the property in square feet")
    description: str = Field(description="Description of the property.")   
    neighborhood_description: str = Field(description="Description of the neighborhood.")  

class ListingCollection(BaseModel):
    listing: List[RealEstateListing] = Field(description="List of available real estate")
        
parser = PydanticOutputParser(pydantic_object=ListingCollection)

Prompt Creation

In [5]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)
query = prompt.format(instruction = INSTRUCTION, sample = Listing1)
print(query)

Generate 13 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a 

In [6]:
output = llm(query)
output

'{\n  "listing": [\n    {\n      "neighborhood": "Green Oaks",\n      "price": 800000,\n      "bedrooms": 3,\n      "bathrooms": 2,\n      "house_size": 2000,\n      "description": "Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.",\n      "neighborhood_description": "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access t

In [7]:
listings = parser.parse(output)
listings

ListingCollection(listing=[RealEstateListing(neighborhood='Green Oaks', price=800000, bedrooms=3, bathrooms=2, house_size=2000, description='Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.', neighborhood_description='Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting i

Converting to DataFrame

In [None]:
!pip install pandas

In [8]:
import pandas as pd
df = pd.DataFrame([listing.dict() for listing in listings.listing])
df

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Sunnyvale,950000,4,3,2500,Located in the desirable neighborhood of Sunny...,"Sunnyvale is known for its top-rated schools, ..."
2,Lakeview,700000,2,1,1500,Welcome to this charming bungalow in the heart...,Lakeview is a family-friendly neighborhood wit...
3,Downtown,1200000,3,2,1800,Luxury living awaits in this modern condo in t...,Downtown is a bustling urban neighborhood with...
4,Hillside,850000,4,3,2200,"Perched on a hillside with panoramic views, th...",Hillside is a quiet residential neighborhood w...
5,Waterfront,1600000,5,4,3500,Experience waterfront living at its finest in ...,Waterfront is a prestigious neighborhood with ...
6,Mountain View,1100000,4,3,2400,"Nestled in the scenic hills of Mountain View, ...",Mountain View is a picturesque neighborhood wi...
7,Downtown Loft District,750000,2,2,1800,Live in style in this chic loft in the heart o...,Downtown Loft District is a trendy neighborhoo...
8,Historic District,900000,3,2,2000,"Step back in time in this historic 3-bedroom, ...",Historic District is a quaint neighborhood wit...
9,Beachfront,2000000,6,5,4000,Live the ultimate beachfront lifestyle in this...,Beachfront is a sought-after neighborhood with...


In [9]:
df.to_csv('listings.csv',index_label = 'id')

# Step 3: Storing Listings in a Vector Database
Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.

Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [10]:
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
CHROMA_PATH = "chroma"

In [11]:
df = pd.read_csv('listings.csv')
df

Unnamed: 0,id,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,1,Sunnyvale,950000,4,3,2500,Located in the desirable neighborhood of Sunny...,"Sunnyvale is known for its top-rated schools, ..."
2,2,Lakeview,700000,2,1,1500,Welcome to this charming bungalow in the heart...,Lakeview is a family-friendly neighborhood wit...
3,3,Downtown,1200000,3,2,1800,Luxury living awaits in this modern condo in t...,Downtown is a bustling urban neighborhood with...
4,4,Hillside,850000,4,3,2200,"Perched on a hillside with panoramic views, th...",Hillside is a quiet residential neighborhood w...
5,5,Waterfront,1600000,5,4,3500,Experience waterfront living at its finest in ...,Waterfront is a prestigious neighborhood with ...
6,6,Mountain View,1100000,4,3,2400,"Nestled in the scenic hills of Mountain View, ...",Mountain View is a picturesque neighborhood wi...
7,7,Downtown Loft District,750000,2,2,1800,Live in style in this chic loft in the heart o...,Downtown Loft District is a trendy neighborhoo...
8,8,Historic District,900000,3,2,2000,"Step back in time in this historic 3-bedroom, ...",Historic District is a quaint neighborhood wit...
9,9,Beachfront,2000000,6,5,4000,Live the ultimate beachfront lifestyle in this...,Beachfront is a sought-after neighborhood with...


In [12]:
from langchain.schema import Document
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['description'], metadata={'id': str(index)}))

In [13]:
splitter = CharacterTextSplitter(chunk_size=300,chunk_overlap=100)
split_docs = splitter.split_documents(documents)

In [14]:
print(f"Split {len(documents)} documents into {len(split_docs)} chunks.")

Split 13 documents into 13 chunks.


In [15]:
print(split_docs[3])

page_content='Luxury living awaits in this modern condo in the heart of Downtown. This 3-bedroom, 2-bathroom unit features floor-to-ceiling windows with stunning city views. The gourmet kitchen is equipped with high-end appliances and sleek finishes. Relax in the spa-like master suite with a soaking tub and walk-in closet. Enjoy the convenience of living in the vibrant Downtown area with access to top restaurants, shopping, and entertainment.' metadata={'id': '3'}


In [16]:
print(split_docs[3].page_content)
print(split_docs[3].metadata)

Luxury living awaits in this modern condo in the heart of Downtown. This 3-bedroom, 2-bathroom unit features floor-to-ceiling windows with stunning city views. The gourmet kitchen is equipped with high-end appliances and sleek finishes. Relax in the spa-like master suite with a soaking tub and walk-in closet. Enjoy the convenience of living in the vibrant Downtown area with access to top restaurants, shopping, and entertainment.
{'id': '3'}


In [17]:
# Save to Chroma - backup
import shutil
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    split_docs, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Saved split docs.")

Saved split docs.


# Step 4: Building the User Preference Interface
Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language

Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

In [18]:
preferences = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

# Step 5: Searching Based on Preferences
Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.

Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences

In [19]:
PROMPT_TEMPLATE = \
"""
Context: 

{context}

----

Provide the answer for the question: {question}
"""

In [20]:
from langchain.chat_models import ChatOpenAI
def get_top_listings(query, PROMPT_TEMPLATE) -> list[int]:
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    results = db.similarity_search_with_relevance_scores(query, k=5)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = PromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query)
        print(f"Generated Prompt:\n{prompt}")
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        sources = [doc.metadata.get("id", None) for doc, _score in results]
        formatted_response = f"Response: {response_text}\nSources: {sources}"
        print(formatted_response)

In [21]:
get_top_listings(preferences[2], PROMPT_TEMPLATE)

Generated Prompt:

Context: 

Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

---

Live in harmony with nature in this eco-friendly home in the serene Parkside neighborhood. This 4-bedroom, 3-bathroom home features energy-efficient features and sustainable materials. The open-concept living area is flooded with natural light and offers views of the landscaped garden. The gourmet kitchen is equipped with modern appliances and a large island for entertaining. Step outside to the outdoor deck to enjoy the peaceful surround

# Step 6: Personalizing Listing Descriptions
LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [22]:
PERSONALIZED_PROMPT = \
"""
Context:

{context}

------
Provide a response which will not only answer for the buyer's question but also helps buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

Question: {question}
"""

In [23]:
get_top_listings(preferences[2], PERSONALIZED_PROMPT)

Generated Prompt:

Context:

Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

---

Live in harmony with nature in this eco-friendly home in the serene Parkside neighborhood. This 4-bedroom, 3-bathroom home features energy-efficient features and sustainable materials. The open-concept living area is flooded with natural light and offers views of the landscaped garden. The gourmet kitchen is equipped with modern appliances and a large island for entertaining. Step outside to the outdoor deck to enjoy the peaceful surroundi

# Step 7: Deliverables and Testing

In [24]:
def chat(query, PROMPT_TEMPLATE) -> list[int]:
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    results = db.similarity_search_with_relevance_scores(query, k=5)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
    else:
        context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
        prompt_template = PromptTemplate.from_template(PROMPT_TEMPLATE)
        prompt = prompt_template.format(context=context_text, question=query)
        
        model = ChatOpenAI()
        response_text = model.predict(prompt)
        print(f"\n\n*Response*: {response_text}\n---------\n")

In [25]:
print("I am a personalized real estate chat agent. Please enter your query, or type 'exit' to end the chat.\n")
while True:
    user_input = input("*User Query*: ")
    if user_input.lower() == "exit": 
        print("Thank you!")
        break
    chat(user_input, PERSONALIZED_PROMPT)

I am a personalized real estate chat agent. Please enter your query, or type 'exit' to end the chat.

*User Query*: I'm looking for a quiet neighborhood with good schools and parks nearby for my kids.


*Response*: Based on your preferences, I would highly recommend the eco-friendly oasis in Green Oaks. This home not only offers a peaceful and quiet neighborhood but also features energy-efficient features and a spacious backyard with a vegetable garden, perfect for your eco-conscious family. Additionally, with its close proximity to parks and good schools, this charming home in Green Oaks would be an ideal fit for you and your kids.
---------

*User Query*: I want an eco-friendly home with solar panels and a vegetable garden.


*Response*: Response: 

I am thrilled to present to you this stunning eco-friendly oasis in Green Oaks that perfectly aligns with your preferences. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulate