This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [45]:
import os

os.environ["OPENAI_API_KEY"] = "voc-9095091411266773732140677406abc70071.75282992"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

In [46]:
model_name = 'gpt-3.5-turbo'
llm = OpenAI(model_name=model_name, temperature=0)

# Generate Synthetic Property Listings with LLM

### Create an output parser to ensure LLM outputs are structured

In [47]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

class PropertyListing(BaseModel):
    PropertyID: int = Field(description="The ID of the property.")
    Neighborhood: str = Field(description="The name of the neighborhood.")
    Price: str = Field(description="The price of the property.")
    Bedrooms: int = Field(description="The number of bedrooms.")
    Bathrooms: int = Field(description="The number of bathrooms.")
    HouseSize: str = Field(description="The size of the house.")
    Description: str = Field(description="The description of the property.")
    NeighborhoodDescription: str = Field(description="The description of the neighborhood.")

class ListingsOutput(BaseModel):
    listings: List[PropertyListing] = Field(description="A list of property listings.")

# Initialize the parser
output_parser = PydanticOutputParser(pydantic_object=ListingsOutput)

### Create Prompt

In [48]:
# Format your query to include parsing instructions
format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template="{question}\nInstructions:{format_instructions}\nExample: {example}",
    input_variables=["question", "example"],
    partial_variables = {"format_instructions": format_instructions}
)

total_listings = 15
question = f"""
Generate {total_listings} property listings with IDs, based on the below instructions and example. 
""".format(total_listings)

example = """
{
  "listings": [
    {
      "PropertyID": 1,
      "Neighborhood": "Green Oaks",
      "Price": "$800,000",
      "Bedrooms": 3,
      "Bathrooms": 2,
      "HouseSize": "2,000 sqft",
      "Description": "Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.",
      "NeighborhoodDescription": "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."
    }
  ]
}
"""

query = prompt.format(question = question, format_instructions = format_instructions, example = example)
print(query)


Generate 15 property listings with IDs, based on the below instructions and example. 

Instructions:The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"listings": {"title": "Listings", "description": "A list of property listings.", "type": "array", "items": {"$ref": "#/definitions/PropertyListing"}}}, "required": ["listings"], "definitions": {"PropertyListing": {"title": "PropertyListing", "type": "object", "properties": {"PropertyID": {"title": "Propertyid", "description": "The ID of the property.", "type": "integer"}, "Neighborhood": {"title": "Neighborhood", "description": 

In [49]:
# Get the output and parse it
output = llm.predict(query)
parsed_output = output_parser.parse(output)

# Access listings as a structured object
print(parsed_output.listings)

[PropertyListing(PropertyID=1, Neighborhood='Downtown Loft District', Price='$500,000', Bedrooms=2, Bathrooms=1, HouseSize='1,200 sqft', Description='Experience urban living at its finest in this stylish loft located in the heart of the Downtown Loft District. This 2-bedroom, 1-bathroom loft features high ceilings, exposed brick walls, and large windows offering stunning city views. The open floor plan is perfect for entertaining, and the modern kitchen is equipped with stainless steel appliances. Enjoy the convenience of living within walking distance to trendy restaurants, art galleries, and nightlife hotspots.', NeighborhoodDescription='The Downtown Loft District is a vibrant neighborhood known for its eclectic mix of art galleries, boutiques, and restaurants. Residents can enjoy easy access to public transportation and cultural events happening throughout the year.'), PropertyListing(PropertyID=2, Neighborhood='Suburban Paradise', Price='$700,000', Bedrooms=4, Bathrooms=3, HouseSiz

In [50]:
#check the total number of tokens used (max tokens for gpt 3.5 is 4096)
import tiktoken
import json

# Initialize tokenizer (for OpenAI models like gpt-3.5-turbo or gpt-4)
tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo")
# Convert listings to a string representation for token counting
listings_text = json.dumps(parsed_output.dict())

# Count tokens
listing_tokens = len(tokenizer.encode(listings_text))
query_tokens = len(tokenizer.encode(query))
print(f"Total Number of tokens in query + listing ouputs: {listing_tokens + query_tokens}")

Total Number of tokens in query + listing ouputs: 3401


# Generate embeddings of property listings

In [51]:
from langchain.embeddings.openai import OpenAIEmbeddings
# Initialize the embeddings model
embedding_model = OpenAIEmbeddings()

In [52]:
#use "Full Description" for embedding (both house and neighborhood)
property_descriptions = parsed_output.dict()['listings'] 
for property_description in property_descriptions:
    property_description['FullDescription'] = property_description['Description'] + "\n" +  property_description['NeighborhoodDescription']

In [53]:
property_text_to_embed = [property_description['FullDescription'] for property_description in property_descriptions]

In [54]:
# Generate embeddings for a list of strings
listing_embeddings = embedding_model.embed_documents(property_text_to_embed)

# Initialise Vector DB and store listing embeddings

Format data

In [55]:
# Function to drop keys
def drop_keys_from_dict(d, keys_to_remove):
    return {k: v for k, v in d.items() if k not in keys_to_remove}
# Keys to remove
keys_to_remove = ['PropertyID', 'Description', 'FullDescription', 'NeighborhoodDescription']
metadatas = [drop_keys_from_dict(property_description, keys_to_remove) for property_description in property_descriptions]
propertyIDs = [str(property_description['PropertyID']) for property_description in property_descriptions]

for i in range(len(property_descriptions)):
    property_descriptions[i]['embedding'] = listing_embeddings[i]

Chroma DB

In [56]:
import chromadb
from chromadb.config import Settings

# Initialize the ChromaDB client with persistence
client = chromadb.Client(Settings(persist_directory="db/"))
collection_name = "property_listings"
#client.delete_collection(collection_name)
collection = client.get_or_create_collection(name=collection_name)

In [66]:
# Add data to ChromaDB
collection.add(
    embeddings = listing_embeddings,
    documents = property_text_to_embed,
    metadatas = metadatas,
    ids = propertyIDs
)

In [107]:
data = collection.get(include = ["metadatas", "documents", "embeddings"])

Lance DB

In [22]:
#!pip install lancedb

In [59]:
# from lancedb.pydantic import vector, LanceModel
# class property_listings(LanceModel):
#     embedding: vector(1536)
#     PropertyID: float
#     Neighborhood: str
#     Price: str
#     Bedrooms: float
#     Bathrooms: float
#     HouseSize: str
#     Description: str
#     NeighborhoodDescription: str
#     FullDescription: str

# #Now connect to a local db at ~/.lancedb and create an empty LanceDB table called 
# import lancedb

# db = lancedb.connect("~/.lancedb")
# table_name = "property_listings"
# db.drop_table(table_name, ignore_missing=True)
# table = db.create_table(table_name, schema=property_listings)

#table.add(property_descriptions)

#table.head(5).to_pandas()

# Buyer's Preferences

In [110]:
questions = [   
                "How big do you want your house to be?" ,
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."]

buyer_preferences = ''
for Q, A in dict(zip(questions, answers)).items():
    buyer_preferences += f"Q:{Q}\nA:{A}\n"
    
print(buyer_preferences)

Q:How big do you want your house to be?
A:A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
Q:What are 3 most important things for you in choosing this property?
A:A quiet neighborhood, good local schools, and convenient shopping options.
Q:Which amenities would you like?
A:A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
Q:Which transportation options are important to you?
A:Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
Q:How urban do you want your neighborhood to be?
A:A balance between suburban tranquility and access to urban amenities like restaurants and theaters.



# Retrieval (Semantic Search) for relevant property listings

In [111]:
# Phase 2: Implement an agent that first reads the answer/preference and first maps it to either house description or neighborhood description or both. then sends similarity to the relevant embedding
# For each question can take average distance for a certain property.

Chroma DB

In [112]:
# Generate embeddings for the query
#Note for retrieval we just embed the answers to the questions
query_embeddings = embedding_model.embed_documents(["".join(answers)])

In [113]:
#search with chromeDB
results = collection.query(
    query_embeddings=query_embeddings,
    n_results=2
)

# Property Recommendations and Personalized Descriptions through RAG and User Preferences

Prompt template

In [114]:
# Define the system prompt
system_prompt = "You are a real estage agent that provides personalised property descriptions based on buyer's preferences."


instructions = """
Based on the property listings in the Context, generate personalized descriptions for the listings.
The descriptions should be unique, appealing, and tailored to resonate with the buyer’s specific preferences. 
This involves subtly emphasizing aspects of the property that align with what the buyer is looking for. 
Ensure that the context provided enhances the appeal of the listing without altering factual information.
"""

query = f"""
{system_prompt}

{instructions} 

Buyer preferences are provided in a Q&A formatx.
Buyer Preferences:
{buyer_preferences}
"""
print(query)


You are a real estage agent that provides personalised property descriptions based on buyer's preferences.


Based on the property listings in the Context, generate personalized descriptions for the listings.
The descriptions should be unique, appealing, and tailored to resonate with the buyer’s specific preferences. 
This involves subtly emphasizing aspects of the property that align with what the buyer is looking for. 
Ensure that the context provided enhances the appeal of the listing without altering factual information.
 

Buyer preferences are provided in a Q&A formatx.
Buyer Preferences:
Q:How big do you want your house to be?
A:A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
Q:What are 3 most important things for you in choosing this property?
A:A quiet neighborhood, good local schools, and convenient shopping options.
Q:Which amenities would you like?
A:A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.

Augmented prompt

In [115]:
context = "".join([f'\nProperty {str(i + 1)}:\n' + results['documents'][0][i] for i in range(len(results['documents'][0]))])

prompt_augmented = query + f"\nContext: {context}"
print(prompt_augmented)


You are a real estage agent that provides personalised property descriptions based on buyer's preferences.


Based on the property listings in the Context, generate personalized descriptions for the listings.
The descriptions should be unique, appealing, and tailored to resonate with the buyer’s specific preferences. 
This involves subtly emphasizing aspects of the property that align with what the buyer is looking for. 
Ensure that the context provided enhances the appeal of the listing without altering factual information.
 

Buyer preferences are provided in a Q&A formatx.
Buyer Preferences:
Q:How big do you want your house to be?
A:A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
Q:What are 3 most important things for you in choosing this property?
A:A quiet neighborhood, good local schools, and convenient shopping options.
Q:Which amenities would you like?
A:A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.

Generation (Property Recommendations & Personalized Descriptions)

In [116]:
model_name = 'gpt-3.5-turbo'
llm = OpenAI(model_name=model_name, temperature=0, max_tokens=2000)
output = llm.predict(prompt_augmented)
print("----LLM Generated Personalized Descriptions of Two Recommended Properties------")
print(output)

----LLM Generated Personalized Descriptions of Two Recommended Properties------
Property 1:
Welcome to your suburban oasis in this spacious 4-bedroom, 3-bathroom home. Nestled in the family-friendly Suburban Paradise neighborhood, this property offers a large backyard perfect for gardening and outdoor gatherings. The gourmet kitchen is ideal for preparing family meals, while the cozy fireplace in the living room provides a warm and inviting atmosphere. The luxurious master suite is a peaceful retreat after a long day. With top-rated schools and shopping centers nearby, this home combines suburban tranquility with convenience.

Property 2:
Step into the past with this charming 3-bedroom, 2-bathroom home located in the Historic District. The original hardwood floors and vintage fixtures exude character and charm, while the updated kitchen and bathrooms offer modern comfort. Relax on the cozy front porch and soak in the historic ambiance of the neighborhood. The Historic District is a tre

Comparison with initial property description

In [117]:
print(context)


Property 1:
Escape to your own suburban paradise in this spacious 4-bedroom, 3-bathroom home. The property features a large backyard with a pool, perfect for outdoor entertaining. Inside, the home boasts a gourmet kitchen, a cozy fireplace in the living room, and a luxurious master suite. Enjoy the peace and tranquility of suburban living while still being close to shopping centers and top-rated schools.
Suburban Paradise is a family-friendly neighborhood with tree-lined streets and parks. Residents can enjoy community events, farmers markets, and easy access to hiking trails and outdoor activities.
Property 2:
Step back in time with this charming 3-bedroom, 2-bathroom home located in the Historic District. The property features original hardwood floors, vintage fixtures, and a cozy front porch. The updated kitchen and bathrooms offer modern convenience while preserving the historic charm of the home. Enjoy living in a neighborhood rich in history and architectural character.
The Hist