This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

## Import Libraries

In [1]:
!python3 -m venv home_matching

In [2]:
!. home_matching/bin/activate

In [3]:
!python3 -m pip install -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable








In [4]:
import os
import pandas as pd
import torch

In [5]:
# pd.set_option('display.max_colwidth', None)  

In [6]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  
print(f"Using device: {device}") 

Using device: cpu


## Setup OpenAI Key

In [7]:
os.environ["OPENAI_API_KEY"] = "YOUR API KEY"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

## Generate Real Estate Listings

Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties. <br>

### Prompt
> You are a property guru in singapore. I need 15 listing of house around singapore. Can you generate them with the following fields: Neighborhood, Price, Bedrooms, Bathroom, House Size, Description, Neigborhood Description

*The results are saved in the `listings.csv`*

In [8]:
import openai
openai.api_key = os.environ["OPENAI_API_KEY"]

In [9]:
def generate_current_timestamp(time_format='%Y%m%d'):
    from datetime import datetime
    now = datetime.now()
    current_timestamp = now.strftime(time_format)
    return current_timestamp

In [10]:
def save_to_file(content, filename):
    with open(filename, 'w') as file:
        file.write(content)
        
    print(f'Saved listing successfully to {filename}')

In [11]:
def generate_listing_prompt_template(country = "Singapore", listing_count = "10"):
    return f"""
        You are a property guru in singapore. 
        I need {listing_count} listing of house around {country}.
        
        Can you generate them with the following fields in csv format: 
        Neighborhood, Price, Bedrooms, Bathroom, House Size, Description, Neigborhood Description
    """

In [12]:
def generate_listing_prompt_response(user_prompt, model_name = 'gpt-3.5-turbo'):
    return openai.ChatCompletion.create(  
        model=model_name,  
        messages=[
                {"role": "system", "content": "You are a helpful property guru assistant that provides accurate and detailed information based on the user's queries. Avoid making assumptions and focus on the provided data."},
                {"role": "user", "content": user_prompt}  
            ]  
        )  

#### Generate Prompt to generate mthe property listing

In [13]:
# Prepare the prompt for OpenAI
listing_prompt = generate_listing_prompt_template(country = "Singapore", listing_count = "15")
listing_prompt

'\n        You are a property guru in singapore. \n        I need 15 listing of house around Singapore.\n        \n        Can you generate them with the following fields in csv format: \n        Neighborhood, Price, Bedrooms, Bathroom, House Size, Description, Neigborhood Description\n    '

#### Send Prompt to OpenAI Model

In [14]:
listing_response = generate_listing_prompt_response(listing_prompt)

#### Show the Prompt Response from OpenAI

In [15]:
generated_property_listing = listing_response['choices'][0]['message']['content']  
print(generated_property_listing)  

Sure! Here are 15 property listings of houses around Singapore in CSV format:

Neighborhood, Price, Bedrooms, Bathroom, House Size, Description, Neighborhood Description
Orchard, 5000000, 4, 3, 2500 sqft, Beautiful house in prime location, Upscale area with shopping and dining options
Changi, 3000000, 5, 4, 3200 sqft, Spacious family home near airport, Quiet neighborhood with good amenities
Sentosa, 10000000, 6, 5, 5000 sqft, Luxurious waterfront property with stunning views, Exclusive island living with private beach access
Bukit Timah, 3500000, 3, 2, 2000 sqft, Cosy house surrounded by greenery, Prestigious area known for its schools and nature reserves
Marine Parade, 2200000, 4, 3, 1800 sqft, Modern house with sea view, Vibrant neighborhood with beach and shopping mall
Tanjong Pagar, 4000000, 2, 2, 1500 sqft, Stylish urban living in the city center, Trendy area with great restaurants and nightlife
Sengkang, 1200000, 5, 3, 2400 sqft, Spacious home in family-friendly neighborhood, Clo

#### Save the Prompt Response to a file 

In [16]:
current_timestamp = generate_current_timestamp()
demo_listing_csv = f"listings_{current_timestamp}.csv"
save_to_file(generated_property_listing, demo_listing_csv)

Saved listing successfully to listings_20250106.csv


#### Viewing the listing from the file
Note: For this demo, we will be using the prepared file `listings.csv` instead

In [17]:
demo_listing_csv = "listings.csv"
df_listings = pd.read_csv(demo_listing_csv)
df_listings

Unnamed: 0,Neighborhood,Price (SGD),Bedrooms,Bathrooms,House Size (sq ft),Description,Neighborhood Description
0,Orchard Road,3500000,3,2,1200,Luxurious apartment with modern amenities and ...,Orchard Road is a vibrant shopping district kn...
1,Bukit Timah,2800000,4,3,2500,Spacious family home with a large garden and p...,Bukit Timah is a serene residential area with ...
2,Sentosa Cove,5200000,5,5,4000,Exclusive waterfront villa with private pool a...,Sentosa Cove is a luxurious resort-style commu...
3,Tiong Bahru,1600000,2,1,800,Charming heritage apartment with a cozy balcon...,Tiong Bahru is a trendy neighborhood with a mi...
4,East Coast,2200000,3,2,1800,Modern townhouse with easy access to the beach...,East Coast is known for its beautiful coastlin...
5,Holland Village,2500000,4,3,2200,Stylish home with an open-concept layout and v...,Holland Village is a lively area with a mix of...
6,Marina Bay,4000000,3,2,1500,Contemporary apartment with panoramic views of...,Marina Bay is a bustling financial district wi...
7,Jurong East,1200000,3,2,1000,Affordable family home close to shopping malls...,Jurong East is a growing commercial hub with v...
8,Punggol,1000000,4,3,1600,Modern HDB flat with spacious living areas and...,Punggol is a waterfront town known for its fam...
9,Bukit Batok,1500000,3,2,1200,Cozy home with a backyard and nearby nature re...,Bukit Batok is a peaceful residential area wit...


## Store Listings in a Vector Database

- Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.
- Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [18]:
# from chromadb.utils import embedding_functions
# default_embedding_function = embedding_functions.DefaultEmbeddingFunction()

In [19]:
# import chromadb.utils.embedding_functions as embedding_functions
# openai_embedding_function = embedding_functions.OpenAIEmbeddingFunction(
#                 api_key=os.environ["OPENAI_API_KEY"],
#                 model_name="text-embedding-3-small"
#             )

In [20]:
from transformers import BertTokenizer, BertModel  
import torch  

# Load a BERT model and tokenizer  
model_name = "bert-base-uncased"  # Example model  
tokenizer = BertTokenizer.from_pretrained(model_name)  
model = BertModel.from_pretrained(model_name)  

# Function to generate embeddings  
def bert_embedding(text):  
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)   
    with torch.no_grad():  
        outputs = model(**inputs)  
        embedding = outputs.last_hidden_state.mean(dim=1)  # Average pooling  
        return embedding.squeeze().numpy()  # Convert to numpy array  



#### Create ChromaDB Client

In [21]:
import chromadb
chroma_client = chromadb.Client()

#### Create ChromaDB Collection

In [22]:
collection_name = "home_listings"

# chroma_client.delete_collection(name=collection_name)
collection = chroma_client.get_or_create_collection(name=collection_name, embedding_function=bert_embedding)

In [23]:
print(f"There are [{collection.count()}] documents in [{collection_name}] collection")
# collection.peek()

There are [0] documents in [home_listings] collection


#### Load Home Listing CSV

In [24]:
df_listings.head(1)

Unnamed: 0,Neighborhood,Price (SGD),Bedrooms,Bathrooms,House Size (sq ft),Description,Neighborhood Description
0,Orchard Road,3500000,3,2,1200,Luxurious apartment with modern amenities and ...,Orchard Road is a vibrant shopping district kn...


In [25]:
df_listings.columns

Index(['Neighborhood', 'Price (SGD)', 'Bedrooms', 'Bathrooms',
       'House Size (sq ft)', 'Description', 'Neighborhood Description'],
      dtype='object')

In [26]:
df_listings['text'] = "Area: " + df_listings['Neighborhood'] + " "\
        "Price: " + df_listings['Price (SGD)'].astype(str) + " "\
        "Bedrooms: " + df_listings['Bedrooms'].astype(str) + " "\
        "Bathrooms: " + df_listings['Bathrooms'].astype(str) + " " + \
        "House Size (sq ft): " + df_listings['House Size (sq ft)'].astype(str) + " " + \
        "Description: " + df_listings['Description'].astype(str) + " " + \
        "Neighborhood Description: " + df_listings['Neighborhood Description']

In [27]:
df_listings['text']

0     Area: Orchard Road Price: 3500000 Bedrooms: 3 ...
1     Area: Bukit Timah Price: 2800000 Bedrooms: 4 B...
2     Area: Sentosa Cove Price: 5200000 Bedrooms: 5 ...
3     Area: Tiong Bahru Price: 1600000 Bedrooms: 2 B...
4     Area: East Coast Price: 2200000 Bedrooms: 3 Ba...
5     Area: Holland Village Price: 2500000 Bedrooms:...
6     Area: Marina Bay Price: 4000000 Bedrooms: 3 Ba...
7     Area: Jurong East Price: 1200000 Bedrooms: 3 B...
8     Area: Punggol Price: 1000000 Bedrooms: 4 Bathr...
9     Area: Bukit Batok Price: 1500000 Bedrooms: 3 B...
10    Area: Novena Price: 2000000 Bedrooms: 2 Bathro...
11    Area: Woodlands Price: 900000 Bedrooms: 3 Bath...
12    Area: Serangoon Price: 1800000 Bedrooms: 4 Bat...
13    Area: Yishun Price: 1100000 Bedrooms: 3 Bathro...
14    Area: Clementi Price: 1400000 Bedrooms: 3 Bath...
Name: text, dtype: object

#### Generate Embeddings

In [28]:
embedding_function = bert_embedding
df_listings['embeddings'] = df_listings['text'].apply(embedding_function)

#### Checking Size of Embeddings

In [29]:
for idx, items in df_listings.iterrows():
    print(f"{idx} - {len(items['embeddings'])}")

0 - 768
1 - 768
2 - 768
3 - 768
4 - 768
5 - 768
6 - 768
7 - 768
8 - 768
9 - 768
10 - 768
11 - 768
12 - 768
13 - 768
14 - 768


#### Save Data into ChromaDB

In [30]:
import numpy as np  

for idx, listing in df_listings.iterrows():
    collection.add(
        documents = [listing['text']],
        embeddings = [listing['embeddings'].tolist()],
        ids=[str(idx)]
    )

In [31]:
print(f"There are [{collection.count()}] documents in [{collection_name}] collection")
# collection.peek()

There are [15] documents in [home_listings] collection


## Searching Based on Preferences

- `Semantic Search Implementation`: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
- `Listing Retrieval Logic`: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

In [32]:
def generate_search_query(answers=[]):
    return f"""
        House Size: {answers[0]} 
        Top 3 Considerations: {answers[1]} 
        Amenities: {answers[2]} 
        Transportation: {answers[3]} 
        Neighborhood: {answers[4]} 
    """.strip().replace('\n', '')

In [33]:
def generate_search_embeddings(search_text, embedding_function):
    return embedding_function(search_text)

In [34]:
def generate_search_result(chroma_collection, embeddings, top_n = 1):
    search_results = chroma_collection.query(
        query_embeddings=embeddings.tolist(),
        n_results = top_n,
        include=["documents", "distances"]
    )
    return search_results['documents'], search_results['distances']

#### Search Result for Demo User Input

In [35]:

demo_answer_set_1 = [
    r"a house that is spacious enough to accommodate my family comfortably, ideally around 1,800 square feet or more",
    r"a good balance of space and comfort, proximity to recreational areas, and a vibrant community atmosphere",
    r"access to parks, recreational facilities, and modern conveniences that enhance daily living",
    r"convenient public transportation options available, including nearby bus stops and train stations for easy commuting",
    r"neighborhood that strikes a balance between urban vibrancy and a community-oriented atmosphere, with access to shops and cafes while still having green spaces"
]

In [36]:

demo_answer_set_2 = [
    r"a house size around 1,200 to 1,400 square feet, as it provides a comfortable living space without being too large to maintain",
    r"affordability, modern amenities, and a family-friendly environment",
    r"parks for outdoor activities, community facilities for social gatherings, and modern conveniences within the property itself",
    r"access to public transport options is crucial, as it allows for easy commuting to work and other areas. Proximity to major roads would also be beneficial for driving",
    r"neighborhood that strikes a balance between urban and suburban, offering essential services and amenities while maintaining a community feel and access to green spaces"
]

In [37]:
demo_answers = demo_answer_set_1

In [38]:
demo_text = generate_search_query(demo_answers)
demo_text

'House Size: a house that is spacious enough to accommodate my family comfortably, ideally around 1,800 square feet or more         Top 3 Considerations: a good balance of space and comfort, proximity to recreational areas, and a vibrant community atmosphere         Amenities: access to parks, recreational facilities, and modern conveniences that enhance daily living         Transportation: convenient public transportation options available, including nearby bus stops and train stations for easy commuting         Neighborhood: neighborhood that strikes a balance between urban vibrancy and a community-oriented atmosphere, with access to shops and cafes while still having green spaces'

In [39]:
demo_embeddings = generate_search_embeddings(demo_text, embedding_function)
# demo_embeddings

In [40]:
search_recommendations, search_scores = generate_search_result(collection, demo_embeddings, 5)

In [41]:
search_result = zip(search_recommendations[0], search_scores[0])
for recommendation, score in search_result:
    print(f"{score} - {recommendation}")

19.37670135498047 - Area: Clementi Price: 1400000 Bedrooms: 3 Bathrooms: 2 House Size (sq ft): 1500 Description: Well-maintained apartment with easy access to public transport. Neighborhood Description: Clementi is a bustling area with shopping malls, schools, and recreational spaces.
19.42504119873047 - Area: Woodlands Price: 900000 Bedrooms: 3 Bathrooms: 2 House Size (sq ft): 1400 Description: Affordable home with a community feel and access to amenities. Neighborhood Description: Woodlands is a suburban area with a mix of residential and commercial developments.  
21.678491592407227 - Area: East Coast Price: 2200000 Bedrooms: 3 Bathrooms: 2 House Size (sq ft): 1800 Description: Modern townhouse with easy access to the beach and parks. Neighborhood Description: East Coast is known for its beautiful coastline, recreational activities, and eateries.  
21.867773056030273 - Area: Holland Village Price: 2500000 Bedrooms: 4 Bathrooms: 3 House Size (sq ft): 2200 Description: Stylish home wi

In [42]:
search_recommendation = search_recommendations[0][0]
search_recommendation

'Area: Clementi Price: 1400000 Bedrooms: 3 Bathrooms: 2 House Size (sq ft): 1500 Description: Well-maintained apartment with easy access to public transport. Neighborhood Description: Clementi is a bustling area with shopping malls, schools, and recreational spaces.'

## Personalizing Listing Descriptions

- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [43]:
import openai
openai.api_key = os.environ["OPENAI_API_KEY"]

In [54]:
def generate_personalized_prompt_template(user_input, search_recommendation):
    return f"""
        The user input for the search is: {user_input}.
        The recommendation from the search is: {search_recommendation}. 

        Return the result with the rest of the fields untouched and replacing the description and neigborhood description with a personalized touch based using the user input in order to appeal to them.
        Use only data from the recommendation and user input and do not provide any non-factual input.
    """

In [55]:
def generate_personalized_prompt_response(user_prompt, model_name = 'gpt-3.5-turbo'):
    return openai.ChatCompletion.create(  
        model=model_name,  
        messages=[
                {"role": "system", "content": "You are a helpful property guru assistant that provides accurate and detailed information based on the user's queries. Avoid making assumptions and focus on the provided data."},
                {"role": "user", "content": user_prompt}  
            ]  
        )  

#### Generate Prompt to provide user with the feedback

In [56]:
# Prepare the prompt for OpenAI
personalized_prompt = generate_personalized_prompt_template(demo_answers, search_recommendation)

#### Send Prompt to OpenAI Model

In [57]:
personalized_response = generate_personalized_prompt_response(personalized_prompt)

#### Show the Prompt Response from OpenAI

In [58]:
print("Original")
print(search_recommendation)
print("--")

Original
Area: Clementi Price: 1400000 Bedrooms: 3 Bathrooms: 2 House Size (sq ft): 1500 Description: Well-maintained apartment with easy access to public transport. Neighborhood Description: Clementi is a bustling area with shopping malls, schools, and recreational spaces.
--


In [59]:
personalized_description = personalized_response['choices'][0]['message']['content']  
print("Personalized")
print(personalized_description)
print("--")

Personalized
Area: Clementi
Price: 1400000
Bedrooms: 3
Bathrooms: 2
House Size (sq ft): 1500
Description: This spacious 3-bedroom apartment in Clementi offers a perfect balance of space and comfort, ideal for accommodating your family comfortably. With a size of 1500 square feet, it provides the room you need for a vibrant and active household.
Neighborhood Description: Clementi is not only a bustling area with shopping malls, schools, and recreational spaces but also a community-oriented neighborhood that strikes a balance between urban vibrancy and a laid-back atmosphere. You'll find yourself within easy reach of parks, recreational facilities, modern conveniences, shops, cafes, and green spaces, making everyday living convenient and enjoyable. Plus, with convenient public transportation options nearby, including bus stops and train stations, commuting to work or exploring the city is a breeze.
--


## Building the User Preference Interface

- Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like

`example`

```
questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]
```

- Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

In [60]:
def generate_personalized_recommendation(user_answers):
    search_query = generate_search_query(user_answers)
    print(f"Summary: {search_query}")
    print("--")
    
    search_embeddings = generate_search_embeddings(search_query, embedding_function)
    search_recommendations, search_scores = generate_search_result(collection, search_embeddings)    
    selected_recommendation = search_recommendations[0][0]
    
    prompt_template = generate_personalized_prompt_template(search_query, selected_recommendation)
    prompt_response = generate_personalized_prompt_response(prompt_template)
    
    return selected_recommendation, prompt_response['choices'][0]['message']['content'] 

#### Prompt User for Answers

In [61]:
questions = ['How big do you want your house to be?', 'What are 3 most important things for you in choosing this property?', 'Which amenities would you like?', 'Which transportation options are important to you?', 'How urban do you want your neighborhood to be?']

You may use the following to input in the application for testing
```sample
questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
answers = [
    "A comfortable living space that accommodates family needs is ideal, providing enough room for everyone.",
    "Proximity to medical facilities, a strong sense of community, and access to essential amenities are key factors in making a choice.",
    "Access to nearby medical facilities, parks, and recreational areas is important for ensuring health and well-being while enjoying an active lifestyle.",
    "Convenient access to public transport is essential for easy commuting, especially to medical facilities and other important locations.",
    "A suburban setting that offers a peaceful atmosphere while being close to medical services and essential amenities is preferred for a balanced lifestyle."
]
```

In [None]:
is_continue = True

while is_continue:
    user_answers = [None] * len(questions)
    
    for idx, question in enumerate(questions):
        answer = input(f"Q{idx + 1}: {question} ")
        user_answers[idx] = answer
        print(f"A{idx + 1}: {user_answers[idx]}")
        print("--")
        
    original_recommendation, personalized_recommendation = generate_personalized_recommendation(user_answers)
    
    print(f"Original Description: \n{original_recommendation}")
    print("--")
    
    print(f"Personalized Description: \n{personalized_recommendation}")
    print("--")
        
    check_continue = input("Any other questions? ")
    if (check_continue.lower() == 'y' or check_continue.lower() == "yes"):
        is_continue = True
        print("--")
    else:
        print("Ok - Byebye!")
        break

Q1: How big do you want your house to be? A comfortable living space that accommodates family needs is ideal, providing enough room for everyone.
A1: A comfortable living space that accommodates family needs is ideal, providing enough room for everyone.
--
Q2: What are 3 most important things for you in choosing this property? Proximity to medical facilities, a strong sense of community, and access to essential amenities are key factors in making a choice.
A2: Proximity to medical facilities, a strong sense of community, and access to essential amenities are key factors in making a choice.
--
Q3: Which amenities would you like? Access to nearby medical facilities, parks, and recreational areas is important for ensuring health and well-being while enjoying an active lifestyle.
A3: Access to nearby medical facilities, parks, and recreational areas is important for ensuring health and well-being while enjoying an active lifestyle.
--
Q4: Which transportation options are important to you? 

## Resources

1. https://docs.trychroma.com/docs/overview/getting-started
2. https://docs.trychroma.com/integrations/embedding-models/openai
3. https://platform.openai.com/docs/guides/embeddings
4. https://docs.trychroma.com/docs/embeddings/embedding-functions
5. https://docs.trychroma.com/docs/collections/create-get-delete
6. https://docs.trychroma.com/docs/collections/add-data
7. https://platform.openai.com/docs/guides/text-generation
8. https://huggingface.co/google-bert/bert-base-uncased