# Home Match AI
### Created by:
Vo Nguyen

### Description:
Personalization is key to customer satisfaction! At our company, we want to revolutionize how clients interact with real-estate listings.

### Goal
Create a personalize experience for each buy, making property search process more engaging and tailored to individual preference.

### Core Components of "HomeMatchAI"
**Understanding Buyer Preferences:**
- Buyers will input their requirements and preferences, such as location, property type, budget, amenities, and lifestyle choices.
- The application uses LLMs to interpret these inputs in natural language, understanding nuanced requests beyond basic filters.

**Integrating with a Vector Database:**
- Connect "HomeMatch" with a vector database, where all available property listings are stored.
- Utilize vector embeddings to match properties with buyer preferences, focusing on aspects like neighborhood vibes, architectural styles, and proximity to specific amenities.

**Personalized Listing Description Generation:**
- For each matched listing, use an LLM to rewrite the description in a way that highlights aspects most relevant to the buyer’s preferences.
- Ensure personalization emphasizes characteristics appealing to the buyer without altering factual information about the property.


## Synthetic Data Generation
- Generate Real Estate Listings with an LLM

### Load Libraries

In [4]:
import openai
from openai import OpenAI
import os 

import json

In [1]:
from dotenv import load_dotenv
load_dotenv("dev.env")

True

In [3]:
openai.api_key = os.getenv("api_key")
openai.api_base = os.getenv("api_base")

client = OpenAI()

In [5]:
def create_embeddings(text):
    response = client.embeddings.create(
        model = "text-embedding-ada-002",
        input = text
    )
    response_dict = json.loads(response.model_dump_json())
    embedds = response_dict["data"][0]["embedding"]

    return embedds

In [39]:
system_prompt_listings = """
### ROLE ####
You are Real Estate AI Asssittant that specializes in generating listings

### TASK ####
Your task is to generate a real estate random listing similar to the example and must contain EXACTLY the following fields:
- Neighborhood
- Price
- Bedrooms
- Bathrooms
- House Size
- Description
- Neighborhood Description

### Requirements ###
- The listing should be realistic and appealing to potential buyers.
- Use descriptive language to highlight the features of the property and the neighborhood.
- Keep it in same format as the example
- Try to interchange Neighborhood names, prices, and features to create a unique listing.
- Try to include names of fantasy places or characters as the Nieghborhood name.
- Do not show any heads or title, just create the listing to what is below for the sake of putting it into a dataframe.

### EXAMPLE ###

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft
Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.
Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.


"""


Generate Embeddings and Listings

In [40]:
from tqdm import tqdm

def generate_listings(system_prompt):
    prompt = system_prompt
    response = client.completions.create(
        model="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=1500
    )
    return response.choices[0].text

listings = []
listings_embeddings = []

for _ in tqdm(range(11), desc="Generating Listings"):
    listing = generate_listings(system_prompt_listings)
    listing_embeddings = create_embeddings(listing)
    listings.append(listing)
    listings_embeddings.append(listing_embeddings)


Generating Listings: 100%|██████████| 11/11 [00:31<00:00,  2.89s/it]


In [41]:
import pandas as pd

df = pd.DataFrame(
    {
        "listing": listings,
        "embedding": listings_embeddings
    }
)

display(df)

Unnamed: 0,listing,embedding
0,"Neighborhood: Golden Grove\nPrice: $750,000\nB...","[0.017127079889178276, 0.023387322202324867, -..."
1,"Neighborhood: Golden Peak\nPrice: $1,200,000\n...","[0.021166713908314705, 0.016980759799480438, -..."
2,"Neighborhood: Riverdale\nPrice: $650,000\nBedr...","[0.007767536677420139, 0.007729398086667061, -..."
3,"Neighborhood: Hidden Vale\nPrice: $1,200,000\n...","[0.008742164820432663, 0.01611197367310524, -0..."
4,"Neighborhood: Hidden Valley\nPrice: $950,000\n...","[0.01955648511648178, 0.015668749809265137, -0..."
5,"Neighborhood: Mystic Glen\nPrice: $1,200,000\n...","[0.01475507952272892, 0.01735503599047661, -0...."
6,"Neighborhood: Heavenly Haven\nPrice: $1,200,00...","[0.013147459365427494, 0.01406047772616148, -0..."
7,"Neighborhood: Rivendell\nPrice: $1,200,000\nBe...","[0.009661990217864513, 0.006560327485203743, -..."
8,### FANTASTIC LISTING ###\n\nNeighborhood: Mys...,"[0.02168569527566433, 0.013535277917981148, -0..."
9,"Neighborhood:Silver City\nPrice:$1,200,000\nBe...","[0.029040899127721786, 0.019910717383027077, -..."


In [52]:
df.dtypes

listing      object
embedding    object
dtype: object

In [42]:
df.to_csv("./data/output/listings.csv", index=False)

## Semantic Search
- Creating a Vector Database and Storing Listings
- Semantic Search of Listings Based on Buyer Preferences

### Creating Vector Database and Store Listings

In [53]:
from lancedb.pydantic import vector, LanceModel
import lancedb

class RealEstateListing(LanceModel):
    listing: str
    embedding: vector(1536)  # Update the dimension to match your embedding model

db = lancedb.connect("~/.lancedb")
table_name = "RealEstateListings"
db.drop_table(table_name, ignore_missing=True)
table = db.create_table(table_name, schema=RealEstateListing)


In [54]:
table.add(df)


In [55]:
table.head().to_pandas()


Unnamed: 0,listing,embedding
0,"Neighborhood: Golden Grove\nPrice: $750,000\nB...","[0.01712708, 0.023387322, -0.0046525286, -0.00..."
1,"Neighborhood: Golden Peak\nPrice: $1,200,000\n...","[0.021166714, 0.01698076, -0.019468637, -0.009..."
2,"Neighborhood: Riverdale\nPrice: $650,000\nBedr...","[0.0077675367, 0.007729398, -0.0015883086, 0.0..."
3,"Neighborhood: Hidden Vale\nPrice: $1,200,000\n...","[0.008742165, 0.016111974, -0.003249446, 0.007..."
4,"Neighborhood: Hidden Valley\nPrice: $950,000\n...","[0.019556485, 0.01566875, -0.002686726, 0.0030..."


In [59]:
import numpy as np
query = "I want to live in Rivendell with the price of $1,200,000 and 3 bedrooms"
query_embedding = create_embeddings(query)
table.search(query_embedding).limit(10).to_df()

  table.search(query_embedding).limit(10).to_df()


Unnamed: 0,listing,embedding,_distance
0,"Neighborhood: Rivendell\nPrice: $1,200,000\nBe...","[0.00966199, 0.0065603275, -0.021341534, 0.001...",0.23582
1,"Neighborhood: Golden Peak\nPrice: $1,200,000\n...","[0.021166714, 0.01698076, -0.019468637, -0.009...",0.390006
2,"Neighborhood: Riverdale\nPrice: $650,000\nBedr...","[0.0077675367, 0.007729398, -0.0015883086, 0.0...",0.390469
3,"Neighborhood: Heavenly Haven\nPrice: $1,200,00...","[0.013147459, 0.014060478, -0.009606254, 0.008...",0.407394
4,"Neighborhood: Mystic Glen\nPrice: $1,200,000\n...","[0.0147550795, 0.017355036, -0.0065592797, -0....",0.407678
5,"Neighborhood: Hidden Vale\nPrice: $1,200,000\n...","[0.008742165, 0.016111974, -0.003249446, 0.007...",0.409586
6,"Neighborhood:Silver City\nPrice:$1,200,000\nBe...","[0.0290409, 0.019910717, -0.017901013, -0.0216...",0.423689
7,### FANTASTIC LISTING ###\n\nNeighborhood: Mys...,"[0.021685695, 0.013535278, -0.017298032, 0.000...",0.425014
8,"Neighborhood: Hidden Valley\nPrice: $950,000\n...","[0.019556485, 0.01566875, -0.002686726, 0.0030...",0.425532
9,"Neighborhood: Golden Grove\nPrice: $750,000\nB...","[0.01712708, 0.023387322, -0.0046525286, -0.00...",0.448683


### Semantic Search of Listings Based on Buyer Preferences

Buyer's Preference

In [62]:
questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

### Augmented Response Generation
- Logic for Searching and Augmenting Listings Description
- Use of LLM of Generating Personalize Descriptions

### Logic For Searching and Augmenting Listings Descriptions

In [74]:
compiled_answer = " ".join(answers)
answer_embeddings = create_embeddings(compiled_answer)
results = table.search(answer_embeddings).limit(10).to_pandas()

In [71]:
display(results)

Unnamed: 0,listing,embedding,_distance
0,"Neighborhood: Riverdale\nPrice: $650,000\nBedr...","[0.0077675367, 0.007729398, -0.0015883086, 0.0...",0.300832
1,"Neighborhood:Silver City\nPrice:$1,200,000\nBe...","[0.0290409, 0.019910717, -0.017901013, -0.0216...",0.32948
2,"Neighborhood: Golden Grove\nPrice: $750,000\nB...","[0.01712708, 0.023387322, -0.0046525286, -0.00...",0.337139
3,"Neighborhood: Rivendell\nPrice: $1,200,000\nBe...","[0.00966199, 0.0065603275, -0.021341534, 0.001...",0.350273
4,### FANTASTIC LISTING ###\n\nNeighborhood: Mys...,"[0.021685695, 0.013535278, -0.017298032, 0.000...",0.353152
5,"Neighborhood: Mystic Glen\nPrice: $1,200,000\n...","[0.0147550795, 0.017355036, -0.0065592797, -0....",0.35356
6,"Neighborhood: Hidden Valley\nPrice: $950,000\n...","[0.019556485, 0.01566875, -0.002686726, 0.0030...",0.363746
7,"Neighborhood: Heavenly Haven\nPrice: $1,200,00...","[0.013147459, 0.014060478, -0.009606254, 0.008...",0.372708
8,"Neighborhood: Hidden Vale\nPrice: $1,200,000\n...","[0.008742165, 0.016111974, -0.003249446, 0.007...",0.373175
9,"Neighborhood: Golden Peak\nPrice: $1,200,000\n...","[0.021166714, 0.01698076, -0.019468637, -0.009...",0.374176


In [75]:
top_answer = results.iloc[0]["listing"]
print(top_answer)

Neighborhood: Riverdale
Price: $650,000
Bedrooms: 4
Bathrooms: 3
House Size: 2,500 sqft
Description: Welcome to this spacious 4-bedroom, 3-bathroom home in the charming neighborhood of Riverdale. The elegant brick exterior exudes sophistication and the expansive backyard is perfect for hosting summer BBQs. Inside, the hardwood floors and vaulted ceilings create a grand living space, perfect for entertaining guests. The recently renovated kitchen boasts modern appliances and granite countertops. Retreat to the luxurious master suite, complete with a walk-in closet and spa-like bathroom. Make this Riverdale gem your dream home.
Neighborhood Description:Escape the hustle and bustle of the city in the tranquil neighborhood of Riverdale. Surrounded by lush greenery, this community offers a peaceful and scenic environment. Enjoy a cup of coffee at the nearby Riverdale Cafe or take a leisurely stroll along the river. With easy access to local schools and parks, Riverdale is the perfect place 

### Use of LLM of Generating Personalize Descriptions

In [77]:
personalize_prompt = """
### ROLE ####
You are an AI Assistant that takes a real estate listing and a buyer's preferences and generates a personalized description of the property.
### TASK ####
Your task is to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.

### REQUIREMENTS ###
Maintain factual integrity: Enhance the appeal of the listing without altering any factual information.
"""

def generate_personal_description(listing, buyer_preferences):
    user_prompt = f"""
Listing:
{listing}

Buyer Preferences:
{buyer_preferences}
"""
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": personalize_prompt},
            {"role": "user", "content": user_prompt}
        ],
        max_tokens=2000,
        temperature=0.4
    )
    return response.choices[0].message.content.strip()


### Test Case 1

In [78]:
questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

compiled_answer = " ".join(answers)
answer_embeddings = create_embeddings(compiled_answer)
results = table.search(answer_embeddings).limit(10).to_pandas()
top_answer = results.iloc[0]["listing"]


personalized_description = generate_personal_description(top_answer, compiled_answer)
print(personalized_description)

Welcome to your dream home - a comfortable and spacious 4-bedroom, 3-bathroom house nestled in the tranquil neighborhood of Riverdale, priced at $650,000. The house spans a generous 2,500 sqft, giving you more than enough room to create your cozy living space. 

The exterior of this gem is adorned with elegant brick, lending an air of sophistication. The expansive backyard is not just perfect for hosting summer BBQs, but also provides ample space for your gardening pursuits. 

Step inside to find hardwood floors and vaulted ceilings that create a grand living space, perfect for entertaining guests or enjoying a quiet evening with family. The recently renovated kitchen, which boasts modern appliances and granite countertops, is spacious enough for all your culinary adventures. 

The luxurious master suite is a retreat within your home, complete with a walk-in closet and a spa-like bathroom. The house also includes a two-car garage and a modern, energy-efficient heating system, aligning 

### Test Case #2

In [None]:
questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]

compiled_answer = " ".join(answers)
answer_embeddings = create_embeddings(compiled_answer)
results = table.search(answer_embeddings).limit(10).to_pandas()
top_answer = results.iloc[0]["listing"]


personalized_description = generate_personal_description(top_answer, compiled_answer)
print(personalized_description)