# HomeMatch
Core Components of "HomeMatch"

Understanding Buyer Preferences:
- Buyers will input their requirements and preferences, such as location, property type, budget, amenities, and lifestyle choices.
- The application uses LLMs to interpret these inputs in natural language, understanding nuanced requests beyond basic filters.

Integrating with a Vector Database:
- Connect "HomeMatch" with a vector database, where all available property listings are stored.
- Utilize vector embeddings to match properties with buyer preferences, focusing on aspects like neighborhood vibes, architectural styles, and proximity to specific amenities.
- Personalized Listing Description Generation:

Matching
- For each matched listing, use an LLM to rewrite the description in a way that highlights aspects most relevant to the buyer’s preferences.
- Ensure personalization emphasizes characteristics appealing to the buyer without altering factual information about the property.

Listing Presentation:
- Output the personalized listing(s) as a text description of the listing.

## Step 1: Setting Up the Python Application
Import Libraries

In [1]:
# import libraries
import chromadb
import clip
import json
from IPython.display import display, HTML
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma
import numpy as np
import os
import pandas as pd
from PIL import Image
import random
import re
import torch
from typing import Any, Dict, Optional, Tuple

In [2]:
# Create a Conda Environment YAML File
!conda env export > environment.yml

In [3]:
# Generate requirements.txt file using pip freeze
!pip freeze > requirements.txt



### Global Variables

In [4]:
num = 15
location = 'Los Angeles, California, USA'

In [5]:
personalities = ['single, arrogant, rich, and female', 
                 'male, married, 2 kids, and middle class', 
                 'divorced middle aged woman, working 2 jobs, with dependent adult children', 
                 'young, nice, poor, male', 
                 'upper middle class married woman, husband works overseas, with no children']
personalities

['single, arrogant, rich, and female',
 'male, married, 2 kids, and middle class',
 'divorced middle aged woman, working 2 jobs, with dependent adult children',
 'young, nice, poor, male',
 'upper middle class married woman, husband works overseas, with no children']

In [6]:
questions = ['Which houses are in Venice?',
             'What is the cheapest house that is available in your listings?',
             'Only provide me listings if the house has a pool']
questions

['Which houses are in Venice?',
 'What is the cheapest house that is available in your listings?',
 'Only provide me listings if the house has a pool']

### Common Functions

Due to the stochastic nature of LLMs there are a lot of situatons whereby the JSON returned had errrors in it and did not work. As a result, I used ChatGPT and CoPilot to build regex's to deal with the situations. I still had to put try and except blocks in on the json.loads() code. It is just VERY difficult to anticipate every error condition that will potentially happen and build clean up code that will deal with all instances.

In [7]:
# Function to clean the price values
def clean_price(price):
    # Convert to string, remove spaces, commas, and dollar signs, then convert to integer
    clean_price = str(price).replace(' ', '').replace(',', '').replace('$', '')
    return int(clean_price)

In [8]:
def cleanup(response):
    
    # Fix spacing issues after commas and periods
    cleaned_response = re.sub(r'(?<=,)(\w)', r' \1', response)
    cleaned_response = re.sub(r'(?<=\.)(\w)', r' \1', cleaned_response)
    
    # Correct sequences of triple quotes
    cleaned_response = cleaned_response.replace('"   "', '", "')
    
    # We have a problem with contractions and possesives.
    # Replace single quotes with backticks (`) using regex
    pattern = r"\b\w+'\w*\b|\b'\w*\b"
    
    # Function to replace matched apostrophes with backticks
    def replace_apostrophe(match):
        return match.group().replace("'", "`")

    # Apply the regex pattern using sub with the replacement function
    cleaned_response = re.sub(pattern, replace_apostrophe, response)
    
    # Replace single quotes with double quotes for keys and values
    cleaned_response = cleaned_response.replace("'", '"')
    
    # Regex pattern to insert spaces after commas and periods
    pattern = r'([,.])(?=[^\s])'

    # Function to insert space after matched commas and periods
    def insert_space(match):
        return match.group(1) + ' '

    # Apply the regex pattern using sub with a function
    cleaned_response = re.sub(pattern, insert_space, cleaned_response)
    
    # Remove non-numeric characters except comma and then remove the commas
    cleaned_string = re.sub(r'[^\d,]', '', cleaned_response).replace(',', '')
    
    # Remove leading/trailing whitespace and newlines
    cleaned_response = cleaned_response.strip()
    
    # Remove newline characters from the JSON string
    cleaned_response = cleaned_response.replace('\n', ' ')
    
    return cleaned_response

In [9]:
def concat_row(row):
    
    preferences = (str(row.Q0) if row.Q0 is not None else '') + ' \n' + \
                  (str(row.Q1) if row.Q1 is not None else '') + ' \n' + \
                  (str(row.Q2) if row.Q2 is not None else '') + ' \n' + \
                  (str(row.Q3) if row.Q3 is not None else '') + ' \n' + \
                  (str(row.Q4) if row.Q4 is not None else '') + ' \n' + \
                  (str(row.Q5) if row.Q5 is not None else '') + ' \n' + \
                  (str(row.Q6) if row.Q6 is not None else '')
    
    return preferences

In [10]:
def create_df(input_list, output_file_name):
    
    temp_list = []
    
    # Load the data as a list of dictionaries
    for element in input_list:
        try:
            temp = json.loads(element)
            temp_list.append(temp)
        except Exception as e:
            print(f"Error occurred: {str(e)}")
            print('Non fatal error. We just get one fewer "preference" or "judgement".')
    
    if len(temp_list) == 5:
        index_values = personalities
    else:
        index_values = range(len(temp_list))

    # Create a DataFrame from the list of dictionaries
    output_df = pd.DataFrame(temp_list, index = index_values)

    # Write it to disk
    output_df.to_csv(output_file_name)

    # Display the DataFrame
    display(output_df)
    
    return output_df

In [11]:
def encode_image(image_path):
    image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
    with torch.no_grad():
        image_features = model.encode_image(image)
    return image_features.cpu().numpy()

In [12]:
def encode_text(text):
    text = clip.tokenize([text]).to(device)
    with torch.no_grad():
        text_features = model.encode_text(text)
    return text_features.cpu().numpy()

In [13]:
def format_print_listing(result, listing_dict):
    
    html_content_formatted = html_content.format(
                        image_path = result['metadatas'][0][0]['image_path'],
                        Neighborhood_Name = listing_dict['Neighborhood_Name'],
                        Neighborhood_Description = listing_dict['Neighborhood_Description'],
                        House_Description = listing_dict['House_Description'],
                        Square_Meters = listing_dict['Square_Meters'],
                        Bathrooms = listing_dict['Bathrooms'],
                        Bedrooms = listing_dict['Bedrooms'],
                        Price = listing_dict['Price'])

    # Display the HTML content
    display(HTML(html_content_formatted))

In [14]:
# I am probably never going to use this one. This simply does a print output of one result at a time.
def format_print_results(result, idx):
        
    print('\n###\n', idx, 'result')

    # Path to the image
    image_path = results['metadatas'][0][0]['image_path']
    # Open and display the image
    image = Image.open(image_path)
    display(image)

    print('This is the id', results['ids'][0][0])
    print('This is the image path:', results['metadatas'][0][0]['image_path'])

    # This is how you get the listing
    listing_dict = results['metadatas'][0][0]['listing']

    # Now that you have the listing, you can extract all of the items since this is a dictionary. 
    listing_dict = listing_dict.replace("'", '"')
    listing_dict = json.loads(listing_dict)
    # Print all key-value pairs
    for key, value in listing_dict.items():
        print(f"{key}: {value}")

In [15]:
# HTML template to display image and text side by side
html_content = """
<div style="display: flex; align-items: flex-start;">
    <div style="flex: 1;">
        <img src="{image_path}" style="max-width: 100%;">
    </div>
    <div style="flex: 1; padding-left: 20px;">
        <p><strong>Neighborhood Name:</strong> {Neighborhood_Name}</p>
        <p><strong>Neighborhood Description:</strong> {Neighborhood_Description}</p>
        <p><strong>House Description:</strong> {House_Description}</p>
        <p><strong>Square Meters:</strong> {Square_Meters}</p>
        <p><strong>Bathrooms:</strong> {Bathrooms}</p>
        <p><strong>Bedrooms:</strong> {Bedrooms}</p>
        <p><strong>Price:</strong> {Price}</p>
    </div>
</div>
"""

In [16]:
def search(query, top_k=1):
    text_features = encode_text(query)[0].tolist()
    query_embeddings = np.concatenate((text_features, np.zeros(512))).tolist()
    
    # Query the collection
    results = collection.query(query_embeddings=[query_embeddings], n_results=top_k)
    
    # This is how you get the listing
    listing_dict = results['metadatas'][0][0]['listing']

    # Now that you have the listing, you can extract all of the items since this is a dictionary. 
    listing_dict = listing_dict.replace("'", '"')
    listing_dict = json.loads(listing_dict)
    
    return results, listing_dict

## Step 2: Generating Real Estate Listings

### Create Prompt for Listings

In [17]:
listings_prompt = """
You are an experienced real estate professional in {location} with a flair for marketing. You have been asked to produce 
{num} new listings. Each listing is unique. Each listing has this information.

###
Neighborhood_Name - Generate {num} names of well known districts for {location}.
Neighborhood_Description - Generate an 50 - 80 word description for each Neighborhood_Description. Use the 
Neighborhood_Name for each listing as context for the description of the Neighborhood_Description
House_Description - For each House_Description generate an 50 - 80 word description of the House that is for sale.
Each house is unique. They vary in quality from low, medium, to high regardless of Neighborhood_Name.
Square_Meters - Generate a RANDOM number from 100 to 500.
Bathrooms - Using the Square_Meters number for each listing, generate the number of bathrooms that the house will have
based on this formula. For every 100 square meters, the house should have at least 1 bathroom. This can vary slightly.
Bedrooms - Each house will have a minimum of 2 bedrooms. For every 100 Square_Meters > 200 Square_Meters, add 1 bedroom. 
This can vary slightly.
Price - The price will be based on this formula. 
1. Choose a number randomly between 4000 and 8000. Call this random_number.
2. Multiply random_number by Square_Meters for each listing. This gives you the Price.
###
Examples for Price Calculation
random_number between 4000 and 8000 = 4300
Square_Meters = 184
Price = 4350 * 184
Price = 800400

random_number between 4000 and 8000 = 5100
Square_Meters = 115
Price = 5100 * 115
Price = 586500

random_number between 4000 and 8000 = 7850
Square_Meters = 255
Price = 7850 * 255
Price = 2001750
"""
print(listings_prompt)


You are an experienced real estate professional in {location} with a flair for marketing. You have been asked to produce 
{num} new listings. Each listing is unique. Each listing has this information.

###
Neighborhood_Name - Generate {num} names of well known districts for {location}.
Neighborhood_Description - Generate an 50 - 80 word description for each Neighborhood_Description. Use the 
Neighborhood_Name for each listing as context for the description of the Neighborhood_Description
House_Description - For each House_Description generate an 50 - 80 word description of the House that is for sale.
Each house is unique. They vary in quality from low, medium, to high regardless of Neighborhood_Name.
Square_Meters - Generate a RANDOM number from 100 to 500.
Bathrooms - Using the Square_Meters number for each listing, generate the number of bathrooms that the house will have
based on this formula. For every 100 square meters, the house should have at least 1 bathroom. This can vary sli

In [18]:
json_spec = """
###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 

[{'Neighborhood_Name': 'Neighborhood_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhood_Name': 'Neighborhood_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhood_Name': 'Neighborhood_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
 ...}]
###
"""
print(json_spec)


###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 

[{'Neighborhood_Name': 'Neighborhood_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhood_Name': 'Neighborhood_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhood_Name': 'Neighborhood_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
 ...}]
###



In [19]:
listings_prompt_formatted = listings_prompt.format(
    num = num,
    location = location
)
listings_prompt_formatted = listings_prompt_formatted + json_spec
print(listings_prompt_formatted)


You are an experienced real estate professional in Los Angeles, California, USA with a flair for marketing. You have been asked to produce 
15 new listings. Each listing is unique. Each listing has this information.

###
Neighborhood_Name - Generate 15 names of well known districts for Los Angeles, California, USA.
Neighborhood_Description - Generate an 50 - 80 word description for each Neighborhood_Description. Use the 
Neighborhood_Name for each listing as context for the description of the Neighborhood_Description
House_Description - For each House_Description generate an 50 - 80 word description of the House that is for sale.
Each house is unique. They vary in quality from low, medium, to high regardless of Neighborhood_Name.
Square_Meters - Generate a RANDOM number from 100 to 500.
Bathrooms - Using the Square_Meters number for each listing, generate the number of bathrooms that the house will have
based on this formula. For every 100 square meters, the house should have at least

### Completion Model
Instantiate the model and run it with the prompt.

In [20]:
# Get key
open_ai_key = pd.read_csv('D:\OneDrive\Security\keys.csv')
open_ai_key = open_ai_key[open_ai_key['Organization'] == 'Open_AI']['Key'].iloc[0]

# Set the environment variable
os.environ["OPENAI_API_KEY"] = open_ai_key

# Create the model
descriptions_model = OpenAI(model = 'gpt-3.5-turbo-instruct', 
                            temperature = .3,
                            max_tokens = 3000)

In [21]:
# Call the model with the formatted prompt. 
response = descriptions_model.predict(listings_prompt_formatted)

# Clean it up and prepare it for a dataframe or viewing as a list
cleaned_response = cleanup(response)

In [26]:
# Load the data as a list of dictionaries
dict_list = json.loads(cleaned_response)

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(dict_list)

# Write the dataframe to disk
df.to_csv('listings.csv')

# Display the DataFrame
df.sort_values('Price', inplace = True)
df.reset_index(drop = True, inplace = True)

df

Unnamed: 0,Neighborhood_Name,Neighborhood_Description,House_Description,Square_Meters,Bathrooms,Bedrooms,Price
0,Koreatown,Koreatown is a vibrant and diverse neighborhoo...,This modern and stylish apartment in Koreatown...,150,2,2,430000
1,Silver Lake,Silver Lake is a trendy and hip neighborhood i...,This charming cottage in Silver Lake offers 20...,200,2,2,560000
2,Venice,Venice is a vibrant and eclectic neighborhood ...,This charming bungalow in Venice offers 150 sq...,150,2,2,645000
3,Hollywood,Hollywood is the iconic neighborhood in Los An...,This modern and stylish condo in Hollywood off...,300,3,3,765000
4,West Hollywood,West Hollywood is a trendy and lively neighbor...,This modern and stylish condo in West Hollywoo...,300,3,3,765000
5,Echo Park,Echo Park is a trendy and diverse neighborhood...,This modern and stylish townhouse in Echo Park...,250,3,3,800000
6,Downtown,Downtown Los Angeles is the bustling and vibra...,This modern and spacious loft in Downtown offe...,200,2,2,800000
7,Culver City,Culver City is a trendy and up-and-coming neig...,This modern and stylish townhouse in Culver Ci...,250,3,3,800000
8,Beverly Hills,Beverly Hills is a luxurious neighborhood in L...,This stunning mansion in Beverly Hills boasts ...,400,4,4,4300000
9,Santa Monica,Santa Monica is a beautiful and upscale beachf...,This stunning beachfront home in Santa Monica ...,450,5,5,6000000


## Step 3: Storing Listings in a Vector Database

In [27]:
# Convert DataFrame to a list of dictionaries
dict_list = df.to_dict(orient="records")

# Convert each dictionary to a string
dict_list_string = [str(d) for d in dict_list]

dict_list_string[:2]

["{'Neighborhood_Name': 'Koreatown', 'Neighborhood_Description': 'Koreatown is a vibrant and diverse neighborhood in Los Angeles known for its rich culture, delicious food, and bustling nightlife. It is a popular spot for young professionals and students, with a mix of affordable housing and trendy hotspots. ', 'House_Description': 'This modern and stylish apartment in Koreatown offers 150 square meters of living space, with sleek finishes and convenient amenities. The apartment features 2 bedrooms and 2 bathrooms, perfect for someone looking for a trendy and affordable lifestyle in the heart of Koreatown. Don`t miss out on this rare opportunity. ', 'Square_Meters': 150, 'Bathrooms': 2, 'Bedrooms': 2, 'Price': 430000}",
 "{'Neighborhood_Name': 'Silver Lake', 'Neighborhood_Description': 'Silver Lake is a trendy and hip neighborhood in Los Angeles known for its artsy vibe, diverse community, and beautiful reservoir. It is a popular spot for young professionals and families, with a mix of

In [28]:
# Initialize your embedding function (e.g., OpenAIEmbeddings)
embeddings = OpenAIEmbeddings()

# Create the vector store
chroma_langchain = Chroma.from_texts(dict_list_string, embedding=embeddings)

# Persist the vector store to a directory
chroma_langchain.persist()

### Query the Vector Database

In [29]:
# Query chroma_langchain
question = 'Which houses are in Venice'

# Get the most similar documents
results = chroma_langchain.similarity_search(question)

print(len(results))
results[0]

4


Document(page_content="{'Neighborhood_Name': 'Venice', 'Neighborhood_Description': 'Venice is a vibrant and eclectic neighborhood in Los Angeles, known for its artistic community, beautiful canals, and lively boardwalk. It is a popular spot for tourists and locals alike, with a mix of upscale restaurants, trendy bars, and unique shops. ', 'House_Description': 'This charming bungalow in Venice offers 150 square meters of living space, with a cozy and inviting atmosphere. The house features 2 bedrooms and 2 bathrooms, perfect for a couple or small family looking for a peaceful retreat in the heart of Venice. Don`t miss out on the opportunity to live in this vibrant and sought-after neighborhood. ', 'Square_Meters': 150, 'Bathrooms': 2, 'Bedrooms': 2, 'Price': 645000}")

## Step 4: Building the User Preference Interface

I am going to pose these questions to OpenAI and see what the LLMs responses are. I will do it for 5 different people and collect the answers to each one of the questions to build up a user profile for each "person"

In [30]:
questions = [   
"Q0: How big do you want your house to be? ",
"Q1: What are the 3 most important things for you in choosing this property? ", 
"Q2: Which amenities would you like? ", 
"Q3: Which transportation options are important to you? ",
"Q4: Who will be living in the home: their names, ages, sex, and relationship to you? ",
"Q5: How urban do you want your neighborhood to be? ",   
"Q6: Is there anything else that you think I shoud know? "
]
questions

['Q0: How big do you want your house to be? ',
 'Q1: What are the 3 most important things for you in choosing this property? ',
 'Q2: Which amenities would you like? ',
 'Q3: Which transportation options are important to you? ',
 'Q4: Who will be living in the home: their names, ages, sex, and relationship to you? ',
 'Q5: How urban do you want your neighborhood to be? ',
 'Q6: Is there anything else that you think I shoud know? ']

In [31]:
prompt_for_asking_questions = """
You are a(n) {personality} looking for a house in {location}. Please pay attention to the type of buyer you are.
I am a professional real estate agent helping you purchase a home. Answer each of these questions. 
{questions}
With this information I will help you find the perfect home for you. Please limit your answers to 100 words.
"""
print(prompt_for_asking_questions)


You are a(n) {personality} looking for a house in {location}. Please pay attention to the type of buyer you are.
I am a professional real estate agent helping you purchase a home. Answer each of these questions. 
{questions}
With this information I will help you find the perfect home for you. Please limit your answers to 100 words.



In [32]:
json_spec = """
###
Return the answers in this JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 
All commas('), square brackets([], colons(:), and ellipsis({}) must be exactly as you see here. Check your work. 
This is important. 


{'Q0': 'Answer to Q0',
 'Q1': 'Answer to Q1',
 'Q2': 'Answer to Q2',
 'Q3': 'Answer to Q3',
 'Q4': 'Answer to Q4',
 'Q5': 'Answer to Q5',
 'Q6': 'Answer to Q6'}
###
"""
print(json_spec)


###
Return the answers in this JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 
All commas('), square brackets([], colons(:), and ellipsis({}) must be exactly as you see here. Check your work. 
This is important. 


{'Q0': 'Answer to Q0',
 'Q1': 'Answer to Q1',
 'Q2': 'Answer to Q2',
 'Q3': 'Answer to Q3',
 'Q4': 'Answer to Q4',
 'Q5': 'Answer to Q5',
 'Q6': 'Answer to Q6'}
###



In [33]:
# Change the modeltemperature to make it more creative.
answers_model = OpenAI(model = 'gpt-3.5-turbo-instruct', 
                            temperature = .5,
                            max_tokens = 3000)

In [34]:
personal_answers = []
for personality in personalities:
    formatted_prompt = prompt_for_asking_questions.format(
        personality = personality,
        location = location,
        questions = questions)
    
    # Combine the prompts
    formatted_prompt = formatted_prompt + json_spec
    
    # Call the model with the formatted prompt 
    response = answers_model.predict(formatted_prompt)
    
    cleaned_response = cleanup(response)
    
    # Append to personal_answers
    personal_answers.append(cleaned_response)
    
personal_answers[0]

'{"Q0": "I am looking for a spacious house with at least 3 bedrooms and 2 bathrooms. ",  "Q1": "The most important things for me are location, security, and a modern design. ",  "Q2": "I would like a house with a pool, a gym, and a home office. ",  "Q3": "Easy access to public transportation and proximity to major highways are important to me. ",  "Q4": "I will be living in the home with my partner, who is also a single female, and our two cats. ",  "Q5": "I would like to live in a neighborhood that is urban, but not too busy or noisy. ",  "Q6": "I prefer a house with a large backyard and a garage for my luxury car. "}'

In [35]:
# Create df
personal_df = create_df(personal_answers, 'personal_df.csv')

Unnamed: 0,Q0,Q1,Q2,Q3,Q4,Q5,Q6
"single, arrogant, rich, and female",I am looking for a spacious house with at leas...,"The most important things for me are location,...","I would like a house with a pool, a gym, and a...",Easy access to public transportation and proxi...,"I will be living in the home with my partner, ...",I would like to live in a neighborhood that is...,I prefer a house with a large backyard and a g...
"male, married, 2 kids, and middle class",Answer to Q0,"Location, size, and price","Backyard, garage, and updated appliances",Proximity to public transportation and major h...,"Wife, 35, female, spouse; children, 8 and 10, ...",Moderately urban,I would prefer a safe and family-friendly neig...
"divorced middle aged woman, working 2 jobs, with dependent adult children","I would like my house to be at least 2, 000 sq...",The 3 most important things for me in choosing...,I would like amenities such as a spacious back...,Having easy access to public transportation an...,"My dependent adult children, Sarah (25, female...",I would prefer a moderately urban neighborhood...,I would also like to have a home office space ...
"young, nice, poor, male","I would like my house to be at least 1, 000 sq...",The three most important things for me in choo...,"I would like amenities such as a backyard, par...",Having access to public transportation and maj...,"I will be living in the home with my partner, ...",I prefer a neighborhood that is moderately urb...,I also prefer a home with natural lighting and...
"upper middle class married woman, husband works overseas, with no children","I would like my house to be around 2, 000 to 3...",The three most important things for me in choo...,"I would like amenities such as a pool, a gym, ...",Having access to public transportation and bei...,My husband and I will be living in the home. W...,"I would like my neighborhood to be urban, but ...",I would prefer a home that is move-in ready an...


### Buyer Preference Parsing: 
- Implement logic to interpret and structure these preferences for querying the vector database. 
- Ask the first question (Q0) of each of the buyers.

In [36]:
for row in personal_df.itertuples():
    print(row.Q0)
    results = chroma_langchain.similarity_search(row.Q0)[0]
    print(row.Index, '\n', results, '\n###\n')

I am looking for a spacious house with at least 3 bedrooms and 2 bathrooms. 
single, arrogant, rich, and female 
 page_content="{'Neighborhood_Name': 'Bel Air', 'Neighborhood_Description': 'Bel Air is an exclusive and upscale neighborhood in Los Angeles known for its sprawling estates, stunning views, and privacy. It is home to many celebrities and wealthy residents, and is known for its luxurious lifestyle. ', 'House_Description': 'This magnificent mansion in Bel Air offers 500 square meters of living space, with opulent finishes and breathtaking views of the city. The house features 5 bedrooms and 5 bathrooms, perfect for someone looking for a grand and luxurious home in one of the most prestigious neighborhoods in Los Angeles. Don`t miss out on this rare opportunity. ', 'Square_Meters': 500, 'Bathrooms': 5, 'Bedrooms': 5, 'Price': 7000000}" 
###

Answer to Q0
male, married, 2 kids, and middle class 
 page_content="{'Neighborhood_Name': 'Downtown', 'Neighborhood_Description': 'Downto

The code works, but ... you can see that some of the answers while they may be the best in the database are certainly off the mark. 

## Step 5: Searching Based on Preferences

- Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
- Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

- I will loop thru each buyers answers to each question and provide that answer to chroma_langchain and see what the search results are. I will put those answers in a df called answers_df. Since there are only 15 entries, I will limit the answers to just the top 1. The answers_df, will be structured the same as the personaL_df.

In [37]:
# Results list for entire dataframe
df_results_list = []

# Exterior loop to use answers to search chroma_langchain
for row in personal_df.itertuples():
    
    # Create the empty list
    row_results_list = []
    
    # Interior loop to acroww each row
    for preference in row:
        
        # Do the search
        results = chroma_langchain.similarity_search(preference)[0]
        
        # So that it is obvious, include the preference that was submitted to chroma_langchain
        # Make this a tuple
        pair = preference, results
        
        # Append search results in the row results list
        row_results_list.append(pair)
        
    # Append each row_results_list to the entire df_results_list
    df_results_list.append(row_results_list)

In [45]:
df_results_list[0][0]

('single, arrogant, rich, and female',
 Document(page_content="{'Neighborhood_Name': 'Bel Air', 'Neighborhood_Description': 'Bel Air is an exclusive and upscale neighborhood in Los Angeles known for its sprawling estates, stunning views, and privacy. It is home to many celebrities and wealthy residents, and is known for its luxurious lifestyle. ', 'House_Description': 'This magnificent mansion in Bel Air offers 500 square meters of living space, with opulent finishes and breathtaking views of the city. The house features 5 bedrooms and 5 bathrooms, perfect for someone looking for a grand and luxurious home in one of the most prestigious neighborhoods in Los Angeles. Don`t miss out on this rare opportunity. ', 'Square_Meters': 500, 'Bathrooms': 5, 'Bedrooms': 5, 'Price': 7000000}"))

### Judging Quality of Results

There are some specific preferences being stated and the system is returning a lot of text. Difficult to say if this is working unless you painstakingly went thru each preference, results pair and made a call on that. 

However, this is something that LLMs typically do quite well. So, lets provide 3 pieces of information to the LLM. They are the profile of the buyer, the preference, and the results. Then ask the LLM based on what the result was, is the preference stated satisfied by that result? This will require some prompt engineering and then simply a score of 0 or 1 that means NO or YES.

There are 35 preferences being posed here. Lets see how it does. Since there is a lot of text going back and forth here, what I will do is put this in a loop that calls the LLM for each buyer and ONLY gives them 1 buyer_profile, preference, result tuple to work on. This will take awhile to do this. Generally speaking OpenAI does not like this many calls in. 

In [46]:
buyer_prompt = """
You are a(n) {personality} BUYER. You have stated this PREFERENCE:
###
{preference} 
###
to your real estate agent here in {location}. 
Please answer if this LISTING:
###
{listing} 
###
satisfies your Preference.
###
If it DOES NOT stafisfy your Preference return a 0 as NUMBER or if it DOES satisfy your Preference return a 1 as NUMBER.
Next provide your REASON.
###
"""
print(buyer_prompt)


You are a(n) {personality} BUYER. You have stated this PREFERENCE:
###
{preference} 
###
to your real estate agent here in {location}. 
Please answer if this LISTING:
###
{listing} 
###
satisfies your Preference.
###
If it DOES NOT stafisfy your Preference return a 0 as NUMBER or if it DOES satisfy your Preference return a 1 as NUMBER.
Next provide your REASON.
###



In [47]:
json_spec = """
###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it.

{'Buyer': "BUYER",
 'Preference': "PREFERENCE",
 'Number': NUMBER,
 'Reason': "REASON"}

###
"""
print(json_spec)


###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it.

{'Buyer': "BUYER",
 'Preference': "PREFERENCE",
 'Number': NUMBER,
 'Reason': "REASON"}

###



In [48]:
judging_results = []
for idx, buyer in enumerate(df_results_list):
    
    # Outside loops gets the buyer froom the original personalities list
    personality_type = personalities[idx]
    print(personality_type)
    
    # Goes thru each buyer's preferences and asks them if the listing satisfies their itch
    for int_idx, temp in enumerate(buyer):
        
        # The first tuple is a throw away.
        if int_idx ==0:
            pass
        
        else:
            # Split the list into its component parts.
            preference, listing = temp
            
            # Format the prompt
            preference_prompt_formatted = buyer_prompt.format(
                personality = personality_type,
                preference = preference,
                location = location,
                listing = listing)

            judging_prompt = preference_prompt_formatted + json_spec

            # Call the model with the formatted prompt. 
            response = answers_model.predict(judging_prompt)

            # Clean up the string
            cleaned_response = cleanup(response)

            # Append to personal_answers
            judging_results.append(cleaned_response)

len(judging_results)

single, arrogant, rich, and female
male, married, 2 kids, and middle class
divorced middle aged woman, working 2 jobs, with dependent adult children
young, nice, poor, male
upper middle class married woman, husband works overseas, with no children


35

In [49]:
# Create the df
judging_df = create_df(judging_results, 'judging_df.csv')

Unnamed: 0,Buyer,Preference,Number,Reason
0,"Single, Arrogant, Rich, Female Buyer",I am looking for a spacious house with at leas...,1,This listing satisfies all of my preferences f...
1,"Single, arrogant, rich, female BUYER","The most important things for me are location,...",1,This listing satisfies my preference because i...
2,"Single, arrogant, rich, female BUYER","I would like a house with a pool, a gym, and a...",1,"This listing satisfies all of the buyer""s pref..."
3,"Single, arrogant, rich, female",Easy access to public transportation and proxi...,1,This listing is located in the heart of Downto...
4,"Single, arrogant, rich, female BUYER",Living with partner and cats,1,This listing offers 4 bedrooms and 4 bathrooms...
5,"Single, arrogant, rich, female BUYER",I would like to live in a neighborhood that is...,1,This listing is for a modern and spacious loft...
6,"Single, arrogant, rich, female BUYER",I prefer a house with a large backyard and a g...,1,This listing satisfies my preference as it is ...
7,male,"married, 2 kids, middle class",1,This listing is a perfect fit for a middle cla...
8,Male,"Location, size, and price",1,"This listing is located in Downtown, which is ..."
9,male,"Backyard, garage, and updated appliances",1,"The listing offers a backyard and garage, as w..."


In [50]:
accuracy_score = judging_df['Number'].sum() / judging_df.shape[0]
print('It has an accuracy score of:', round(accuracy_score, 2)*100, '%')

It has an accuracy score of: 89.0 %


However, when I look at what the "judge" is complaining about, I am not sure that I would not have said the listing was fine. A panel of "3 judges" is what is needed here. Get them to vote and we simply take the majority opinion each time. For a production application this would be a good idea. However, for this one, it is not necessary.

In [51]:
judging_results[:5]

['{"Buyer": "Single, Arrogant, Rich, Female Buyer",  "Preference": "I am looking for a spacious house with at least 3 bedrooms and 2 bathrooms. ",  "Number": 1,  "Reason": "This listing satisfies all of my preferences for a spacious house with at least 3 bedrooms and 2 bathrooms. It is located in the exclusive and upscale neighborhood of Bel Air and offers luxurious features and stunning views. The 500 square meters of living space and 5 bedrooms and 5 bathrooms make it the perfect grand and luxurious home for someone like me. "}',
 '{"Buyer": "Single, arrogant, rich, female BUYER",  "Preference": "The most important things for me are location, security, and a modern design. ",  "Number": 1,  "Reason": "This listing satisfies my preference because it is located in Downtown, which is a bustling and vibrant center of the city. It also mentions security, which is important to me, and has a modern design. Additionally, the loft is spacious and offers stunning views of the city, making it a

### Fine-tune the Retrieval Algorithm
Lets see if this changes if we use euclidean distance as opposed to the default cosine distance. I don't want to do this on large number of queries. Lets just do it on some new questions that I have dreamed up here and see if the results change.

In [52]:
for question in questions:
    
    print('###\n', question)

    # Get the most similar documents using Euclidean distance
    results_euclidean = chroma_langchain.similarity_search(question, k=3, distance_metric='euclidean')

    print(f"Results (Euclidean Distance): {len(results_euclidean)}")
    print(results_euclidean[0])

    # Get the most similar documents using Cosine distance
    results_cosine = chroma_langchain.similarity_search(question, k=3, distance_metric='cosine')

    print(f"Results (Cosine Distance): {len(results_cosine)}")
    print(results_cosine[0])
    
    if results_euclidean == results_cosine:
        print('\nResults for both Euclidean and Cosine distance searches are identical\n')
    else:
        print('\nResults are different!\n')


###
 Q0: How big do you want your house to be? 
Results (Euclidean Distance): 3
page_content="{'Neighborhood_Name': 'Beverly Hills', 'Neighborhood_Description': 'Beverly Hills is a luxurious neighborhood in Los Angeles known for its extravagant mansions, high-end shopping, and celebrity sightings. It is home to the famous Rodeo Drive and is a popular destination for tourists and wealthy residents alike. ', 'House_Description': 'This stunning mansion in Beverly Hills boasts 400 square meters of living space, with high-end finishes and luxurious amenities. The house features 4 bedrooms and 4 bathrooms, perfect for a family looking for a spacious and elegant home. Don`t miss out on the opportunity to live in one of the most prestigious neighborhoods in Los Angeles. ', 'Square_Meters': 400, 'Bathrooms': 4, 'Bedrooms': 4, 'Price': 4300000}"
Results (Cosine Distance): 3
page_content="{'Neighborhood_Name': 'Beverly Hills', 'Neighborhood_Description': 'Beverly Hills is a luxurious neighborhood

Results (Euclidean Distance): 3
page_content="{'Neighborhood_Name': 'Downtown', 'Neighborhood_Description': 'Downtown Los Angeles is the bustling and vibrant center of the city, known for its iconic skyscrapers, cultural institutions, and diverse population. It is a popular spot for young professionals and tourists, with a mix of historic and modern architecture. ', 'House_Description': 'This modern and spacious loft in Downtown offers 200 square meters of living space, with sleek finishes and stunning views of the city. The loft features 2 bedrooms and 2 bathrooms, perfect for someone looking for a convenient and trendy lifestyle in the heart of Downtown. Don`t miss out on this rare opportunity. ', 'Square_Meters': 200, 'Bathrooms': 2, 'Bedrooms': 2, 'Price': 800000}"
Results (Cosine Distance): 3
page_content="{'Neighborhood_Name': 'Downtown', 'Neighborhood_Description': 'Downtown Los Angeles is the bustling and vibrant center of the city, known for its iconic skyscrapers, cultural in

In this case, the use of euclidean vs cosine distance does not matter. It is interesting to note that this vector store does really poorly on retrieving a numbers question. It says the Malibu house is the cheapest. That makes no sense since it is 1,800,000. The least expensive is 500,000. 

## Step 6: Personalizing Listing Descriptions

- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

### Process will be:

- Select a buyer and a preference. 
- The preferences are in personal_df. 
- Use Q1 to search chroma_langchain and come up with the top result only.
- Lets use Q1 as the preference for enhancing the listing. 
- The prompt will give the personality, the retrieved listing from chroma_langchain, and the location (again).
- The prompt will include instructions for being creative to enhance the listing.
- Instructions need to be given in the prompt that the LLM must only only use what is in the prompt for context.

In [53]:
enhanced_listing_prompt = """
You are an accomplished and creative real estate marketing executive who works in {location}. Your buyer is 
{personality}. The buyer will be interested in the following listing. 
###
{listing}
###
Please create a new listing that encapsulate the information contained in this listing and takes into account 
the buyer's preferences as follows.
###
{preferences}
###
When creating the new listing take poetic license but remain factual. Restrict yourself to the content in the listing. 
If the Square_Meters are 150, then in your poetic listing you must state 150 Square_Meters. 
If it says there are 3 bedrooms, you must state there are 3 bedrooms. 
If it says 1 bathroom in the listing, then your poetic listing should say 1 bathroom. 
Please reference price, number of bedrooms, and size of the home (Square_Meters) in your poetic listing. Also, 
wordsmith their preferences into the listing. 
"""
print(enhanced_listing_prompt)


You are an accomplished and creative real estate marketing executive who works in {location}. Your buyer is 
{personality}. The buyer will be interested in the following listing. 
###
{listing}
###
Please create a new listing that encapsulate the information contained in this listing and takes into account 
the buyer's preferences as follows.
###
{preferences}
###
When creating the new listing take poetic license but remain factual. Restrict yourself to the content in the listing. 
If the Square_Meters are 150, then in your poetic listing you must state 150 Square_Meters. 
If it says there are 3 bedrooms, you must state there are 3 bedrooms. 
If it says 1 bathroom in the listing, then your poetic listing should say 1 bathroom. 
Please reference price, number of bedrooms, and size of the home (Square_Meters) in your poetic listing. Also, 
wordsmith their preferences into the listing. 



In [54]:
json_spec = """
###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput.

{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description with Preferences',
 'House_Description': 'House_Description with Preferences',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price}
###
Examples of JSON output:
{'Neighborhoold_Name': 'Beverly Hills',
 'Neighborhood_Description': 'Neighborhood Description: Beverly Hills is a prestigious neighborhood in Los Angeles 
  known for its luxurious homes, high-end shopping, and celebrity residents. It is a highly sought after location for 
  those looking for a glamorous and upscale lifestyle.',
 'House_Description': 'Live like a celebrity in this stunning Beverly Hills mansion. With 500 square meters of living 
  space, this house boasts 5 bedrooms and 4 bathrooms, making it the perfect home for a large family or those who love 
  to entertain. The high-end finishes and top-of-the-line appliances make this house a true luxury retreat. Don`t miss 
  your chance to live in one of the most exclusive neighborhoods in Los Angeles! ',
 'Square_Meters': 217,
 'Bathrooms': 4,
 'Bedrooms': 3,
 'Price': '995,000'}
 
 {'Neighborhoold_Name': 'West Hollywood',
 'Neighborhood_Description': 'Known for its vibrant nightlife and trendy restaurants, West Hollywood is a popular 
  neighborhood for young professionals and those in the entertainment industry. With its central location and walkable 
  streets, West Hollywood offers a lively and convenient lifestyle.',
 'House_Description': 'This stylish condo in West Hollywood is perfect for those who want to be in the heart of the 
  action. With 100 square meters, this house features 2 bedrooms and 1 bathroom, making it the ideal size for a young 
  couple or single individual. The open floor plan and modern finishes create a chic and contemporary living space. ',
 'Square_Meters': 313,
 'Bathrooms': 5,
 'Bedrooms': 4,
 'Price': 2495000}
###
"""
print(json_spec)


###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput.

{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description with Preferences',
 'House_Description': 'House_Description with Preferences',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price}
###
Examples of JSON output:
{'Neighborhoold_Name': 'Beverly Hills',
 'Neighborhood_Description': 'Neighborhood Description: Beverly Hills is a prestigious neighborhood in Los Angeles 
  known for its luxurious homes, high-end shopping, and celebrity residents. It is a highly sought after location for 
  those looking for a glamorous and upscale lifestyle.',
 'House_Description': 'Live like a celebrity in this stunning Beverly Hills mansion. With 500 square meters of living 
  space, this house boasts 5 bedrooms and 4 bathrooms, making it the perfect home for a large family or those who love 
  to entertain. The high-end finish

In [55]:
# We are only gonig to get 5 poetic listings. Just one for each buyer to see how the LLM does.
data_list = []
for idx, row in enumerate(personal_df.itertuples()):
    
    # Define personality of buyer
    personality = personalities[idx]
    print(personality)
    
    # Search chroma_langchain
    results = chroma_langchain.similarity_search(row.Q2)[0]
    
    # Create the preferences for this buyer
    preferences = concat_row(row)
    
    formatted_prompt = enhanced_listing_prompt.format(
        personality = personality,
        listing = results,
        preferences = preferences,
        location = location)
    
    # Create the formatted prompt
    formatted_prompt = formatted_prompt + json_spec
    
    # Get the response from the LLM
    try:
        response = answers_model.predict(formatted_prompt)
    except Exception as e:
        print(f"Error occurred: {str(e)}")
        print('Non fatal error. We just get one fewer personalized listing.')
    
    # Clean the output
    cleaned_response = cleanup(response)
    
    # Parse the response to structured JSON
    try:
        structured_response = json.loads(cleaned_response)
        data_list.append(structured_response)
    except Exception as e:
        print(f"Error occurred: {str(e)}")
        print('Non fatal error. We just get one fewer personalized listing.')
    
len(data_list)

single, arrogant, rich, and female
male, married, 2 kids, and middle class
divorced middle aged woman, working 2 jobs, with dependent adult children
young, nice, poor, male
upper middle class married woman, husband works overseas, with no children


5

In [56]:
# Make a dataframe so that you can look at it.
poetic_df = pd.DataFrame(data_list)

# Sometimes the price comes back funny
poetic_df['Price'] = poetic_df['Price'].apply(clean_price)
poetic_df

Unnamed: 0,Neighborhoold_Name,Neighborhood_Description,House_Description,Square_Meters,Bathrooms,Bedrooms,Price
0,Pacific Palisades,"Welcome to Pacific Palisades, where luxury and...",Indulge in the ultimate luxury with this magni...,500,5,5,7000000
1,Bel Air,Experience the epitome of luxury in the exclus...,Live like royalty in this magnificent Bel Air ...,500,5,5,7000000
2,Pasadena,"Find your dream home in Pasadena, a charming a...",Welcome to your new home in Pasadena! This ele...,350,4,4,7500000
3,Pasadena,Experience the charm and history of Pasadena i...,Welcome to your dream home in Pasadena! This e...,100,1,3,500000
4,Bel Air,Experience the epitome of luxury in this exqui...,Welcome to your dream home in Bel Air. This ma...,2500,5,5,7000000


In [57]:
for idx, row in enumerate(poetic_df.itertuples()):
    
    # Get the Buyer
    print('Buyer is:', personalities[idx])
            
    # Get the Preferences for the Buyer
    print('Preferences are:')
    for col_num in range(len(personal_df)):
        print(col_num, personal_df.iloc[idx, col_num], '\n')
    
    # Get the original listing
    results = chroma_langchain.similarity_search(row.Neighborhood_Description)[0]
    print('This is the original listing')
    print(results, '\n')
    
    print('This is the poetic license version')
    print(row.Neighborhood_Description)
    print(row.House_Description)
    print('###\n')

Buyer is: single, arrogant, rich, and female
Preferences are:
0 I am looking for a spacious house with at least 3 bedrooms and 2 bathrooms.  

1 The most important things for me are location, security, and a modern design.  

2 I would like a house with a pool, a gym, and a home office.  

3 Easy access to public transportation and proximity to major highways are important to me.  

4 I will be living in the home with my partner, who is also a single female, and our two cats.  

This is the original listing
page_content="{'Neighborhood_Name': 'Pacific Palisades', 'Neighborhood_Description': 'Pacific Palisades is a beautiful and upscale neighborhood in Los Angeles known for its stunning homes, breathtaking views, and upscale shopping and dining. It is a popular spot for wealthy residents and celebrities, with a mix of luxury and relaxation. ', 'House_Description': 'This magnificent mansion in Pacific Palisades offers 500 square meters of living space, with opulent finishes and breathtak

This is the original listing
page_content="{'Neighborhood_Name': 'Bel Air', 'Neighborhood_Description': 'Bel Air is an exclusive and upscale neighborhood in Los Angeles known for its sprawling estates, stunning views, and privacy. It is home to many celebrities and wealthy residents, and is known for its luxurious lifestyle. ', 'House_Description': 'This magnificent mansion in Bel Air offers 500 square meters of living space, with opulent finishes and breathtaking views of the city. The house features 5 bedrooms and 5 bathrooms, perfect for someone looking for a grand and luxurious home in one of the most prestigious neighborhoods in Los Angeles. Don`t miss out on this rare opportunity. ', 'Square_Meters': 500, 'Bathrooms': 5, 'Bedrooms': 5, 'Price': 7000000}" 

This is the poetic license version
Experience the epitome of luxury in this exquisite Bel Air mansion. With its sprawling    500 square meters of living space, this house offers 5 bedrooms and 5 bathrooms, making it the perfect

It is factual, reflects the buyers preferences, and creative as requested!!

## Step 7: Multi Modal Search Using CLIP

I am going to use Chromadb (not just chroma_langchain) to have a multi model search capability to this vector store. 

In [58]:
# Load the CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
device

'cuda'

In [71]:
# Initialize Chroma
client = chromadb.Client()

# Assuming client.list_collections() returns a list of Collection objects
collections = client.list_collections()

# Set the collection name you want to check
collection_name = "real_estate_collection"

# Check if the collection exists and delete it if it does
collection_exists = any(collection.name == collection_name for collection in collections)

if collection_exists:
    client.delete_collection(collection_name)
    print(f"Collection '{collection_name}' deleted.")

# Create a new collection
collection = client.create_collection(collection_name)
print(f"Collection '{collection_name}' created.")

Collection 'real_estate_collection' deleted.
Collection 'real_estate_collection' created.


In [72]:
# We may have a problem with how long the text can be for the embeddings.
print(len(dict_list_string[0]))

725


In [73]:
df.sort_values('Neighborhood_Name', inplace = True)
df.reset_index(drop = True, inplace = True)
df

Unnamed: 0,Neighborhood_Name,Neighborhood_Description,House_Description,Square_Meters,Bathrooms,Bedrooms,Price
0,Bel Air,Bel Air is an exclusive and upscale neighborho...,This magnificent mansion in Bel Air offers 500...,500,5,5,7000000
1,Beverly Hills,Beverly Hills is a luxurious neighborhood in L...,This stunning mansion in Beverly Hills boasts ...,400,4,4,4300000
2,Culver City,Culver City is a trendy and up-and-coming neig...,This modern and stylish townhouse in Culver Ci...,250,3,3,800000
3,Downtown,Downtown Los Angeles is the bustling and vibra...,This modern and spacious loft in Downtown offe...,200,2,2,800000
4,Echo Park,Echo Park is a trendy and diverse neighborhood...,This modern and stylish townhouse in Echo Park...,250,3,3,800000
5,Hollywood,Hollywood is the iconic neighborhood in Los An...,This modern and stylish condo in Hollywood off...,300,3,3,765000
6,Koreatown,Koreatown is a vibrant and diverse neighborhoo...,This modern and stylish apartment in Koreatown...,150,2,2,430000
7,Malibu,Malibu is a luxurious and picturesque beachfro...,This stunning beachfront villa in Malibu offer...,500,5,5,7000000
8,Marina del Rey,Marina del Rey is a beautiful and upscale wate...,This stunning waterfront home in Marina del Re...,450,5,5,6000000
9,Pacific Palisades,Pacific Palisades is a beautiful and upscale n...,This magnificent mansion in Pacific Palisades ...,500,5,5,7000000


In [74]:
# Put the image paths into image_paths
directory = 'images'
image_paths = []
for root, _, files in os.walk(directory):
    for file in files:
        image_paths.append(os.path.join(root, file))
print(image_paths)
len(image_paths)

['images\\bel_air_house.png', 'images\\beverly_hills_mansion.png', 'images\\downtown_residence.png', 'images\\echo_park_house.png', 'images\\hollywood_house.png', 'images\\los_feliz_house.png', 'images\\malibu_house.png', 'images\\manhattan_beach_house.png', 'images\\marian_del_rey_house.png', 'images\\pacific_palisades_house.png', 'images\\pasadena_house.png', 'images\\santa_monica_residence.png', 'images\\silver_lake_house.png', 'images\\venice_beach_house.png', 'images\\westwood_house.png', 'images\\west_hollywood_residence.png']


16

In [75]:
# Add data to the collection
for idx, entry_str in enumerate(dict_list_string):
    
    text_features = encode_text(entry_str[:256])[0]
    image_features = encode_image(image_paths[idx])[0]
    
    combined_features = np.concatenate((text_features, image_features))
    
    collection.add(
        ids=str([idx]),
        embeddings=combined_features.tolist(),
        metadatas={"id": idx, "listing": entry_str, "image_path": image_paths[idx]}
    )
    print(f"Added entry {idx} to collection.")
collection.count(), combined_features.shape

Added entry 0 to collection.
Added entry 1 to collection.
Added entry 2 to collection.
Added entry 3 to collection.
Added entry 4 to collection.
Added entry 5 to collection.
Added entry 6 to collection.
Added entry 7 to collection.
Added entry 8 to collection.
Added entry 9 to collection.
Added entry 10 to collection.
Added entry 11 to collection.
Added entry 12 to collection.
Added entry 13 to collection.
Added entry 14 to collection.
Added entry 15 to collection.


(16, (1024,))

In [85]:
# Example query
query = "Pacific Palisades"
results, listing_dict = search(query, top_k=3)

# The json is not that easy to work with
nof_listings = len([results['ids'][0][0]])

if nof_listings == 1:
    format_print_listing(results, listing_dict)
else:
    print('Need to work on the logic here so that it handles multiple results')
    print('But the dataframe has a small number of rows in it, so, fine fow now')