# HomeMatch
Core Components of "HomeMatch"

Understanding Buyer Preferences:
- Buyers will input their requirements and preferences, such as location, property type, budget, amenities, and lifestyle choices.
- The application uses LLMs to interpret these inputs in natural language, understanding nuanced requests beyond basic filters.

Integrating with a Vector Database:
- Connect "HomeMatch" with a vector database, where all available property listings are stored.
- Utilize vector embeddings to match properties with buyer preferences, focusing on aspects like neighborhood vibes, architectural styles, and proximity to specific amenities.
- Personalized Listing Description Generation:

Matching
- For each matched listing, use an LLM to rewrite the description in a way that highlights aspects most relevant to the buyer’s preferences.
- Ensure personalization emphasizes characteristics appealing to the buyer without altering factual information about the property.

Listing Presentation:
- Output the personalized listing(s) as a text description of the listing.

## Step 1: Setting Up the Python Application
Import Libraries

In [105]:
# import libraries
import json
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma
import numpy as np
import os
import pandas as pd
import random
import re
from typing import Any, Dict, Optional, Tuple

In [148]:
# Create a Conda Environment YAML File
!conda env export > environment.yml

In [149]:
# Generate requirements.txt file using pip freeze
!pip freeze > requirements.txt



### Global Variables

In [106]:
num = 15
location = 'Los Angeles, California, USA'

In [107]:
personalities = ['single, arrogant, rich, and female', 
                 'male, married, 2 kids, and middle class', 
                 'divorced middle aged woman, working 2 jobs, with dependent adult children', 
                 'young, nice, poor, male', 
                 'upper middle class married woman, husband works overseas, with no children']
personalities

['single, arrogant, rich, and female',
 'male, married, 2 kids, and middle class',
 'divorced middle aged woman, working 2 jobs, with dependent adult children',
 'young, nice, poor, male',
 'upper middle class married woman, husband works overseas, with no children']

In [108]:
questions = ['Which houses are in Venice?',
             'What is the cheapest house that is available in your listings?',
             'Only provide me listings if the house has a pool']
questions

['Which houses are in Venice?',
 'What is the cheapest house that is available in your listings?',
 'Only provide me listings if the house has a pool']

### Common Functions

Due to the stochastic nature of LLMs there are a lot of situatons whereby the JSON returned had errrors in it and did not work. As a result, I used ChatGPT and CoPilot to build regex' to deal with the situations. I still had to put try and except blocks in on the json.loads() code. It is just VERY difficult to anticipate every error condition that will potentially happen and build clean up code that will deal with all instances.

In [109]:
def cleanup(response):
    
    # Fix spacing issues after commas and periods
    cleaned_response = re.sub(r'(?<=,)(\w)', r' \1', response)
    cleaned_response = re.sub(r'(?<=\.)(\w)', r' \1', cleaned_response)
    
    # Correct sequences of triple quotes
    cleaned_response = cleaned_response.replace('"   "', '", "')
    
    # We have a problem with contractions and possesives.
    # Replace single quotes with backticks (`) using regex
    pattern = r"\b\w+'\w*\b|\b'\w*\b"
    
    # Function to replace matched apostrophes with backticks
    def replace_apostrophe(match):
        return match.group().replace("'", "`")

    # Apply the regex pattern using sub with the replacement function
    cleaned_response = re.sub(pattern, replace_apostrophe, response)
    
    # Replace single quotes with double quotes for keys and values
    cleaned_response = cleaned_response.replace("'", '"')
    
    # Regex pattern to insert spaces after commas and periods
    pattern = r'([,.])(?=[^\s])'

    # Function to insert space after matched commas and periods
    def insert_space(match):
        return match.group(1) + ' '

    # Apply the regex pattern using sub with a function
    cleaned_response = re.sub(pattern, insert_space, cleaned_response)
    
    # Remove non-numeric characters except comma and then remove the commas
    cleaned_string = re.sub(r'[^\d,]', '', cleaned_response).replace(',', '')
    
    # Remove leading/trailing whitespace and newlines
    cleaned_response = cleaned_response.strip()
    
    # Remove newline characters from the JSON string
    cleaned_response = cleaned_response.replace('\n', ' ')
    
    return cleaned_response

In [110]:
def concat_row(row):
    
    preferences = (str(row.Q0) if row.Q0 is not None else '') + ' \n' + \
                  (str(row.Q1) if row.Q1 is not None else '') + ' \n' + \
                  (str(row.Q2) if row.Q2 is not None else '') + ' \n' + \
                  (str(row.Q3) if row.Q3 is not None else '') + ' \n' + \
                  (str(row.Q4) if row.Q4 is not None else '') + ' \n' + \
                  (str(row.Q5) if row.Q5 is not None else '') + ' \n' + \
                  (str(row.Q6) if row.Q6 is not None else '')
    
    return preferences

In [111]:
def create_df(input_list, output_file_name):
    
    temp_list = []
    
    # Load the data as a list of dictionaries
    for element in input_list:
        try:
            temp = json.loads(element)
            temp_list.append(temp)
        except Exception as e:
            print(f"Error occurred: {str(e)}")
            print('Non fatal error. We just get one fewer "preference" or "judgement".')
    
    if len(temp_list) == 5:
        index_values = personalities
    else:
        index_values = range(len(temp_list))

    # Create a DataFrame from the list of dictionaries
    output_df = pd.DataFrame(temp_list, index = index_values)

    # Write it to disk
    output_df.to_csv(output_file_name)

    # Display the DataFrame
    display(output_df)
    
    return output_df

## Step 2: Generating Real Estate Listings

### Create Prompt for Listings

In [112]:
listings_prompt = """
You are an experienced real estate professional in {location} with a flair for marketing. You have been asked to produce 
{num} new listings. Each listing is unique. Each listing has this information.

###
Neighborhoold_Name - Generate {num} names of well known districts for {location}.
Neighborhood_Description - Generate an 50 - 80 word description for each Neighborhood_Description. Use the 
Neighborhoold_Name for each listing as context for the description of the Neighborhood_Description
House_Description - For each House_Description generate an 50 - 80 word description of the House that is for sale.
Each house is unique. They vary in quality from low, medium, to high regardless of Neighborhoold_Name.
Square_Meters - Generate a RANDOM number from 100 to 500.
Bathrooms - Using the Square_Meters number for each listing, generate the number of bathrooms that the house will have
based on this formula. For every 100 square meters, the house should have at least 1 bathroom. This can vary slightly.
Bedrooms - Each house will have a minimum of 2 bedrooms. For every 100 Square_Meters > 200 Square_Meters, add 1 bedroom. 
This can vary slightly.
Price - The price will be based on this formula. 
1. Choose a number randomly between 500 and 2000. Call this random_number.
2. Multiply random_number by Square_Meters for each listing. This gives you the Price.

"""
print(listings_prompt)


You are an experienced real estate professional in {location} with a flair for marketing. You have been asked to produce 
{num} new listings. Each listing is unique. Each listing has this information.

###
Neighborhoold_Name - Generate {num} names of well known districts for {location}.
Neighborhood_Description - Generate an 50 - 80 word description for each Neighborhood_Description. Use the 
Neighborhoold_Name for each listing as context for the description of the Neighborhood_Description
House_Description - For each House_Description generate an 50 - 80 word description of the House that is for sale.
Each house is unique. They vary in quality from low, medium, to high regardless of Neighborhoold_Name.
Square_Meters - Generate a RANDOM number from 100 to 500.
Bathrooms - Using the Square_Meters number for each listing, generate the number of bathrooms that the house will have
based on this formula. For every 100 square meters, the house should have at least 1 bathroom. This can vary 

In [113]:
json_spec = """
###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 

[{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
 ...}]
###
"""
print(json_spec)


###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 

[{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description',
 'House_Description': 'House_Description',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price},
 ...}]
###



In [114]:
listings_prompt_formatted = listings_prompt.format(
    num = num,
    location = location
)
listings_prompt_formatted = listings_prompt_formatted + json_spec
print(listings_prompt_formatted)


You are an experienced real estate professional in Los Angeles, California, USA with a flair for marketing. You have been asked to produce 
15 new listings. Each listing is unique. Each listing has this information.

###
Neighborhoold_Name - Generate 15 names of well known districts for Los Angeles, California, USA.
Neighborhood_Description - Generate an 50 - 80 word description for each Neighborhood_Description. Use the 
Neighborhoold_Name for each listing as context for the description of the Neighborhood_Description
House_Description - For each House_Description generate an 50 - 80 word description of the House that is for sale.
Each house is unique. They vary in quality from low, medium, to high regardless of Neighborhoold_Name.
Square_Meters - Generate a RANDOM number from 100 to 500.
Bathrooms - Using the Square_Meters number for each listing, generate the number of bathrooms that the house will have
based on this formula. For every 100 square meters, the house should have at le

### Completion Model
Instantiate the model and run it with the prompt.

In [115]:
# Get key
open_ai_key = pd.read_csv('D:\OneDrive\Security\keys.csv')
open_ai_key = open_ai_key[open_ai_key['Organization'] == 'Open_AI']['Key'].iloc[0]

# Set the environment variable
os.environ["OPENAI_API_KEY"] = open_ai_key

# Create the model
descriptions_model = OpenAI(model = 'gpt-3.5-turbo-instruct', 
                            temperature = .3,
                            max_tokens = 3500)

In [116]:
# Call the model with the formatted prompt. 
response = descriptions_model.predict(listings_prompt_formatted)

# Clean it up and prepare it for a dataframe or viewing as a list
cleaned_response = cleanup(response)

In [117]:
# Load the data as a list of dictionaries
dict_list = json.loads(cleaned_response)

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(dict_list)

# Write the dataframe to disk
df.to_csv('listings.csv')

# Display the DataFrame
df

Unnamed: 0,Neighborhoold_Name,Neighborhood_Description,House_Description,Square_Meters,Bathrooms,Bedrooms,Price
0,Hollywood,Hollywood is known as the entertainment capita...,This beautiful home in Hollywood offers the pe...,300,3,3,1500000
1,Beverly Hills,Beverly Hills is known for its luxurious lifes...,Live like a celebrity in this stunning Beverly...,400,4,4,2000000
2,Venice Beach,Venice Beach is a vibrant and eclectic neighbo...,Experience the laid-back lifestyle of Venice B...,200,2,2,1000000
3,Bel Air,Bel Air is a prestigious and exclusive neighbo...,Live like royalty in this magnificent Bel Air ...,500,5,5,2500000
4,Silver Lake,Silver Lake is a hip and trendy neighborhood k...,This modern and stylish home in Silver Lake is...,300,3,3,1500000
5,Santa Monica,Santa Monica is a beachfront neighborhood know...,Live just steps away from the beach in this ch...,200,2,2,1000000
6,Westwood,Westwood is a bustling neighborhood known for ...,This spacious home in Westwood is perfect for ...,400,4,4,2000000
7,Downtown,"Downtown is the heart of Los Angeles, known fo...",Live in the heart of the city in this stunning...,200,2,2,1000000
8,Echo Park,Echo Park is a trendy and up-and-coming neighb...,This charming home in Echo Park offers the per...,300,3,3,1500000
9,Marina Del Rey,Marina Del Rey is a waterfront neighborhood kn...,Live the beach life in this stunning Marina De...,400,4,4,2000000


## Step 3: Storing Listings in a Vector Database

In [118]:
# Convert DataFrame to a list of dictionaries
dict_list = df.to_dict(orient="records")

# Convert each dictionary to a string
dict_list_string = [str(d) for d in dict_list]

dict_list_string[0]

"{'Neighborhoold_Name': 'Hollywood', 'Neighborhood_Description': 'Hollywood is known as the entertainment capital of the world, with its iconic Walk of Fame and the famous Hollywood sign. It is a bustling neighborhood with a mix of historic landmarks, trendy restaurants, and luxurious homes. ', 'House_Description': 'This beautiful home in Hollywood offers the perfect blend of modern amenities and classic charm. With 300 square meters of living space, this home features 3 bedrooms and 2 bathrooms, making it the perfect space for a growing family. The spacious backyard is perfect for entertaining guests or simply enjoying a quiet evening at home. ', 'Square_Meters': 300, 'Bathrooms': 3, 'Bedrooms': 3, 'Price': 1500000}"

In [119]:
# Initialize your embedding function (e.g., OpenAIEmbeddings)
embeddings = OpenAIEmbeddings()

# Create the vector store
chromadb = Chroma.from_texts(dict_list_string, embedding=embeddings)

# Persist the vector store to a directory
chromadb.persist()

### Query the Vector Database

In [120]:
# Query chromadb
question = 'Which houses are in Venice'

# Get the most similar documents
results = chromadb.similarity_search(question)

print(len(results))
results[0]

4


Document(page_content="{'Neighborhoold_Name': 'Venice', 'Neighborhood_Description': 'Venice is a vibrant and eclectic neighborhood in Los Angeles, known for its bohemian vibe and beautiful beach. It is a popular spot for artists and creatives, with colorful murals and street performers lining the famous Venice Boardwalk. The neighborhood also offers a variety of trendy restaurants, boutique shops, and a lively nightlife. ', 'House_Description': 'This charming house in Venice is the perfect beach retreat. With a spacious rooftop deck, you can enjoy stunning ocean views and beautiful sunsets. The house features an open floor plan with plenty of natural light and a cozy fireplace. The backyard is a private oasis with a hot tub and outdoor shower. Don`t miss your chance to live in this unique and vibrant neighborhood. ', 'Square_Meters': 200, 'Bathrooms': 2, 'Bedrooms': 3, 'Price': 1000000}")

## Step 4: Building the User Preference Interface

I am going to pose these questions to OpenAI and see what the LLMs responses are. I will do it for 5 different people and collect the answers to each one of the questions to build up a user profile for each "person"

In [121]:
questions = [   
"Q0: How big do you want your house to be? ",
"Q1: What are the 3 most important things for you in choosing this property? ", 
"Q2: Which amenities would you like? ", 
"Q3: Which transportation options are important to you? ",
"Q4: Who will be living in the home: their names, ages, sex, and relationship to you? ",
"Q5: How urban do you want your neighborhood to be? ",   
"Q6: Is there anything else that you think I shoud know? "
]
questions

['Q0: How big do you want your house to be? ',
 'Q1: What are the 3 most important things for you in choosing this property? ',
 'Q2: Which amenities would you like? ',
 'Q3: Which transportation options are important to you? ',
 'Q4: Who will be living in the home: their names, ages, sex, and relationship to you? ',
 'Q5: How urban do you want your neighborhood to be? ',
 'Q6: Is there anything else that you think I shoud know? ']

In [122]:
prompt_for_asking_questions = """
You are a(n) {personality} looking for a house in {location}. Please pay attention to the type of buyer you are.
I am a professional real estate agent helping you purchase a home. Answer each of these questions. 
{questions}
With this information I will help you find the perfect home for you. Please limit your answers to 100 words.
"""
print(prompt_for_asking_questions)


You are a(n) {personality} looking for a house in {location}. Please pay attention to the type of buyer you are.
I am a professional real estate agent helping you purchase a home. Answer each of these questions. 
{questions}
With this information I will help you find the perfect home for you. Please limit your answers to 100 words.



In [123]:
json_spec = """
###
Return the answers in this JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 
All commas('), square brackets([], colons(:), and ellipsis({}) must be exactly as you see here. Check your work. 
This is important. 


{'Q0': 'Answer to Q0',
 'Q1': 'Answer to Q1',
 'Q2': 'Answer to Q2',
 'Q3': 'Answer to Q3',
 'Q4': 'Answer to Q4',
 'Q5': 'Answer to Q5',
 'Q6': 'Answer to Q6'}
###
"""
print(json_spec)


###
Return the answers in this JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it. 
All commas('), square brackets([], colons(:), and ellipsis({}) must be exactly as you see here. Check your work. 
This is important. 


{'Q0': 'Answer to Q0',
 'Q1': 'Answer to Q1',
 'Q2': 'Answer to Q2',
 'Q3': 'Answer to Q3',
 'Q4': 'Answer to Q4',
 'Q5': 'Answer to Q5',
 'Q6': 'Answer to Q6'}
###



In [124]:
# Change the modeltemperature to make it more creative.
answers_model = OpenAI(model = 'gpt-3.5-turbo-instruct', 
                            temperature = .8,
                            max_tokens = 3300)

In [125]:
personal_answers = []
for personality in personalities:
    formatted_prompt = prompt_for_asking_questions.format(
        personality = personality,
        location = location,
        questions = questions)
    
    # Combine the prompts
    formatted_prompt = formatted_prompt + json_spec
    
    # Call the model with the formatted prompt. 
    response = answers_model.predict(formatted_prompt)
    
    cleaned_response = cleanup(response)
    
    # Append to personal_answers
    personal_answers.append(cleaned_response)
    
personal_answers[0]

'{"Q0": "I would like my house to be at least 3, 000 square feet. ",  "Q1": "The most important things for me in choosing this property are location, security, and luxury. ",  "Q2": "I would like amenities such as a pool, gym, and home theater. ",  "Q3": "Having easy access to public transportation and being close to major highways is important to me. ",  "Q4": "I will be living in the home alone. ",  "Q5": "I would like a neighborhood that is urban but still has some peace and quiet. ",  "Q6": "I would like the house to have a modern design and be located in a prestigious neighborhood. "}'

In [126]:
# Create df
personal_df = create_df(personal_answers, 'personal_df.csv')

Unnamed: 0,Q0,Q1,Q2,Q3,Q4,Q5,Q6
"single, arrogant, rich, and female","I would like my house to be at least 3, 000 sq...",The most important things for me in choosing t...,"I would like amenities such as a pool, gym, an...",Having easy access to public transportation an...,I will be living in the home alone.,I would like a neighborhood that is urban but ...,I would like the house to have a modern design...
"male, married, 2 kids, and middle class","I am looking for a house that is around 2, 000...",The most important things for me in choosing t...,"I would like amenities such as a backyard, a g...",Having access to public transportation and maj...,"My wife and I, both in our 30s, and our 2 youn...",I prefer a suburban neighborhood with a good s...,I would like to live in a family-friendly comm...
"divorced middle aged woman, working 2 jobs, with dependent adult children",I would like my house to be at least 1500 squa...,The 3 most important things for me in choosing...,"I would like amenities such as a backyard, gar...",Proximity to public transportation and easy ac...,"My dependent adult children, ages 20 and 23, w...","I would like a moderately urban neighborhood, ...",I would prefer a newer or recently renovated h...
"young, nice, poor, male",I am looking for a house that is around 1000-1...,The 3 most important things for me in choosing...,"I would like amenities such as a washer/dryer,...",I would prefer a neighborhood with reliable pu...,"I will be living in the home with my partner, ...","I would like a neighborhood that is urban, but...","As a young professional on a tight budget, I a..."
"upper middle class married woman, husband works overseas, with no children",Answer to Q0: I am looking for a house that is...,Answer to Q1: The 3 most important things for ...,Answer to Q2: I would like a house with a back...,Answer to Q3: The most important transportatio...,Answer to Q4: My husband and I will be living ...,Answer to Q5: I prefer a neighborhood that is ...,Answer to Q6: I would also like to have a gara...


### Buyer Preference Parsing: 
- Implement logic to interpret and structure these preferences for querying the vector database. 
- Ask the first question (Q0) of each of the buyers.

In [127]:
for row in personal_df.itertuples():
    print(row.Q0)
    results = chromadb.similarity_search(row.Q0)[0]
    print(row.Index, '\n', results, '\n###\n')

I would like my house to be at least 3, 000 square feet. 
single, arrogant, rich, and female 
 page_content="{'Neighborhoold_Name': 'Beverly Hills', 'Neighborhood_Description': 'Beverly Hills is a luxurious neighborhood known for its extravagant mansions, high-end shopping, and celebrity sightings. It is located in the heart of Los Angeles and offers stunning views of the city. Residents enjoy the finest dining and entertainment options, as well as top-rated schools and a safe community. ', 'House_Description': 'This stunning mansion in Beverly Hills boasts 400 square meters of living space and features high-end finishes and luxurious amenities. With 4 bedrooms and 4 bathrooms, this house is perfect for a family looking for a spacious and elegant home. Enjoy the beautiful California weather in the private backyard with a pool and outdoor kitchen. Don`t miss the opportunity to live in the prestigious neighborhood of Beverly Hills. ', 'Square_Meters': 400, 'Bathrooms': 4, 'Bedrooms': 4, 

The code works, but ... you can see that some of the answers while they may be the best in the database are certainly off the mark. 

## Step 5: Searching Based on Preferences

- Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
- Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.

- I will loop thru each buyers answers to each question and provide that answer to chromadb and see what the search results are. I will put those answers in a df called answers_df. Since there are only 15 entries, I will limit the answers to just the top 1. The answers_df, will be structured the same as the personaL_df.

In [128]:
# Results list for entire dataframe
df_results_list = []

# Exterior loop to use answers to search chromadb
for row in personal_df.itertuples():
    
    # Create the empty list
    row_results_list = []
    
    # Interior loop to acroww each row
    for preference in row:
        
        # Do the search
        results = chromadb.similarity_search(preference)[0]
        
        # So that it is obvious, include the preference that was submitted to chromadb
        # Make this a tuple
        pair = preference, results
        
        # Append search results in the row results list
        row_results_list.append(pair)
        
    # Append each row_results_list to the entire ef_results_list
    df_results_list.append(row_results_list)

### Judging Quality of Results

There are some specific preferences being stated and the system is returning a lot of text. Difficult to say if this is working unless you painstakingly went thru each preference, results pair and made a call on that. 

However, this is something that LLMs typically do quite well. So, lets provide 3 pieces of information to the LLM. They are the profile of the buyer, the preference, and the results. Then ask the LLM based on what the result was, is the preference stated satisfied by that result? This will require some prompt engineering and then simply a score of 0 or 1 that means NO or YES.

There are 35 preferences being posed here. Lets see how it does. Since there is a lot of text going back and forth here, what I will do is put this in a loop that calls the LLM for each buyer and ONLY gives them 1 buyer_profile, preference, result tuple to work on. This will take awhile to do this. Generally speaking OpenAI does not like this many calls in. 

In [129]:
buyer_prompt = """
You are a(n) {personality} BUYER. You have stated this PREFERENCE:
###
{preference} 
###
to your real estate agent here in {location}. 
Please answer if this LISTING:
###
{listing} 
###
satisfies your Preference.
###
If it DOES NOT stafisfy your Preference return a 0 as NUMBER or if it DOES satisfy your Preference return a 1 as NUMBER.
Next provide your REASON.
###
"""
print(buyer_prompt)


You are a(n) {personality} BUYER. You have stated this PREFERENCE:
###
{preference} 
###
to your real estate agent here in {location}. 
Please answer if this LISTING:
###
{listing} 
###
satisfies your Preference.
###
If it DOES NOT stafisfy your Preference return a 0 as NUMBER or if it DOES satisfy your Preference return a 1 as NUMBER.
Next provide your REASON.
###



In [130]:
json_spec = """
###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it.

{'Buyer': "BUYER",
 'Preference': "PREFERENCE",
 'Number': NUMBER,
 'Reason': "REASON"}

###
"""
print(json_spec)


###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput. Do not deviate from it.

{'Buyer': "BUYER",
 'Preference': "PREFERENCE",
 'Number': NUMBER,
 'Reason': "REASON"}

###



In [131]:
judging_results = []
for idx, buyer in enumerate(df_results_list):
    
    # Outside loops gets the buyer froom the original personalities list
    personality_type = personalities[idx]
    print(personality_type)
    
    # Goes thru each buyer's preferences and asks them if the listing satisfies their itch
    for int_idx, temp in enumerate(buyer):
        
        # The first tuple is a throw away.
        if int_idx ==0:
            pass
        
        else:
            # Split the list into its component parts.
            preference, listing = temp
            
            # Format the prompt
            preference_prompt_formatted = buyer_prompt.format(
                personality = personality_type,
                preference = preference,
                location = location,
                listing = listing)

            judging_prompt = preference_prompt_formatted + json_spec

            # Call the model with the formatted prompt. 
            response = answers_model.predict(judging_prompt)

            # Clean up the string
            cleaned_response = cleanup(response)

            # Append to personal_answers
            judging_results.append(cleaned_response)

len(judging_results)

single, arrogant, rich, and female
male, married, 2 kids, and middle class
divorced middle aged woman, working 2 jobs, with dependent adult children
young, nice, poor, male
upper middle class married woman, husband works overseas, with no children


35

In [132]:
# Create the df
judging_df = create_df(judging_results, 'judging_df.csv')

Error occurred: Expecting ',' delimiter: line 1 column 652 (char 651)
Non fatal error. We just get one fewer "preference" or "judgement".
Error occurred: Expecting ',' delimiter: line 1 column 64 (char 63)
Non fatal error. We just get one fewer "preference" or "judgement".


Unnamed: 0,Buyer,Preference,Number,Reason
0,"Single, arrogant, rich, female BUYER","My house should be at least 3, 000 square feet",1,This listing has 400 square meters or approxim...
1,"Single, arrogant, rich, female BUYER","Location, security, luxury",1,"This listing in Beverly Hills, a luxurious and..."
2,"Single, arrogant, rich, female BUYER","Amenities such as a pool, gym, and home theater",1,"This listing satisfies all of the buyer""s stat..."
3,"single, arrogant, rich, and female BUYER",Having easy access to public transportation an...,1,"The house is located in Downtown Los Angeles, ..."
4,"Single, Arrogant, Rich, Female Buyer",Living alone,1,"The house has 5 bedrooms and 5 bathrooms, prov..."
5,"Single, arrogant, rich, female buyer",I would like a neighborhood that is urban but ...,1,This listing in Brentwood satisfies my prefere...
6,"Single, arrogant, rich, female BUYER",I would like the house to have a modern design...,1,"The house is located in Beverly Hills, which i..."
7,"male, married, 2 kids, and middle class BUYER","I am looking for a house that is around 2, 000...",0,"The listed house has 400 square meters, which ..."
8,male,The most important things for me in choosing t...,0,"This property is located in Pasadena, which is..."
9,male,"Amenities such as a backyard, a garage, and a ...",1,"This listing offers a backyard, a garage, and ..."


In [133]:
accuracy_score = judging_df['Number'].sum() / judging_df.shape[0]
print('It has an accuracy score of:', round(accuracy_score, 2)*100, '%')

It has an accuracy score of: 82.0 %


However, when I look at what the "judge" is complaining about, I am not sure that I would not have said the listing was fine. A panel of "3 judges" is what is needed here. Get them to vote and we simply take the majority opinion each time. For a production application this would be a good idea. However, for this one, it is not necessary.

In [134]:
judging_results[:5]

['{"Buyer": "Single, arrogant, rich, female BUYER",  "Preference": "My house should be at least 3, 000 square feet",  "Number": 1,  "Reason": "This listing has 400 square meters or approximately 4, 300 square feet, which satisfies my preference for a large living space. "}',
 '{"Buyer": "Single, arrogant, rich, female BUYER",  "Preference": "Location, security, luxury",  "Number": 1,  "Reason": "This listing in Beverly Hills, a luxurious and safe neighborhood, offers high-end finishes and luxurious amenities, as well as stunning views of the city. With 4 bedrooms and 4 bathrooms, it provides plenty of space for a family. Additionally, the private backyard with a pool and outdoor kitchen allows for enjoying the beautiful California weather. Overall, this property satisfies all of the stated preferences of the buyer. "}',
 '{"Buyer": "Single, arrogant, rich, female BUYER",  "Preference": "Amenities such as a pool, gym, and home theater",  "Number": 1,  "Reason": "This listing satisfies a

### Fine-tune the Retrieval Algorithm
Lets see if this changes if we use euclidean distance as opposed to the default cosine distance. I don't want to do this on large number of queries. Lets just do it on some new questions that I have dreamed up here and see if the results change.

In [135]:
for question in questions:
    
    print('###\n', question)

    # Get the most similar documents using Euclidean distance
    results_euclidean = chromadb.similarity_search(question, k=3, distance_metric='euclidean')

    print(f"Results (Euclidean Distance): {len(results_euclidean)}")
    print(results_euclidean[0])

    # Get the most similar documents using Cosine distance
    results_cosine = chromadb.similarity_search(question, k=3, distance_metric='cosine')

    print(f"Results (Cosine Distance): {len(results_cosine)}")
    print(results_cosine[0])
    
    if results_euclidean == results_cosine:
        print('\nResults for both Euclidean and Cosine distance searches are identical\n')
    else:
        print('\nResults are different!\n')


###
 Q0: How big do you want your house to be? 
Results (Euclidean Distance): 3
page_content="{'Neighborhoold_Name': 'Hollywood', 'Neighborhood_Description': 'Hollywood is known as the entertainment capital of the world, with its iconic Walk of Fame and the famous Hollywood sign. It is a bustling neighborhood with a mix of historic landmarks, trendy restaurants, and luxurious homes. ', 'House_Description': 'This beautiful home in Hollywood offers the perfect blend of modern amenities and classic charm. With 300 square meters of living space, this home features 3 bedrooms and 2 bathrooms, making it the perfect space for a growing family. The spacious backyard is perfect for entertaining guests or simply enjoying a quiet evening at home. ', 'Square_Meters': 300, 'Bathrooms': 3, 'Bedrooms': 3, 'Price': 1500000}"
Results (Cosine Distance): 3
page_content="{'Neighborhoold_Name': 'Hollywood', 'Neighborhood_Description': 'Hollywood is known as the entertainment capital of the world, with its 

Results (Euclidean Distance): 3
page_content="{'Neighborhoold_Name': 'Venice Beach', 'Neighborhood_Description': 'Venice Beach is a unique and eclectic neighborhood known for its bohemian vibe and artistic community. It is located right on the beach and offers a laid-back lifestyle with plenty of outdoor activities, such as surfing and biking. Residents also enjoy a variety of trendy restaurants, boutique shops, and street performers. ', 'House_Description': 'This modern beach house in Venice Beach offers 300 square meters of living space and features sleek and stylish design. With 3 bedrooms and 3 bathrooms, this house is perfect for those who love to entertain. The rooftop deck offers stunning ocean views and is perfect for hosting summer parties. Don`t miss the opportunity to live in this vibrant and dynamic neighborhood. ', 'Square_Meters': 300, 'Bathrooms': 3, 'Bedrooms': 3, 'Price': 1500000}"
Results (Cosine Distance): 3
page_content="{'Neighborhoold_Name': 'Venice Beach', 'Neigh

In this case, the use of euclidean vs cosine distance does not matter. It is interesting to note that this vector store does really poorly on retrieving a numbers question. It says the Malibu house is the cheapest. That makes no sense since it is 1,800,000. The least expensive is 500,000. 

## Step 6: Personalizing Listing Descriptions

- LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
- Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

### Process will be:

- Select a buyer and a preference. 
- The preferences are in personal_df. 
- Use Q1 to search Chroma and come up with the top result only.
- Lets use Q1 as the preference for enhancing the listing. 
- The prompt will give the personality, the retrieved listing from Chroma, and the location (again).
- The prompt will include instructions for being creative to enhance the listing.
- Instructions need to be given in the prompt that the LLM must only only use what is in the prompt for context.

In [136]:
enhanced_listing_prompt = """
You are an accomplished and creative real estate marketing executive who works in {location}. Your buyer is 
{personality}. The buyer will be interested in the following listing. 
###
{listing}
###
Please create a new listing that encapsulate the information contained in this listing and takes into account 
the buyer's preferences as follows.
###
{preferences}
###
When creating the new listing take poetic license but remain factual. Please reference price, number of bedrooms, 
and size of the home in your poetic listing. Also, wordsmith their preferences.
into the listing. 
"""
print(enhanced_listing_prompt)


You are an accomplished and creative real estate marketing executive who works in {location}. Your buyer is 
{personality}. The buyer will be interested in the following listing. 
###
{listing}
###
Please create a new listing that encapsulate the information contained in this listing and takes into account 
the buyer's preferences as follows.
###
{preferences}
###
When creating the new listing take poetic license but remain factual. Please reference price, number of bedrooms, 
and size of the home in your poetic listing. Also, wordsmith their preferences.
into the listing. 



In [137]:
json_spec = """
###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput.

{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description wiht Preferences',
 'House_Description': 'House_Description with Preferences',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price}
###
"""
print(json_spec)


###
Return the output in JSON format as follows. Please adhere EXACTLY to this ouput.

{'Neighborhoold_Name': 'Neighborhoold_Name',
 'Neighborhood_Description': 'Neighborhood_Description wiht Preferences',
 'House_Description': 'House_Description with Preferences',
 'Square_Meters': Square_Meters,
 'Bathrooms': Bathrooms,
 'Bedrooms': Bedrooms,
 'Price': Price}
###



In [141]:
data_list = []
for idx, row in enumerate(personal_df.itertuples()):
    
    # Define personality of buyer
    personality = personalities[idx]
    print(personality)
    
    # Search Chroma
    results = chromadb.similarity_search(row.Q2)[0]
    
    # Create the preferences for this buyer
    preferences = concat_row(row)
    
    formatted_prompt = enhanced_listing_prompt.format(
        personality = personality,
        listing = results,
        preferences = preferences,
        location = location)
    
    # Create the formatted prompt
    formatted_prompt = formatted_prompt + json_spec
    
    # Get the response from the LLM
    try:
        response = answers_model.predict(formatted_prompt)
    except Exception as e:
        print(f"Error occurred: {str(e)}")
        print('Non fatal error. We just get one fewer personalized listing.')
    
    # Clean the output
    cleaned_response = cleanup(response)
    
    # Parse the response to structured JSON
    try:
        structured_response = json.loads(cleaned_response)
        data_list.append(structured_response)
    except Exception as e:
        print(f"Error occurred: {str(e)}")
        print('Non fatal error. We just get one fewer personalized listing.')
    
len(data_list)

single, arrogant, rich, and female
male, married, 2 kids, and middle class
divorced middle aged woman, working 2 jobs, with dependent adult children
young, nice, poor, male
upper middle class married woman, husband works overseas, with no children


5

In [142]:
poetic_df = pd.DataFrame(data_list)
poetic_df

Unnamed: 0,Neighborhoold_Name,Neighborhood_Description,House_Description,Square_Meters,Bathrooms,Bedrooms,Price,Neighborhood_Name
0,Hollywood,"Welcome to Hollywood, the perfect blend of gli...",Indulge in the ultimate Hollywood dream with t...,">3, 000",2,3,"$1, 500, 000",
1,,Welcome to the upscale and picturesque neighbo...,Step into your stunning new villa in Brentwood...,2500,5,5,"$2, 500, 000",Brentwood
2,Brentwood,Nestled in the heart of the charming and upsca...,Welcome to your dream home in Brentwood - a sp...,500,5,5,2500000,
3,West Hollywood,"Nestled in the heart of the city, West Hollywo...",This charming bungalow in West Hollywood offer...,1000,2,2,750000,
4,Brentwood,Experience luxury living in this stunning vill...,Welcome to your dream home in Brentwood. This ...,500,5,5,2500000,


In [143]:
for idx, row in enumerate(poetic_df.itertuples()):
    
    # Get the Buyer
    print('Buyer is:', personalities[idx])
            
    # Get the Preferences for the Buyer
    print('Preferences are:')
    for col_num in range(len(personal_df)):
        print(col_num, personal_df.iloc[idx, col_num], '\n')
    
    # Get the original listing
    results = chromadb.similarity_search(row.Neighborhood_Description)[0]
    print('This is the original listing')
    print(results, '\n')
    
    print('This is the poetic license version')
    print(row.Neighborhood_Description)
    print(row.House_Description)
    print('###\n')

Buyer is: single, arrogant, rich, and female
Preferences are:
0 I would like my house to be at least 3, 000 square feet.  

1 The most important things for me in choosing this property are location, security, and luxury.  

2 I would like amenities such as a pool, gym, and home theater.  

3 Having easy access to public transportation and being close to major highways is important to me.  

4 I will be living in the home alone.  

This is the original listing
page_content="{'Neighborhoold_Name': 'Hollywood', 'Neighborhood_Description': 'Hollywood is the iconic neighborhood in Los Angeles, known for its glitz and glamour. It is home to the famous Hollywood Walk of Fame, where you can find the stars of your favorite celebrities. The neighborhood also offers a variety of entertainment options, including theaters, museums, and live music venues. With its central location, Hollywood is the perfect place to experience the best of LA. ', 'House_Description': 'This modern house in Hollywood is

This is the original listing
page_content="{'Neighborhoold_Name': 'Brentwood', 'Neighborhood_Description': 'Brentwood is an upscale and picturesque neighborhood known for its beautiful homes and tree-lined streets. It is located near the coast of Los Angeles and offers a peaceful and luxurious lifestyle. Residents enjoy a variety of outdoor activities, as well as top-rated schools and a strong sense of community. ', 'House_Description': 'This stunning villa in Brentwood offers 500 square meters of living space and features breathtaking views of the city and ocean. With 5 bedrooms and 5 bathrooms, this house is perfect for a large family or those who love to entertain. The backyard is a private oasis with a pool, spa, and outdoor kitchen. Don`t miss the chance to live in this exclusive and sought-after neighborhood. ', 'Square_Meters': 500, 'Bathrooms': 5, 'Bedrooms': 5, 'Price': 2500000}" 

This is the poetic license version
Experience luxury living in this stunning villa located in th

It is factual, reflects the buyers preferences, and creative as requested!!