# HomeMatch: Personalized Real Estate Finder

## Project Overview

This notebook is part of my complete HomeMatch project, which is designed to create a personalized real estate listing experience. The application uses large language models (LLMs), vector databases, and dynamic user inputs to recommend property listings tailored to user preferences.

Users can input specific features such as location, number of bedrooms, bathrooms, house size, and desired amenities. The system semantically searches a database of generated real estate listings and personalizes the descriptions to match the user's profile, making the search experience engaging and relevant.

## Deployed Live Application

In addition to this notebook, I have developed and deployed a fully working live version of this project on Hugging Face Spaces.

**You can access the deployed application here:**

➡️ **[Live HomeMatch App on Hugging Face](https://huggingface.co/spaces/Joe-ElM/HomeMatch)**

Direct link (copyable):  
https://huggingface.co/spaces/Joe-ElM/HomeMatch

This live version allows users to interact with the model via a user-friendly interface and see the personalized property listings generated in real time.


## Objectives

- Collect user preferences through natural language input.
- Generate and personalize property listings using an LLM.
- Use vector similarity search to match user preferences with available properties.
- Present personalized, engaging property descriptions to users.

## Technologies Used

- Python
- LangChain
- FAISS (Vector Search)
- OpenAI API (LLM and embeddings)
- Pandas
- Gradio (Interface)

## Workflow Summary

- Generate a set of synthetic property listings using LLM.
- Embed property descriptions into vector space for similarity search.
- Collect buyer preferences via natural language input.
- Perform semantic search to retrieve the most relevant properties.
- Personalize the retrieved property descriptions to highlight buyer preferences.
- Present results in a user-friendly web interface.

## Submission Note

Please note that this notebook complements the live application and demonstrates the core components of the system such as:
- Listing generation
- Vector store preparation
- User profile creation
- Semantic search
- Personalized output generation

For the full experience, I invite you to try the deployed application using the link above.


In [1]:
import os
import pandas as pd
os.environ["OPENAI_API_KEY"]  = "API KEY"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

from langchain.llms import OpenAI
pd.set_option('display.max_colwidth', None)

In [2]:
from   langchain.chat_models                 import ChatOpenAI 
from   langchain.prompts                     import PromptTemplate
from   langchain.output_parsers              import PydanticOutputParser
from   langchain.document_loaders.csv_loader import CSVLoader
from   langchain.chains                      import LLMChain
from   langchain.vectorstores                import Chroma
from   langchain.embeddings.openai           import OpenAIEmbeddings
from   langchain.docstore.document           import Document
from   pydantic                              import BaseModel, Field, validator
from   typing                                import List
from   random                                import sample 
from   fastapi.encoders                      import jsonable_encoder
import pandas                                as     pd

## Step 1: Generate Real Estate Listings

This step involves generating synthetic property listings by means of a language model. The generated listings constitute the foundation of the property database for the HomeMatch application. Each listing is intended to include essential attributes such as neighborhood, price, property type, number of bedrooms and bathrooms, and descriptive details about the property and its surroundings.

In [3]:

class RealEstateListing(BaseModel):
    neighborhood            : str   = Field(description="Neighborhood name")
    price                   : str   = Field(description="Property price (formatted as $X,XXX,XXX)")
    bedrooms                : int   = Field(description="Number of bedrooms")
    bathrooms               : float = Field(description="Number of bathrooms")
    house_size              : str   = Field(description="House size in square feet")
    property_type           : str   = Field(description="Type of property (house, condo, etc.)")
    description             : str   = Field(description="Detailed property description")
    neighborhood_description: str   = Field(description="Description of the neighborhood")

    # Validator for price format
    @validator('price')
    def price_must_start_with_dollar(cls, v):
        if not isinstance(v, str) or not v.startswith('$'):
            raise ValueError('Price must be a string starting with $')
        return v

class ListingCollection(BaseModel):
    listings: List[RealEstateListing] = Field(description="Collection of real estate listings")


## Step 2: Create Listing Generation Template

This template instructs the LLM to generate diverse, detailed property listings following our defined schema.

In [4]:
prompt_template = """
Generate {num_listings} unique real estate listings in JSON format that match the following Pydantic model schema:

{format_instructions}

Each listing must include:
- A detailed, engaging property description.
- A vivid neighborhood description, mentioning local amenities and community character.
- Prices formatted as "$X,XXX,XXX".

Use this example as inspiration for the style and format of descriptions:

Example Listing:
Neighborhood           : Green Oaks
Price                  : $800,000
Bedrooms               : 3
Bathrooms              : 2
House Size             : 2,000 sqft
Property Type          : Single Family Home

Description:
Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, \
2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. \
Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. \
The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, \
perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description:
Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, \
community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of \
coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.

Ensure each listing has unique features, neighborhood characteristics, and varied architectural styles. Avoid repetition.

Number of listings: {num_listings}
"""


## Step 3: Generate Listings

Execute the chain to generate a collection of property listings and save them to a CSV file.

In [7]:

# Reuse parser
parser = PydanticOutputParser(pydantic_object=ListingCollection)

# Define the prompt template


prompt = PromptTemplate(
                        template          = prompt_template,
                        input_variables   = ["num_listings"],
                        partial_variables = {"format_instructions": parser.get_format_instructions()}
                       )

# Initialize the LLM (your existing setup)
llm   = ChatOpenAI(temperature = 0.7)

# Create chain
chain = LLMChain(llm    = llm, 
                 prompt = prompt)

# Run the chain to generate listings
response = chain.run(num_listings = 10)

# Parse response
result   = parser.parse(response)

# Convert to DataFrame
df = pd.DataFrame(jsonable_encoder(result.listings))
df.to_csv('listings.csv', index=False)  # Save to CSV file
print("Listings saved to listings.csv")
df.head(2)


Listings saved to listings.csv


Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,property_type,description,neighborhood_description
0,Sunset Heights,"$1,200,000",4,3.5,"3,500 sqft",Luxury Home,"Experience luxury living in this modern masterpiece in Sunset Heights. This 4-bedroom, 3.5-bathroom home features high-end finishes, a gourmet kitchen, and a stunning rooftop deck with panoramic city views. The spacious layout and designer touches make this home perfect for entertaining.","Sunset Heights is a vibrant neighborhood known for its trendy restaurants, boutique shops, and lively nightlife. Enjoy easy access to hiking trails, parks, and cultural attractions, making it an ideal location for those seeking an active lifestyle."
1,Lakeview Terrace,"$950,000",3,2.0,"2,200 sqft",Single Family Home,"Welcome to this charming 3-bedroom, 2-bathroom home in Lakeview Terrace. The open floor plan, vaulted ceilings, and cozy fireplace create a warm and inviting atmosphere. The backyard oasis with a pool and patio is perfect for relaxing or entertaining guests.","Lakeview Terrace offers a peaceful retreat with scenic lake views, walking trails, and a sense of tranquility. Residents enjoy easy access to local cafes, farmers markets, and community events, making it a desirable place to call home."


In [9]:
#This code to clean the DB


# import shutil
# import os

# # Path to your ChromaDB directory
# chroma_db_path = "./chroma_db"

# # Check if the directory exists and remove it
# if os.path.exists(chroma_db_path):
#     shutil.rmtree(chroma_db_path)
#     print("Old ChromaDB folder cleared.")
# else:
#     print("No existing ChromaDB folder found, proceeding.")


Old ChromaDB folder cleared.


## Step 4: Prepare Vector Database

The generated listings are converted into vector embeddings using OpenAI's embedding models and stored in FAISS for efficient similarity search.


In [10]:


# Prepare documents from DataFrame
documents = []
for idx, row in df.iterrows():
    content = f"""
    Neighborhood            : {row['neighborhood']}
    Price                   : {row['price']}
    Bedrooms                : {row['bedrooms']}
    Bathrooms               : {row['bathrooms']}
    House Size              : {row['house_size']}
    Property Type           : {row['property_type']}
    Description             : {row['description']}
    Neighborhood Description: {row['neighborhood_description']}
    """
    documents.append(Document(page_content=content.strip(), metadata={"id": str(idx)}))

print(f" Prepared {len(documents)} documents.")

# Step 3: Chunking (optional for larger datasets)
chunk_size = 10  # Adjust as needed
chunks     = [documents[i:i + chunk_size] for i in range(0, len(documents), chunk_size)]

# Step 4: Initialize embedding model
embedding  = OpenAIEmbeddings()

# Step 5: Process chunks and build ChromaDB
vectorstore = None

for i, chunk in enumerate(chunks):
    if i == 0:
        vectorstore = Chroma.from_documents(chunk, embedding, persist_directory=chroma_db_path)
    else:
        vectorstore.add_documents(chunk)
    print(f"Processed chunk {i + 1}/{len(chunks)}")

# Step 6: Persist the vector database
vectorstore.persist()
print("ChromaDB vectorstore created and saved successfully!")


 Prepared 20 documents.
Processed chunk 1/2
Processed chunk 2/2
ChromaDB vectorstore created and saved successfully!


## Step 5: Capture Buyer Preferences

The code below creates a sample buyer profile to demonstrate the personalization capabilities.

In [11]:
# Step 4: Buyer Preference Interface

# Example: predefined buyer preferences (you can make this interactive later if you want)
questions = [   
    "How big do you want your house to be?", 
    "What are the 3 most important things for you in choosing this property?", 
    "Which amenities would you like?", 
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
    "What architectural style do you prefer?",  
    "Do you prefer newer constructions or older, historic homes?",  
    "Is proximity to parks or green spaces important to you?",  
]

answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters.",
    "Contemporary or modern architectural style with clean lines.",
    "Prefer newer constructions with modern amenities.",
    "Yes, living near parks and green spaces is very important.",
]


# Combine preferences into a single query string
buyer_profile = "\n".join([f"{q} {a}" for q, a in zip(questions, answers)])

print("Buyer profile created:\n")
print(buyer_profile)


Buyer profile created:

How big do you want your house to be? A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
What are the 3 most important things for you in choosing this property? A quiet neighborhood, good local schools, and convenient shopping options.
Which amenities would you like? A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
Which transportation options are important to you? Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
How urban do you want your neighborhood to be? A balance between suburban tranquility and access to urban amenities like restaurants and theaters.
What architectural style do you prefer? Contemporary or modern architectural style with clean lines.
Do you prefer newer constructions or older, historic homes? Prefer newer constructions with modern amenities.
Is proximity to parks or green spaces important to you? Yes, living near parks and gre

## Step 6: Perform Semantic Search

Using the buyer's preferences, this code finds the most relevant property listings.### Step 5

In [12]:
# Load the existing vectorstore
embedding   = OpenAIEmbeddings()
vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embedding)

# Define your query (buyer profile from Step 4)
query = buyer_profile

# Search the vectorstore
search_results = vectorstore.similarity_search(query, k = 3)  # k = number of results

# Show the top results
for idx, result in enumerate(search_results):
    print(f"\nMatch {idx + 1}:")
    print(result.page_content)



Match 1:
Neighborhood            : Historic Old Town
    Price                   : $850,000
    Bedrooms                : 3
    Bathrooms               : 2.5
    House Size              : 2,400 sqft
    Property Type           : Historic Home
    Description             : Step back in time in this charming historic home in Historic Old Town. This 3-bedroom, 2.5-bathroom home features original architectural details, a cozy fireplace, and a beautifully landscaped backyard. The character and charm of this home make it a true gem.
    Neighborhood Description: Historic Old Town is a quaint and picturesque neighborhood filled with historic homes, cobblestone streets, and boutique shops. Stroll through the historic district, visit local museums, and dine at charming cafes, immersing yourself in the rich history and culture of the area.

Match 2:
Neighborhood            : Meadowbrook Heights
    Price                   : $850,000
    Bedrooms                : 3
    Bathrooms               : 

## Step 7: Personalize Listings

The final step creates personalized descriptions that highlight features matching the buyer's preferences.

In [13]:
personalization_template = """
        You are a top-performing, empathetic real estate agent who crafts engaging, personalized property descriptions.
        Your task is to rewrite the following property listing to deeply resonate with the buyer’s preferences.
        Buyer’s profile and preferences:
        {buyer_profile}
        Original property listing:
        {listing_description}

        Guidelines:
        - Highlight specific features that match the buyer's preferences with enthusiasm.
        - Create a warm, inviting narrative that helps the buyer emotionally connect with the property.
        - Emphasize lifestyle benefits and community aspects, not just the physical features.
        - Maintain factual accuracy — do not invent details.
        - Use descriptive, vivid language to bring the property to life.
        - If certain buyer preferences aren't covered by the listing, gracefully omit them (do not fabricate).
        - Aim for a friendly and professional tone, like a personal recommendation.
        Provide the fully rewritten, personalized property description below:
        """

personalization_prompt = PromptTemplate(
                                        input_variables = ["buyer_profile", "listing_description"],
                                        template        = personalization_template,
                                       )

# Initialize LLM chain
personalization_chain = LLMChain(llm    = llm, 
                                 prompt = personalization_prompt)


# Process each search result individually
for idx, result in enumerate(search_results):
    personalized = personalization_chain.run(
                                            buyer_profile       = buyer_profile,
                                            listing_description = result.page_content
                                        )
    print(f"\nPersonalized Match {idx + 1}:\n")
    print(personalized)
    print("\n" + "-" * 80 + "\n")



Personalized Match 1:

Step into your dream home in the serene neighborhood of Historic Old Town! Priced at $850,000, this charming 3-bedroom, 2.5-bathroom historic home offers 2,400 sqft of cozy living space that perfectly fits your preferences.

As you step inside, you'll be enchanted by the original architectural details that adorn this beautiful home. Imagine cozying up by the fireplace in the spacious living room or preparing a delicious meal in the well-equipped kitchen with ample space for your culinary creations.

The backyard of this home is a gardener's paradise, perfect for cultivating your own green oasis. And with a two-car garage, you'll have plenty of space for your vehicles and storage needs.

Located in the quiet and picturesque Historic Old Town neighborhood, you'll be surrounded by cobblestone streets and boutique shops, creating a tranquil atmosphere that is perfect for unwinding after a long day. Plus, the proximity to good local schools and convenient shopping op

## Conclusion

This notebook has demonstrated a complete AI-powered real estate personalization system that:
1. Generates diverse property listings
2. Creates a searchable vector database 
3. Captures buyer preferences
4. Finds semantically matching properties
5. Personalizes property descriptions

The implementation shows how language models can transform standard property listings into personalized narratives that better connect with potential buyers.