## HomeMatch: Personalized Real Estate Listing Application
Developed by: ShutterStack [Arya Patil]



Organization: Future Homes Realty (Simulated)

Overview:

HomeMatch is an innovative Python application designed to revolutionize the real estate search experience by delivering personalized property listings tailored to individual buyer preferences. Leveraging the power of large language models (LLMs) via the Groq API, LangChain for prompt management, and ChromaDB for vector-based semantic search, HomeMatch transforms standard listings into engaging, buyer-specific narratives. This end-to-end solution generates a dataset of real estate listings, stores them in a vector database, processes buyer preferences, performs semantic searches, and outputs personalized descriptions—all within a modular, step-by-step workflow.

#### Part 1: Setup and Dependencies
**Purpose:** This cell prepares the environment for the HomeMatch application by installing necessary Python packages and configuring core components.  
**What Happens:**  
- Installs dependencies including LangChain, Groq API client, ChromaDB, HuggingFace embeddings, and pandas using pip.  
- Imports required Python libraries for file handling (json, csv), regular expressions (re), UUID generation (uuid), and the core AI tools (LangChain, Groq, ChromaDB).  
- Configures the Groq LLM with the `llama3-70b-8192` model, setting an API key (to be replaced by the user), temperature for response creativity, and maximum token limit.  
- Initializes the HuggingFace embedding model (`all-MiniLM-L6-v2`) for vectorizing listing data.  
- Sets up a persistent ChromaDB client at `/tmp/chroma` for storing listing embeddings, with a collection name `real_estate_listings`.  


In [14]:
!pip install -q langchain langchain-groq langchain-chroma chromadb huggingface-hub sentence-transformers

In [3]:
!pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain<1.0.0,>=0.3.21 (from langchain-community)
  Downloading langchain-0.3.21-py3-none-any.whl.metadata (7.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting langchain-text-spli

In [1]:
!huggingface-cli login



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
The token `lang` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `lang`


In [27]:
# Import necessary libraries
import json
import csv
import uuid
import re
from langchain_groq import ChatGroq
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain.embeddings import HuggingFaceEmbeddings
import chromadb

# Configure LLM with API key (replace with your actual Groq API key)
GROQ_API_KEY = "YOUR_GROQ_API_KEY"  # Replace this!
llm = ChatGroq(
    api_key=GROQ_API_KEY,
    model_name="llama3-70b-8192",
    temperature=0.7,
    max_tokens=2000
)

# Configure embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Initialize ChromaDB
client = chromadb.PersistentClient(path="/tmp/chroma")
collection_name = "real_estate_listings"

print("Dependencies installed and configured.")

Dependencies installed and configured.


#### Part 2: Generate Listings
**Purpose:** This cell generates a dataset of 10 unique real estate listings and saves them in both JSON and CSV formats for later use.  
**What Happens:**  
- Defines a `ListingGenerator` class with methods to generate, parse, and save listings.  
- Uses the LLM to generate 10 diverse listings based on a structured prompt, specifying fields like neighborhood, price, bedrooms, bathrooms, house size, description, and neighborhood description.  
- Parses the LLM output using regex to extract structured data into a list of dictionaries, handling potential errors gracefully.  
- If fewer than 10 listings are generated (due to parsing issues or LLM failure), supplements with fallback sample listings, ensuring exactly 10 unique entries.  
- Saves the listings to `listings.json` (human-readable format) and `listings.csv` (tabular format) in the Colab file system.  
- Prints progress messages, including the number of listings generated and confirmation of file saves.  


In [28]:
class ListingGenerator:
    def __init__(self):
        self.listings = []

    def generate_listings(self, num_listings=10):
        """Generate diverse real estate listings"""
        prompt = PromptTemplate.from_template(
            """Generate {num_listings} unique real estate listings. For each:

            --- LISTING START ---
            Neighborhood: [Unique name]
            Price: [$200,000-$2,000,000]
            Bedrooms: [2-6]
            Bathrooms: [1-5]
            House Size: [1,000-5,000 sqft]

            Description: [200-300 character detailed description of unique features]

            Neighborhood Description: [150-200 character description of community and amenities]
            --- LISTING END ---

            Ensure diversity in price, location type (urban/suburban/rural), style, and amenities."""
        )

        chain = prompt | llm | StrOutputParser()
        listings_text = chain.invoke({"num_listings": num_listings})

        self.listings = self._parse_listings(listings_text)
        if len(self.listings) < num_listings:
            print(f"Generated only {len(self.listings)} listings, adding fallback samples...")
            self.listings.extend(self._generate_sample_listings(num_listings - len(self.listings)))

        print(f"Total listings generated: {len(self.listings)}")

    def _parse_listings(self, text):
        """Parse LLM-generated listings"""
        pattern = r"--- LISTING START ---\s*(.*?)\s*--- LISTING END ---"
        raw_listings = re.findall(pattern, text, re.DOTALL)
        parsed = []

        for listing in raw_listings:
            try:
                fields = {
                    "neighborhood": r"Neighborhood:\s*(.*?)(?=\s*\n)",
                    "price": r"Price:\s*(.*?)(?=\s*\n)",
                    "bedrooms": r"Bedrooms:\s*(.*?)(?=\s*\n)",
                    "bathrooms": r"Bathrooms:\s*(.*?)(?=\s*\n)",
                    "house_size": r"House Size:\s*(.*?)(?=\s*\n)",
                    "description": r"Description:\s*(.*?)(?=\s*\n\s*Neighborhood Description:)",
                    "neighborhood_description": r"Neighborhood Description:\s*(.*)$"
                }
                listing_dict = {"id": str(uuid.uuid4())}

                for key, regex in fields.items():
                    match = re.search(regex, listing, re.DOTALL)
                    if match:
                        listing_dict[key] = match.group(1).strip()
                    else:
                        raise ValueError(f"Missing {key}")

                parsed.append(listing_dict)
            except Exception as e:
                print(f"Parsing error: {e}")
                continue

        return parsed

    def _generate_sample_listings(self, num):
        """Fallback sample listings"""
        sample = {
            "id": str(uuid.uuid4()),
            "neighborhood": "Pine Valley",
            "price": "$650,000",
            "bedrooms": "4",
            "bathrooms": "3",
            "house_size": "2,800 sqft",
            "description": "Modern 4-bed home with open floor plan, gourmet kitchen with island, and master suite with spa-like bath. Features smart home tech and a landscaped backyard.",
            "neighborhood_description": "Pine Valley offers suburban peace with top schools, parks, and easy highway access. Close to shops and dining."
        }
        samples = []
        for i in range(num):
            new_sample = sample.copy()
            new_sample["id"] = str(uuid.uuid4())
            new_sample["neighborhood"] = f"{sample['neighborhood']} {i+1}"
            new_sample["price"] = f"${650000 + i*50000}"
            samples.append(new_sample)
        return samples

    def save_listings(self):
        """Save listings to JSON and CSV"""
        # Save as JSON
        with open("listings.json", "w") as f:
            json.dump(self.listings, f, indent=2)
        print("Saved listings to listings.json")

        # Save as CSV
        with open("listings.csv", "w", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=self.listings[0].keys())
            writer.writeheader()
            writer.writerows(self.listings)
        print("Saved listings to listings.csv")

# Generate and save listings
print("Generating listings...")
generator = ListingGenerator()
generator.generate_listings(100)
generator.save_listings()

Generating listings...
Generated only 0 listings, adding fallback samples...
Total listings generated: 100
Saved listings to listings.json
Saved listings to listings.csv


#### Part 3: Initialize Vector Database and Load Listings
**Purpose:** This cell sets up the vector database and loads the generated listings for semantic search functionality.  
**What Happens:**  
- Defines a `HomeMatch` class to manage the application's core operations.  
- Initializes the class by loading the 10 listings from `listings.json` into memory.  
- Resets any existing ChromaDB collection named `real_estate_listings` to ensure a clean slate.  
- Creates a new ChromaDB instance with the specified embedding function for vectorizing data.  
- Converts each listing into a `Document` object, storing the full listing as JSON in `page_content` and key metadata (excluding descriptions) separately.  
- Adds these documents to the ChromaDB collection, enabling semantic search capabilities based on embeddings.  


In [29]:
class HomeMatch:
    def __init__(self, listings_file="listings.json"):
        self.listings = self._load_listings(listings_file)
        try:
            client.delete_collection(collection_name)
        except:
            pass
        self.db = Chroma(
            client=client,
            collection_name=collection_name,
            embedding_function=embeddings
        )

    def _load_listings(self, filename):
        """Load listings from JSON file"""
        with open(filename, "r") as f:
            listings = json.load(f)
        print(f"Loaded {len(listings)} listings from {filename}")
        return listings

    def store_listings_in_vector_db(self):
        """Store listings in ChromaDB"""
        if not self.listings:
            print("No listings to store!")
            return

        documents = [Document(
            page_content=json.dumps(listing),
            metadata={k: v for k, v in listing.items() if k != "description" and k != "neighborhood_description"}
        ) for listing in self.listings]

        # Pass the Document objects directly, not their page_content
        self.db.add_documents(documents=documents)
        print(f"Stored {len(documents)} listings in vector DB")

# Initialize HomeMatch and store listings
print("\nInitializing HomeMatch and storing listings...")
app = HomeMatch("listings.json")
app.store_listings_in_vector_db()


Initializing HomeMatch and storing listings...
Loaded 100 listings from listings.json
Stored 100 listings in vector DB


#### Part 4: Collect and Process Buyer Preferences
**Purpose:** This cell collects buyer preferences and transforms them into a semantic search query.  
**What Happens:**  
- Defines `get_buyer_preferences` to simulate user input with hardcoded preferences (e.g., 3 bedrooms, quiet neighborhood, specific amenities) as a dictionary of question-answer pairs.  
- Prints the collected preferences for visibility.  
- Defines `process_buyer_preferences` to use the LLM to convert these preferences into a concise search query (under 100 words), leveraging natural language understanding.  
- Passes the preferences to the LLM via a prompt, parses the output, and prints the resulting query.  
- Executes both functions to prepare the query for the next step.  


In [30]:
def get_buyer_preferences():
    """Collect and return buyer preferences"""
    questions = [
        "How big do you want your house to be?",
        "What are 3 most important things for you in choosing this property?",
        "Which amenities would you like?",
        "Which transportation options are important to you?",
        "How urban do you want your neighborhood to be?"
    ]

    answers = [
        "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
        "A quiet neighborhood, good local schools, and convenient shopping options.",
        "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
        "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
        "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
    ]

    preferences = dict(zip(questions, answers))
    print("Buyer preferences collected:")
    for q, a in preferences.items():
        print(f"{q}: {a}")
    return preferences

def process_buyer_preferences(preferences):
    """Convert preferences to a search query"""
    prompt = PromptTemplate.from_template(
        """Convert these preferences into a concise search query (under 100 words):
        {preferences}"""
    )
    chain = prompt | llm | StrOutputParser()
    query = chain.invoke({"preferences": "\n".join(f"{q}: {a}" for q, a in preferences.items())})
    print(f"Generated search query: {query}")
    return query

# Collect and process preferences
print("\nCollecting buyer preferences...")
preferences = get_buyer_preferences()
print("\nProcessing preferences...")
query = process_buyer_preferences(preferences)


Collecting buyer preferences...
Buyer preferences collected:
How big do you want your house to be?: A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
What are 3 most important things for you in choosing this property?: A quiet neighborhood, good local schools, and convenient shopping options.
Which amenities would you like?: A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
Which transportation options are important to you?: Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
How urban do you want your neighborhood to be?: A balance between suburban tranquility and access to urban amenities like restaurants and theaters.

Processing preferences...
Generated search query: Here is a concise search query based on the provided preferences:

"3 bedroom house for sale in quiet neighborhood with good schools, convenient shopping, and bus line access. Must have backyard, 2-car garag

#### Part 5: Search and Personalize Listings
**Purpose:** This cell performs a semantic search on the vector database and generates personalized listing descriptions based on buyer preferences.  
**What Happens:**  
- Defines `search_listings` to query the ChromaDB collection using the search query from Part 4, retrieving the top 3 matching listings based on semantic similarity.  
- Extracts the listing data from the search results by parsing the JSON `page_content` of each `Document`.  
- Defines `personalize_listing` to use the LLM to rewrite each matched listing’s description, emphasizing features that align with the buyer’s preferences while preserving factual details.  
- Formats the personalized output with a creative title and a "Why This Is Your Perfect Match" section with 3 reasons.  
- Executes the search and personalization, printing the results for each of the 3 matches.  


In [31]:
def search_listings(query, k=3):
    """Search for matching listings"""
    results = app.db.similarity_search(query, k=k)
    matches = [json.loads(r.page_content) for r in results]
    print(f"Found {len(matches)} matching listings")
    return matches

def personalize_listing(listing, preferences):
    """Generate personalized description"""
    prompt = PromptTemplate.from_template(
        """Original listing:
        {listing}

        Buyer preferences:
        {preferences}

        Create a personalized description:
        # [Creative Title]
        [300-400 character description emphasizing buyer preferences]
        ## Why This Is Your Perfect Match
        - [Reason 1]
        - [Reason 2]
        - [Reason 3]"""
    )

    chain = prompt | llm | StrOutputParser()
    return chain.invoke({
        "listing": json.dumps(listing),
        "preferences": "\n".join(f"{q}: {a}" for q, a in preferences.items())
    })

# Search and personalize listings
print("\nSearching listings...")
matches = search_listings(query)

print("\nPersonalizing results...")
for i, match in enumerate(matches, 1):
    print(f"\nMatch {i}:")
    print(personalize_listing(match, preferences))
    print("-" * 50)


Searching listings...
Found 3 matching listings

Personalizing results...

Match 1:
Here is a personalized description based on the buyer's preferences:

# **Suburban Serenity with Modern Convenience**

Escape to this stunning 4-bedroom retreat in Pine Valley, boasting a spacious kitchen, cozy living room, and master suite with spa-like bath. Enjoy a quiet neighborhood with top schools, parks, and easy highway access.

## Why This Is Your Perfect Match
- **Quiet Neighborhood**: Pine Valley offers suburban peace, perfect for a comfortable and relaxing lifestyle.
- **Convenient Shopping**: Easy access to shops and dining, ensuring you're never far from what you need.
- **Modern Amenities**: This energy-efficient home features smart home tech, a landscaped backyard for gardening, and a two-car garage for added convenience.
--------------------------------------------------

Match 2:
Here is a personalized description tailored to the buyer's preferences:

# Serene Suburban Oasis

Escape t