# HomeMatch: Personalized Real Estate Matching Application

This notebook demonstrates the HomeMatch application, which uses AI and vector databases to match home buyers with personalized real estate listings. The application:

1. Generates synthetic real estate listings using LLMs
2. Stores these listings in a vector database
3. Collects buyer preferences
4. Performs semantic search to find matching properties
5. Personalizes property descriptions based on buyer preferences

## Project Structure

The project uses a modular architecture with the following components:

- **models/listing_generator.py**: Class for generating real estate listings
- **models/vector_db.py**: Class for managing the vector database
- **models/preference_manager.py**: Class for handling buyer preferences
- **models/listing_personalizer.py**: Class for personalizing listing descriptions
- **models/home_match.py**: Main application class that coordinates all components
- **utils/helpers.py**: Utility functions
- **main.py**: Script to run the application

This notebook will demonstrate how to use these components together to create a complete real estate matching system.

## 1. Setup and Dependencies

First, we'll install and import all necessary libraries:

In [1]:
# Install required packages, including ipykernel
!pip install langchain langchain-community langchain-openai openai chromadb python-dotenv pandas ipykernel jupyter 
!pip install langchain_openai

Collecting langchain-community
  Using cached langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-openai
  Using cached langchain_openai-0.3.30-py3-none-any.whl.metadata (2.4 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain-community)
  Downloading aiohttp-3.12.15-cp311-cp311-macosx_11_0_arm64.whl.metadata (7.7 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Using cached dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Using cached pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Using cached httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting aiohappyeyeballs>=2.5.0 (from aiohttp<4.0.0,>=3.8.3->langchain-community)
  Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.4.0 (from aiohttp<4.0.0,>=3.8.3->langchain-community)


In [1]:
# Fix for "requires the ipykernel package" error
# Run this if you still have kernel issues after selecting the "HomeMatch" kernel
!pip install ipykernel --upgrade
!python -m ipykernel install --user --name=python3
!python -m ipykernel install --user --name=homematch --display-name="HomeMatch"
print("Restart Jupyter after running this cell, then select the 'HomeMatch' kernel")

Installed kernelspec python3 in /Users/pawel.ladyzynski/Library/Jupyter/kernels/python3
Installed kernelspec homematch in /Users/pawel.ladyzynski/Library/Jupyter/kernels/homematch
Restart Jupyter after running this cell, then select the 'HomeMatch' kernel


### Setting Up a Virtual Environment

#### For macOS/Linux:
```bash
chmod +x setup.sh
./setup.sh
```


After running the setup script:
1. Activate the virtual environment (if not already activated):
   ```bash
   # On macOS/Linux:
   source venv/bin/activate

2. Set your OpenAI API key in the `.env` file

3. Start Jupyter and select the "HomeMatch" kernel (or anoher venv if you created it already):
   ```bash
   jupyter notebook
   ```

### IMPORTANT: Selecting the Correct Kernel

If you encounter the error "Running cells requires the ipykernel package," you need to select the correct kernel
For detailed setup instructions, see the `SETUP.md` file in the project directory.

In [2]:
# Import necessary libraries and modules
import os
import sys
from pathlib import Path

# Add project root to path to allow imports from other directories
project_root = Path.cwd()
sys.path.append(str(project_root))

# Import modules from our project structure
from models.listing_generator import ListingGenerator
from models.vector_db import VectorDBManager
from models.preference_manager import PreferenceManager
from models.listing_personalizer import ListingPersonalizer
from models.home_match import HomeMatch
from utils.helpers import setup_environment


# Setup environment (load API keys from .env)
setup_environment()

# Check if API key is set
if not os.environ.get("OPENAI_API_KEY"):
    print("WARNING: OpenAI API key not found. Please set it in .env file or add it below:")
    # os.environ["OPENAI_API_KEY"] = "your-api-key-here"

In [15]:
# Set up OpenAI API specifically for Vocareum
import os
import openai

# Vocareum-specific OpenAI API configuration
openai.api_base = "https://openai.vocareum.com/v1"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

# Set your Vocareum API key here (this is an example format, replace with your actual key)
#vocareum_api_key = "vocerum-key"  # Replace with your actual key
#os.environ["OPENAI_API_KEY"] = vocareum_api_key
# Get API key from .env file
vocareum_api_key = os.getenv("OPENAI_API_KEY")
if not vocareum_api_key:
    print("WARNING: No API key found in .env file. Please add your Vocareum API key to the .env file.")
  
openai.api_key = vocareum_api_key

# Print configuration for verification
print(f"OpenAI API Base URL: {openai.api_base}")
print(f"OpenAI API Key set: {'Yes' if openai.api_key else 'No'}")
print(f"OPENAI_API_KEY environment variable set: {'Yes' if os.environ.get('OPENAI_API_KEY') else 'No'}")
print(f"OPENAI_API_BASE environment variable set: {'Yes' if os.environ.get('OPENAI_API_BASE') else 'No'}")

# For newer OpenAI client
try:
    from openai import OpenAI
    client = OpenAI(
        base_url = "https://openai.vocareum.com/v1",
        api_key = vocareum_api_key
    )
    print("Successfully created OpenAI client with Vocareum configuration")
except Exception as e:
    print(f"Error creating OpenAI client: {str(e)}")

OpenAI API Base URL: https://openai.vocareum.com/v1
OpenAI API Key set: Yes
OPENAI_API_KEY environment variable set: Yes
OPENAI_API_BASE environment variable set: Yes
Successfully created OpenAI client with Vocareum configuration


In [4]:
# Test LangChain integration with Vocareum OpenAI API
import os
import sys
from pathlib import Path

# Ensure Vocareum configuration is applied to LangChain
try:
    # Import langchain modules
    from langchain_community.llms import OpenAI
    from langchain_community.embeddings.openai import OpenAIEmbeddings
    
    # Create a test OpenAI LLM instance with explicit configuration
    llm = OpenAI(
        openai_api_key=os.environ["OPENAI_API_KEY"],
        openai_api_base=os.environ["OPENAI_API_BASE"],
        temperature=0.7
    )
    
    # Test embeddings
    embeddings = OpenAIEmbeddings(
        openai_api_key=os.environ["OPENAI_API_KEY"],
        openai_api_base=os.environ["OPENAI_API_BASE"]
    )
    
    # Try a simple completion
    try:
        print("Testing simple LLM completion...")
        response = llm("Hello, this is a test for Vocareum OpenAI API. Please respond with 'API is working!'")
        print(f"LLM Response: {response}")
        print("✅ LangChain OpenAI LLM is configured correctly")
    except Exception as e:
        print(f"❌ Error using LLM: {str(e)}")
    
    # Try a simple embedding
    try:
        print("\nTesting embeddings...")
        test_embedding = embeddings.embed_query("This is a test")
        print(f"Embedding dimensions: {len(test_embedding)}")
        print("✅ LangChain OpenAI Embeddings are configured correctly")
    except Exception as e:
        print(f"❌ Error using embeddings: {str(e)}")
        
except ImportError as e:
    print(f"❌ Error importing LangChain modules: {str(e)}")
except Exception as e:
    print(f"❌ Unexpected error: {str(e)}")

print("\nIf you see any errors above, check that:")
print("1. You have entered the correct Vocareum API key")
print("2. You have installed all required packages: langchain, langchain-community, langchain-openai")
print("3. Your Vocareum API is active and has access to OpenAI services")

  llm = OpenAI(
  embeddings = OpenAIEmbeddings(
  response = llm("Hello, this is a test for Vocareum OpenAI API. Please respond with 'API is working!'")


Testing simple LLM completion...
LLM Response: 

API is working!
✅ LangChain OpenAI LLM is configured correctly

Testing embeddings...
Embedding dimensions: 1536
✅ LangChain OpenAI Embeddings are configured correctly

If you see any errors above, check that:
1. You have entered the correct Vocareum API key
2. You have installed all required packages: langchain, langchain-community, langchain-openai
3. Your Vocareum API is active and has access to OpenAI services


## 2. Generate Real Estate Listings

In this section, we'll use OpenAI's LLM to generate synthetic real estate listings. We'll create at least 10 diverse listings with:
- Neighborhood information
- Price
- Number of bedrooms and bathrooms
- House size
- Detailed property description
- Neighborhood description

These listings will be used to populate our vector database.

In [5]:
# Initialize the ListingGenerator
from models.listing_generator import ListingGenerator
listing_generator = ListingGenerator(temperature=0.7)

# Generate listings
print("Generating real estate listings...")
listings = listing_generator.generate_listings(num_listings=10)

# Display first few listings
for i, listing in enumerate(listings[:3]):
    print(f"Listing {i+1}:")
    print(f"Neighborhood: {listing['neighborhood']}")
    print(f"Price: {listing['price']}")
    print(f"Bedrooms: {listing['bedrooms']}")
    print(f"Bathrooms: {listing['bathrooms']}")
    print(f"House Size: {listing['house_size']}")
    print(f"Description: {listing['description']}")
    print(f"Neighborhood Description: {listing['neighborhood_description']}")
    print("-" * 80)

# Save listings to a JSON file
listing_generator.save_listings_to_file(listings, 'data/listings.json')

print(f"All {len(listings)} listings saved to 'data/listings.json'")

  self.listing_chain = LLMChain(llm=self.llm, prompt=self.listing_prompt)
  response = self.listing_chain.run(number=i)


Generating real estate listings...
Generating listing 1/10...
Generating listing 2/10...
Generating listing 3/10...
Generating listing 4/10...
Generating listing 5/10...
Generating listing 6/10...
Generating listing 7/10...
Generating listing 8/10...
Generating listing 9/10...
Generating listing 10/10...
Listing 1:
Neighborhood: Oakhurst Park
Price: $850,000
Bedrooms: 4
Bathrooms: 3
House Size: 2,500 sqft
Description: 
Neighborhood Description: 
--------------------------------------------------------------------------------
Listing 2:
Neighborhood: Oakwood Heights
Price: 
Bedrooms: 
Bathrooms: 
House Size: 
Description: This charming colonial-style home in the coveted neighborhood of Oakwood Heights is the perfect blend of traditional and modern living. As you enter the home, you are greeted by a spacious living room with a cozy fireplace and large windows that fill the room with natural light. The recently renovated kitchen features granite countertops, stainless steel appliances, an

## 3. Store Listings in Vector Database

Now, we'll store our generated listings in a vector database (ChromaDB). We'll:
1. Initialize the vector database
2. Convert each listing into embeddings that capture semantic meaning
3. Store these embeddings in the database for later retrieval

The vector database will allow us to perform semantic searches based on buyer preferences.

In [6]:
# Initialize the VectorDBManager
from models.vector_db import VectorDBManager
vector_db = VectorDBManager(persist_directory="data/vectordb")

# Initialize vector database with listings
vector_db.initialize_with_listings(listings)

print("Vector database initialized successfully")

  self.vectordb = Chroma(


Loaded existing vector database from data/vectordb
Vector database initialized with 10 listings
Vector database initialized successfully


  self.vectordb.persist()


## 4. Build User Preference Interface

In this section, we'll create a function to collect buyer preferences. We'll:
1. Define a set of questions about desired home features
2. Collect responses (either interactively or using predefined answers)
3. Process these preferences to prepare them for searching

In [7]:
# Initialize the PreferenceManager
from models.preference_manager import PreferenceManager
preference_manager = PreferenceManager()

# Collect buyer preferences (using defaults for now)
buyer_preferences = preference_manager.collect_preferences(interactive=False)

# Display the collected preferences
preference_manager.display_preferences(
    preference_manager.default_questions,
    buyer_preferences
)

# Combine preferences into a single query
preference_query = preference_manager.combine_preferences(buyer_preferences)
print("\nCombined Preference Query:")
print(preference_query)

Buyer Preferences:
1. How big do you want your house to be?
   A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
2. What are 3 most important things for you in choosing this property?
   A quiet neighborhood, good local schools, and convenient shopping options.
3. Which amenities would you like?
   A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
4. Which transportation options are important to you?
   Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
5. How urban do you want your neighborhood to be?
   A balance between suburban tranquility and access to urban amenities like restaurants and theaters.


Combined Preference Query:
A comfortable three-bedroom house with a spacious kitchen and a cozy living room. A quiet neighborhood, good local schools, and convenient shopping options. A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.

## 5. Implement Semantic Search

Now we'll implement the semantic search functionality:
1. Process the buyer preferences
2. Convert preferences to embeddings
3. Query the vector database to find matching listings
4. Rank results by relevance

In [8]:
# Search for matching listings using vector database
matching_listings = vector_db.search(preference_query, num_results=3)

# Display matching listings
print("Top Matching Listings:")
for i, listing in enumerate(matching_listings):
    print(f"\nMatch {i+1} (Similarity: {listing['similarity_score']:.4f}):")
    print(f"Neighborhood: {listing['neighborhood']}")
    print(f"Price: {listing['price']}")
    print(f"Bedrooms: {listing['bedrooms']}")
    print(f"Bathrooms: {listing['bathrooms']}")
    print(f"House Size: {listing['house_size']}")
    print(f"Description: {listing['description']}")
    print(f"Neighborhood Description: {listing['neighborhood_description']}")
    print("-" * 80)

Top Matching Listings:

Match 1 (Similarity: 0.6899):
Neighborhood: West Haven
Price: 
Bedrooms: 
Bathrooms: 
House Size: 
Description: This stunning 3 bedroom, 2.5 bathroom home in the vibrant neighborhood of West Haven is the perfect blend of modern luxury and historic charm. As you enter through the grand front porch, you are greeted by a spacious living room with soaring ceilings and large windows that flood the space with natural light. The recently renovated kitchen boasts high-end stainless steel appliances, quartz countertops, and a large island, making it a chef's dream. The primary bedroom features a luxurious en-suite bathroom and a walk-in closet. The backyard oasis is complete with a deck, perfect for outdoor entertaining, and a detached studio, perfect for a home office or gym. This home also has a two-car garage and a long driveway, providing ample parking space. Located just steps away from trendy restaurants, cafes, and shops, and only a short commute to downtown, this

## 6. Personalize Listing Descriptions

In this section, we'll use the LLM to personalize the descriptions of the matching listings based on buyer preferences:
1. For each matched listing, identify aspects that align with buyer preferences
2. Use the LLM to rewrite the description emphasizing these aspects
3. Ensure factual integrity is maintained

In [9]:
# Initialize the ListingPersonalizer
from models.listing_personalizer import ListingPersonalizer
listing_personalizer = ListingPersonalizer(temperature=0.5)

# Personalize descriptions for matching listings
personalized_listings = listing_personalizer.personalize_listings(matching_listings, buyer_preferences)

# Display personalized listings
listing_personalizer.display_personalized_listings(personalized_listings)

Personalizing description for listing in West Haven...
Personalizing description for listing in West Haven...
Personalizing description for listing in West Haven...

=== PERSONALIZED LISTINGS ===

Personalized Match 1:
Neighborhood: West Haven
Price: 
Bedrooms: 
Bathrooms: 
House Size: 

ORIGINAL Description: This stunning 3 bedroom, 2.5 bathroom home in the vibrant neighborhood of West Haven is the perfect blend of modern luxury and historic charm. As you enter through the grand front porch, you are greeted by a spacious living room with soaring ceilings and large windows that flood the space with natural light. The recently renovated kitchen boasts high-end stainless steel appliances, quartz countertops, and a large island, making it a chef's dream. The primary bedroom features a luxurious en-suite bathroom and a walk-in closet. The backyard oasis is complete with a deck, perfect for outdoor entertaining, and a detached studio, perfect for a home office or gym. This home also has a t

## 7. Test the Complete Application

Now we'll put everything together and test the complete HomeMatch application:
1. Define a function to run the entire pipeline
2. Test with different buyer preferences
3. Analyze the results

In [10]:
# Initialize the HomeMatch application
home_match = HomeMatch()

# Run the complete application
results = home_match.run(num_listings=10, num_results=3, interactive=False)

# The run method handles:
# 1. Loading or generating listings
# 2. Setting up the vector database
# 3. Collecting buyer preferences
# 4. Searching for matching listings
# 5. Personalizing descriptions
# 6. Displaying results

Loaded existing vector database from /Users/pawel.ladyzynski/Desktop/Udacity/udacity-project-rag-realestate/data/vectordb
=== RUNNING HOMEMATCH APPLICATION ===

Loaded 10 listings from '/Users/pawel.ladyzynski/Desktop/Udacity/udacity-project-rag-realestate/data/listings.json'
Vector database initialized with 10 listings
Buyer Preferences:
1. How big do you want your house to be?
   A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
2. What are 3 most important things for you in choosing this property?
   A quiet neighborhood, good local schools, and convenient shopping options.
3. Which amenities would you like?
   A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
4. Which transportation options are important to you?
   Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.
5. How urban do you want your neighborhood to be?
   A balance between suburban tranquility and access to u

In [12]:
# Test with a different set of buyer preferences
alternative_preferences = [
    "I'm looking for a luxury condo with at least 2 bedrooms in an urban setting.",
    "Modern design, high-end finishes, and good security are my priorities.",
    "I'd like a fitness center, rooftop terrace, and concierge service if possible.",
    "I need to be within walking distance of public transit and close to downtown.",
    "I prefer a vibrant urban neighborhood with restaurants, shopping, and nightlife."
]

# Override default preferences in HomeMatch
home_match.preference_manager.default_answers = alternative_preferences

# Run HomeMatch with alternative preferences
alternative_results = home_match.run(num_results=3, interactive=False)

# Note: The listings and vector database are reused from the previous run

=== RUNNING HOMEMATCH APPLICATION ===

Loaded 10 listings from '/Users/pawel.ladyzynski/Desktop/Udacity/udacity-project-rag-realestate/data/listings.json'
Vector database initialized with 10 listings
Buyer Preferences:
1. How big do you want your house to be?
   I'm looking for a luxury condo with at least 2 bedrooms in an urban setting.
2. What are 3 most important things for you in choosing this property?
   Modern design, high-end finishes, and good security are my priorities.
3. Which amenities would you like?
   I'd like a fitness center, rooftop terrace, and concierge service if possible.
4. Which transportation options are important to you?
   I need to be within walking distance of public transit and close to downtown.
5. How urban do you want your neighborhood to be?
   I prefer a vibrant urban neighborhood with restaurants, shopping, and nightlife.


Searching for matching listings...

Personalizing descriptions...
Personalizing description for listing in Northside Heights...

# Summary and Conclusion

We've successfully built the HomeMatch application:

1. Generates synthetic real estate listings using LLMs
2. Stores these listings in a vector database with semantic embeddings
3. Collects and processes buyer preferences
4. Performs semantic search to find matching properties
5. Personalizes property descriptions based on buyer preferences


## Next Steps and Improvements

Some potential enhancements for the application include:
- Implementing a web interface for better user interaction
- Adding more sophisticated preference parsing
- Expanding the listing database with more properties
- Adding filters for specific criteria like price range or number of bedrooms
- Implementing user feedback to improve matching over time