## HomeMatch

**Purpose**: AI-powered property matching application that finds homes based on natural language user preferences.

**Main Components**:
- Natural language preference collection
- AI-powered preference parsing (GPT-3.5)
- Vector database search (ChromaDB with OpenAI embeddings)
- Property recommendation generation

**Database & Embeddings**:
- Property listings from `house_listings.csv` are loaded and split into chunks
- Each listing is converted to embeddings using OpenAI's embedding model
- Embeddings are stored in ChromaDB vector database for semantic similarity search
- User preferences are parsed and converted to a query embedding
- System finds matching properties using cosine similarity in the embedding space

**Final Selection Chain**:
- A QA (Question-Answering) chain is used to analyze the top matching properties
- The chain evaluates all candidate listings against user preferences
- GPT-3.5 generates the final recommendation, selecting the best advertisement for the user

**Input**: 
- User is prompted to enter home preferences in natural language
- If user enters nothing (empty input), the system uses the default input:


  > I'm looking for a small appartment near easy transportation to access OLX office. Should be affordable, I need a garage for storage my bikes, access to public transportation, and that has lots of open space for walking a dog and exercise. Prefer a more city central next to live activities and restaurants.


**Output**: 
- Matching property recommendations from `house_listings.csv`


In [1]:
import os
import warnings
warnings.filterwarnings('ignore')  # Suppress telemetry warnings
import uuid
import pandas as pd  # Add this line

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.question_answering import load_qa_chain



os.environ["OPENAI_API_KEY"] = "YOUR API KEY"
os.environ["OPENAI_API_KEY"] = "voc-1936094634126677416702668e6a99b20d239.31596715"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

listings_file_name = "house_listings.csv"


# init llm
model_name = "gpt-3.5-turbo"
llm = ChatOpenAI(model_name=model_name, temperature=0, max_tokens=2000)



In [2]:
# Step 4: Building the User Preference Interface
# Example questions and answers (hardcoded for demonstration)
questions = [
    "How big do you want your house to be?",
    "What are 3 most important things for you in choosing this property?",
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
]

answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]


In [3]:
# ============================================================
# Default input, is user passes empty string, this will be used.
# ============================================================
default_input = """
I'm looking for a small appartment near easy transportation to access OLX office. 
Should be affordable, I need a garage for storage my bikes, 
access to public transportation, 
and that has lots of open space for walking a dog and exercise. 
Prefer a more city central next to live activities and restaurants.
"""

In [4]:
# Collect buyer preferences as natural language input.
# Returns a single string with all preferences.
def collect_preferences_natural_language():
    print("Please describe your home preferences in natural language.")
    print("Include details about: size, bedrooms, bathrooms, location, amenities, transportation, neighborhood type, etc.\n")
    print("æÆÆæ >>> there should be some input box arround, look for it <<< \n")
    preferences = input("Your preferences: ").strip()

    if preferences == "":
        print("User optout, using default input...")
        preferences = default_input

    return preferences

In [5]:
# parsing by interpret and structure user preferences for querying the vector database.
def parse_preferences_from_text(preferences_text):
    
    parse_prompt = f"""
    Based on the following user preferences described in natural language, extract and structure the key information for a home search.
    
    User preferences:
    {preferences_text}
    
    Extract the following information:
    1. Price range (if mentioned), also try to deduce from the preferences if the user is looking for a cheap or expensive house
    1. Number of bedrooms (if mentioned), also try to deduce from the preferences if the user is looking for a single or multiple bedrooms
    2. Number of bathrooms (if mentioned), also try to deduce from the preferences if the user is looking for a single or multiple bathrooms
    3. House size preferences (sqm, square meters, square footage, etc.)
    4. Location/neighborhood preferences
    5. Amenities (garage, backyard, etc.)
    6. Transportation needs
    7. Neighborhood characteristics (urban, suburban, quiet, etc.)
    8. Any other specific requirements
    
    Create a comprehensive, natural language query string that captures all these preferences.
    This query will be used to search a vector database of house listings.
    Make it detailed and specific to help find the best matching properties.
    
    Return ONLY the query string, no additional text or formatting.
    """
    
    response = llm([HumanMessage(content=parse_prompt)])
    return response.content.strip()


In [6]:
# prepare the database
def prepare_vector_db():

    # open the csv file and load the listings
    loader = CSVLoader(file_path='./' + listings_file_name)
    docs = loader.load()

    # Load CSV with pandas to extract metadata
    df = pd.read_csv(listings_file_name)
    
    # Add metadata to each document
    for i, doc in enumerate(docs):
        if i < len(df):
            row = df.iloc[i]
            # Add metadata from CSV columns
            doc.metadata.update({
                'listing_id': i,
                'neighborhood': str(row['Neighborhood']) if pd.notna(row['Neighborhood']) else '',
                'property_type': str(row['Property Type']) if pd.notna(row['Property Type']) else '',
                'price': str(row['Price']) if pd.notna(row['Price']) else '',
                'bedrooms': int(row['Bedrooms']) if pd.notna(row['Bedrooms']) else None,
                'bathrooms': int(row['Bathrooms']) if pd.notna(row['Bathrooms']) else None,
                'house_size': str(row['House Size']) if pd.notna(row['House Size']) else '',
            })

    # split the listings into chunks
    splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=1)
    split_docs = splitter.split_documents(docs)

    # create the embeddings
    embeddings = OpenAIEmbeddings()

    # create the database
    unique_collection_name = f"listings_{uuid.uuid4().hex[:8]}"
    db = Chroma.from_documents(split_docs, embeddings, collection_name=unique_collection_name)
    print(f"Total docs in collection: {len(db._collection.get()['ids'])}")
    return db


In [7]:
# Run the program
print("=" * 60)
print("starting home match ai #>")


# read the user preferences
print("Collecting user preferences...")
user_preferences = collect_preferences_natural_language()

# parse user preferences into a query string for the vector database
print("Parsing user preferences...")
user_query = parse_preferences_from_text(user_preferences)

print("\n" + "-" * 60)
print("\nUser query:")
print(f"\n{user_query}")
print("\n" + "-" * 60)

# prepare the vector database
print("Preparing DB...")
db = prepare_vector_db()

# search the vector database using parsed buyer preferences.
# parameters:
# query: Parsed preference query string
# k: Number of results to return (default: 5)
# similar_docs: List of Document objects matching the preferences
similar_docs = db.similarity_search(user_query, k=10)

# Also show the raw listings for reference
print("\n" + "=" * 60)
print(f"\nFound {len(similar_docs)} similar listings (raw results):\n")
for i, listing in enumerate(similar_docs, 1):
    print(f"Listing {i}:")
    print(listing.page_content)
    print("\n" + "-" * 60 + "\n")


# Create a prompt template for the QA chain
prompt = PromptTemplate(
    template="""
    Based on the house listings in the context, answer the following question about properties that match the user's preferences.
    Make sure you do not paraphrase the listings, and only use the information provided in the listings.

    Question: {query}

    Context: {context}

    Answer:""",
    input_variables=["query", "context"],
)

# # Load the QA chain with the prompt and Run the chain to get a refined answer
chain = load_qa_chain(llm, prompt=prompt, chain_type="stuff")
answer = chain.run(input_documents=similar_docs, query=user_query, context=similar_docs)
print("\n" + "=" * 60)
print("AI-Generated Property Recommendations:")
print("=" * 60 + "\n")
print(answer)




starting home match ai #>
Collecting user preferences...
Please describe your home preferences in natural language.
Include details about: size, bedrooms, bathrooms, location, amenities, transportation, neighborhood type, etc.

æÆÆæ >>> there should be some input box arround, look for it <<< 

User optout, using default input...
Parsing user preferences...

------------------------------------------------------------

User query:

I am looking for a small apartment in a city central location near easy transportation to access OLX office. The apartment should be affordable and have a garage for storing my bikes. I also need access to public transportation and lots of open space for walking a dog and exercise. I prefer a location next to live activities and restaurants.

------------------------------------------------------------
Preparing DB...


Failed to send telemetry event client_start: capture() takes 1 positional argument but 3 were given


Total docs in collection: 100


Found 10 similar listings (raw results):

Listing 1:
: 20
Neighborhood: Baixa
Property Type: Apartment
Price: €350,000
Bedrooms: 2
Bathrooms: 1
House Size: 85 sqm
Description: Stunning 2 bedroom apartment in the heart of Baixa, featuring modern finishes and a spacious layout. Ideal for city living with easy access to shops and restaurants.
Neighborhood Description: Baixa is known for its charming cobblestone streets, historic buildings, and vibrant atmosphere.

------------------------------------------------------------

Listing 2:
: 44
Neighborhood: Baixa
Property Type: Apartment
Price: €400,000
Bedrooms: 2
Bathrooms: 2
House Size: 90 sqm
Description: Modern apartment in the bustling Baixa district, filled with shops, restaurants, and cultural attractions. Ideal for a young couple or professionals looking to live in the heart of the city.
Neighborhood Description: Baixa is the commercial and financial center of Lisbon, with lively pedestrian streets an