## Real Estate Agent using LLM

#### Imports

In [2]:
import pandas as pd
from langchain.chat_models import ChatOpenAI
import chromadb
from chromadb.utils import embedding_functions

#### Select and create LLM model

In [4]:
# Use GPT3.5 turbo as llm
model_name = 'gpt-3.5-turbo'
llm = ChatOpenAI(model_name=model_name, temperature=0)

### Generate Synthetic data using LLM

In [5]:
prompt_to_synthetic_data = """Create a detailed CSV file that captures the details of 20 unique real estate properties. Each property should be organized into columns with the following attributes:

- Neighborhood: area where the property is located
- Price: property's market value in USD, formatted
- Bedrooms: number of bedrooms in the property
- Bathrooms: number of bathrooms in the property
- House Size: property's size in square feet

Enhance each property's entry with a detailed paragraph that highlights its unique features, amenities, and sustainable aspects. Mention details such as energy-efficient appliances, use of eco-friendly materials, and gardens.

Sample Entry Format:
Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description
"Green Oaks","$800,000",3,2,"2,000 sqft","Located in Green Oaks, this eco-friendly property includes a 3-bedroom, 2-bathroom setup with solar panels and efficient insulation. The home offers ample natural light, hardwood floors, and an open kitchen that leads to a lush backyard, making it an ideal retreat for those who value sustainability. Green Oaks is known for its vibrant and eco-conscious community, featuring organic shops, community gardens, and excellent transit options, making it perfect for individuals who prioritize environmental responsibility and community involvement.","Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."

Format the CSV with clear headers for each column. Follow the sample provided to format each subsequent row with specific information for each property listing. Generate 20 distinct listings."""

In [88]:
real_estate_llm_data = llm.predict(prompt_to_synthetic_data)

In [89]:
print(real_estate_llm_data)

Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description
"Willow Creek","$650,000",4,3,"2,500 sqft","Nestled in the serene neighborhood of Willow Creek, this spacious property boasts 4 bedrooms and 3 bathrooms. The house features energy-efficient windows, a modern kitchen with stainless steel appliances, and a backyard perfect for gardening. With a focus on sustainability, this home is equipped with a rainwater harvesting system and LED lighting throughout. Willow Creek offers a peaceful environment with tree-lined streets, parks, and a strong sense of community.","Willow Creek is a tranquil neighborhood with tree-lined streets and parks, ideal for families and nature lovers. Residents can enjoy the nearby Willow Creek Community Center, hiking trails, and local farmers' markets. The neighborhood is known for its strong community spirit and commitment to sustainability."
"Sunset Heights","$900,000",5,4,"3,000 sqft","Situated in the prestigious Sunset Heights

#### Generated data needed formatting, after formatting the csv file is created. This file will be used further ads listinigs.

### Store real estate listings in cromadb with embedding

In [90]:
re_listings_df = pd.read_csv('real-estate-listings-generated.csv')

In [91]:
re_listings_df.head()

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description,Neighborhood Description
0,Willow Creek,"$650,000",4,3,"2,500 sqft",Nestled in the serene neighborhood of Willow C...,Willow Creek is a family-friendly neighborhood...
1,Sunset Hills,"$900,000",5,4,"3,000 sqft",Situated in the prestigious Sunset Hills neigh...,Sunset Hills is an upscale neighborhood known ...
2,Riverfront Estates,"$1,200,000",6,5,"4,500 sqft","Located in the exclusive Riverfront Estates, t...",Riverfront Estates is a prestigious waterfront...
3,Maple Grove,"$500,000",3,2,"1,800 sqft",Set in the charming neighborhood of Maple Grov...,Maple Grove is a quaint neighborhood with tree...
4,Oakwood Heights,"$750,000",4,3,"2,300 sqft",Situated in the desirable neighborhood of Oakw...,Oakwood Heights is a trendy neighborhood with ...


In [92]:
CHROMA_DATA_PATH = "chroma_data/"
EMBED_MODEL = "text-embedding-ada-002"
COLLECTION_NAME = "property_Description"

In [95]:
# Create embedding function to embed description
embedding_func = embedding_functions.OpenAIEmbeddingFunction(model_name=EMBED_MODEL)

In [96]:
# Initialize cromadb client and set metadata options
chroma_client = chromadb.PersistentClient()

metadata_options = {
    "hnsw:space": "cosine" 
}

In [97]:
# Create a collection in cromadb for further use
collection = chroma_client.get_or_create_collection(
    name=COLLECTION_NAME, 
    metadata=metadata_options, 
    embedding_function=embedding_func
)

In [99]:
# create lists of all data to be stored ini cromaadb including text, metadata and ids
all_documents = []
all_metadata = []
all_ids = []
for i, row in re_listings_df.iterrows():
    metadata = {
        'Neighborhood': row['Neighborhood'],
        'Price': row['Price'],
        'Bedrooms': row['Bedrooms'],
        'Bathrooms': row['Bathrooms'],
        'House Size': row['House Size'],
    }
    document = row['Description']
    all_ids.append(str(i))
    all_documents.append(document)
    all_metadata.append(metadata)

In [103]:
print(all_documents[0])

Nestled in the serene neighborhood of Willow Creek, this spacious property boasts 4 bedrooms and 3 bathrooms. The house features energy-efficient appliances, a solar water heater, and a backyard garden with fruit trees and a composting area. With a large living room and a cozy fireplace, this home is perfect for families looking for a sustainable and comfortable living space.


In [104]:
collection.add(
    documents=all_documents,
    metadatas=all_metadata,
    ids=all_ids
)

### Quey cromadb for getting related context to customer prefrences

In [105]:
# Function to get related property details from cromadb
def get_listings_from_prefrences(collection, query, max_results):
    results = collection.query(query_texts=query, n_results=max_results)
    property_descriptions = results['documents'][0]
    property_metadata = results['metadatas'][0]
    result_sources = results['ids'][0]
    return property_descriptions, property_metadata, result_sources

In [109]:
query = "Would like to buy home with 3 bedrooms"

In [110]:
description_context, metadata_context, matching_sources = get_listings_from_prefrences(
    collection=collection,
    query=query,
    max_results=5,
)

In [111]:
print(description_context)

['Set in the charming neighborhood of Forest Hills, this cozy property features 3 bedrooms and 2 bathrooms. The house includes a renovated kitchen with energy-efficient appliances, a sunroom overlooking the backyard garden, and a rainwater collection system. With hardwood floors and a fireplace, this home offers a warm and inviting atmosphere for those seeking a sustainable lifestyle.', 'Located in the peaceful neighborhood of Mountain View, this charming property offers 3 bedrooms and 2 bathrooms. The house features a renovated kitchen with energy-efficient appliances, a cozy fireplace, and a backyard garden with a composting area. With hardwood floors and a sunroom, this home provides a tranquil retreat for those seeking a sustainable and serene living space.', 'Set in the charming neighborhood of Forest Park, this cozy property features 3 bedrooms and 2 bathrooms. The house includes a renovated kitchen with energy-efficient appliances, a sunroom overlooking the backyard garden, and 

In [127]:
def create_prompt_with_context(description_context, metadata_context, query):
    # description_context = "\n\n---\n\n".join(description_context)
    customer_query = query
    prompt_with_context = f"""
    Answer the question only using the following context:

    {description_context}

    and additional metadata for context - 
    
    {metadata_context}

    ---

    From given context suggest properties closely matching with query "{customer_query}" from customer. Description for properties should be distinct and blended well in a paragraph with noticable seperation, which aligns with prefrences given in the customer query. The response should contain at least two or more suggestions which are mostly related to customer query.
    """
    return prompt_with_context

### Get user query, search for related properties and augment property description as per user preference

In [140]:
def generate_response_for_customer(customer_query, database_collection):
    description_context, metadata_context, matching_sources = get_listings_from_prefrences(
    collection=database_collection,
    query=customer_query,
    max_results=5)

    prompt_for_listing = create_prompt_with_context(description_context, metadata_context, query=query)
    llm_response = llm.predict(prompt_for_listing)

    print(f'User Query - \n\n{customer_query}')
    print('--------')
    print(f'Related Property Descriptions - \n\n{description_context}')
    print('--------')
    print(f'Related Property Metadata - \n\n{metadata_context}')
    print('--------')
    print(f'Prompt to LLM with Context - \n\n{prompt_for_listing}')
    print('--------')
    return llm_response
    

### Generate responses with user query

In [141]:
query = "I need a 3 bedroom home near waterfront."
llm_response = generate_response_for_customer(query, collection)
print(f'LLM Response - {llm_response}')

User Query - 

I need a 3 bedroom home near waterfront.
--------
Related Property Descriptions - 

['Perched in the scenic neighborhood of Harbor View, this stunning property offers 4 bedrooms and 3 bathrooms. The house features a gourmet kitchen with energy-efficient appliances, a wine cellar, and a backyard garden with a vegetable patch and a rain garden. With panoramic views of the harbor and a spacious deck, this home is perfect for those who value sustainable living and waterfront living.', 'Situated in the desirable neighborhood of Riverfront Terrace, this modern property offers 4 bedrooms and 3 bathrooms. The house features a sleek design, energy-efficient appliances, and a rooftop deck with solar panels. With a spacious living room and a gourmet kitchen, this home is perfect for those who appreciate contemporary style and sustainable living.', 'Located in the exclusive Riverfront Estates, this grand property boasts 6 bedrooms and 5 bathrooms. The house features a gourmet kitche

In [142]:
query = "I need a home with neighbourhood surrounded by greenary and should be within $50,000."
llm_response = generate_response_for_customer(query, collection)
print(f'LLM Response - {llm_response}')

User Query - 

I need a home with neighbourhood surrounded by greenary and should be within $50,000.
--------
Related Property Descriptions - 

['Set in the charming neighborhood of Forest Park, this cozy property features 3 bedrooms and 2 bathrooms. The house includes a renovated kitchen with energy-efficient appliances, a sunroom overlooking the backyard garden, and a rainwater collection system. With hardwood floors and a fireplace, this home offers a warm and inviting atmosphere for those seeking a sustainable lifestyle.', 'Set in the charming neighborhood of Forest Hills, this cozy property features 3 bedrooms and 2 bathrooms. The house includes a renovated kitchen with energy-efficient appliances, a sunroom overlooking the backyard garden, and a rainwater collection system. With hardwood floors and a fireplace, this home offers a warm and inviting atmosphere for those seeking a sustainable lifestyle.', 'Nestled in the serene neighborhood of Willow Creek, this spacious property bo

## Comments

### - From above queries we can see the customer preference is take into consideration while sending prompt to LLM. Then LLM using the context generate personalized response for customer.
### - I can use metadata in better manned to include in prompt and also have additional filtering on cromadb query rsponse
### - Using better embedding and LLM model the rspoonses could be better but still what the current LLm doing is very nice from my perspective.

## My thoughts on multimodal approach - 
### - For including images along with description using CLIP embedding model and reproducing in response can face an issue of capacity to store fewer images per property to have good performance.
### - Generally property listings have multiple photos from varipus angles and varoous areas.
### - To support such additiona capability we can create folder of images on a cloud storage and put link to correspondign folder as metadata for property description.