<a href="https://colab.research.google.com/github/brenoakihiromorimoto/portf-lio/blob/main/HomeMatch_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace.

# Step 1: Setting Up the Python Application
* Initialize a Python Project: Create a new Python project, setting up a virtual environment and installing necessary packages like LangChain, a suitable LLM library (e.g., OpenAI's GPT), and a vector database package compatible with Python (e.g., ChromaDB or LanceDB). If you don't wish to create your files from scratch, starter files are available in the workspace on the next page as an application skeleton.

In [None]:
!pip install langchain==0.0.305
!pip install openai==0.28.1
!pip install pydantic>=1.10.12
!pip install pytest>=7.4.0
!pip install sentence-transformers>=2.2.0
!pip install transformers>=4.31.0
!pip install chromadb==0.4.12
!pip install jupyter==1.0.0
!pip install tiktoken==0.4.0
!pip install gradio==3.9
!pip install pydantic==1.9

Collecting fastapi<0.100.0,>=0.95.2 (from chromadb==0.4.12)
  Using cached fastapi-0.99.1-py3-none-any.whl (58 kB)
Collecting starlette<0.28.0,>=0.27.0 (from fastapi<0.100.0,>=0.95.2->chromadb==0.4.12)
  Using cached starlette-0.27.0-py3-none-any.whl (66 kB)
Installing collected packages: starlette, fastapi
  Attempting uninstall: starlette
    Found existing installation: starlette 0.37.2
    Uninstalling starlette-0.37.2:
      Successfully uninstalled starlette-0.37.2
  Attempting uninstall: fastapi
    Found existing installation: fastapi 0.110.3
    Uninstalling fastapi-0.110.3:
      Successfully uninstalled fastapi-0.110.3
Successfully installed fastapi-0.99.1 starlette-0.27.0


In [None]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.memory import ConversationSummaryMemory, ConversationBufferMemory, CombinedMemory, ChatMessageHistory
from langchain.chains import ConversationChain
from typing import Any, Dict, Optional, Tuple
from transformers import CLIPModel, CLIPProcessor
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.chains.question_answering import load_qa_chain
import io
import PIL
import os
import pandas as pd

In [None]:
os.environ['OPENAI_API_KEY']="OPENAI_API_KEY"

# Step 2: Generating Real Estate Listings
* Generate real estate listings using a Large Language Model. Generate at least 10 listings This can involve creating prompts for the LLM to produce descriptions of various properties. An example of a listing might be:


In [None]:
#Generate at least 10 examples
#Context
context = """
You are a row generator.
Generate random houses descriptions, follow the example below:
"""

#Example of format
example = """
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks.
This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure.
Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes.
The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores,
community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe.
With easy access to public transportation and bike lanes, commuting is a breeze.
"""

In [None]:
# TODO: initialize OpenAI object with your API key
llm = ChatOpenAI(
    model="gpt-3.5-turbo-0125",
    temperature=1
)
list_examples = []
for i in range(10):
    row_gen = llm.predict(context+example, max_tokens=500)
    list_examples.append(row_gen)

In [None]:
df = pd.DataFrame({'text': list_examples})

In [None]:
df.to_csv('df_home_match.csv')

# Step 3: Storing Listings in a Vector Database
* Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.
* Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

In [None]:
loader = CSVLoader('df_home_match.csv')
documents = loader.load()

In [None]:
# split it into chunks
text_splitter = CharacterTextSplitter()
docs = text_splitter.split_documents(documents)
embedding_function = OpenAIEmbeddings()
# load it into Chroma
db = Chroma.from_documents(docs, embedding_function)

In [None]:
# query it
query = "Please, I need to find a luxury house"
docs = db.similarity_search(query)

# print results
print(docs[0].page_content)

: 7
text: Neighborhood: Riverfront Estates
Price: $1,200,000
Bedrooms: 4
Bathrooms: 3.5
House Size: 3,500 sqft

Description: Step into luxury in this stunning 4-bedroom, 3.5-bathroom home located in the prestigious Riverfront Estates. 
With high-end finishes and designer touches throughout, this spacious home offers panoramic views of the river from every room. 
The gourmet kitchen features top-of-the-line appliances and a large island, perfect for entertaining. 
Relax in the master suite complete with a spa-like en suite bathroom and private balcony. 
Enjoy outdoor living at its finest in the expansive backyard with a pool, hot tub, and outdoor kitchen. 
Experience waterfront living at its best in this exquisite Riverfront Estates home.

Neighborhood Description: Riverfront Estates is an exclusive waterfront community lined with luxury homes and upscale amenities. 
Residents can enjoy private access to the river for boating and water activities. 
Explore the nearby Riverfront Plaza fo

# Step 4: Building the User Preference Interface
* Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like, example:

In [None]:
questions = [
                "How big do you want your house to be?",
                "What are 3 most important things for you in choosing this property?",
                "Which amenities would you like?",
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?"
]

answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]
#create variable to store Q/A:
history = ChatMessageHistory()

for i in range(len(questions)):
    history.add_ai_message(questions[i])
    history.add_user_message(answers[i])


In [None]:
conversational_memory = ConversationBufferMemory(
    chat_memory = history,
    memory_key='questions_and_answers',
    input_key='input'
)

In [None]:
#Create summary memory conversation to input long-term-memory on live
summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key='recommendation_summary',
    input_key='input',
    buffer=f'The human answered {len(questions)} personal questions. Use them to rate from 1 to 10, how much their buyer preferences.'
)

In [None]:
#Combine memories
memory = CombinedMemory(memories=[conversational_memory, summary_memory])

# Step 5: Searching Based on Preferences
* Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
* Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferences.


In [None]:
#docs[3].page_content is not a good suggestion, because It doesn't have a school nearby.
query = str(memory.memories[0].chat_memory.messages)
docs = db.similarity_search(query)
print(docs[0].page_content)
print(docs[1].page_content)
print(docs[2].page_content)
print(docs[3].page_content)

: 6
text: Neighborhood: Maple Grove
Price: $500,000
Bedrooms: 4
Bathrooms: 3
House Size: 2,500 sqft

Description: Step into this spacious 4-bedroom, 3-bathroom home located in the sought-after Maple Grove neighborhood. 
This well-maintained house features a large master suite with a walk-in closet and en-suite bathroom. 
The open floor plan includes a modern kitchen with stainless steel appliances and a cozy living room with a fireplace. 
Outside, you'll find a beautifully landscaped backyard with a patio, perfect for outdoor entertaining. 
With plenty of natural light and ample storage space, this home is perfect for growing families looking for comfort and convenience.

Neighborhood Description: Maple Grove is a family-friendly neighborhood known for its tree-lined streets and community events. 
Residents enjoy easy access to parks, playgrounds, and top-rated schools. 
With a variety of dining and shopping options nearby, there is always something to do in Maple Grove. 
Experience th

In [None]:
#We will focus just on top 3 suggestions.
query = str(memory.memories[0].chat_memory.messages)
docs = db.similarity_search(query, k=3)
print(docs[0].page_content)
print(docs[1].page_content)
print(docs[2].page_content)

: 6
text: Neighborhood: Maple Grove
Price: $500,000
Bedrooms: 4
Bathrooms: 3
House Size: 2,500 sqft

Description: Step into this spacious 4-bedroom, 3-bathroom home located in the sought-after Maple Grove neighborhood. 
This well-maintained house features a large master suite with a walk-in closet and en-suite bathroom. 
The open floor plan includes a modern kitchen with stainless steel appliances and a cozy living room with a fireplace. 
Outside, you'll find a beautifully landscaped backyard with a patio, perfect for outdoor entertaining. 
With plenty of natural light and ample storage space, this home is perfect for growing families looking for comfort and convenience.

Neighborhood Description: Maple Grove is a family-friendly neighborhood known for its tree-lined streets and community events. 
Residents enjoy easy access to parks, playgrounds, and top-rated schools. 
With a variety of dining and shopping options nearby, there is always something to do in Maple Grove. 
Experience th

# Step 6: Personalizing Listing Descriptions
* LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
* Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.

In [None]:
RECOMMENDER_TEMPLATE = """
The following is a friendly conversation between a human and an AI Home Recommender.
The AI is follows human instructions and provides personalized home recommendation following preferences of user.

Summary of Recommendations:
{recommendation_summary}
Personal Questions and Answers:
{questions_and_answers}
Human: {input}
AI:"""

PROMPT = PromptTemplate(
    input_variables=["recommendation_summary", "input", "questions_and_answers"],
    template=RECOMMENDER_TEMPLATE
)
# create a recommendation conversation chain that will let us ask AI for recommendations on top 3 homes
recommender = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)

In [None]:
def top_k_home_db(i=0, k=3, query = str(memory.memories[0].chat_memory.messages)):
    docs = db.similarity_search(query, k=k)
    return docs[i].page_content

In [None]:
for i in range(3):
    docs_suggestion = top_k_home_db(i=i)

    final_recommendation = f"""
     Emphasize important aspects of description below where it's alligned with buyer preferences,
     but you can't modify any factual information of home description:
     Follow the example format:
     ###EXAMPLE FORMAT###
     Neighborhood:
     Price:
     Bedrooms:
     Bathrooms:
     House Size:

     Description:

     Neighborhood Description:

     ###HOME DESCRIPTION###
    {docs_suggestion}
    """
    recommender.predict(input=final_recommendation)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
The following is a friendly conversation between a human and an AI Home Recommender. 
The AI is follows human instructions and provides personalized home recommendation following preferences of user. 

Summary of Recommendations:
The human answered 5 personal questions. Use them to rate from 1 to 10, how much their buyer preferences.
Personal Questions and Answers:
AI: How big do you want your house to be?
Human: A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
AI: What are 3 most important things for you in choosing this property?
Human: A quiet neighborhood, good local schools, and convenient shopping options.
AI: Which amenities would you like?
Human: A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
AI: Which transportation options are important to you?
Human: Easy access to a reliable bus line, proximity to a major hig