# Personalized Real-Estate Agent

In this notebook, we will build a personalized real-estate agent. 


## Import statements and configuration

In [1]:
import os
from dotenv import load_dotenv # To keep private keys private
load_dotenv()
# To store the generated listings in a csv file
import csv
import io

openai_api_key = os.getenv("OPENAI_API_KEY")
openai_api_base = os.getenv("OPENAI_API_BASE")

if openai_api_key is not None:
    print(f"Using API key from the `.env` file.")
else:
    print("OPENAI_API_KEY not found in environment variable. - Please set it up in the `.env` file.")


if openai_api_base is not None:
    print(f"Using API base URL from the `.env` file. - You're all set.")
else:
    print("OPENAI_API_BASE not found in environment variable. - Please set it up in the `.env` file.")

from openai import OpenAI
client = OpenAI(
    base_url = openai_api_base,
    api_key = openai_api_key
)

MODEL_NAME = "gpt-3.5-turbo"
version = "v1"      # To make file names etc. unique
LISTINGS_FILE = f"listings_{version}.csv"

# LangChain components we are going to use
from langchain.llms import OpenAI
from langchain.document_loaders.csv_loader import CSVLoader # To load the CSV file
from langchain.vectorstores import Chroma                   # For vector database
import tiktoken                                             # For token counting, required by Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
EMBEDDINGS_MODEL_NAME = "text-embedding-ada-002"            # OpenAI's embedding model
from langchain.text_splitter import CharacterTextSplitter   # To make embeddings more efficient
from langchain.chains import RetrievalQA                    # To perform Retrieval-Augmented Generation (RAG)

import pandas as pd
import numpy as np

COLLECT_USER_PREFERENCES = True        # Set to True to collect user preferences interactively


Using API key from the `.env` file.
Using API base URL from the `.env` file. - You're all set.


## Generating Real Estate Listings

First, we generate some fantasy listings using an LLM. The one example given is: 

In [None]:
listing_elements = ["Neighborhood", "Price", "Bedrooms", "Bathrooms", "House Size (in sqft)", "Description", "Neighborhood Description"]

listing_elements_text = ", ".join(listing_elements)

example_listing = """
"Green Oaks",800000,3,2,2000,"Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.","Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."
"""

num_listings = 30
system_prompt = f"""
You are a real-estate listing generator. Your task is to create realistic and diverse real estate listings based on the provided example. Each listing should include the following fields: Neighborhood, Price, Bedrooms, Bathrooms, House Size (in sqft), Description, and Neighborhood Description. The listings should be varied in terms of price, size, and neighborhood features.
"""

user_prompt = f"""
Please generate {num_listings} real estate listings in the same format as the example below. The listings should be diverse and include various neighborhoods, prices, and features. Each listing should have a unique neighborhood description that highlights local amenities and attractions. The format should be csv-compatible, i.e., numbers should not contain commas, and text should be enclosed in double quotes if it contains commas. The listings should be realistic and reflect current market trends. Dollar values should not contain the $ sign. The listings should be in the following format: {listing_elements_text}
Example Listing:
{example_listing}
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

try:
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages
    )

    # To access the actual response text:
    raw_listings = response.choices[0].message.content
    print(raw_listings)
except Exception as e:
    print(f"An error occurred: {e}")





"Seaside Retreat",1250000,4,3,2800,"Step into luxury with this stunning 4-bedroom, 3-bathroom home located in the exclusive Seaside Retreat neighborhood. Boasting panoramic ocean views, this spacious residence features high-end finishes, a gourmet kitchen, a private deck perfect for entertaining, and a master suite with a spa-like bathroom. Enjoy coastal living at its finest in this Seaside Retreat gem.","Seaside Retreat offers residents access to private beachfront, upscale dining options, boutique shops, and scenic walking trails along the coastline. Explore the nearby marina for water activities or unwind at the seaside spa. With top-rated schools and a vibrant community atmosphere, Seaside Retreat is the epitome of coastal luxury living."

"Oakridge Heights",650000,3,2,1800,"Welcome home to this elegant 3-bedroom, 2-bathroom residence in the desirable Oakridge Heights neighborhood. This meticulously maintained property features a gourmet kitchen, a cozy fireplace in the living room

Next, we store the listings as a `.csv` file in order to retrieve them later.

In [41]:
# Use io.StringIO to treat the string like a file

string_io = io.StringIO(listing_elements_text + "\n" + response.choices[0].message.content)

# Open the output file for writing
with open(LISTINGS_FILE, 'w', newline='', encoding='utf-8') as outfile:
    # Create a CSV reader to read the string data
    reader = csv.reader(string_io)

    # Create a CSV writer to write to the file
    writer = csv.writer(outfile)

    # Read each row from the string data and write it to the file
    for row in reader:
        writer.writerow(row)

If the listings are already generated, we can just read them from the csv file (code adapted from the course exercises on LangChain)

In [2]:
# If using a pandas DataFrame, this would do it.
# df = pd.read_csv(LISTINGS_FILE)

# We are going to use LangChain, so we do this:
loader = CSVLoader(file_path=LISTINGS_FILE, encoding="utf-8", csv_args={"delimiter": ","})
docs = loader.load()
print(docs)


[Document(metadata={'source': 'listings_v1.csv', 'row': 0}, page_content='Neighborhood: Seaside Retreat\nPrice: 1250000\nBedrooms: 4\nBathrooms: 3\nHouse Size (in sqft): 2800\nDescription: Step into luxury with this stunning 4-bedroom, 3-bathroom home located in the exclusive Seaside Retreat neighborhood. Boasting panoramic ocean views, this spacious residence features high-end finishes, a gourmet kitchen, a private deck perfect for entertaining, and a master suite with a spa-like bathroom. Enjoy coastal living at its finest in this Seaside Retreat gem.\nNeighborhood Description: Seaside Retreat offers residents access to private beachfront, upscale dining options, boutique shops, and scenic walking trails along the coastline. Explore the nearby marina for water activities or unwind at the seaside spa. With top-rated schools and a vibrant community atmosphere, Seaside Retreat is the epitome of coastal luxury living.'), Document(metadata={'source': 'listings_v1.csv', 'row': 1}, page_con

## Storing Listings in a Vector Database
We now have loaded the generated listings and want to store them in a vector database.

In [None]:
embeddings = OpenAIEmbeddings(
    openai_api_key=openai_api_key,
    openai_api_base=openai_api_base,
    model=EMBEDDINGS_MODEL_NAME,
    #chunk_size=1,  # This is important for Chroma
    max_retries=3, # Number of retries for embedding requests
    request_timeout=60, # Timeout for embedding requests
)
# Splitting the data to make embeddings more efficient
splitter = CharacterTextSplitter(
                chunk_size=1000,
                chunk_overlap=0
            )
split_docs = splitter.split_documents(docs)
db = Chroma.from_documents(split_docs, embeddings)


## Building the User Preference Interface
We now collect user preferences. We can either use hard-coded question and answer pairs, or generate them, interactively, at runtime.

In [3]:
questions = [   
    "How big do you want your house to be?",
    "What are 3 most important things for you in choosing this property?", 
    "Which amenities would you like?", 
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",   
    "Do you have an upper price target?",   
]

answers = []
if COLLECT_USER_PREFERENCES:
    # Collect user preferences interactively
    for question in questions:
        answer = input(question + " ")
        answers.append(answer)
else:
    answers = [
        "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
        "A quiet neighborhood, good local schools, and convenient shopping options.",
        "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
        "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
        "A balance between suburban tranquility and access to urban amenities like restaurants and theaters.",
        "It should be under $1,000,000."
    ]

## Searching based on preferences

['At least 4 bedrooms, a nice living room with an open kitchen, and some storage areas', 'It should be within walking distance to a supermarket, it should be nice and cosy, and it should be not too old', 'A garden and a good insulation or new heating system', 'Easy access to a bus line.', 'It should be possible to reach downtown in 10 minutes', '1 million dollars']


## Personalize Listing Descriptions