This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

### Step 1: Setting Up the Python Application

In [1]:
# %pip install -U --quiet lancedb pandas pydantic

In [54]:
import os
import openai
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.memory import ConversationSummaryMemory, ConversationBufferMemory, CombinedMemory, ChatMessageHistory
from langchain.chains import ConversationChain
from typing import Any, Dict, Optional, Tuple
import json

# Importing the necessary library for OpenAI API
openai.api_base = "https://openai.vocareum.com/v1"

# Define your OpenAI API key 
api_key = "voc-929917910126677362447467214b40b11f71.47286702" #"YOUR API KEY"
openai.api_key = api_key

os.environ["OPENAI_API_KEY"] = api_key
os.environ["OPENAI_API_BASE"] = openai.api_base

### Step 2: Generating Real Estate Listings

In [3]:
# Function to call the OpenAI GPT-3.5 API
def realstate_agent(user_prompt):
    try:
        # Calling the OpenAI API with a system message and our prompt in the user message content
        # Use openai.ChatCompletion.create for openai < 1.0
        # openai.chat.completions.create for openai > 1.0
        response = openai.ChatCompletion.create(
          model="gpt-3.5-turbo",
          messages=[
          {
            "role": "system",
            "content": """You are a real state agent. Your goal is to assist the user by generating a given number of 
            real estate listings provided by the user. The listing should include the following information (Neighborhood, 
            Price in USD currency, number of bedrooms, number of bathrooms, hous size in sqft, description, and neighborhood 
            description.)
            
            
            As an example, if the user asked for one listing, below is an output format that can be returned:
                Neighborhood: Green Oaks
                Price: $800,000
                Bedrooms: 3
                Bathrooms: 2
                House Size: 2,000 sqft
                Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.
                Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
              
              
            If the user specified the output to be in a specific format, for instance a list of dictionaries, 
            return it as following with all of the keys. Don't forget to generate as much as requested as well. 
            If no instructions were given, return them as previously mentioned.
                [
                   {
                        "neighborhood": "Green Oaks"
                        "price": "$800,000"
                        "bedrooms": "3"
                        "bathrooms": "2"
                        "house_size": "2,000 sqft"
                        "description": "Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem."
                        "neighborhood_description": "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."

                   }
                ]
              """
          },
          {
            "role": "user",
            "content": user_prompt
          }
          ],
        temperature=1,
        max_tokens=4096, #512, # we increased the max tokens to be able to generate 10 listings at once.
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
        )
        # The response is a JSON object containing more information than the response. We want to return only the message content
        return response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"
    
user_prompt = f"Generate 10 real state listing in a list of dictionary."

# Running the wellness agent, this returns a string
listing_str = realstate_agent(user_prompt)

# convert to an object
listing_list = json.loads(listing_str)

# Printing the output. 
print("Generated Response: ")
print(listing_list)

Generated Response: 
[{'neighborhood': 'Willow Creek', 'price': '$900,000', 'bedrooms': '4', 'bathrooms': '3', 'house_size': '2,500 sqft', 'description': "Discover tranquility in this modern 4-bedroom, 3-bathroom home located in the picturesque Willow Creek neighborhood. The sleek design and high-end finishes create a luxurious space for both relaxation and entertainment. Enjoy the chef's kitchen with top-of-the-line appliances and unwind in the spacious master suite. The lush backyard offers a peaceful retreat with a sparkling pool and outdoor dining area, perfect for gatherings with friends and family.", 'neighborhood_description': 'Willow Creek is known for its scenic views, tree-lined streets, and family-friendly atmosphere. Residents can explore the nearby hiking trails, parks, and upscale shopping destinations. With top-rated schools and community events, Willow Creek is a sought-after neighborhood for those seeking a blend of nature and convenience.'}, {'neighborhood': 'Sunset H

In [4]:
# we have successfully generated 10 items and stored them in a list of dictionary, we can use it easly with the database.
# this is useful also to have access to each listing separately 
print(listing_list[9])

print(listing_list[9].keys())

{'neighborhood': 'Lakeside Retreat', 'price': '$800,000', 'bedrooms': '3', 'bathrooms': '2.5', 'house_size': '2,100 sqft', 'description': 'Experience lakefront living in this 3-bedroom, 2.5-bathroom home in the serene Lakeside Retreat neighborhood. The light-filled living spaces and stunning lake views create a tranquil escape. The updated kitchen features granite countertops and stainless steel appliances, ideal for entertaining. Relax in the master suite with a private balcony overlooking the water or unwind on the spacious deck by the lake.', 'neighborhood_description': 'Lakeside Retreat offers a peaceful setting with private lake access, walking paths, and community amenities. Residents can enjoy boating, fishing, and lakeside picnics surrounded by natural beauty. With a welcoming atmosphere and close-knit community, Lakeside Retreat is the perfect place to enjoy lakefront living.'}
dict_keys(['neighborhood', 'price', 'bedrooms', 'bathrooms', 'house_size', 'description', 'neighborh

### Step 3: Storing Listings in a Vector Database

This step includes convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.

We will use BERT for processing the generated text.

In [5]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")



In [6]:
def preprocess_text(text):
    """Preprocess the generated text by returning tokens."""
    return tokenizer(text, return_tensors="pt", padding='max_length')['input_ids']

def format_text(response):
    """Formats the response into a string. The response here is a dictionary."""
    return f"""Neighborhood: {response['neighborhood']}. \nPrice: {response['price']}. \nBedrooms: {response['bedrooms']}. \nBathrooms: {response['bathrooms']}. \nHouse Size: {response['house_size']}. 
    \nDescription: {response['description']}
    \nNeighborhood {response['neighborhood_description']}
    """

In [7]:
print(format_text(listing_list[9]))

print(preprocess_text(format_text(listing_list[9])))

Neighborhood: Lakeside Retreat. 
Price: $800,000. 
Bedrooms: 3. 
Bathrooms: 2.5. 
House Size: 2,100 sqft. 
    
Description: Experience lakefront living in this 3-bedroom, 2.5-bathroom home in the serene Lakeside Retreat neighborhood. The light-filled living spaces and stunning lake views create a tranquil escape. The updated kitchen features granite countertops and stainless steel appliances, ideal for entertaining. Relax in the master suite with a private balcony overlooking the water or unwind on the spacious deck by the lake.
    
Neighborhood Lakeside Retreat offers a peaceful setting with private lake access, walking paths, and community amenities. Residents can enjoy boating, fishing, and lakeside picnics surrounded by natural beauty. With a welcoming atmosphere and close-knit community, Lakeside Retreat is the perfect place to enjoy lakefront living.
    
tensor([[  101,  5101,  1024, 28701,  7822,  1012,  3976,  1024,  1002,  5385,
          1010,  2199,  1012, 18390,  1024,  

Now as we prepared the functions for processing the text, let's handle the database tasks

In [8]:
print(len(preprocess_text(format_text(listing_list[9]))[0]))

512


In [9]:
tokenizer

DistilBertTokenizerFast(name_or_path='distilbert-base-uncased', vocab_size=30522, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True)

In [10]:
import lancedb
from lancedb.pydantic import vector, LanceModel

class RealStateListings(LanceModel):
    # embeddings are required, we set the length to the max length of the tokenizer, as we made the padding to max length to
    # handle different strings sizes     
    embeddings: vector(tokenizer.model_max_length)
    
    # we store the formatted text
    formatted_text: str
        
#     we store the rest of listing details incase we need them later
#     neighborhood: str
#     price: str
#     bedrooms: str 
#     bathrooms: str 
#     house_size: str 
#     description: str 
#     neighborhood_description: str

db = lancedb.connect("~/lancedb")
table_name = 'realstate_listings'
db.drop_table(table_name, ignore_missing=True)
table = db.create_table(table_name, schema=RealStateListings)

In [11]:
# prepare the data
data  = []

for item in listing_list:
    item_dict = item
    item_str = format_text(item_dict)
    item_embedding = preprocess_text(item_str)[0]
    
    data.append(
        RealStateListings(
            embeddings = item_embedding,
            formatted_text = item_str
        )
    )

In [12]:
# print one data item
print(data[9])

embeddings=FixedSizeList(dim=512) formatted_text='Neighborhood: Lakeside Retreat. \nPrice: $800,000. \nBedrooms: 3. \nBathrooms: 2.5. \nHouse Size: 2,100 sqft. \n    \nDescription: Experience lakefront living in this 3-bedroom, 2.5-bathroom home in the serene Lakeside Retreat neighborhood. The light-filled living spaces and stunning lake views create a tranquil escape. The updated kitchen features granite countertops and stainless steel appliances, ideal for entertaining. Relax in the master suite with a private balcony overlooking the water or unwind on the spacious deck by the lake.\n    \nNeighborhood Lakeside Retreat offers a peaceful setting with private lake access, walking paths, and community amenities. Residents can enjoy boating, fishing, and lakeside picnics surrounded by natural beauty. With a welcoming atmosphere and close-knit community, Lakeside Retreat is the perfect place to enjoy lakefront living.\n    '


In [13]:
# add the data to the table
table.add([dict(d) for d in data])

In [14]:
# display the table
table.to_pandas()

Unnamed: 0,embeddings,formatted_text
0,"[101.0, 5101.0, 1024.0, 11940.0, 3636.0, 1012....","Neighborhood: Willow Creek. \nPrice: $900,000...."
1,"[101.0, 5101.0, 1024.0, 10434.0, 4564.0, 1012....","Neighborhood: Sunset Hills. \nPrice: $750,000...."
2,"[101.0, 5101.0, 1024.0, 7222.0, 25313.0, 8707....","Neighborhood: Pinecrest Estates. \nPrice: $1,2..."
3,"[101.0, 5101.0, 1024.0, 11035.0, 7676.0, 1012....","Neighborhood: Maple Grove. \nPrice: $600,000. ..."
4,"[101.0, 5101.0, 1024.0, 4153.0, 3193.0, 7535.0...","Neighborhood: Ocean View Heights. \nPrice: $1,..."
5,"[101.0, 5101.0, 1024.0, 2314.0, 12792.0, 4899....",Neighborhood: Riverfront Landing. \nPrice: $85...
6,"[101.0, 5101.0, 1024.0, 3137.0, 3193.0, 8707.0...",Neighborhood: Mountain View Estates. \nPrice: ...
7,"[101.0, 5101.0, 1024.0, 6496.0, 26119.0, 1012....","Neighborhood: Harbor Pointe. \nPrice: $700,000..."
8,"[101.0, 5101.0, 1024.0, 3585.0, 4564.0, 1012.0...","Neighborhood: Golden Hills. \nPrice: $950,000...."
9,"[101.0, 5101.0, 1024.0, 28701.0, 7822.0, 1012....","Neighborhood: Lakeside Retreat. \nPrice: $800,..."


In [15]:
# experimenting searching through the table, note that we tokenize the search first (user input for instance), then search
table.search(preprocess_text("A 4-bedroom and 3-bathroom coastal home in Harbor Point neighborhood")[0].tolist()).metric('cosine').limit(2).to_pandas()

Unnamed: 0,embeddings,formatted_text,_distance
0,"[101.0, 5101.0, 1024.0, 2314.0, 12792.0, 4899....",Neighborhood: Riverfront Landing. \nPrice: $85...,0.816686
1,"[101.0, 5101.0, 1024.0, 6496.0, 26119.0, 1012....","Neighborhood: Harbor Pointe. \nPrice: $700,000...",0.838808


Now we implemented a database with text tokenizing and tested the search mechanism.

### Step 4, 5, and 6: 
- Building the User Preference Interface 
- Searching Based on Preferences
- Deliverables and Testing

In [69]:
# Given
questions = [   
    "How big do you want your house to be?" 
    "What are 3 most important things for you in choosing this property?", 
    "Which amenities would you like?", 
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?"]

answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."]

# as the answers are given, we can use them directly in searching through the database
full_answer = " ".join(answers)
full_answer_tokenized = preprocess_text(full_answer)[0].tolist()

# closest item in the database
closest_item = table.search(full_answer_tokenized).metric('cosine').limit(1).to_list()[0]['formatted_text'].replace('\n', '')

# llm
model_name = "gpt-3.5-turbo"
temperature = 0.0
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens = 1000)

# formatting the messages
history = ChatMessageHistory()
history.add_user_message(f"""You are AI that will recommend user a real state listing based on their answers to personal questions. Ask user {len(questions)} questions""")
for i in range(len(questions)):
    history.add_ai_message(questions[i])
    history.add_user_message(answers[i])
    
summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="recommendation_summary", 
    input_key="input",
    buffer=f"The human answered ({len(questions)} housing questions). The closest listing found in the database based on the answers was {closest_item}.",
    return_messages=True)

class MementoBufferMemory(ConversationBufferMemory):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        input_str, output_str = self._get_input_output(inputs, outputs)
        self.chat_memory.add_ai_message(output_str)
    
conversational_memory = MementoBufferMemory(
    chat_memory=history,
    memory_key="questions_and_answers", 
    input_key="input"
)

# Combined
memory = CombinedMemory(memories=[conversational_memory, summary_memory])

RECOMMENDER_TEMPLATE = """The following is a friendly conversation between a human and an AI Real State agent Recommender. 
The AI is follows human instructions and provides a housing listing for a human based on the questions and human's answers. 

Summary of Recommendations:
{recommendation_summary[0]}
Personal Questions and Answers:
{questions_and_answers}
Human: {input}
AI:"""
PROMPT = PromptTemplate(
    input_variables=["recommendation_summary", "input", "questions_and_answers"],
    template=RECOMMENDER_TEMPLATE
)
recommender = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)

plot_rating_instructions = f"""
    Use the closest listing to augment the description, tailoring it to resonate with the buyer’s specific preferences. 
    This involves subtly emphasizing aspects of the property that align with what the buyer is looking for. 
    Ensure that the augmentation process enhances the appeal of the listing without altering factual information.
"""

recommender.predict(input = plot_rating_instructions)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI Real State agent Recommender. 
The AI is follows human instructions and provides a housing listing for a human based on the questions and human's answers. 

Summary of Recommendations:
content='The human answered 4 housing questions). The closest listing found in the database based on the answers was Neighborhood: Sunset Hills. Price: $750,000. Bedrooms: 3. Bathrooms: 2. House Size: 1,800 sqft.     Description: Indulge in the charm of Sunset Hills with this beautifully renovated 3-bedroom, 2-bathroom home. The open floor plan and natural light create a warm and inviting atmosphere. The gourmet kitchen features quartz countertops and stainless steel appliances, perfect for culinary enthusiasts. Relax in the serene backyard oasis with a covered patio and mature landscaping. Experience modern living in the heart of Sunset Hills.    Ne

'Indulge in the charm of Sunset Hills with this beautifully renovated 3-bedroom, 2-bathroom home. The open floor plan and natural light create a warm and inviting atmosphere, perfect for relaxing with your family. The gourmet kitchen features quartz countertops and stainless steel appliances, ideal for preparing delicious meals and entertaining guests. Enjoy the serene backyard oasis with a covered patio, perfect for gardening and outdoor gatherings. With a two-car garage and modern, energy-efficient heating system, this home offers both convenience and sustainability. Experience modern living in the heart of Sunset Hills, a quiet neighborhood with good local schools and convenient shopping options. Easy access to a reliable bus line and proximity to a major highway make commuting a breeze, while bike-friendly roads encourage a healthy and active lifestyle. Sunset Hills truly offers the ideal place to call home.'