This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

### Import API onfig and credentails from the credential file

In [1]:
import os
import os

import openai
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = "YOUR_API_KEY"

### Install dependencies and libraries

In [2]:
!pip install -r requirements.txt




[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


### check openai version

In [3]:
import openai
print(openai.__version__)

0.28.1


### Import Libraries

In [4]:
import os
import json
import re
import pandas as pd
import numpy as np
import openai

from bs4 import BeautifulSoup
from markdown import markdown

from langchain.schema import HumanMessage, SystemMessage
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import LanceDB

from lancedb.rerankers import LinearCombinationReranker
import inflect
import lancedb
from lancedb.embeddings import get_registry
from lancedb.pydantic import LanceModel, Vector

import ipywidgets as widgets
from IPython.display import display
from ipywidgets import Layout, Button, Box, FloatText, Textarea, Dropdown, Label, IntSlider, FloatSlider

### define listing file name

In [5]:
db_file = "home_match.json"

### Define the system prompt for the AI model

In [6]:
system_prompt = """
You are an expert real estate agent in Los Angeles in the USA.
"""

### Define the human prompt to generate real estate listings

In [7]:
human_prompt= """
      Generate a minimum of 10 real estate listings using creativity and incorporating real properties as well.
      Ensure the listings are distributed across all the areas and regions within the Los Angeles City. 
      Each listing should be represented as a JSON array of dictionaries, following the structure provided below:

        {
        "location": "Downtown Los Angeles",
        "list_price": 1800000,
        "bedrooms": 3,
        "bathrooms": 2,
        "square_feet": 1800,
        "school_rating": 4.5,
        "description": "Luxurious penthouse with floor-to-ceiling windows, boasting stunning city views. Features a gourmet kitchen, spacious open layout, and private balcony. Building amenities include a rooftop pool, gym, and 24-hour concierge."
       }
    """

### Check if the database file exists

In [8]:
if os.path.isfile(path=db_file):
    # If it exists, read the existing real estate listings
    with open(db_file, "r") as f:
        real_estate_listings = f.read()
else:
    # If it does not exist, invoke the AI model to generate new listings
    chat = ChatOpenAI(temperature=1)  # Set temperature for creativity in responses
    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=human_prompt),
    ]
    
    # Get the AI-generated message
    ai_message = chat.invoke(messages)
    # Parse the response as JSON
    real_estate_listings = json.loads(ai_message.json())["content"]

In [9]:
# Convert the listings to markdown format
md_text = markdown(real_estate_listings)

# Extract text content from the markdown using BeautifulSoup
real_estate_listings_text = ''.join(BeautifulSoup(md_text, 'html.parser').findAll(string=True))

# Clean up the JSON string for storage
real_estate_listings_json = real_estate_listings_text.replace('json\n', '')

# Output the JSON string
print(real_estate_listings_json)

[
  {
    "location": "Downtown Los Angeles",
    "list_price": 1800000,
    "bedrooms": 3,
    "bathrooms": 2,
    "square_feet": 1800,
    "school_rating": 4.5,
    "description": "Luxurious penthouse with floor-to-ceiling windows, boasting stunning city views. Features a gourmet kitchen, spacious open layout, and private balcony. Building amenities include a rooftop pool, gym, and 24-hour concierge."
  },
  {
    "location": "Hollywood",
    "list_price": 2500000,
    "bedrooms": 4,
    "bathrooms": 4,
    "square_feet": 2400,
    "school_rating": 4.6,
    "description": "Modern villa nestled in the Hollywood Hills, featuring a home theater, infinity pool, and panoramic views of the city. This home is a true entertainer's dream, complete with high-end finishes and smart home technology."
  },
  {
    "location": "Santa Monica",
    "list_price": 3200000,
    "bedrooms": 5,
    "bathrooms": 4,
    "square_feet": 3000,
    "school_rating": 4.8,
    "description": "Stunning beachfront 

### Write the new listings to the database file

In [10]:
with open(db_file, "w") as f:
    f.write(real_estate_listings_json)

### Load real estate listings from the JSON file

In [11]:
df = pd.read_json(db_file)

In [12]:
df

Unnamed: 0,location,list_price,bedrooms,bathrooms,square_feet,school_rating,description
0,Downtown Los Angeles,1800000,3,2,1800,4.5,Luxurious penthouse with floor-to-ceiling wind...
1,Hollywood,2500000,4,4,2400,4.6,"Modern villa nestled in the Hollywood Hills, f..."
2,Santa Monica,3200000,5,4,3000,4.8,Stunning beachfront property with private acce...
3,Sherman Oaks,1450000,3,2,2000,4.7,Charming ranch-style home in the heart of the ...
4,Silver Lake,1700000,4,3,2200,4.4,Architectural gem with modern design and eco-f...
5,Venice,2700000,3,3,2000,4.5,Contemporary beachfront condo just steps from ...
6,Beverly Hills,5200000,6,4,6000,4.9,"Palatial estate in the heart of Beverly Hills,..."
7,Brentwood,4100000,5,3,4500,4.8,"Elegant family home with a spacious layout, go..."
8,Echo Park,1200000,3,2,1600,4.3,"Mid-century modern home with open spaces, vaul..."
9,San Pedro,950000,3,2,1800,4.1,Charming coastal home with harbor views and a ...


### Connect to LanceDB

In [13]:
db = lancedb.connect("real-estate-embeddings-db")

### Define the schema for the real estate listings

In [14]:
class RealEstateListings(LanceModel):
    location: str
    list_price: float
    bedrooms: float
    bathrooms: float
    square_feet: float
    school_rating: float
    description: str
    description_vector: Vector(1536)  # Vector for storing embeddings

### Check if the listings table exists in the database

In [15]:
if 'listings' in db.table_names():
    # Open the existing listings table
    table = db.open_table("listings")
else:
    # Create a new listings table with the defined schema
    table = db.create_table("listings", schema=RealEstateListings)
    
    # Prepare data for insertion into the database
    data = []
    for _, row in df.iterrows():
        # Call OpenAI API to get embeddings for the property description
        response = openai.Embedding.create(
            model="text-embedding-ada-002",  # Specify the model for embeddings
            input=row["description"]
        )
        
        # Extract the embedding vector from the API response
        embedding_vector = response['data'][0]['embedding']
        
        # Append the data for the table
        data.append({
            "location": row["location"],
            "list_price": row["list_price"],
            "bedrooms": row["bedrooms"],
            "bathrooms": row["bathrooms"],
            "square_feet": row["square_feet"],
            "school_rating": row["school_rating"],
            "description": row["description"],
            "description_vector": embedding_vector  # Include the embedding vector
        })
    
    # Add the prepared data to the table
    table.add(data)

### Print the first few rows of the table as a Pandas DataFrame

In [16]:
print(table.to_pandas().head())

               location  list_price  bedrooms  bathrooms  square_feet  \
0  Downtown Los Angeles   1800000.0       3.0        2.0       1800.0   
1             Hollywood   2500000.0       4.0        4.0       2400.0   
2          Santa Monica   3200000.0       5.0        4.0       3000.0   
3          Sherman Oaks   1450000.0       3.0        2.0       2000.0   
4           Silver Lake   1700000.0       4.0        3.0       2200.0   

   school_rating                                        description  \
0            4.5  Luxurious penthouse with floor-to-ceiling wind...   
1            4.6  Modern villa nestled in the Hollywood Hills, f...   
2            4.8  Stunning beachfront property with private acce...   
3            4.7  Charming ranch-style home in the heart of the ...   
4            4.4  Architectural gem with modern design and eco-f...   

                                  description_vector  
0  [0.0045068543, 0.0021039487, 0.008447941, -0.0...  
1  [-0.008112394, -0.000

### An inline user interface designed to capture buyer preferences

In [17]:
# Define the layout for each form item (label and input field)
form_item_layout = Layout(
    display='flex',         # Flexbox layout for row alignment
    flex_flow='row',        # Align the label and slider in a row
    justify_content='space-between'  # Space the items evenly
)

# Create a list of form items (each with a label and corresponding input)
form_items = [
    Box([Label(value='Max Price'), FloatSlider(min=1000000, max=5000000, step=10000, value=5000000)], layout=form_item_layout),
    Box([Label(value='Bedrooms Minimum'), FloatSlider(min=1, max=10, step=1)], layout=form_item_layout),
    Box([Label(value='Bathrooms Minimum'), FloatSlider(min=1, max=10, step=1)], layout=form_item_layout),
    Box([Label(value='School Ratings'), FloatSlider(min=1, max=5, step=1)], layout=form_item_layout),
    Box([Label(value='Square Footage'), FloatSlider(min=1000, max=5000, step=500)], layout=form_item_layout),
    Box([Label(value='Preferences'), Textarea(value="Modern villa nestled in the Hollywood Hills")], layout=form_item_layout)
]

# Define the overall layout for the form, aligning items vertically
form = Box(form_items, layout=Layout(
    display='flex',          # Flexbox layout for the form
    flex_flow='column',      # Arrange the form items in a column
    border='solid 2px',      # Add a border around the form
    align_items='stretch',   # Stretch form items to fill the width
    width='50%'              # Set the form width to 50% of the available space
))

In [18]:
# Display the form
form

Box(children=(Box(children=(Label(value='Max Price'), FloatSlider(value=5000000.0, max=5000000.0, min=1000000.…

In [19]:
max_price = form_items[0].children[1].value
bedrooms = form_items[1].children[1].value
bathrooms = form_items[2].children[1].value
school_rating = form_items[3].children[1].value
square_feet = form_items[4].children[1].value
preferences = form_items[5].children[1].value
print(max_price)
print(bedrooms)
print(bathrooms)
print(school_rating)
print(square_feet)
print(preferences)

5000000.0
1.0
1.0
1.0
1000.0
Modern villa nestled in the Hollywood Hills


### Apply a prefilter using numeric preferences, followed by a vector search for textual preferences

##### 1. Connect to the LanceDB real estate embeddings database

In [20]:
db = lancedb.connect("real-estate-embeddings-db")

##### 2. open/use to the 'listings' table

In [21]:
table = db.open_table("listings")

In [22]:
print(table.to_pandas().head())

               location  list_price  bedrooms  bathrooms  square_feet  \
0  Downtown Los Angeles   1800000.0       3.0        2.0       1800.0   
1             Hollywood   2500000.0       4.0        4.0       2400.0   
2          Santa Monica   3200000.0       5.0        4.0       3000.0   
3          Sherman Oaks   1450000.0       3.0        2.0       2000.0   
4           Silver Lake   1700000.0       4.0        3.0       2200.0   

   school_rating                                        description  \
0            4.5  Luxurious penthouse with floor-to-ceiling wind...   
1            4.6  Modern villa nestled in the Hollywood Hills, f...   
2            4.8  Stunning beachfront property with private acce...   
3            4.7  Charming ranch-style home in the heart of the ...   
4            4.4  Architectural gem with modern design and eco-f...   

                                  description_vector  
0  [0.0045068543, 0.0021039487, 0.008447941, -0.0...  
1  [-0.008112394, -0.000

#### 3. Define the filter expression for querying based on user preferences

In [23]:
filter_expr = f"list_price < {max_price} and bedrooms > {bedrooms} and bathrooms > {bathrooms} and school_rating > {school_rating} and square_feet > {square_feet}"
filter_expr

'list_price < 5000000.0 and bedrooms > 1.0 and bathrooms > 1.0 and school_rating > 1.0 and square_feet > 1000.0'

#### 4. Create a full-text search index on the 'description' column

In [24]:
table.create_fts_index("description", replace=True)

#### 5. perform the search on the table using the user’s property preferences and filter results

In [25]:
filtered_df = table.search(query=preferences, vector_column_name="description_vector", fts_columns=["description"]).where(filter_expr, prefilter=True).limit(5).to_pandas()

# Display the first few rows of the filtered DataFrame
filtered_df.head()

Unnamed: 0,location,list_price,bedrooms,bathrooms,square_feet,school_rating,description,description_vector,_score
0,Hollywood,2500000.0,4.0,4.0,2400.0,4.6,"Modern villa nestled in the Hollywood Hills, f...","[-0.008112394, -0.0009720663, -0.0023914124, -...",9.608468
1,Silver Lake,1700000.0,4.0,3.0,2200.0,4.4,Architectural gem with modern design and eco-f...,"[0.000758627, 0.012596925, -0.0009355592, 0.00...",1.818644
2,Sherman Oaks,1450000.0,3.0,2.0,2000.0,4.7,Charming ranch-style home in the heart of the ...,"[-0.0015524372, 0.030469513, -0.005012455, -0....",1.184155
3,Echo Park,1200000.0,3.0,2.0,1600.0,4.3,"Mid-century modern home with open spaces, vaul...","[-0.0070071816, 0.00036274552, 0.00920135, -0....",1.1329


In [26]:
# Check the table schema
print("Table schema:")
print(table.schema)

Table schema:
location: string not null
list_price: double not null
bedrooms: double not null
bathrooms: double not null
square_feet: double not null
school_rating: double not null
description: string not null
description_vector: fixed_size_list<item: float>[1536] not null
  child 0, item: float


### Customize listings for personalization

In [27]:
def generate_real_estate_response(query, df):
    """
    Generates a response to a real estate query based on provided property listings.

    Args:
    query (str): User's query regarding real estate properties.
    df (DataFrame): DataFrame containing real estate listings with columns like location, list_price, bedrooms, bathrooms, square_feet, school_rating, and description.

    Returns:
    str: AI-generated response using the listings context and the user query.
    """
    
    # Initialize inflect engine to convert numbers to words
    p = inflect.engine()
    
    # Build the context string for the properties
    context_list = []

    for index, row in df.iterrows():
        listing_context = (
            f"Located in {row['location']} with a list price of {p.number_to_words(int(row['list_price']))}, "
            f"that has {p.number_to_words(int(row['bedrooms']))} bedrooms, "
            f"{p.number_to_words(int(row['bathrooms']))} bathrooms, "
            f"{int(row['square_feet'])} square feet, and a school rating of {row['school_rating']}. "
            f"{row['description']}.\n\n"
        )
        context_list.append(listing_context)
    
    # Combine all property contexts into one string
    context = ''.join(context_list)

    # Send context and user query to OpenAI API for generating the response
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are an expert real estate agent that answers user's questions based on the context provided.\n"
                    "Do not make up an answer if you do not know it, stay within the bounds of the context provided.\n"
                    "If you don't know the answer, say that you don't have enough information on the topic!"
                ),
            },
            {"role": "user", "content": f"CONTEXT: {context}\nQUERY: {query}"},
            {"role": "user", "content": "ANSWER:"},
        ],
    )

    # Extract and return the AI-generated response
    return response.choices[0].message['content'].strip()


In [28]:
response1 = generate_real_estate_response("Provide a factual summary of the top 2 listings.", filtered_df)
print(response1)

The top 2 listings based on the information provided are:

1. Hollywood Hills: 
- List price: $2,500,000
- Bedrooms: 4
- Bathrooms: 4
- Square footage: 2400
- School rating: 4.6
- Features: Modern villa with home theater, infinity pool, panoramic views, high-end finishes, smart home technology.

2. Silver Lake: 
- List price: $1,700,000
- Bedrooms: 4
- Bathrooms: 3
- Square footage: 2200
- School rating: 4.4
- Features: Architectural gem with modern design, eco-friendly features such as solar panels, open floor plan, rooftop garden. Trendy neighborhood known for its vibrant art scene and cafes.


In [29]:
response2 = generate_real_estate_response("Provide a factual summary of the top 2 listings with 3 bedrooms", filtered_df)
print(response2)

The top 2 listings with 3 bedrooms are:

1. Located in Sherman Oaks with a list price of one million, four hundred and fifty thousand. It features three bedrooms, two bathrooms, 2000 square feet, and a school rating of 4.7. This charming ranch-style home has an updated kitchen, large backyard with a pool, and a cozy fireplace. It is situated on a quiet street with top-rated schools and easy access to Ventura Blvd.

2. Located in Echo Park with a list price of one million, two hundred thousand. This property offers three bedrooms, two bathrooms, 1600 square feet, and a school rating of 4.3. It is a mid-century modern home with open spaces, vaulted ceilings, and large windows providing ample natural light. The house also features a private garden and is conveniently located near Echo Park Lake and local dining hotspots.
