This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [90]:
import os
os.environ["OPENAI_API_KEY"] = "YOU_API_KEY"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

from langchain.llms import OpenAI


In [13]:
MODEL_NAME='gpt-3.5-turbo'
llm = OpenAI(model_name=MODEL_NAME, temperature=0.5, max_tokens=4000)

INSTRUCTION = '''
Generate 15 real estate listings in different property types (i.e. Condo, Single Family and etc).
'''
SAMPLE_LISTING = '''
    Price: 865000,
    Property Type: "Single-Family Home",
    Bedrooms: 4,
    Bathrooms: 3,
    Year Built: 2018,
    House Size: 2350,
    Garage Space: 2,
    HOA Fees ($/mo)": 75
    Description: Step into this stunning 4-bedroom, 3-bathroom single-family home, offering 2,350 square feet of thoughtfully designed living space. Built in 2018, this home balances modern style, function, and comfort in one inviting package.
    Inside, you'll find a bright open floor plan ideal for entertaining and daily living. The gourmet kitchen seamlessly connects to the dining and living areas, while the two-car garage provides ample space for vehicles and storage.
'''

In [31]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List

class RealEstateListing(BaseModel):
    price: NonNegativeInt = Field(description="Price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="Number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="Size of the property in square feet")
    hoa_fees: NonNegativeInt = Field(description="HOA fees in USD")
    year_built: NonNegativeInt = Field(description="Year built of the house")
    garage_space: NonNegativeInt = Field(description="Garage space.")
    property_type: str = Field(description="Property type")   
    description: str = Field(description="Description of the property.")   

class ListingCollection(BaseModel):
    listing: List[RealEstateListing] = Field(description="List of available real estate")
        
parser = PydanticOutputParser(pydantic_object=ListingCollection)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"listing": {"title": "Listing", "description": "List of available real estate", "type": "array", "items": {"$ref": "#/definitions/RealEstateListing"}}}, "required": ["listing"], "definitions": {"RealEstateListing": {"title": "RealEstateListing", "type": "object", "properties": {"price": {"title": "Price", "description": "Price of the property in USD", "minimum": 0, "type": "integer"}, "bedrooms": {"title": "Bedrooms", "description": "Number of bedrooms in the property", "minimum": 0, "type": "integer"}, "bathrooms": {"title": "

In [15]:
from langchain.prompts import PromptTemplate

# Prepare query
prompt = PromptTemplate(
    template="{question}\n{format_instructions}\nExample: {context}",
    input_variables=["question", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)
question = INSTRUCTION

context = SAMPLE_LISTING

query = prompt.format(context = context, question = question)
print(query)


Generate 15 real estate listings in different property types (i.e. Condo, Single Family and etc).

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"listing": {"title": "Listing", "description": "List of available real estate", "type": "array", "items": {"$ref": "#/definitions/RealEstateListing"}}}, "required": ["listing"], "definitions": {"RealEstateListing": {"title": "RealEstateListing", "type": "object", "properties": {"price": {"title": "Price", "description": "Price of the property in USD", "minimum": 0, "type": "integer"}, "bedrooms": {"title": "Bedrooms", "descriptio

#### Generate listings using prompt template technique

In [16]:
# Now prompt to get 15 listings.
output = llm.predict(query)
print(output)

{
  "listing": [
    {
      "price": 865000,
      "property_type": "Single-Family Home",
      "bedrooms": 4,
      "bathrooms": 3,
      "year_built": 2018,
      "house_size": 2350,
      "garage_space": 2,
      "hoa_fees": 75,
      "description": "Step into this stunning 4-bedroom, 3-bathroom single-family home, offering 2,350 square feet of thoughtfully designed living space. Built in 2018, this home balances modern style, function, and comfort in one inviting package. Inside, you'll find a bright open floor plan ideal for entertaining and daily living. The gourmet kitchen seamlessly connects to the dining and living areas, while the two-car garage provides ample space for vehicles and storage."
    },
    {
      "price": 550000,
      "property_type": "Condo",
      "bedrooms": 2,
      "bathrooms": 2,
      "year_built": 2005,
      "house_size": 1200,
      "garage_space": 1,
      "hoa_fees": 300,
      "description": "Luxurious 2-bedroom, 2-bathroom condo with 1,200 squar

In [32]:
# Parse output and load data into CSVLoader
result = parser.parse(output)
print(result)

listing=[RealEstateListing(price=865000, bedrooms=4, bathrooms=3, house_size=2350, hoa_fees=75, year_built=2018, garage_space=2, property_type='Single-Family Home', description="Step into this stunning 4-bedroom, 3-bathroom single-family home, offering 2,350 square feet of thoughtfully designed living space. Built in 2018, this home balances modern style, function, and comfort in one inviting package. Inside, you'll find a bright open floor plan ideal for entertaining and daily living. The gourmet kitchen seamlessly connects to the dining and living areas, while the two-car garage provides ample space for vehicles and storage."), RealEstateListing(price=550000, bedrooms=2, bathrooms=2, house_size=1200, hoa_fees=300, year_built=2005, garage_space=1, property_type='Condo', description='Luxurious 2-bedroom, 2-bathroom condo with 1,200 square feet of living space. This modern unit features high-end finishes, a spacious layout, and a private balcony with stunning views. The building ameniti

#### Pandas issues
I can't load Pandas in my environment and I tried to reboot and pip install. Nothing worked. Hence I'm trying a different approach to store the LLM generated data into csv file.

In [56]:
import json
import csv

# Step 1: Parse the JSON string
data = json.loads(output)['listing']

# Step 2: Write to CSV using csv.DictWriter
with open("real_estate_listing.csv", "w", newline="", encoding="utf-8") as csvfile:
    fieldnames = data[0].keys()  # Get column names from first dict
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for row in data:
        writer.writerow(row)

#### Load data from disk

Load data from pre-saved CSV file. In the CSV file it should contain all listing attributes are can be used for vector query

In [58]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='./real_estate_listing.csv')
loaded_data = loader.load()
print(loaded_data)

[Document(page_content="price: 865000\nproperty_type: Single-Family Home\nbedrooms: 4\nbathrooms: 3\nyear_built: 2018\nhouse_size: 2350\ngarage_space: 2\nhoa_fees: 75\ndescription: Step into this stunning 4-bedroom, 3-bathroom single-family home, offering 2,350 square feet of thoughtfully designed living space. Built in 2018, this home balances modern style, function, and comfort in one inviting package. Inside, you'll find a bright open floor plan ideal for entertaining and daily living. The gourmet kitchen seamlessly connects to the dining and living areas, while the two-car garage provides ample space for vehicles and storage.", metadata={'source': './real_estate_listing.csv', 'row': 0}), Document(page_content='price: 550000\nproperty_type: Condo\nbedrooms: 2\nbathrooms: 2\nyear_built: 2005\nhouse_size: 1200\ngarage_space: 1\nhoa_fees: 300\ndescription: Luxurious 2-bedroom, 2-bathroom condo with 1,200 square feet of living space. This modern unit features high-end finishes, a spacio

#### Storing listing into Vector Database
Use LangChain OpenAIEmbeddings and Chroma to create Vector database.

In [61]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
embedding_function = OpenAIEmbeddings()
db = Chroma.from_documents(loaded_data, embedding_function)

#### Collect user 1 personal questions and answer to determine preference

In [72]:
def build_homebuyer_prompt(questions, answers):
    """
    Constructs a prompt that summarizes a home buyer's preferences.

    Parameters:
        questions (list of str): List of preference questions.
        answers (list of str): Corresponding answers to the questions.

    Returns:
        str: A formatted prompt string for use in LLMs or filtering engines.
    """
    preference_details = "\n".join(
        f"- {q.strip('?')}: {a}" for q, a in zip(questions, answers)
    )

    prompt = f"""The following is a summary of a home buyer's personal preferences for purchasing a house:

    {preference_details}

    Use these preferences to recommend or filter properties that best match the buyer's needs. If you cannot
    a match or a recommendation say 'There is no listing that can fit your need.'
    """
    return prompt

In [73]:
questions = [
    "What is your budget in USD?",
    "What is your desired house size?",
    "What is your year built requirement?",
    "How is your family size? How many people need to live in the house?"
]

answers = [
    "1,000,000 USD",
    "2000+ sqft",
    "1991",
    "4 people are living together"
]

# Final prompt
user_query = build_homebuyer_prompt(questions, answers)

print(user_query)

The following is a summary of a home buyer's personal preferences for purchasing a house:

    - What is your budget in USD: 1,000,000 USD
- What is your desired house size: 2000+ sqft
- What is your year built requirement: 1991
- How is your family size? How many people need to live in the house: 4 people are living together

    Use these preferences to recommend or filter properties that best match the buyer's needs. If you cannot
    a match or a recommendation say 'There is no listing that can fit your need.'
    


In [74]:
from langchain.chains import RetrievalQA

rag = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever())
print(rag.run(user_query))

Based on the buyer's preferences, the Single-Family Home priced at 865,000 USD with 2350 sqft of living space and built in 2018 would be the closest match.


#### Collect user 2 personal questions and answer to determine preference

In [81]:
questions_2 = [
    "What is your budget in USD?",
    "What is your desired house size?",
    "What is your year built requirement?",
    "How is your family size? How many people need to live in the house?"
]

answers_2 = [
    "600,000 USD",
    "1000+ sqft",
    "No preference",
    "2 people are living together"
]

user_query_2 = build_homebuyer_prompt(questions_2, answers_2)
print(user_query_2)

The following is a summary of a home buyer's personal preferences for purchasing a house:

    - What is your budget in USD: 600,000 USD
- What is your desired house size: 1000+ sqft
- What is your year built requirement: No preference
- How is your family size? How many people need to live in the house: 2 people are living together

    Use these preferences to recommend or filter properties that best match the buyer's needs. If you cannot
    a match or a recommendation say 'There is no listing that can fit your need.'
    


In [82]:
rag = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever())
print(rag.run(user_query_2))

Based on the buyer's preferences, the best match would be the cozy 3-bedroom, 2-bathroom bungalow priced at $600,000 with 1,800 square feet of living space.


#### Augmented Response Generation

In this section, I'm going to setup the application to allow LLM using RAG to respond to users inquiries.

In [88]:
from langchain.prompts import ChatPromptTemplate

PROMPT_TEMPLATE = """
You are a Real Estate Agent, answer the question based only on the 
following questions and answer collected from the user. You can use
these questions and answer to determine buyer's preference.

{questions_and_answers}

---

Given the context provided above, craft a response that not only 
answers the question, but also ensures that your 
reasoning for prompting the answer aligns with user's preference.
For example, make the description more appeal to the user.

Keep in mind, your response is aiming to attract the user to make a
purchase.

Question: {question}
"""

query_text = "I want to buy a house and I need your help." 
questions_and_answers = "\n".join([f"{q}: {a}" for q, a in zip(questions, answers)])
prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = prompt_template.format(questions_and_answers=questions_and_answers, question=query_text)
results = db.similarity_search_with_relevance_scores(query_text, k=3)
final_prompt = f"{prompt}\n\n{results}"
print(final_prompt)



Human: 
You are a Real Estate Agent, answer the question based only on the 
following questions and answer collected from the user. You can use
these questions and answer to determine buyer's preference.

What is your budget in USD?: 1,000,000 USD
What is your desired house size?: 2000+ sqft
What is your year built requirement?: 1991
How is your family size? How many people need to live in the house?: 4 people are living together

---

Given the context provided above, craft a response that not only 
answers the question, but also ensures that your 
reasoning for prompting the answer aligns with user's preference.
For example, make the description more appeal to the user.

Keep in mind, your response is aiming to attract the user to make a
purchase.

Question: I want to buy a house and I need your help.


[(Document(page_content="price: 865000\nproperty_type: Single-Family Home\nbedrooms: 4\nbathrooms: 3\nyear_built: 2018\nhouse_size: 2350\ngarage_space: 2\nhoa_fees: 75\ndescription: S

In [89]:
response = llm.predict(final_prompt)
print(response)

Based on your preferences, I have found a perfect match for you! 

I recommend considering this stunning 4-bedroom, 3-bathroom single-family home priced at $865,000. This home offers 2,350 square feet of thoughtfully designed living space, meeting your desired size of 2000+ sqft. Built in 2018, this modern home balances style, function, and comfort seamlessly. 

With a spacious open floor plan ideal for entertaining and daily living, this home is perfect for your family of 4. The gourmet kitchen connects to the dining and living areas, creating a warm and inviting atmosphere. Additionally, the two-car garage provides ample space for your vehicles and storage needs.

This property aligns with your budget and size requirements, making it an ideal choice for your new home. Let's schedule a visit to see this dream home in person!
