This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

In [4]:
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-2.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m66.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting tzdata>=2022.7
  Downloading tzdata-2024.1-py2.py3-none-any.whl (345 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m345.4/345.4 kB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tzdata, pandas
Successfully installed pandas-2.2.2 tzdata-2024.1


In [25]:
import os
import random
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain import PromptTemplate
from langchain.schema import Document
from fastapi.encoders import jsonable_encoder
from langchain.vectorstores.chroma import Chroma
import pandas as pd
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.evaluation import load_evaluator
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI

import shutil

os.environ["OPENAI_API_KEY"] = "voc-88312162112667733803536698b103749f15.60195553"
os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"


In [2]:
MODEL_NAME = 'gpt-3.5-turbo'

In [3]:
llm = OpenAI(model_name=MODEL_NAME,
            temperature=0)



## 1. Synthetic Data Generation - Generating Real Estate Listings with an LLM


In [4]:


# Creating RealEstateListing and ListingCollection classes
class PropertyListing(BaseModel):
    """
    A real estate listing.
    """
    neighborhood: str = Field(description="The neighborhood where the property is located")
    price: NonNegativeInt = Field(description="The price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="The number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="The number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="The size of the house in square feet")
    description: str = Field(description="A description of the property")
    neighborhood_description: str = Field(description="A description of the neighborhood.")  

class ListingCollection(BaseModel):
    """
    A collection of real estate listings.
    """
    listings: List[PropertyListing] = Field(description="A list of real estate listings")


In [5]:
parser = PydanticOutputParser(pydantic_object=ListingCollection)

In [7]:
generate_data = PromptTemplate(
    template="{instruction}\nSample: {sample}\nFormat: {formatting}",
    input_variables=["instruction","sample"],
    partial_variables={"formatting": parser.get_format_instructions}
)

In [8]:
instruction = "Generate about 20 real estate listings using the sample listing, in the given format."

sample_listing = """Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."""

query = generate_data.format(
            instruction = instruction,
            sample = sample_listing)

print(query)


Generate about 20 real estate listings using the sample listing, in the given format.
Sample: Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy acce

In [9]:
data = llm(query)

In [10]:
data_list = parser.parse(data)
listings = pd.DataFrame(jsonable_encoder(data_list.listings))

In [12]:
listings.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Maple Grove,650000,4,3,2500,"Step into this spacious 4-bedroom, 3-bathroom ...",Maple Grove is known for its top-rated schools...
2,Sunset Hills,900000,5,4,3500,"Luxury awaits in this stunning 5-bedroom, 4-ba...",Sunset Hills is an upscale community known for...
3,Riverfront Estates,1200000,6,5,4500,Welcome to this waterfront paradise in the pre...,Riverfront Estates is an exclusive community w...
4,Pine Ridge,750000,4,3,2800,Step into this beautifully renovated 4-bedroom...,Pine Ridge is a peaceful community with tree-l...


In [13]:
listings.to_csv("real_estate_listings.csv", index_label="id")

## 2. Implementing Semantic Search
**Setting Up the Vector Database:** We begin by initializing and configuring ChromaDB, or an equivalent vector database, to store real estate listings.

**Creating and Storing Embeddings:** Transform the listings generated by the LLM into embeddings that accurately represent the semantic content of each listing, then store these embeddings in the vector database.

In [6]:
# Initialize and configure ChromaDB or a similar vector database to store real estate listings
CHROMA_PATH = "chroma"
CSV_PATH = "real_estate_listings.csv"

In [7]:
embeddings = OpenAIEmbeddings()

In [27]:
# Load CSV data
data_frame = pd.read_csv(CSV_PATH)
doc_list = []
for idx, record in data_frame.iterrows():
    doc_list.append(Document(page_content=record['description'], metadata={'doc_id': str(idx)}))

# Text segmentation
segmenter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
segmented_docs = segmenter.split_documents(doc_list)
print(f"Processed {len(doc_list)} documents into {len(segmented_docs)} segments.")

if segmented_docs:
    sample_doc = segmented_docs[10]
    print(sample_doc.page_content)
    print(sample_doc.metadata)

# Store in Chroma
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

chroma_db = Chroma.from_documents(
    segmented_docs, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
chroma_db.persist()
print(f"Stored {len(segmented_docs)} segments at {CHROMA_PATH}.")


Processed 19 documents into 39 segments.
The gourmet kitchen boasts quartz countertops and stainless steel appliances. Retreat to the master suite with a spa-like bathroom and walk-in closet.
{'doc_id': '4', 'start_index': 198}
Stored 39 segments at chroma.


## Context: Semantic Search for Listings Based on Buyer Preferences

- The following code is designed to perform a **Semantic Search for Listings Aligned with Buyer Preferences**.

- It starts by collecting buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements. These preferences can be gathered through predefined questions or by allowing the buyer to enter their preferences in natural language.

- The code then implements logic to interpret and structure these preferences, making them suitable for querying a vector database. This ensures that the search results closely match the buyer's specified needs.

In [40]:
query_text = "A cozy 4-bedroom apartment with modern amenities, a spacious kitchen, and a balcony with a great view."

In [41]:
PROMPT_TEMPLATE = """
Based on the following context only:

{context}

---

Respond to this requirement : {question}
"""


### Preference-Based Search

**Semantic Search Implementation:** Leverage the structured buyer preferences to conduct a semantic search within the vector database, retrieving listings that align most closely with the user's specified criteria.

**Listing Retrieval Logic:** Optimize the retrieval algorithm to prioritize and select the listings that are most relevant, based on their semantic similarity to the buyer's preferences.

In [38]:
def generate_response(user_query, prompt_template):
    embedding_func = OpenAIEmbeddings()
    chroma_db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_func)

    # Perform similarity search in the database.
    search_results = chroma_db.similarity_search_with_relevance_scores(user_query, k=3)
    if not search_results or search_results[0][1] < 0.7:
        print("No suitable matches found.")
    else:
        combined_context = "\n\n---\n\n".join([result.page_content for result, _ in search_results])
        prompt_builder = ChatPromptTemplate.from_template(prompt_template)
        generated_prompt = prompt_builder.format(context=combined_context, question=user_query)
        print(f"Constructed Prompt:\n{generated_prompt}")
        
        ai_model = ChatOpenAI()
        generated_response = ai_model.predict(generated_prompt)
        source_ids = [result.metadata.get("doc_id", None) for result, _score in search_results]
        final_response = f"Response: {generated_response}\nSources: {source_ids}"
        print(final_response)


In [42]:
generate_response(query_text, PROMPT_TEMPLATE)

Constructed Prompt:
Human: 
Based on the following context only:

Welcome to this charming 4-bedroom, 3-bathroom home in the peaceful Mountain View Heights neighborhood. The open-concept living area features hardwood floors, a cozy fireplace, and a gourmet kitchen with granite countertops. The master suite offers a private balcony with mountain views. Enjoy the

---

appliances. The expansive master suite features a spa-like bathroom and a private balcony with panoramic views. Entertain guests in the backyard oasis with a pool and outdoor kitchen.

---

The gourmet kitchen features high-end appliances and a breakfast nook. Relax in the backyard oasis with a covered patio and lush landscaping.

---

Respond to this requirement : A cozy 4-bedroom apartment with modern amenities, a spacious kitchen, and a balcony with a great view.

Response: Based on the context provided, it seems like the charming 4-bedroom, 3-bathroom home in Mountain View Heights neighborhood would meet your requireme

## 3. Generating Augmented Response

### Personalizing Property Descriptions

**LLM Enhancement:** For every retrieved listing, we utilize the LLM to refine the description, customizing it to align with the buyer's specific preferences. This includes highlighting features of the property that match what the buyer is seeking.

**Preserving Accuracy:** We also ensure that the enhancement process increases the listing's attractiveness without changing any factual details.

In [43]:
PROMPT_TEMPLATE_AUGMENTED ="""
Given the context below:

{context}

---

Formulate a response that not only addresses the question {question}, but also ensures your explanation is unique, engaging, and tailored to the outlined preferences. Focus on subtly highlighting features of the property that match the buyer's specific interests.
"""


In [44]:
generate_response(query_text, PROMPT_TEMPLATE_AUGMENTED)

Constructed Prompt:
Human: 
Given the context below:

Welcome to this charming 4-bedroom, 3-bathroom home in the peaceful Mountain View Heights neighborhood. The open-concept living area features hardwood floors, a cozy fireplace, and a gourmet kitchen with granite countertops. The master suite offers a private balcony with mountain views. Enjoy the

---

appliances. The expansive master suite features a spa-like bathroom and a private balcony with panoramic views. Entertain guests in the backyard oasis with a pool and outdoor kitchen.

---

The gourmet kitchen features high-end appliances and a breakfast nook. Relax in the backyard oasis with a covered patio and lush landscaping.

---

Formulate a response that not only addresses the question A cozy 4-bedroom apartment with modern amenities, a spacious kitchen, and a balcony with a great view., but also ensures your explanation is unique, engaging, and tailored to the outlined preferences. Focus on subtly highlighting features of the 