This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace. 

# Step 1: Setting Up the Python Application

In [1]:
!pip install pandas
!pip install chromadb
!pip install langchain
!pip install numpy
!pip install -U langchain-openai
!pip install pydantic
!pip install shutil
!pip install openai==0.28



ERROR: Could not find a version that satisfies the requirement shutil (from versions: none)
ERROR: No matching distribution found for shutil


In [1]:
import os
import pandas as pd
import shutil
from dataclasses import dataclass

from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.evaluation import load_evaluator
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.vectorstores.chroma import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field, NonNegativeInt
from langchain.prompts import PromptTemplate
from fastapi.encoders import jsonable_encoder

# Step 2: Generating Real Estate Listings

## Define OpenAI model and API Key

In [2]:
# Environment variables
OPENAI_API_KEY = 'YOUR_OPENAI_KEY'
MODEL_NAME = 'gpt-3.5-turbo'

## Load LLM

In [3]:
# load the model
llm = OpenAI(model_name=MODEL_NAME, temperature=0, api_key=OPENAI_API_KEY)

INSTRUCTION = "Generate a CSV file with at least 10 real estate listings."
SAMPLE_LISTING = \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""



In [4]:
class RealEstateListing(BaseModel):
    """
    A real estate listing.
    
    Attributes:
    - neighborhood: str
    - price: NonNegativeInt
    - bedrooms: NonNegativeInt
    - bathrooms: NonNegativeInt
    - house_size: NonNegativeInt
    - description: str
    - neighborhood_description: str
    """
    neighborhood: str = Field(description="The neighborhood where the property is located")
    price: NonNegativeInt = Field(description="The price of the property in USD")
    bedrooms: NonNegativeInt = Field(description="The number of bedrooms in the property")
    bathrooms: NonNegativeInt = Field(description="The number of bathrooms in the property")
    house_size: NonNegativeInt = Field(description="The size of the house in square feet")
    description: str = Field(description="A description of the property")
    neighborhood_description: str = Field(description="A description of the neighborhood.")  

class ListingCollection(BaseModel):
    """
    A collection of real estate listings.
    
    Attributes:
    - listings: List[RealEstateListing]
    """
    listings: List[RealEstateListing] = Field(description="A list of real estate listings")

In [5]:
# generate parsed output
parser = PydanticOutputParser(pydantic_object=ListingCollection)

In [6]:
# printing the prompt
prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions},
)

query = prompt.format(
    instruction=INSTRUCTION,
    sample=SAMPLE_LISTING,
)
print(query)

Generate a CSV file with at least 10 real estate listings.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bi

In [7]:
# get the response
response = llm(query)

  warn_deprecated(


In [9]:
# create a dataframe from the response
result = parser.parse(response)
df = pd.DataFrame(jsonable_encoder(result.listings))
df.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Sunnyvale,950000,4,3,2500,Located in the desirable neighborhood of Sunny...,"Sunnyvale is known for its top-rated schools, ..."
2,Brooklyn Heights,1200000,5,4,3500,"Welcome to this stunning 5-bedroom, 4-bathroom...",Brooklyn Heights is a charming neighborhood kn...
3,Pacific Palisades,3500000,6,5,5000,"Experience luxury living in this 6-bedroom, 5-...",Pacific Palisades is a sought-after community ...
4,Georgetown,1800000,4,3,3000,Situated in the historic neighborhood of Georg...,Georgetown is known for its cobblestone street...


In [10]:
# save the dataframe to a csv file
df.to_csv('real_estate_listings.csv', index_label = 'id')

# Step 3: Storing Listings in a Vector Database
* Vector Database Setup: Initialize and configure ChromaDB or a similar vector database to store real estate listings.
* 
Generating and Storing Embeddings: Convert the LLM-generated listings into suitable embeddings that capture the semantic content of each listing, and store these embeddings in the vector database.e

In [None]:
os.environ["OPENAI_API_KEY"] ="YOUR_OPENAI_KEY"

# Get embedding for a word.
embedding_function = OpenAIEmbeddings()
vector = embedding_function.embed_query("new york")
print(f"Vector for 'new york': {vector}")
print(f"Vector length: {len(vector)}")

# Compare vector of two words
evaluator = load_evaluator('pairwise_embedding_distance')
words = ("new york", "nyc")
x = evaluator.evaluate_string_pairs(prediction=words[0], prediction_b=words[1])
print(f"Comparing ({words[0]}, {words[1]}): {x}")

# Step 4: Building the User Preference Interface
 * Collect buyer preferences, such as the number of bedrooms, bathrooms, location, and other specific requirements from a set of questions or telling the buyer to enter their preferences in natural language. You can hard-code the buyer preferences in questions and answers, or collect them interactively however you'd like, example:

* Buyer Preference Parsing: Implement logic to interpret and structure these preferences for querying the vector database.

# Step 5: Searching Based on Preferences

Semantic Search Implementation: Use the structured buyer preferences to perform a semantic search on the vector database, retrieving listings that most closely match the user's requirements.
Listing Retrieval Logic: Fine-tune the retrieval algorithm to ensure that the most relevant listings are selected based on the semantic closeness to the buyer’s preferencct Submission


# Step 6: Personalizing Listing Descriptions

LLM Augmentation: For each retrieved listing, use the LLM to augment the description, tailoring it to resonate with the buyer’s specific preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
Maintaining Factual Integrity: Ensure that the augmentation process enhances the appeal of the listing without altering factual information.


# Step 7: Deliverables and Testing

Test your "HomeMatch" application and make sure it meets all of the requirements in the rubric(opens in a new tab). Your project code will be run when it's assessed. Enter different "buyer preferences" and ensure it works.
Jupyter Notebook/Python Program: Compile the application code in a Jupyter notebook or a standalone Python program. Ensure the code is well-commented and logically structured.
Example Outputs: Include example outputs showcasing how user preferences are processed and how the application generates personalized listing descriptions. You can include these in comments in your application or in a Jupyter notebook that's saved with outputs.
Step 8: Project Submission