This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace.

Step 1: Setup
Simply install the provided requirements.txt file by running the following command:
```bash
pip install -r requirements.txt
```

In [1]:
import os
def set_open_ai_api():
    os.environ["OPENAI_API_KEY"] =
    os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

Step 2 & 3: Generating Fake Listings Using LLM and Saving Them

In [None]:
import openai
import os

from langchain.memory import ConversationBufferMemory
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

from langchain.llms import OpenAI


In [None]:
from langchain.prompts import PromptTemplate

listing_template = PromptTemplate(template=
"""Neighborhood: {neighborhood}
Price: ${price}
Bedrooms: {bathrooms}
Bathrooms: {bedrooms}
House Size: {house_size} sqft

Description: {description}

Neighborhood Description: {neighborhood_description}
""",
                                  input_variables=["neighborhood", "price", "bedrooms", "bathrooms",
                                                   "house_size", "description", "neighborhood_description"]
                                  )



In [None]:
example_inputs = [
    {
        "neighborhood": "Downtown Abbey",
        "price": 300000,
        "bedrooms": 3,
        "bathrooms": 2,
        "house_size": 2000,
        "description": "A beautiful appartment with a large terrace overlooking the city's skyscrapers with 3 well-lit bedrooms and 2 bathrooms means you have enough room even for the occasional guest coming over to celebrate new year's with you :)",
        "neighborhood_description": "A vibrant neighborhood which is very centrally located and specially well connected public transport connections means you are close to everything you need for your everyday life from groceries to cafes and restaurants and shopping."
    },
    {
        "neighborhood": "City Heights",
        "price": 500000,
        "bedrooms": 4,
        "bathrooms": 3,
        "house_size": 3000,
        "description": "A cozy house with a wonderful garden overlooking the city from its hillside location",
        "neighborhood_description": "Located in the city outskirts, this neighborhood is known for its great views and quiet streets with a very family-friendly atmosphere. In no time you are in the midst of a beautiful forest where you can calm all your senses and breath in the cool fresh air. Sightings of deer and other wildlife are common."
    },
    {
        "neighborhood": "Green Oaks",
        "price": 800000,
        "bedrooms": 3,
        "bathrooms": 2,
        "house_size": 2000,
        "description": "A luxurious villa with a large garden and a swimming pool with a winter garden, 2 parking spots, and a garage as well as a large terrace and wonderful finishing.",
        "neighborhood_description": "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."
    }
]

In [None]:
from langchain.prompts import FewShotPromptTemplate

few_shot_template = FewShotPromptTemplate(
    example_prompt=listing_template,
    input_variables=["input"],
    examples=example_inputs,
    suffix="Use the examples above to generate the following: {input}",
)

listing_seperator = "===="
instruction = "Generate {} listings of houses. Be creative in regards neighborhood_description, and description and keep it confined to three sentences each. Keep the prices between $125000 $1500000, the number of bedrooms under 6 and the number of bathrooms under 3 and the total area under 3500sqft. VERY IMPORTANT: Output the results in the same format as the examples keeping even the order of properties the same as in the examples. Add the characters {} before each new listing"
num_listings = 30
prompt_to_use = few_shot_template.format(input=instruction.format(num_listings, listing_seperator))



In [None]:
print(prompt_to_use)

In [None]:
from langchain.chat_models import ChatOpenAI

model_name = "gpt-3.5-turbo"
llm = OpenAI(model_name=model_name, temperature=0.0, max_tokens=3500)

In [None]:
result = llm(prompt=prompt_to_use)

In [None]:
print(f"LLM result:\n{result}")

In [None]:
import pandas as pd
import locale
import re

# Set the locale to 'en_US.UTF-8' for parsing numbers with commas
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

# Split the result into individual listings
listings = result.split(listing_seperator)[1:]
# print(f"Here are the listings in a list:\n{listings}")

# Define the columns based on input_variables
columns = ["neighborhood", "price", "bedrooms", "bathrooms", "house_size", "description", "neighborhood_description"]

# Parse the listings into a list of dictionaries
data = []
for listing in listings:
    print(f"Parsing listing: {listing[1:150]} ...\n")
    try:
        entry = {
            "neighborhood": re.search(r"^Neighborhood: (.+)$", listing, re.MULTILINE).group(1),
            "price": locale.atoi(re.search(r"^Price: \$(.+)$", listing, re.MULTILINE).group(1)),
            "bedrooms": int(re.search(r"^Bedrooms: (.+)$", listing, re.MULTILINE).group(1)),
            "bathrooms": int(re.search(r"^Bathrooms: (.+)$", listing, re.MULTILINE).group(1)),
            "house_size": locale.atoi(re.search(r"^House Size: (.+) sqft$", listing, re.MULTILINE).group(1)),
            "description": re.search(r"^Description: (.+)$", listing, re.MULTILINE).group(1),
            "neighborhood_description": re.search(r"^Neighborhood Description: (.+)$", listing, re.MULTILINE).group(1)
        }

        data.append(entry)
    except Exception as e:
        print("Error parsing listing")
        print(listing)
        print(f"Because of \n{e}")
        print("\n")
        continue

# Create a DataFrame
df = pd.DataFrame(data, columns=columns)

Let's view the dataframe and assert the generated input makes sense and finally save into a csv.

In [None]:
df.head(n=num_listings)

In [None]:
df.to_csv("listings.csv", index=False)


Step 4: Building the User Preference Interface

In [185]:
from langchain.llms import OpenAIChat
set_open_ai_api()
llm = OpenAIChat(temperature=0.0)

In [186]:
# Hard-coding the user preferences using the ones provided in the project - with minor modifications - for reproducibility purposes wit

answers = [
    "A comfortable house with at least 3 bedrooms with a spacious kitchen and a cozy living room with at least 2000sqft",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "$500,000",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
    ]
questions = [
    "How big should your house be?",
    "What are 3 most important things for you in choosing this property?",
    "What is the maximum price you would be willing to pay?"
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
]

We create a conversation chain with memory and simulate the question - answer interaction such that we feed its memory with the questions and answers

In [187]:
from langchain.memory import ChatMessageHistory
def create_chat_message_history(questions, answers):
    chat_message_history=ChatMessageHistory()
    for question, answer in zip(questions, answers):
        chat_message_history.add_ai_message(question)
        chat_message_history.add_user_message(answer)
    return chat_message_history

In [191]:
base_system_prompt = """You are a real estate agent. You have a new client who is looking for a new home.

The client has provided answers to important questions in the context.

Proceed as follows:
- 1. First ask the user if he/she is ready to receive recommendations.
    -2. If he/she enters says 'yes', summarize your understanding of the user preferences and ask the user if he/she is ready to receive recommendations telling the user to either type 'yes' or 'no'
        -3. If he types 'yes' you match the user preferences with listings and make suggestions."""

In [192]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

memory=ConversationBufferMemory(conversation_history=create_chat_message_history(questions,answers), llm=llm, chat_memory=create_chat_message_history(questions,answers),memory_key="context")

qa_template = PromptTemplate(template=
                             ("""%s

Context: {context}

Question: {question}
""" % base_system_prompt),
                             input_variables=["context", "question"]
                             )
qa_chain = ConversationChain(llm=llm,  memory=memory, prompt=qa_template,input_key="question",output_key="answer", verbose=False)


In [193]:


question = "I am waiting for your recommendations"
print(f"The user asked: {question}")
response=qa_chain(inputs={"question": question})
print(f"The smart agent answered: {response['answer']}")


The user asked: I am waiting for your recommendations
The smart agent answered: 1. Are you ready to receive recommendations based on your preferences?
2. Based on your preferences, I understand that you are looking for a comfortable house with at least 3 bedrooms, a spacious kitchen, and a cozy living room with at least 2000sqft. You value a quiet neighborhood, good local schools, and convenient shopping options. Your maximum budget is $500,000 and you are looking for a property with a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system. You also prefer easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads. Are you ready to receive recommendations? Please type 'yes' or 'no'.


In [194]:
question = "yes"
print(f"The user asked: {question}")
response=qa_chain(inputs={"question": question})
print(f"The smart agent answered: {response['answer']}")

The user asked: yes
The smart agent answered: Great! Based on your preferences, I have found a few listings that match your criteria. Here are some recommendations:

1. A beautiful 3-bedroom house in a quiet neighborhood with good schools nearby. This property features a spacious kitchen, a cozy living room, and a backyard for gardening. It also includes a two-car garage and a modern, energy-efficient heating system. Convenient shopping options are just a short drive away. The price of this property is within your budget at $480,000.

2. A charming 4-bedroom home with a large kitchen and a comfortable living room. Located in a peaceful neighborhood with easy access to a reliable bus line and bike-friendly roads. This property also offers a two-car garage, a backyard for gardening, and is close to major highways for convenient commuting. Priced at $490,000, this property is a great fit for your preferences.

Please let me know if you would like more information on any of these listings 

We can see the result above is totally fake. Let's see what we can do about this using our generated listings and vector db'

Step 5: Using the Generated Listings to Provide Real Context

In [195]:
from typing import List, Dict, Any
from langchain.memory import ConversationBufferMemory

# Thanks to https://github.com/langchain-ai/langchain/issues/1800#issuecomment-1598128244
class ExtendedConversationBufferMemory(ConversationBufferMemory):
    extra_variables:List[str] = []

    @property
    def memory_variables(self) -> List[str]:
        """Will always return list of memory variables."""
        return [self.memory_key] + self.extra_variables

    def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
        """Return buffer with history and extra variables"""
        d = super().load_memory_variables(inputs)
        d.update({k:inputs.get(k) for k in self.extra_variables})
        return d

In [196]:
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate


memory_with_listings=ExtendedConversationBufferMemory(
    conversation_history=create_chat_message_history(questions,answers),
    llm=llm,
    chat_memory=create_chat_message_history(questions,answers),
    memory_key="context",
    extra_variables=["listings"])


qa_template_with_listings=PromptTemplate(template=
                          (("""%s. Only provide suggestions from the available listings. '

Context: {context}

Listings: {listings}

Question: {question}
""") % base_system_prompt),
                          input_variables=["context", "listings", "question"]
                          )



qa_chain_with_listings = ConversationChain(llm=llm,  memory=memory_with_listings, prompt=qa_template_with_listings,input_key="question" ,output_key="answer", verbose=True)



In [197]:
response=qa_chain_with_listings(inputs={"question":"I am waiting for your recommendations"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a real estate agent. You have a new client who is looking for a new home.

The client has provided answers to important questions in the context.

Proceed as follows:
- 1. First ask the user if he/she is ready to receive recommendations.
    -2. If he/she enters says 'yes', summarize your understanding of the user preferences and ask the user if he/she is ready to receive recommendations telling the user to either type 'yes' or 'no'
        -3. If he types 'yes' you match the user preferences with listings and make suggestions.. Only provide suggestions from the available listings. '

Context: AI: How big should your house be?
Human: A comfortable house with at least 3 bedrooms with a spacious kitchen and a cozy living room with at least 2000sqft
AI: What are 3 most important things for you in choosing this property?
Human: A quiet neighborhood, good local schools, and convenient shopping 

In [198]:
print(f"The smart assistant answered: {response['answer']}")

The smart assistant answered: 1. Are you ready to receive recommendations based on your preferences?
2. Based on your preferences for a comfortable house with at least 3 bedrooms, a spacious kitchen, and a cozy living room with at least 2000sqft, in a quiet neighborhood with good local schools and convenient shopping options, with a maximum budget of $500,000, amenities such as a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system, and transportation options including easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads, are you ready to receive recommendations? Please type 'yes' or 'no'.


In [199]:
preference_summary_by_smart_assistant=response['answer']

Now we use the preference summary by the smart assistant to retreive suitable listings from the vector db

In [200]:
from langchain.document_loaders import CSVLoader
import random

set_open_ai_api()
listings_loader = CSVLoader("listings.csv")
listing_documents=listings_loader.load()
print(f"There are {len(listing_documents)} listings loaded")
sample_listings = random.sample(listing_documents, 2)
print(f"Here are two sample loaded listings:\n{sample_listings}")

There are 28 listings loaded
Here are two sample loaded listings:
[Document(page_content='neighborhood: Pinecrest Meadows\nprice: 400000\nbedrooms: 3\nbathrooms: 2\nhouse_size: 1800\ndescription: Newly renovated home with modern finishes and a spacious backyard for outdoor entertaining. The open floor plan and large windows create a bright and inviting atmosphere throughout the house.\nneighborhood_description: Pinecrest Meadows is a family-friendly neighborhood with well-maintained parks and playgrounds for children to enjoy. Conveniently located near shopping centers, restaurants, and schools, making it a desirable place to live for families and young professionals.', metadata={'source': 'listings.csv', 'row': 4}), Document(page_content='neighborhood: Cedarwood Village\nprice: 225000\nbedrooms: 3\nbathrooms: 2\nhouse_size: 1600\ndescription: Cozy home with a spacious backyard and deck for outdoor gatherings and relaxation. The open floor plan and large windows create a bright and inv

In [201]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings


embeddings_model=OpenAIEmbeddings()
listings_db=Chroma.from_documents(documents=listing_documents,embedding=embeddings_model)


In [202]:
most_similar_documents=listings_db.similarity_search(query=preference_summary_by_smart_assistant,k=5)
print(f"Most similar documents to the user preferences: {most_similar_documents}")

Most similar documents to the user preferences: [Document(page_content='neighborhood: Meadowview Village\nprice: 700000\nbedrooms: 5\nbathrooms: 2\nhouse_size: 2500\ndescription: Spacious two-story home with a large backyard and deck for outdoor gatherings and relaxation. The updated kitchen and cozy living room provide a comfortable and inviting space for families to enjoy.\nneighborhood_description: Meadowview Village is a family-friendly neighborhood with top-rated schools and community parks for children to play and explore. Close to shopping centers, restaurants, and recreational facilities, making it a convenient and desirable place to live for families and young professionals.', metadata={'row': 20, 'source': 'listings.csv'}), Document(page_content='neighborhood: Meadowview Village\nprice: 700000\nbedrooms: 5\nbathrooms: 2\nhouse_size: 2500\ndescription: Spacious two-story home with a large backyard and deck for outdoor gatherings and relaxation. The updated kitchen and cozy liv

In [205]:
most_similar_documents_string="\n\n---\n".join([most_similar_document.page_content for most_similar_document in most_similar_documents])
print(most_similar_documents_string)

neighborhood: Meadowview Village
price: 700000
bedrooms: 5
bathrooms: 2
house_size: 2500
description: Spacious two-story home with a large backyard and deck for outdoor gatherings and relaxation. The updated kitchen and cozy living room provide a comfortable and inviting space for families to enjoy.
neighborhood_description: Meadowview Village is a family-friendly neighborhood with top-rated schools and community parks for children to play and explore. Close to shopping centers, restaurants, and recreational facilities, making it a convenient and desirable place to live for families and young professionals.

---
neighborhood: Meadowview Village
price: 700000
bedrooms: 5
bathrooms: 2
house_size: 2500
description: Spacious two-story home with a large backyard and deck for outdoor gatherings and relaxation. The updated kitchen and cozy living room provide a comfortable and inviting space for families to enjoy.
neighborhood_description: Meadowview Village is a family-friendly neighborhood 

In [206]:
response=qa_chain_with_listings(inputs={"question":"yes","listings":most_similar_documents_string})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a real estate agent. You have a new client who is looking for a new home.

The client has provided answers to important questions in the context.

Proceed as follows:
- 1. First ask the user if he/she is ready to receive recommendations.
    -2. If he/she enters says 'yes', summarize your understanding of the user preferences and ask the user if he/she is ready to receive recommendations telling the user to either type 'yes' or 'no'
        -3. If he types 'yes' you match the user preferences with listings and make suggestions.. Only provide suggestions from the available listings. '

Context: AI: How big should your house be?
Human: A comfortable house with at least 3 bedrooms with a spacious kitchen and a cozy living room with at least 2000sqft
AI: What are 3 most important things for you in choosing this property?
Human: A quiet neighborhood, good local schools, and convenient shopping 

In [207]:
print(f"The smart assistant answered: {response['answer']}")

The smart assistant answered: Based on your preferences for a comfortable house with at least 3 bedrooms, a spacious kitchen, and a cozy living room with at least 2000sqft, in a quiet neighborhood with good local schools and convenient shopping options, with a maximum budget of $500,000, amenities such as a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system, and transportation options including easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads, I recommend the following listing:

Listing in Meadowview Village:
- Price: $700,000
- Bedrooms: 5
- Bathrooms: 2
- House Size: 2500 sqft
- Description: Spacious two-story home with a large backyard and deck for outdoor gatherings and relaxation. The updated kitchen and cozy living room provide a comfortable and inviting space for families to enjoy.
- Neighborhood Description: Meadowview Village is a family-friendly neighborhood with top-rated schools and community park

Let's verify if the listing indeed exists

In [214]:
import pandas as pd
df=pd.read_csv("listings.csv")
filtered_df=df[df["neighborhood"]=="Meadowview Village"]
filtered_df.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
20,Meadowview Village,700000,5,2,2500,Spacious two-story home with a large backyard ...,Meadowview Village is a family-friendly neighb...


In [None]:
Good job :)