This is a starter notebook for the project, you'll have to import the libraries you'll need, you can find a list of the ones available in this workspace in the requirements.txt file in this workspace.

Step 1: Setup
Simply install the provided requirements.txt file by running the following command:
```bash
pip install -r requirements.txt
```

In [1]:
import os
def set_open_ai_api():
    os.environ["OPENAI_API_KEY"] =
    os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"

Step 2 & 3: Generating Fake Listings Using LLM and Saving Them

In [None]:
import openai
import os

from langchain.memory import ConversationBufferMemory
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

from langchain.llms import OpenAI


In [None]:
from langchain.prompts import PromptTemplate

listing_template = PromptTemplate(template=
"""Neighborhood: {neighborhood}
Price: ${price}
Bedrooms: {bathrooms}
Bathrooms: {bedrooms}
House Size: {house_size} sqft

Description: {description}

Neighborhood Description: {neighborhood_description}
""",
                                  input_variables=["neighborhood", "price", "bedrooms", "bathrooms",
                                                   "house_size", "description", "neighborhood_description"]
                                  )



In [None]:
example_inputs = [
    {
        "neighborhood": "Downtown Abbey",
        "price": 300000,
        "bedrooms": 3,
        "bathrooms": 2,
        "house_size": 2000,
        "description": "A beautiful appartment with a large terrace overlooking the city's skyscrapers with 3 well-lit bedrooms and 2 bathrooms means you have enough room even for the occasional guest coming over to celebrate new year's with you :)",
        "neighborhood_description": "A vibrant neighborhood which is very centrally located and specially well connected public transport connections means you are close to everything you need for your everyday life from groceries to cafes and restaurants and shopping."
    },
    {
        "neighborhood": "City Heights",
        "price": 500000,
        "bedrooms": 4,
        "bathrooms": 3,
        "house_size": 3000,
        "description": "A cozy house with a wonderful garden overlooking the city from its hillside location",
        "neighborhood_description": "Located in the city outskirts, this neighborhood is known for its great views and quiet streets with a very family-friendly atmosphere. In no time you are in the midst of a beautiful forest where you can calm all your senses and breath in the cool fresh air. Sightings of deer and other wildlife are common."
    },
    {
        "neighborhood": "Green Oaks",
        "price": 800000,
        "bedrooms": 3,
        "bathrooms": 2,
        "house_size": 2000,
        "description": "A luxurious villa with a large garden and a swimming pool with a winter garden, 2 parking spots, and a garage as well as a large terrace and wonderful finishing.",
        "neighborhood_description": "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."
    }
]

In [None]:
from langchain.prompts import FewShotPromptTemplate

few_shot_template = FewShotPromptTemplate(
    example_prompt=listing_template,
    input_variables=["input"],
    examples=example_inputs,
    suffix="Use the examples above to generate the following: {input}",
)

listing_seperator = "===="
instruction = "Generate {} listings of houses. Be creative in regards neighborhood_description, and description and keep it confined to three sentences each. Keep the prices between $125000 $1500000, the number of bedrooms under 6 and the number of bathrooms under 3 and the total area under 3500sqft. VERY IMPORTANT: Output the results in the same format as the examples keeping even the order of properties the same as in the examples. Add the characters {} before each new listing"
num_listings = 30
prompt_to_use = few_shot_template.format(input=instruction.format(num_listings, listing_seperator))



In [None]:
print(prompt_to_use)

In [None]:
from langchain.chat_models import ChatOpenAI

model_name = "gpt-3.5-turbo"
llm = OpenAI(model_name=model_name, temperature=0.0, max_tokens=3500)

In [None]:
result = llm(prompt=prompt_to_use)

In [None]:
print(f"LLM result:\n{result}")

In [None]:
import pandas as pd
import locale
import re

# Set the locale to 'en_US.UTF-8' for parsing numbers with commas
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

# Split the result into individual listings
listings = result.split(listing_seperator)[1:]
# print(f"Here are the listings in a list:\n{listings}")

# Define the columns based on input_variables
columns = ["neighborhood", "price", "bedrooms", "bathrooms", "house_size", "description", "neighborhood_description"]

# Parse the listings into a list of dictionaries
data = []
for listing in listings:
    print(f"Parsing listing: {listing[1:150]} ...\n")
    try:
        entry = {
            "neighborhood": re.search(r"^Neighborhood: (.+)$", listing, re.MULTILINE).group(1),
            "price": locale.atoi(re.search(r"^Price: \$(.+)$", listing, re.MULTILINE).group(1)),
            "bedrooms": int(re.search(r"^Bedrooms: (.+)$", listing, re.MULTILINE).group(1)),
            "bathrooms": int(re.search(r"^Bathrooms: (.+)$", listing, re.MULTILINE).group(1)),
            "house_size": locale.atoi(re.search(r"^House Size: (.+) sqft$", listing, re.MULTILINE).group(1)),
            "description": re.search(r"^Description: (.+)$", listing, re.MULTILINE).group(1),
            "neighborhood_description": re.search(r"^Neighborhood Description: (.+)$", listing, re.MULTILINE).group(1)
        }

        data.append(entry)
    except Exception as e:
        print("Error parsing listing")
        print(listing)
        print(f"Because of \n{e}")
        print("\n")
        continue

# Create a DataFrame
df = pd.DataFrame(data, columns=columns)

Let's view the dataframe and assert the generated input makes sense and finally save into a csv.

In [None]:
df.head(n=num_listings)

In [None]:
df.to_csv("listings.csv", index=False)


Step 4: Building the User Preference Interface

In [2]:
from langchain.llms import OpenAIChat
set_open_ai_api()
llm = OpenAIChat(temperature=0)



In [3]:
# Hard-coding the user preferences using the ones provided in the project - with minor modifications - for reproducibility purposes wit

answers = [
    "A comfortable house with at least 3 bedrooms with a spacious kitchen and a cozy living room with at least 2000sqft",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "$500,000",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
    ]
questions = [
    "How big should your house be?",
    "What are 3 most important things for you in choosing this property?",
    "What is the maximum price you would be willing to pay?"
    "Which amenities would you like?",
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",
]

We create a conversation chain with memory and simulate the question - answer interaction such that we feed its memory with the questions and answers

In [4]:
from langchain.memory import ChatMessageHistory
def create_chat_message_history(questions, answers):
    chat_message_history=ChatMessageHistory()
    for question, answer in zip(questions, answers):
        chat_message_history.add_ai_message(question)
        chat_message_history.add_user_message(answer)
    return chat_message_history

In [29]:
base_system_prompt = """You are a real estate agent. You have a new client who is looking for a new home.

The client has provided answers to important questions in the context. First ask the user if he/she is ready to receive recommendations.

If he/she enters says 'yes', summarize your understanding of the user preferences and ask the user if he/she is ready to receive recommendations telling the user to either type 'yes' or 'no'; If he types 'yes' you first write a summary of his/her preferences. Upon another 'yes' you make your recommednations."""

In [30]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

memory=ConversationBufferMemory(conversation_history=create_chat_message_history(questions,answers), llm=llm, chat_memory=create_chat_message_history(questions,answers),memory_key="context")

qa_template = PromptTemplate(template=
                             ("""%s

Context: {context}

Question: {question}
""" % base_system_prompt),
                             input_variables=["context", "question"]
                             )
qa_chain = ConversationChain(llm=llm,  memory=memory, prompt=qa_template,input_key="question",output_key="answer", verbose=False)


In [31]:


question = "I am waiting for your recommendations"
print(f"The user asked: {question}")
response=qa_chain(inputs={"question": question})
print(f"The smart agent answered: {response['answer']}")


The user asked: I am waiting for your recommendations
The smart agent answered: Are you ready to receive recommendations? Please type 'yes' or 'no'.


In [32]:
question = "yes"
print(f"The user asked: {question}")
response=qa_chain(inputs={"question": question})
print(f"The smart agent answered: {response['answer']}")

The user asked: yes
The smart agent answered: Great! Based on your preferences, it sounds like you are looking for a comfortable home with at least 3 bedrooms, a spacious kitchen, and a cozy living room in a quiet neighborhood with good local schools and convenient shopping options. You are looking to spend up to $500,000 and would like a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system. You also value easy access to transportation options such as a reliable bus line, proximity to a major highway, and bike-friendly roads.

Are you ready to receive recommendations? Please type 'yes' or 'no'.


In [17]:
question = "yes"
print(f"The user asked: {question}")
response=qa_chain(inputs={"question": question})
print(f"The smart agent answered: {response['answer']}")

The user asked: yes
The smart agent answered: Based on your preferences, I recommend the following properties:

1. A 3-bedroom house with a spacious kitchen, cozy living room, and backyard for gardening in a quiet neighborhood with good local schools and convenient shopping options. This property is within your budget of $500,000 and includes a two-car garage and modern, energy-efficient heating system. It is also located near a reliable bus line, major highway, and bike-friendly roads.

2. A similar property with additional features such as a deck for outdoor entertaining and a home office space.

3. A townhouse with 3 bedrooms, a modern kitchen, and a garage in a family-friendly neighborhood with access to schools and shopping. This property also includes a backyard and energy-efficient heating system.

Let me know if you would like more details on any of these recommendations!


We can see the result above is totally fake. Let's see what we can do about this using our generated listings and vector db'

Step 5: Using the Generated Listings to Provide Real Context

In [65]:
# Here is the error
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

memory_with_listings=ConversationBufferMemory(conversation_history=create_chat_message_history(questions,answers), llm=llm, chat_memory=create_chat_message_history(questions,answers),memory_key="context")


def create_prompt_template_with_listings(most_similar_documents_string):
    return PromptTemplate(template=
                          (("""%s. ONLY PROVIDE RECOMME. IF NO LISTINGS ANSWER 'I DON'T HAVE SUITABLE LISTINGS'

Context: {context}

Listings: {listings}

Question: {question}
""") % base_system_prompt),
                          input_variables=["context", "listings", "question"]
                          )


qa_template_with_listings = create_prompt_template_with_listings("")

qa_chain_with_listings = ConversationChain(llm=llm,  memory=memory_with_listings, prompt=qa_template_with_listings,input_key="question" ,output_key="answer", verbose=False)



ValidationError: 1 validation error for ConversationChain
__root__
  Got unexpected prompt input variables. The prompt expects ['context', 'listings', 'question'], but got ['context'] as inputs from memory, and question as the normal input key. (type=value_error)

In [51]:
response=qa_chain_with_listings(inputs={"question":"I am waiting for your recommendations"})

In [52]:
print(f"The smart assistant answered: {response['answer']}")

The smart assistant summarized the preferences as: Are you ready to receive recommendations? Please type 'yes' or 'no'.


In [53]:
response=qa_chain_with_listings(inputs={"question":"yes"})
print(f"The smart assistant answered: {response['answer']}")

The smart assistant answered: Great! Based on your preferences, it sounds like you are looking for a comfortable house with at least 3 bedrooms, a spacious kitchen, and a cozy living room with at least 2000sqft. You value a quiet neighborhood, good local schools, and convenient shopping options. Your maximum budget is $500,000 and you are looking for a property with a backyard for gardening, a two-car garage, and a modern, energy-efficient heating system. You also prefer easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.

Are you ready to receive recommendations? Please type 'yes' or 'no'.


In [55]:
preference_summary_by_smart_assistant=response['answer']

Now we use the preference summary by the smart assistant to retreive suitable listings from the vector db

In [56]:
preference_summary

"Are you ready to receive recommendations? Please type 'yes' or 'no'."

In [57]:
from langchain.document_loaders import CSVLoader
import random

set_open_ai_api()
listings_loader = CSVLoader("listings.csv")
listing_documents=listings_loader.load()
print(f"There are {len(listing_documents)} listings loaded")
sample_listings = random.sample(listing_documents, 2)
print(f"Here are two sample loaded listings:\n{sample_listings}")

There are 28 listings loaded
Here are two sample loaded listings:
[Document(page_content='neighborhood: Riverfront Estates\nprice: 900000\nbedrooms: 5\nbathrooms: 2\nhouse_size: 3200\ndescription: Elegant waterfront property with a private dock and panoramic views of the river. The spacious bedrooms and gourmet kitchen make this home perfect for hosting gatherings and creating lasting memories.\nneighborhood_description: Riverfront Estates is a prestigious neighborhood known for its luxury homes and serene waterfront living. Residents enjoy access to boating, fishing, and other water activities, as well as proximity to upscale dining and shopping options.', metadata={'source': 'listings.csv', 'row': 2}), Document(page_content='neighborhood: Meadowview Park\nprice: 700000\nbedrooms: 5\nbathrooms: 2\nhouse_size: 2500\ndescription: Spacious two-story home with a large backyard and deck for outdoor gatherings and relaxation. The updated kitchen and cozy living room provide a comfortable an

In [58]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings


embeddings_model=OpenAIEmbeddings()
listings_db=Chroma.from_documents(documents=listing_documents,embedding=embeddings_model)


In [59]:
most_similar_documents=listings_db.similarity_search(query=preference_summary_by_smart_assistant,k=5)
print(f"Most similar documents to the user preferences: {most_similar_documents}")

Most similar documents to the user preferences: [Document(page_content='neighborhood: Cedarwood Village\nprice: 225000\nbedrooms: 3\nbathrooms: 2\nhouse_size: 1600\ndescription: Cozy home with a spacious backyard and deck for outdoor gatherings and relaxation. The open floor plan and large windows create a bright and inviting atmosphere throughout the house.\nneighborhood_description: Cedarwood Village is a family-friendly neighborhood with tree-lined streets and nearby parks for outdoor activities. Close to schools, shopping centers, and restaurants, making it a convenient and welcoming place to call home.', metadata={'row': 24, 'source': 'listings.csv'}), Document(page_content='neighborhood: Cedarwood Village\nprice: 225000\nbedrooms: 3\nbathrooms: 2\nhouse_size: 1600\ndescription: Cozy home with a spacious backyard and deck for outdoor gatherings and relaxation. The open floor plan and large windows create a bright and inviting atmosphere throughout the house.\nneighborhood_descript

In [61]:
most_similar_documents_string="\n\n---\n".join([most_similar_document.page_content for most_similar_document in most_similar_documents])

SyntaxError: invalid syntax (4019695576.py, line 1)

In [62]:
qa_template_with_listings_filled=create_prompt_template_with_listings(most_similar_documents_string=most_similar_documents_string)

In [63]:
qa_chain_with_listings.template=qa_template_with_listings_filled

ValueError: "ConversationChain" object has no field "template"