# Using OpenAI with Udacity

```
from openai import OpenAI
client = OpenAI(
    base_url = "https://openai.vocareum.com/v1",
    api_key = "voc-00000000000000000000000000000000abcd.12345678"
)
```

**My API Key is in Bitwarden**



# 1. Criteria

The submission must demonstrate using a Large Language Model (LLM) to generate at least 10 diverse and realistic real estate listings containing facts about the real estate.

**Example:**
```
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description:
Welcome to this eco-friendly oasis nestled in the heart of Green Oaks.
This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure.
Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes.
The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family.
Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description:
Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths.
Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe.
With easy access to public transportation and bike lanes, commuting is a breeze.
```


# Solve 1. criteria

In [1]:
!pip install -q openai

In [2]:
# TODO: Replace/Remove me
api_key = "TODO"

In [3]:
# Setup openai
import openai

# Configure OpenAI client
client = openai.OpenAI(
    api_key=api_key,
    base_url="https://openai.vocareum.com/v1",
)

openai_model = "gpt-4.1-mini"

In [4]:
# Ask the AI model for a nice prompt to generate 10 diverse and realistic
# real estate listings

prompt = """
### Content
I should leverage AI to generate at least 1 0 diverse and realistic real estate listings.
They should be diverse and realistic!

Here is an example how such an listing can look like.
### Example
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description:
Welcome to this eco-friendly oasis nestled in the heart of Green Oaks.
This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure.
Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes.
The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family.
Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description:
Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths.
Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe.
With easy access to public transportation and bike lanes, commuting is a breeze.

## System prompt
Furthermore, I need a system prompt so that the system knows exactly what it has to generate.
Maybe something like "You're an real estate agent and...."

### Help

You should help me crafting a nice prompt to an AI model that generates me at least 10 such listings.
I guess it also make sense to include my example in the prompt.
Additionally, the AI model should return those in json format, so that I can parse them nicely with python.
The output should be in the following format:
```
{
  "system_prompt": "Your prompt here",
  "user_prompt": "Prompt for the AI model here"
}
```

I also think it make sense to include the example in the user_prompt!

The output of the listings example should also be in json:
```
[
  {
    "neighborhood": "Green Oaks",
    "price": 800000,
    "bedrooms": 3,
    "bathrooms": 2,
    "house_size": 2000,
    "description": "...",
    "neighborhood_description": "..."
  },
  ...
]

Do not include anything else in the output.
Just the json I described above.
I want to assign the output to an variable in python to move forward.
"""

response = client.chat.completions.create(
    model=openai_model,
    messages=[
        {"role": "user", "content": prompt}
    ],
    response_format={ "type": "json_object" }
)

prompts_for_listings = response.choices[0].message.content
print(prompts_for_listings)

{
  "system_prompt": "You are a professional real estate agent tasked with generating realistic and diverse real estate listings. Each listing must include detailed property information, a compelling description of the home, and a neighborhood description that captures the unique atmosphere and amenities of the area. Your goal is to create at least 10 diverse listings that vary in price, size, style, and location, ensuring they are believable and engaging for potential buyers. Return the listings in a structured JSON format as specified.",
  "user_prompt": "Generate at least 10 diverse and realistic real estate listings in JSON format. Each listing should include the following fields: neighborhood, price (integer), bedrooms (integer), bathrooms (integer), house_size (integer, in sqft), description (a detailed paragraph describing the property), and neighborhood_description (a paragraph describing the neighborhood and its attractions). Use varied property types, price ranges, neighborho

In [5]:
# Take the prompts generate by AI and lets generate a few listings!
import json
prompts_for_listings_json = json.loads(prompts_for_listings)

response = client.chat.completions.create(
    model=openai_model,
    messages=[
        {"role": "system", "content": prompts_for_listings_json["system_prompt"]},
        {"role": "user", "content": prompts_for_listings_json["user_prompt"]}
    ]
)

listings_in_json = response.choices[0].message.content
print(listings_in_json)

[
  {
    "neighborhood": "Maplewood Heights",
    "price": 425000,
    "bedrooms": 3,
    "bathrooms": 2,
    "house_size": 1700,
    "description": "This beautifully updated 3-bedroom, 2-bathroom bungalow offers modern comfort with a classic touch. The spacious living room features a gas fireplace and large windows that flood the home with natural light. The kitchen is equipped with stainless steel appliances, granite countertops, and a breakfast nook. Outside, a fenced backyard and deck provide a perfect space for entertaining or relaxing with family and pets. The master suite includes a private bath and ample closet space.",
    "neighborhood_description": "Maplewood Heights is a friendly suburban enclave known for its tree-lined streets, excellent schools, and abundant parks. Residents enjoy community farmers’ markets, nearby hiking trails, and family-friendly events at the local community center. Convenient shopping centers and dining options are within a short drive, making it i

# 2. Criteria

The project must demonstrate the creation of a vector database and successfully storing real estate listing embeddings within it. The database should effectively store and organize the embeddings generated from the LLM-created listings.

# Solve 2. criteria

> **Note:** The following code will use the listings generated by the AI model from the previous cells.
I also prepared a `listing.json` which can be used instead.
This was also once created by an LLM and I used that for testing.
If you want to use it, look up the cell which explains how to load it instead of using the **newly** generated listings.

In [6]:
!pip install -q chromadb

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m78.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m68.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m66.2 MB/s[0m eta [36m0:00:00[

In [7]:
# Load the chroma client and set it up
import chromadb

chroma_client = chromadb.PersistentClient(path="./chroma_db/")
collection_property = chroma_client.get_or_create_collection(name="real_estate_listings_property_desc")
collection_neighborhood = chroma_client.get_or_create_collection(name="real_estate_listings_neighborhood_desc")

In [8]:
# HEADS UP!!
# If you want to override the AI generated listings
# with prepared data run this cell!
# Otherwise just ignore/jump over it.

# To make it work, place the listings.json file
# just next to this notebook file so it can be loaded correctly

listings_file = open('listings.json', 'r')
listings_in_json = listings_file.read()
listings_file.close()

In [9]:
# Prepare the listings to make it easier to save in the DB
import json

listings = json.loads(listings_in_json)

# Print the first listing to demonstrate that it worked
print(json.dumps(listings[0], indent=2))

{
  "neighborhood": "Maplewood Heights",
  "price": 425000,
  "bedrooms": 3,
  "bathrooms": 2,
  "house_size": 1500,
  "description": "This cozy 3-bedroom, 2-bathroom ranch home in Maplewood Heights offers a spacious open floor plan with remodeled kitchen featuring granite countertops and stainless steel appliances. The large backyard with deck is perfect for entertaining or quiet evenings with family. Natural light fills every room thanks to abundant windows, and the finished basement adds extra living space or storage.",
  "neighborhood_description": "Maplewood Heights is a family-friendly suburban neighborhood known for its excellent public schools, community parks, and tree-lined streets. Residents enjoy local farmers\u2019 markets and easy access to downtown shops and restaurants within a 15-minute drive."
}


In [10]:
# Save the data into the DB
for i, listing in enumerate(listings):
  metadatas = listing.copy()
  del metadatas["description"]
  del metadatas["neighborhood_description"]

  collection_property.add(
    documents=listing["description"],
    ids=[str(i)],
    metadatas=[metadatas]
  )
  collection_neighborhood.add(
    documents=listing["neighborhood_description"],
    ids=[str(i)],
    metadatas=[metadatas]
  )

/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:02<00:00, 35.5MiB/s]


In [11]:
# Print the first listing to demonstate that it worked
data = collection_property.get(ids=["0"], include=["documents", "metadatas"])
print(f"Document:\n{data['documents'][0]}")
print(f"Metadata:\n{data['metadatas'][0]}")

print("---")

data = collection_neighborhood.get(ids=["0"], include=["documents", "metadatas"])
print(f"Document:\n{data['documents'][0]}")
print(f"Metadata:\n{data['metadatas'][0]}")

Document:
This cozy 3-bedroom, 2-bathroom ranch home in Maplewood Heights offers a spacious open floor plan with remodeled kitchen featuring granite countertops and stainless steel appliances. The large backyard with deck is perfect for entertaining or quiet evenings with family. Natural light fills every room thanks to abundant windows, and the finished basement adds extra living space or storage.
Metadata:
{'bathrooms': 2, 'house_size': 1500, 'price': 425000, 'bedrooms': 3, 'neighborhood': 'Maplewood Heights'}
---
Document:
Maplewood Heights is a family-friendly suburban neighborhood known for its excellent public schools, community parks, and tree-lined streets. Residents enjoy local farmers’ markets and easy access to downtown shops and restaurants within a 15-minute drive.
Metadata:
{'price': 425000, 'house_size': 1500, 'neighborhood': 'Maplewood Heights', 'bedrooms': 3, 'bathrooms': 2}


# 3. Critera

The application must include a functionality where listings are semantically searched based on given buyer preferences. The search should return listings that closely match the input preferences.

# Solve 3. criteria

In [12]:
# Here we prepare a function for the similarity search
def search_property_similarity(query, pricing, bedrooms, bathrooms, size):
  return collection_property.query(
      query_texts=[query],
      where={
        "$and": [
          {"price": {"$gte": pricing}},
          {"bedrooms": {"$gte": bedrooms}},
          {"bathrooms": {"$gte": bathrooms}},
          {"house_size": {"$gte": size}}
        ]
      }
  )

def search_neighborhood_similarity(query, pricing, bedrooms, bathrooms, size):
  return collection_neighborhood.query(
      query_texts=[query],
      where={
        "$and": [
          {"price": {"$gte": pricing}},
          {"bedrooms": {"$gte": bedrooms}},
          {"bathrooms": {"$gte": bathrooms}},
          {"house_size": {"$gte": size}}
        ]
      }
  )

In [13]:
# Test our defined functions
buyer_notes = "I need a luxury villa. Playgrounds should be nearby."
result = search_property_similarity(buyer_notes, 3000, 3, 2, 2000)
print(result)
buyer_notes = "I need a luxury villa. Playgrounds should be nearby."
result = search_neighborhood_similarity(buyer_notes, 3000, 3, 2, 2000)
print(result)

{'ids': [['6', '2', '4']], 'embeddings': None, 'documents': [['Luxurious estate home in Hilltop Estates with spectacular mountain views. Featuring 5 bedrooms, each with en-suite baths, high ceilings, designer finishes, a gourmet kitchen with professional-grade appliances, and a home theater. Expansive patios and an infinity-edge pool make this property an entertainer’s dream. The gated community offers privacy and exclusivity.', 'Stunning waterfront home in Seaside Cove featuring panoramic ocean views from every room. This elegant 4-bedroom, 3-bathroom property offers a gourmet kitchen with custom cabinetry, a sunroom, and an expansive deck perfect for watching sunsets. The master suite includes a spa-like bathroom and walk-in closet. Enjoy private beach access and a two-car garage.', "Modern family home in Crestview Meadows with 4 spacious bedrooms, 3 baths, and an open-concept living room that flows seamlessly into a chef's kitchen with stainless steel appliances and center island. T

In [14]:
# Since we do a property description and neighborhood search
# we have to sum the distance of both results
# to find the most relevant property
def find_best_matching_properties(query, pricing, bedrooms, bathrooms, size):
  result_property = search_property_similarity(query, pricing, bedrooms, bathrooms, size)
  result_neighborhood = search_neighborhood_similarity(query, pricing, bedrooms, bathrooms, size)

  combined_properties = {}

  for i, prop_id in enumerate(result_property['ids'][0]):
      combined_properties[prop_id] = {
          'id': prop_id,
          'description': result_property['documents'][0][i],
          'distance_1': result_property['distances'][0][i],
          'metadata': result_property['metadatas'][0][i]
      }

  for i, prop_id in enumerate(result_neighborhood['ids'][0]):
      if prop_id in combined_properties:
          combined_properties[prop_id]['neighborhood_description'] = result_neighborhood['documents'][0][i]
          combined_properties[prop_id]['distance_2'] = result_neighborhood['distances'][0][i]
          # Calculate average distance
          d1 = combined_properties[prop_id]['distance_1']
          d2 = result_neighborhood['distances'][0][i]
          combined_properties[prop_id]['average_distance'] = (d1 + d2) / 2

  return combined_properties

def find_best_matching_property(best_matching_properties):
  return min(best_matching_properties.values(), key=lambda x: x['average_distance'])

In [15]:
# Test our defined functions
best_matching_properties = find_best_matching_properties(buyer_notes, 3000, 3, 2, 2000)
print(f"Best matching properties:\n{best_matching_properties}")

best_matching_property = find_best_matching_property(best_matching_properties)
print(f"Best matching property:\n{best_matching_property}")

Best matching properties:
{'6': {'id': '6', 'description': 'Luxurious estate home in Hilltop Estates with spectacular mountain views. Featuring 5 bedrooms, each with en-suite baths, high ceilings, designer finishes, a gourmet kitchen with professional-grade appliances, and a home theater. Expansive patios and an infinity-edge pool make this property an entertainer’s dream. The gated community offers privacy and exclusivity.', 'distance_1': 0.869187593460083, 'metadata': {'bedrooms': 5, 'price': 2450000, 'neighborhood': 'Hilltop Estates', 'bathrooms': 5, 'house_size': 4300}, 'neighborhood_description': 'Hilltop Estates is an upscale, serene neighborhood perched on rolling hills with large lots and stunning vistas. Residents enjoy hiking trails, equestrian facilities, and close proximity to renowned private schools and fine dining.', 'distance_2': 1.0739542245864868, 'average_distance': 0.9715709090232849}, '2': {'id': '2', 'description': 'Stunning waterfront home in Seaside Cove featuri

# 4. & 5. Criteria

4. The project must demonstrate a logical flow where buyer preferences are used to search and then augment the description of real estate listings. The augmentation should personalize the listing without changing factual information.

5. The submission must utilize an LLM to generate personalized descriptions for the real estate listings based on buyer preferences. The descriptions should be unique, appealing, and tailored to the preferences provided.

# Solve 4. & 5. critera

In [16]:
!pip install -q ipywidgets

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━[0m [32m1.4/1.6 MB[0m [31m43.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [17]:
# Create a UI with ipywidgets
import ipywidgets as widgets

def create_range_dropdown(start, stop, step, label):
  items = list(range(start, stop + 1, step))
  options = [str(i) for i in items]
  options.append(f"{stop}+")
  return widgets.Dropdown(options=options, description=label)

pricing_dropdown = create_range_dropdown(300000, 1200000, 100000, "Pricing:")

bedroom_dropdown = widgets.Dropdown(options=["No preference", "1", "2", "3", "4", "5", "5+"], description="Bedrooms:")
bathroom_dropdown = widgets.Dropdown(options=["No preference", "1", "2", "3", "4", "5", "5+"], description="Bathrooms:")

size_dropdown = create_range_dropdown(800, 2000, 200, "Size (sqft):")

notes_textarea = widgets.Textarea(
  value='',
  placeholder='Tell us anything else that matters to you — neighborhood vibes, home style, must-have amenities, or even dealbreakers!',
  description='Notes:',
  layout=widgets.Layout(width='500px', height='100px')
)

submit_button = widgets.Button(
  description='Submit',
  button_style='success',
)

output = widgets.Output()

In [18]:
# Create a function that reacts on submit button
from IPython.display import clear_output

def on_submit_clicked(modify_listing):
  if pricing_dropdown.value == "1200000+":
    pricing = 1200001
  else:
    pricing = int(pricing_dropdown.value)

  if bedroom_dropdown.value == "No preference":
    bedrooms = 0
  elif bedroom_dropdown.value == "5+":
    bedrooms = 6
  else:
    bedrooms = int(bedroom_dropdown.value)

  if bathroom_dropdown.value == "No preference":
    bathrooms = 0
  elif bathroom_dropdown.value == "5+":
    bathrooms = 6
  else:
    bathrooms = int(bathroom_dropdown.value)

  if size_dropdown.value == "2000+":
    size = 2001
  else:
    size = int(size_dropdown.value)

  best_matching_properties = find_best_matching_properties(
      query=notes_textarea.value,
      pricing=pricing,
      bedrooms=bedrooms,
      bathrooms=bathrooms,
      size=size
  )
  best_matching_property = find_best_matching_property(best_matching_properties)

  modified_listing = modify_listing(notes_textarea.value, best_matching_property)

  with output:
    clear_output()
    print(modified_listing)

In [22]:
# Create a function that takes the best matching property
# and returns a better description for the matching query

def modify_listing(query, best_matching_property):
  system_prompt = """
You are an experienced real estate agent with deep knowledge of property marketing and buyer psychology.
Your role is to personalize real estate listings to better appeal to specific buyers based on their unique preferences and needs.

When given a buyer query and a property listing (in JSON), modify the (property) **description** and the **neighborhood description**
to better reflect what the buyer is looking for, while staying truthful to the facts.
Highlight aspects of the listing that match the buyer's interests, downplay less relevant features,
and adjust the tone and emphasis to suit the buyer’s style and priorities.

- Focus on aligning features (e.g., layout, finishes, nearby amenities) with the buyer’s stated preferences.
- If the buyer mentions lifestyle goals (e.g., “quiet place to work from home,” “kid-friendly,” “walkable”), reflect these in both the property and neighborhood descriptions.
- Maintain a professional and engaging tone, as if writing the listing to personally attract the specific buyer.
- Do **not** fabricate or exaggerate property features that are not mentioned in the original listing.
- You are marketing the property through the lens of what matters most to the buyer.
  """

  prompt = f"""
**Buyer Query:**
{query}

**Original Listing (in JSON):**
{best_matching_property}

Please modify the (property) **description** and the **neighborhood description**
to better reflect what the buyer is looking for, while staying truthful to the facts.

Do return in Markdown! Not in a codeblock. Just in natural language in Markdown.
  """

  response = client.chat.completions.create(
    model=openai_model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ],
  )

  return response.choices[0].message.content

In [20]:
# Connect the button click
submit_button.on_click(lambda _: on_submit_clicked(modify_listing))

In [21]:
# Display the UI :)
from IPython.display import display

display(
    pricing_dropdown,
    bedroom_dropdown,
    bathroom_dropdown,
    size_dropdown,
    notes_textarea,
    submit_button,
    output
)

Dropdown(description='Pricing:', options=('300000', '400000', '500000', '600000', '700000', '800000', '900000'…

Dropdown(description='Bedrooms:', options=('No preference', '1', '2', '3', '4', '5', '5+'), value='No preferen…

Dropdown(description='Bathrooms:', options=('No preference', '1', '2', '3', '4', '5', '5+'), value='No prefere…

Dropdown(description='Size (sqft):', options=('800', '1000', '1200', '1400', '1600', '1800', '2000', '2000+'),…

Textarea(value='', description='Notes:', layout=Layout(height='100px', width='500px'), placeholder='Tell us an…

Button(button_style='success', description='Submit', style=ButtonStyle())

Output()