# nd608 - Project Personalized Real Estate Agent

## Generate Synthetic Real Estate Listings

The purpose of this document is to generate synthetic real estate listings using OpenAI's generative AI APIs. We'll also create a [LanceDB](https://lancedb.com/) attaching embeddings to the generated content.

In [None]:
# Load environment variables from a .env file. Alternatively you can
# manually set the value of OPENAI_API_KEY on this cell.

from io import BytesIO
from os import environ
from pathlib import Path

try:
    from dotenv import load_dotenv
    load_dotenv()
except ModuleNotFoundError:
    pass

if "OPENAI_API_KEY" not in environ:
    environ["OPENAI_API_KEY"] = "your-openai-api-key"

In [None]:
import pickle

from textwrap import dedent

import openai
import requests

from IPython.display import display, Markdown
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from langchain.text_splitter import CharacterTextSplitter
from PIL import Image
from pydantic import BaseModel, Field, NonNegativeFloat, NonNegativeInt
from transformers import CLIPProcessor, CLIPModel

We'll use [LangChain](https://www.langchain.com/)'s `PromptTemplate`, `PydanticOutputParser` to generate the synthetic real estate listings in a structured format to make it easier to store the information on a table. We'll use the format suggested on the project's instruction:

```
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
```

In [None]:
class RealEstateListing(BaseModel):
    neighborhood: str = Field(description="Name of the neighborhod")
    price: NonNegativeInt = Field(description="List price of the property")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms")
    bathrooms: NonNegativeFloat | NonNegativeInt = Field(description="Number of bathrooms")
    description: str = Field(description="Brief description of the property")
    neighborhood_description: str = Field(description="Brief description of the neighborhood")


class RealEstateListingWithImage(RealEstateListing):
    image: bytes | None = Field(description="Contents of the generated image", default=None)
    image_filename: str = Field(description="Filename of the generated image", default=None)


class RealEstateListings(BaseModel):
    listings: list[RealEstateListing]

In [None]:
parser = PydanticOutputParser(pydantic_object=RealEstateListings)
print(parser.get_format_instructions())

In [None]:
prompt = PromptTemplate(
    template=dedent("""\
        You are a writer and a real estate expert with extensive
        knowledge of the terminolgy and a capable of writing lengthy,
        easy to read and factual descriptions of properties.

        Generate {num_listings} listings of imaginary real estate
        properties. The description of the property should include detailed
        mentions of the property's features like the number of bedrooms and
        bathrooms. The description of the property should describe the exterior.
        The description of the property should contain at 2 sentences.
        Include both upper-middle class and lower income neighborhoods.
    """) + "\n{format_instructions}",
    input_variables=["request"],
    partial_variables={
        "format_instructions": parser.get_format_instructions
    },
)

In [None]:
print(prompt.format(num_listings=15))

We'll use OpenAI's `gpt-4-turbo` model as it has higher chances of following the instructions.

In [None]:
llm = ChatOpenAI(
    model_name="gpt-4-turbo",
    temperature=0.2,  # Sacrificing reproducibility to give the model some leeway
    max_tokens=4000
)

In [None]:
model_response = llm.invoke(prompt.format(num_listings=15))

In [None]:
parsed_model_response = parser.parse(model_response.content)
parsed_model_response.listings

Let's save the generated real estate listings to avoid hitting the model multiple times.

In [None]:
data_dir = Path("data")
data_dir.mkdir(exist_ok=True)

In [None]:
with open(data_dir / "listings.pickle", "wb") as f:
    pickle.dump(parsed_model_response.listings, f)

In [None]:
with open(data_dir / "listings.pickle", "rb") as f:
    listings = pickle.load(f)

## Generate Images for the Synthetic Real Estate Listings

We want to increase the usability of our recommendation app, so we'll use OpenAI's DALL-e. We're adding specific hints to the prompt to generate photorealistic images.

In [None]:
images_dir = Path("images")
images_dir.mkdir(exist_ok=True)

In [None]:
client = openai.OpenAI()

In [None]:
listings_with_image = []

for i, listing in enumerate(listings):
    display(Markdown(f"Generating image for listing with description: _'{listing.description}'_...."))

    dalle2_response = client.images.generate(
        model="dall-e-2",
        prompt=f"Photo of {listing.description}. 1/100s, ISO 100, Daylight.",
        size="512x512",
        quality="standard",
        n=1,
    )

    image_filename = f"listing_{i}.jpg"

    with open(images_dir / image_filename, "wb") as f:
        response = requests.get(dalle2_response.data[0].url)
        response.raise_for_status()
        f.write(response.content)

    listings_with_image.append(
        RealEstateListingWithImage(
            **listing.model_dump(),
            image=response.content,
            image_filename=image_filename
        )
    )

    image = Image.open(BytesIO(response.content))

    display(image)


We save the results one more time, to avoid hitting the model multiple times.

In [None]:
with open(data_dir / "listings_with_image.pickle", "wb") as f:
    pickle.dump(listings_with_image, f)

In [None]:
with open(data_dir / "listings_with_image.pickle", "rb") as f:
    listings_with_image = pickle.load(f)

## Generate Embeddings for Listings/Images

We're going to use [HuggingFace's CLIP](https://huggingface.co/docs/transformers/model_doc/clip) models to generate embeddings for the listing and image combination.

In [None]:
clip_model = "openai/clip-vit-large-patch14"

model = CLIPModel.from_pretrained(clip_model)
processor = CLIPProcessor.from_pretrained(clip_model)