In [7]:
!pip install langchain-openai




[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
!pip install langchain-text-splitters

Collecting langchain-text-splitters
  Downloading langchain_text_splitters-1.0.0-py3-none-any.whl.metadata (2.6 kB)
Downloading langchain_text_splitters-1.0.0-py3-none-any.whl (33 kB)
Installing collected packages: langchain-text-splitters
Successfully installed langchain-text-splitters-1.0.0



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
!pip install langchain-community

In [11]:
# All Import statements in one cell

import os
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
import pandas as pd
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
import random
from langchain_community.vectorstores import LanceDB
from langchain_core.prompts import ChatPromptTemplate
import lancedb
import shutil

ModuleNotFoundError: No module named 'langchain_community'

# SETTING UP THE ENVIRONMENT

In [4]:
# Initializing the API Base and Key

os.environ["OPENAI_API_BASE"] = "https://openai.vocareum.com/v1"
os.environ["OPENAI_API_KEY"] = "voc-176713170812667743680626901c4a6b06bd0.19164315"

In [73]:
# Choosing the LLM Model and Temperature

MODEL_NAME = "gpt-3.5-turbo"
MODEL_TEMPERATURE = 0.0

In [74]:
# Initializing the LLM

llm = ChatOpenAI(model=MODEL_NAME, temperature=MODEL_TEMPERATURE)

In [75]:
# Verify if the LLM has been set up correctly and responding to queries.

verify = llm.invoke("Hello!")
print(verify.content)

Hello! How can I assist you today?


# CREATING THE CSV DATA USING OUR LLM

In [76]:
# Example Instruction and Sample

INSTRUCTION = "Generate a CSV file with at least 10 real estate listing."
SAMPLE_LISTING = \
"""
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
"""

In [77]:
# Creating the RealEstate class

class RealEstate(BaseModel):
    neighborhood: str = Field(description="The neighborhood in which the property is located.")
    price: NonNegativeInt = Field(description="The price of the property.")
    bedrooms: NonNegativeInt = Field(description="The number of bedrooms in the property. Should be a whole number.")
    bathrooms: NonNegativeInt = Field(description="The number of bathrooms in the property. Should be a whole number.")
    house_size: NonNegativeInt = Field(description="The size of the property in square feet. Should be a whole number.")
    description: str = Field(description="A detailed, evocative description of the property.")
    neighborhood_description: str = Field(description="A detailed, evocative description of the neighborhood in which the property is located.")

In [78]:
# Creating the Listings class

class Listings(BaseModel):
    listings: List[RealEstate] = Field(description="A list containing Real Estate details.")

In [79]:
# Setting up the parser

parser = PydanticOutputParser(pydantic_object=Listings)

In [80]:
# Setting up the prompt

prompt = PromptTemplate(
    template="{instruction}\n{sample}\n{format_instructions}\n",
    input_variables=["instruction", "sample"],
    partial_variables={"format_instructions": parser.get_format_instructions}
)

In [81]:
# Validate the prompt
print(prompt)

input_variables=['instruction', 'sample'] input_types={} partial_variables={'format_instructions': <bound method PydanticOutputParser.get_format_instructions of PydanticOutputParser(pydantic_object=<class '__main__.Listings'>)>} template='{instruction}\n{sample}\n{format_instructions}\n'


In [82]:
# Create the query

query = prompt.format(instruction=INSTRUCTION, sample=SAMPLE_LISTING)

In [83]:
# Validate the query
print(query)

Generate a CSV file with at least 10 real estate listing.

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bik

In [84]:
# Get the LLM response
llm_response = llm.invoke(query)

In [85]:
# Print the LLM Response
print(llm_response.content)

{
  "listings": [
    {
      "neighborhood": "Green Oaks",
      "price": 800000,
      "bedrooms": 3,
      "bathrooms": 2,
      "house_size": 2000,
      "description": "Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.",
      "neighborhood_description": "Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public 

In [86]:
# Create the DataFrame
result = parser.parse(llm_response.content)
listings = result.listings
df = pd.DataFrame([listing.model_dump() for listing in listings])

In [87]:
# Verify the created DataFrame
df.head()

Unnamed: 0,neighborhood,price,bedrooms,bathrooms,house_size,description,neighborhood_description
0,Green Oaks,800000,3,2,2000,Welcome to this eco-friendly oasis nestled in ...,"Green Oaks is a close-knit, environmentally-co..."
1,Sunnyvale,950000,4,3,2500,Located in the desirable neighborhood of Sunny...,"Sunnyvale is known for its top-rated schools, ..."
2,Downtown Los Angeles,1200000,2,2,1800,Experience luxury living in the heart of Downt...,Downtown Los Angeles is a vibrant neighborhood...
3,Brooklyn Heights,1500000,5,4,3000,"Step into this elegant 5-bedroom, 4-bathroom t...",Brooklyn Heights is a charming neighborhood wi...
4,Pacific Palisades,3500000,6,5,5000,Live the California dream in this luxurious 6-...,Pacific Palisades is a prestigious neighborhoo...


In [88]:
# Save the DataFrame to a CSV
df.to_csv("Real_Estates.csv", index_label="id")

# CREATING THE VECTOR DATABASE

In [89]:
# Defining paths and embedding functions

LANCEDB_PATH = "content/lancedb-1"
CSV_PATH = "Real_Estates.csv"
TABLE_NAME = "real_estates"

embedding_function = OpenAIEmbeddings()

In [90]:
# Reading the CSV data
try:
    df = pd.read_csv(CSV_PATH)
except FileNotFoundError as e:
    print(f"Caught a FileNotFoundError: {e}")

In [91]:
# Creating documents from the CSV
try:
    documents = []
    for index, row in df.iterrows():
        documents.append(Document(page_content=row["description"], metadata={"id": str(index)}))
    print(f"Successfully Created {len(documents)} documents.")
except Exception as ex:
    print(f"Caught an unexpected error:\n{ex}")

Successfully Created 10 documents.


In [92]:
# Initialize the text splitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True
)

In [93]:
# Create the chunks
chunks = text_splitter.split_documents(documents)
print(f"{len(chunks)} chunks successfully created from {len(documents)} documents.")

21 chunks successfully created from 10 documents.


In [94]:
# Verify a chunk
if chunks:
    random_index = random.randrange(0, len(chunks))
    document = chunks[random_index]
    print(f"Page Content: {document.page_content}")
    print(f"Metadata: {document.metadata}")

Page Content: appliances and granite countertops, perfect for entertaining guests. Relax in the luxurious master suite with a spa-like bathroom and walk-in closet. Enjoy the California sunshine in the private backyard oasis with a pool and patio area.
Metadata: {'id': '1', 'start_index': 197}


In [97]:
# Populate the Vector Database (LanceDB)
try:
    if os.path.exists(LANCEDB_PATH):
        shutil.rmtree(LANCEDB_PATH)
    print("Populating LanceDB...")
    db = LanceDB.from_documents(
        documents=chunks,
        embedding=embedding_function,
        uri=LANCEDB_PATH,
        table_name=TABLE_NAME
    )
    print(f"Successfully populated LanceDB at {LANCEDB_PATH} with {len(chunks)} chunks in table {TABLE_NAME}")
except Exception as ex:
    print(f"Caught an unexpected exception:\n{ex}")

Populating LanceDB...
Successfully populated LanceDB at content/lancedb-1 with 21 chunks in table real_estates


# FETCHING DATA FROM THE VECTOR DATABASE

In [98]:
query_text = "A comfortable three-bedroom house with a spacious kitchen and a cozy living room."
    
BASIC_PROMPT_TEMPLATE = \
"""
Based on the following context:
    
{context}
    
    
-----
    
Answer the question : {question}
"""

In [99]:
# Creating the function to predict responses.

def predict_response(query, template):
    try:
        db_connection = lancedb.connect(LANCEDB_PATH)
    except Exception as ex:
        print(f"Error establishing connection to LanceDB at {LANCEDB_PATH}. Did you run the ingestion script correctly?")
        print(f"Details:\n{ex}")
        return
    db = LanceDB(
        connection=db_connection,
        table_name=TABLE_NAME,
        embedding=embedding_function
    )

    results = db.similarity_search_with_relevance_scores(query, k=3)

    if len(results) == 0 or results[0][1] < 0.7:
        print("Unable to find relevant matches for your query")
        return

    context_text = "\n\n----\n\n".join([doc.page_content for doc, _ in results])
    prompt_template = ChatPromptTemplate.from_template(template)
    prompt = prompt_template.format(context=context_text, question=query)
    print(f"Generated Prompt:\n{prompt}")

    llm_response = llm.invoke(prompt).content
    sources = [doc.metadata.get("id", None) for doc, _ in results]
    return f"Response: {llm_response}\nSources: {sources}"

In [100]:
response = predict_response(query=query_text, template=BASIC_PROMPT_TEMPLATE)
print(response)

Generated Prompt:
Human: 
Based on the following context:

and a decorative fireplace. The renovated kitchen boasts high-end appliances and custom cabinetry, perfect for the home chef. Relax in the private backyard garden or entertain guests in the formal dining room. Live in luxury in this Brooklyn Heights gem.

----

Step into this luxurious 4-bedroom, 3-bathroom loft in the trendy neighborhood of Tribeca. The open floor plan features exposed brick walls, high ceilings, and oversized windows with city views. The gourmet kitchen boasts top-of-the-line appliances and a breakfast bar, perfect for entertaining

----

floors, exposed brick walls, and a gourmet kitchen. The master suite offers a spa-like bathroom and a private balcony overlooking the landscaped courtyard. Entertain guests in the formal dining room or relax in the cozy living room with a fireplace.


-----

Answer the question : A comfortable three-bedroom house with a spacious kitchen and a cozy living room.

Response: Bas

In [101]:
AUGMENT_PROMPT_TEMPLATE =\
"""
Based on the following context:

{context}

---

craft a response that not only answers the question {question}, but also ensures that your explanation is distinct, captivating, and customized to align with the specified preferences. This involves subtly emphasizing aspects of the property that align with what the buyer is looking for.
"""

In [106]:
augmented_response = predict_response(query=query_text, template=AUGMENT_PROMPT_TEMPLATE)
print(augmented_response)

Generated Prompt:
Human: 
Based on the following context:

and a decorative fireplace. The renovated kitchen boasts high-end appliances and custom cabinetry, perfect for the home chef. Relax in the private backyard garden or entertain guests in the formal dining room. Live in luxury in this Brooklyn Heights gem.

----

Step into this luxurious 4-bedroom, 3-bathroom loft in the trendy neighborhood of Tribeca. The open floor plan features exposed brick walls, high ceilings, and oversized windows with city views. The gourmet kitchen boasts top-of-the-line appliances and a breakfast bar, perfect for entertaining

----

floors, exposed brick walls, and a gourmet kitchen. The master suite offers a spa-like bathroom and a private balcony overlooking the landscaped courtyard. Entertain guests in the formal dining room or relax in the cozy living room with a fireplace.

---

craft a response that not only answers the question A comfortable three-bedroom house with a spacious kitchen and a cozy 