## Getting started with housing Chatbot


In [None]:
%pip install chromadb
%pip install tiktoken

In [None]:
%pip show chromadb

### Create chroma db client 


In [3]:
import chromadb
from chromadb.config import Settings


client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet",
                                    persist_directory="db/"
                                ))

ValueError: [91mYou are using a deprecated configuration of Chroma.

[94mIf you do not have data you wish to migrate, you only need to change how you construct
your Chroma client. Please see the "New Clients" section of https://docs.trychroma.com/deployment/migration.
________________________________________________________________________________________________

If you do have data you wish to migrate, we have a migration tool you can use in order to
migrate your data to the new Chroma architecture.
Please `pip install chroma-migrate` and run `chroma-migrate` to migrate your data and then
change how you construct your Chroma client.

See https://docs.trychroma.com/deployment/migration for more information or join our discord at https://discord.gg/MMeYNTmh3x for help![0m

In [3]:
%pip install langchain




[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip





In [45]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.vectorstores import Chroma
from langchain .text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import (
    HuggingFaceInferenceAPIEmbeddings,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import os

  from .autonotebook import tqdm as notebook_tqdm


#### Load PDFS

In [9]:
%pip install pypdf

Collecting pypdfNote: you may need to restart the kernel to use updated packages.

  Downloading pypdf-5.2.0-py3-none-any.whl.metadata (7.2 kB)
Downloading pypdf-5.2.0-py3-none-any.whl (298 kB)
Installing collected packages: pypdf
Successfully installed pypdf-5.2.0



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [18]:
loader = PyPDFDirectoryLoader('pdfs')

In [19]:
data=loader.load()

In [20]:
data[0]

Document(metadata={'source': 'pdfs\\Angel_House_Listing.pdf', 'page': 0, 'page_label': '1'}, page_content='Angel House Listing Platform\nAngel House Listing is a comprehensive housing platform designed to connect property owners with\npotential tenants seamlessly. The platform, deployed on angelhouslistingwebsite.vercel.app,\nprovides an intuitive interface for property listings, bookings, and user profile management. It offers\na professional and secure environment for users to browse available houses, manage bookings,\nand update their profiles with ease.\nPlatform Features:\n1. User-Friendly Interface: A well-designed, intuitive layout for easy navigation with filtering and\nsearch functionalities.\n2. Comprehensive House Listings: Dedicated page (/houses) for browsing available properties with\ndetailed descriptions.\n3. Seamless Booking System: Users can book houses and manage reservations through a\nstep-by-step process.\n4. Profile & Account Management: Users can update profile 

## Text Splitting 

In [38]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=900,chunk_overlap=20)
text_chunks=text_splitter.split_documents(data)

In [39]:
text_chunks

[Document(metadata={'source': 'pdfs\\Angel_House_Listing.pdf', 'page': 0, 'page_label': '1'}, page_content='Angel House Listing Platform\nAngel House Listing is a comprehensive housing platform designed to connect property owners with\npotential tenants seamlessly. The platform, deployed on angelhouslistingwebsite.vercel.app,\nprovides an intuitive interface for property listings, bookings, and user profile management. It offers\na professional and secure environment for users to browse available houses, manage bookings,\nand update their profiles with ease.\nPlatform Features:\n1. User-Friendly Interface: A well-designed, intuitive layout for easy navigation with filtering and\nsearch functionalities.\n2. Comprehensive House Listings: Dedicated page (/houses) for browsing available properties with\ndetailed descriptions.\n3. Seamless Booking System: Users can book houses and manage reservations through a\nstep-by-step process.'),
 Document(metadata={'source': 'pdfs\\Angel_House_Listin

In [40]:
len(text_chunks)

814

In [42]:
print(text_chunks[102].page_content)

c02.qxp  2/26/04  10:44 AM  Page 28
28 MINDSET + KNOWLEDGE = WEALTH 
Neighborhood Changes 
For the past two decades, many close-in neighborhoods with obsolete 
factories and warehouses have experienced a renaissance as real estate 
investors bought cheap, old industrial buildings and converted them to 
loft apartment buildings. New York City’s Soho District, San Francisco’s 
South of Market (SOMA) area, and Chicago’s near north neighborhood 
stand out as prime examples of this ongoing trend. 
changing. 
Watch for 
neighborhoods 
where uses are 
Likewise, as commercial areas sometimes en -
croach into residential neighborhoods, smart prop -
erty investors have successfully sought zoning 
changes and then converted large houses into pro -
fessional ofﬁces aimed at those ever-expanding le -
gions of lawyers, accountants, dentists, real estate 
brokers, and insurance agents.


### Loading my Environment Variables

In [73]:
from dotenv import load_dotenv
load_dotenv()
gemini_api_key=os.getenv('GEMINI_API_KEY')
huggingface_api_key=os.getenv('HUGGINGFACE_API_KEY')

In [74]:
huggingface_api_key

'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'

### My embender

In [75]:
hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=huggingface_api_key,
    model_name="sentence-transformers/all-MiniLM-l6-v2"
)

### Create my Chroma Db

In [36]:
persist_directory='db'

In [43]:
text_chunks

[Document(metadata={'source': 'pdfs\\Angel_House_Listing.pdf', 'page': 0, 'page_label': '1'}, page_content='Angel House Listing Platform\nAngel House Listing is a comprehensive housing platform designed to connect property owners with\npotential tenants seamlessly. The platform, deployed on angelhouslistingwebsite.vercel.app,\nprovides an intuitive interface for property listings, bookings, and user profile management. It offers\na professional and secure environment for users to browse available houses, manage bookings,\nand update their profiles with ease.\nPlatform Features:\n1. User-Friendly Interface: A well-designed, intuitive layout for easy navigation with filtering and\nsearch functionalities.\n2. Comprehensive House Listings: Dedicated page (/houses) for browsing available properties with\ndetailed descriptions.\n3. Seamless Booking System: Users can book houses and manage reservations through a\nstep-by-step process.'),
 Document(metadata={'source': 'pdfs\\Angel_House_Listin

In [46]:
vectordb=Chroma.from_documents(documents=text_chunks,
                               embedding=hf_embeddings,
                               persist_directory=persist_directory,
                               )

In [47]:
# Since Chroma 0.4.x the manual persistence method is no longer supported as docs are automatically persisted.
#   vectordb.persist()
# vectordb.persist()

  vectordb.persist()


In [50]:
vectordb=Chroma(persist_directory=persist_directory,embedding_function=hf_embeddings)

In [51]:
vectordb

<langchain_community.vectorstores.chroma.Chroma at 0x14a7ced67e0>

### Make Retriever

In [52]:
retriever=vectordb.as_retriever()

In [57]:
# The get_relevant_documents has depreciated and being replaced by invoke from langchain
# docs=retriever.get_relevant_documents("What is the website about?")
docs=retriever.invoke("What is the website about?")

In [59]:
docs[0]

Document(metadata={'page': 0, 'page_label': '1', 'source': 'pdfs\\Angel_House_Listing.pdf'}, page_content='Angel House Listing Platform\nAngel House Listing is a comprehensive housing platform designed to connect property owners with\npotential tenants seamlessly. The platform, deployed on angelhouslistingwebsite.vercel.app,\nprovides an intuitive interface for property listings, bookings, and user profile management. It offers\na professional and secure environment for users to browse available houses, manage bookings,\nand update their profiles with ease.\nPlatform Features:\n1. User-Friendly Interface: A well-designed, intuitive layout for easy navigation with filtering and\nsearch functionalities.\n2. Comprehensive House Listings: Dedicated page (/houses) for browsing available properties with\ndetailed descriptions.\n3. Seamless Booking System: Users can book houses and manage reservations through a\nstep-by-step process.')

In [63]:
retriever=vectordb.as_retriever(search_kwargs={"k":2})

In [66]:
docs2=retriever.invoke("What is the website about?")

In [67]:
docs2

[Document(metadata={'page': 0, 'page_label': '1', 'source': 'pdfs\\Angel_House_Listing.pdf'}, page_content='Angel House Listing Platform\nAngel House Listing is a comprehensive housing platform designed to connect property owners with\npotential tenants seamlessly. The platform, deployed on angelhouslistingwebsite.vercel.app,\nprovides an intuitive interface for property listings, bookings, and user profile management. It offers\na professional and secure environment for users to browse available houses, manage bookings,\nand update their profiles with ease.\nPlatform Features:\n1. User-Friendly Interface: A well-designed, intuitive layout for easy navigation with filtering and\nsearch functionalities.\n2. Comprehensive House Listings: Dedicated page (/houses) for browsing available properties with\ndetailed descriptions.\n3. Seamless Booking System: Users can book houses and manage reservations through a\nstep-by-step process.'),
 Document(metadata={'page': 2, 'page_label': '3', 'sour

## Make a chain


In [85]:
from langchain.chains import RetrievalQA

In [71]:
gemini_api_key

'AIzaSyAUJuGqy5yKx9stk6sCDu9hPzEVEoId6FU'

In [72]:
llm=ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    api_key=gemini_api_key,
    temperature=0.5
)

### Create A chain

In [83]:
qa_chain=RetrievalQA.from_chain_type(llm=llm,
                                     chain_type='stuff',
                                     retriever=retriever,
                                     return_source_documents=True)

In [80]:
def process_llm_response(llm_response):
    print(llm_response['result'])
    print('\n\nSource:')
    for source in llm_response["source_documents"]:
        print(source.metadata['source'])

In [86]:
query='What is real estate?'
llm_response=qa_chain(query)
process_llm_response(llm_response=llm_response)

Based on the provided text, real estate refers to land and buildings,  and investments involving them (such as REITs, home builders, and mortgage lenders).


Source:
pdfs\Guide to REAL ESTATE - PDF Room.pdf
pdfs\Guide to REAL ESTATE - PDF Room.pdf


### Performing Embedding


In [76]:


hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=huggingface_api_key,
    model_name="sentence-transformers/all-MiniLM-l6-v2"
)
texts = ["Hello, world!"]
embeddings=hf_embeddings.embed_documents(texts)

In [77]:
len(embeddings)

1

In [78]:
embeddings

[[-0.038177166134119034,
  0.03291111811995506,
  -0.005459434352815151,
  0.014369940385222435,
  -0.04029102623462677,
  -0.11653245240449905,
  0.031687721610069275,
  0.0019117096671834588,
  -0.04262232407927513,
  0.029168102890253067,
  0.04242667928338051,
  0.03204173222184181,
  0.029844751581549644,
  0.010980352759361267,
  -0.05393955856561661,
  -0.05027727037668228,
  -0.02350788004696369,
  0.010779389180243015,
  -0.13770794868469238,
  0.0041149696335196495,
  0.029333120211958885,
  0.06684111803770065,
  -0.015389388427138329,
  0.04843766614794731,
  -0.08814975619316101,
  -0.012726838700473309,
  0.041409023106098175,
  0.040831558406353,
  -0.05015585198998451,
  -0.058125026524066925,
  0.048801567405462265,
  0.06889018416404724,
  0.05874694511294365,
  0.008730964735150337,
  -0.015918219462037086,
  0.08514193445444107,
  -0.07814742624759674,
  -0.0775168165564537,
  0.020723750814795494,
  0.016194269061088562,
  0.03251056373119354,
  -0.0534888766705989