In [1]:
import ollama
ollama.list()

{'models': [{'name': 'llama3.1:latest',
   'model': 'llama3.1:latest',
   'modified_at': '2024-11-18T14:33:25.305661201+01:00',
   'size': 4661230766,
   'digest': '42182419e9508c30c4b1fe55015f06b65f4ca4b9e28a744be55008d21998a093',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'llama',
    'families': ['llama'],
    'parameter_size': '8.0B',
    'quantization_level': 'Q4_0'}}]}

In [2]:
from langchain_community.document_loaders import WebBaseLoader
import os

os.environ['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [3]:
loader_multiple_pages = WebBaseLoader(["https://www.home0001.com/how-it-works", "https://www.home0001.com/legal"])
docs = loader_multiple_pages.load()
print(docs[0].page_content[:128])

Learn how to own your home and live anywhere | Home0001HOME0001MenuHOME0001MenuHomes:0001: Lower East SideStudioStudio Max1 Bedr


In [4]:
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# set up the splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# split the docs
splits = text_splitter.split_documents(docs)
# create a vector database with the splits
vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=OllamaEmbeddings(model="llama3.1"),
    # persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

print(len(splits))
print(len(splits[0].page_content))
print(splits[0].metadata)

2
903
{'source': 'https://www.home0001.com/how-it-works', 'title': 'Learn how to own your home and live anywhere | Home0001', 'description': 'Flexible Living Fully Furnished Homes For Sale | Home0001', 'language': 'en'}


In [5]:
# Retrieve and generate using the relevant snippets of the site.
# retriever = vectorstore.as_retriever()

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke("What is home0001?")

print(len(retrieved_docs))
print(retrieved_docs[0].page_content)

Number of requested results 6 is greater than number of elements in index 2, updating n_results = 2


2
Learn how to own your home and live anywhere | Home0001HOME0001MenuHOME0001MenuHomes:0001: Lower East SideStudioStudio Max1 Bedroom0001: Bed-Stuy1 Bedroom2 Bedroom0001: Echo Park TownhousesTownhouse Type ATownhouse Type B0001: Echo Park BungalowsBungalow 1Bungalow 20001: Peckham1 BEDROOM0001: Hackney2 BEDROOM0001: Schöneberg1 BEDROOM0001 GDSHow It WorksContact UsLegalPrices:Fiat CryptoHomes:0001: Lower East SideStudioStudio Max1 Bedroom0001: Bed-Stuy1 Bedroom2 Bedroom0001: Echo Park TownhousesTownhouse Type ATownhouse Type B0001: Echo Park BungalowsBungalow 1Bungalow 20001: Peckham1 BEDROOM0001: Hackney2 BEDROOM0001: Schöneberg1 BEDROOM© 2024 HOME0001New york · los angeles · berlin · london · paris · mexico city0001 GDShow it workscontact usInstagramlegalCookies Settings© 2024 HOME0001NY DRE #10351211814CA DRE #01427385NY DRE #10351211814CA DRE #02236922WE USE COOKIES.Accept/Reject/Settings


In [6]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")



In [None]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model = "llama3.1",
    temperature = 0.8,
    num_predict = 256,
    # other params ...
)

In [8]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# print(format_docs(docs))

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("What is Home0001?"):
    print(chunk, end="", flush=True)

print("/n")
print(rag_chain.invoke("can i rent an apartment?"))

Number of requested results 6 is greater than number of elements in index 2, updating n_results = 2


Home0001 appears to be a real estate website or platform, likely based in multiple cities including New York, Los Angeles, Berlin, London, Paris, and Mexico City. It offers various home listings, such as studios, one-bedroom, two-bedroom homes, townhouses, and bungalows. The website also provides information on how it works, contact details, and legal notices./n


Number of requested results 6 is greater than number of elements in index 2, updating n_results = 2


I don't know. The context provided seems to be about buying or owning a home, rather than renting an apartment. There is no information about renting apartments in this context.
