### Dependencies
`langchain_community`  
`langchain-huggingface`  
`langchain-openai`  



## Super Basic Embedding Example

#### Instantiate

In [116]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    dimensions=1024  # size of the embeddings you want returned.
)

#### Embed single query

In [97]:
test_query = embeddings.embed_query("HOME0001 is a peer-to-peer housing collective.")

In [None]:
print(type(test_query))
print(len(test_query))
print(test_query[:3])

#### Embed list of texts

In [105]:
raw_documents = [
        "0001 HOMES ARE FULLY EQUIPPED, PART OF A GLOBAL NETWORK, AND UNIQUELY SIMPLE TO BUY AND OWN.",
        "WE WORK WITH WORLD RENOWNED ARCHITECTS TO DESIGN FULLY FURNISHED HOMES THAT ARE READY FROM DAY ONE.",
        "EACH 0001 HOME IS PART OF OUR GLOBAL PEER-TO-PEER HOUSING COLLECTIVE.",
        "0001 MEMBERS HELP SHAPE OUR COLLECTIVE AND CAN STAY FOR FREE IN ANY OF OUR LOCATIONS AROUND THE WORLD.",
        "WE’VE REINVENTED THE HOME BUYING EXPERIENCE SO YOU CAN PURCHASE OUR HOMES SECURELY, ONLINE, IN MINUTES."
    ]


In [106]:
embedded_docs = embeddings.embed_documents(raw_documents)

In [None]:
print(len(embedded_docs), len(embedded_docs[0])) 

#### Create simple Vector Database

In [110]:
# Create Document Objects
from langchain.schema import Document
prepped_documents = [Document(page_content=text) for text in raw_documents]

In [111]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore.from_documents(prepped_documents, OpenAIEmbeddings())

#### Do a similarity search with a query

In [None]:
query = "is furniture included?"

results = vector_store.similarity_search(query=query, k=1)

for result in results:
    print(f"* {result.page_content}")

## Slightly advanced example with LLM integration

### Load data from a URL  
https://python.langchain.com/docs/integrations/document_loaders/web_base/

DocumentLoaders are objects that load in data from a source and return a list of Documents.  
A Document is an object with some page_content (str) and metadata (dict).  
https://python.langchain.com/docs/how_to/#document-loaders

In [None]:
from langchain_community.document_loaders import WebBaseLoader
import os

os.environ['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'

#### Load a single page  
NOTE: our website currently doesn't seem to work well w/ the WebBaseLoaders

In [None]:
loader = WebBaseLoader("https://www.home0001.com/how-it-works")
doc = loader.load()
print(doc[0].page_content[:128])

#### Load multiple pages

In [None]:
loader_multiple_pages = WebBaseLoader(["https://www.home0001.com/how-it-works", "https://www.home0001.com/legal"])
docs = loader_multiple_pages.load()
print(docs[1].page_content[:128])

### Pre-process data


#### Chunk, split and store the data

-> it's important to figure out the right chunk size later on

We use RecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size.  
This is the recommended text splitter for generic text use cases.

We set add_start_index=True so that the character index where each split Document starts within the initial Document is preserved as metadata attribute “start_index”.  

Next we need to index our text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# set up the splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# split the docs
splits = text_splitter.split_documents(docs)

print(len(splits))
print(len(splits[1].page_content))
print(splits[1].metadata)

### Create a Vector database

In this example we use `OpenAIEmbeddings` and a `Chroma` database

In [8]:
# create a vector database with the splits
vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
    # persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

### Retrieve

A Retriever is an interface that returns relevant Documents from an index based on a string query.  

A vector store retriever is a retriever that uses a vector store to retrieve documents.  

Any VectorStore can easily be turned into a Retriever with `VectorStore.as_retriever()`

In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke("What is home0001?")

print(len(retrieved_docs))
print(retrieved_docs[0].page_content[:128])

other retrieval techniques include:  
- MultiQueryRetriever generates variants of the input question to improve retrieval hit rate.
- MultiVectorRetriever instead generates variants of the embeddings, also in order to improve retrieval hit rate.
- Maximal marginal relevance selects for relevance and diversity among the retrieved documents to avoid passing in duplicate context.
- Documents can be filtered during vector store retrieval using metadata filters, such as with a Self Query Retriever.

### Generate 

In [None]:
from langchain import hub

# use default prompt template
prompt = hub.pull("rlm/rag-prompt")

example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()

# print(example_messages)
print(example_messages[0].content)

In [11]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# print(format_docs(docs))

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("What is Home0001?"):
    print(chunk, end="", flush=True)

print('\n')
print(rag_chain.invoke("can i rent an apartment?"))

In [14]:
# cleanup
vectorstore.delete_collection()

## More detailed examples

#### Load documents with FireCrawl

In [38]:
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()
fc_api_key = os.getenv('FIRECRAWL_API_KEY')

from langchain_community.document_loaders.firecrawl import FireCrawlLoader

loader = FireCrawlLoader(
    api_key=fc_api_key, url="https://www.home0001.com/", mode="crawl"
)

pages = loader.load()

In [None]:
print(pages[0].page_content[:128])

#### Embedding Models

There are various other embedding models to choose from

In [None]:
# pip install "gpt4all[cuda]"
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_ollama import OllamaEmbeddings


hf_embd = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
gpt4all_embd = GPT4AllEmbeddings()
# does this even make sense? 
# ollama embeddings are huge, slow and not great
ollama_embd = OllamaEmbeddings(model="llama3.1")

BGE models on the HuggingFace are one of the best open-source embedding models. BGE model is created by the Beijing Academy of Artificial Intelligence (BAAI)

In [119]:
from langchain_community.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cuda"}
encode_kwargs = {"normalize_embeddings": True}
bge_embd = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

Note that you need to pass query_instruction="" for model_name="BAAI/bge-m3".

#### Text Splitting

Set up various text splitters

In [42]:
from langchain_text_splitters import CharacterTextSplitter

char_text_splitter = CharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=0
    )

rec_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=200, 
    separators=[" ", ",", "\n"]
    )

Instead of default or recursive, we can also use a Semantic Chunker with one of our Embedding Models.  

https://python.langchain.com/docs/how_to/semantic-chunker/  

The default way to split is based on `percentile`. In this method, all differences between sentences are calculated, and then any difference greater than the X percentile is split.  
there are also options like `standard_deviation`, `interquartile` or `gradient`

In [None]:
from langchain_experimental.text_splitter import SemanticChunker

hf_sem_text_splitter = SemanticChunker(hf_embd, breakpoint_threshold_type="percentile")
llama_sem_text_splitter = SemanticChunker(ollama_embd)

In [None]:
char_splits = char_text_splitter.split_documents(pages)
rec_splits = rec_text_splitter.split_documents(pages)
hf_sem_splits = hf_sem_text_splitter.split_documents(pages)
llama_sem_splits = llama_sem_text_splitter.split_documents(pages)

In [None]:
print(len(char_splits), len(char_splits[0].page_content))
print(len(rec_splits), len(rec_splits[0].page_content))
print(len(hf_sem_splits), len(hf_sem_splits[0].page_content))
print(len(llama_sem_splits),len(llama_sem_splits[0].page_content))

#### Create Vector Databases from the splits

In [63]:
# pip install -qU langchain_community faiss-cpu
from langchain_community.vectorstores import FAISS

faiss_vectorstore_hf = FAISS.from_documents(hf_sem_splits, hf_embd)

In [70]:
from langchain_community.vectorstores.utils import filter_complex_metadata

# Chroma is being a bit picky about metadata formats, this should solve it
cleaned_splits = filter_complex_metadata(llama_sem_splits)

chroma_vectorstore_llama = Chroma.from_documents(
    cleaned_splits, 
    hf_embd
    )


BTW When using FAISS it's possible to merge -> look at multi-bot example in rag-bots


`db1 = FAISS.from_texts(["foo"], embeddings)`  
`db2 = FAISS.from_texts(["bar"], embeddings)`   
`db1.merge_from(db2)`

ALSO possible to use several retrievers:
https://python.langchain.com/docs/how_to/ensemble_retriever/

#### Invoke retrievers

In [None]:
faiss_hf_retriever = faiss_vectorstore_hf.as_retriever()

chroma_llama_retriever = chroma_vectorstore_llama.as_retriever()

In [None]:
test_query = "Where is home0001 available?"

print(faiss_hf_retriever.invoke(test_query))
print(chroma_llama_retriever.invoke(test_query))

By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal relevance search (`mmr`), you can specify that as the search type.  

We can also set a similarity `score_threshold` and only return documents with a score above that threshold, as well as top `k` documents returned by the retriever.  

`MultiQueryRetriever` generates variants of the input question to improve retrieval hit rate.  
`MultiVectorRetriever` instead generates variants of the embeddings, also in order to improve retrieval hit rate.  
`Maximal marginal relevance` selects for relevance and diversity among the retrieved documents to avoid passing in duplicate context.

In [None]:
# there are various ways of creating the retriever:
sim_retriever = chroma_vectorstore_llama.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5, "k": 1}
)
mmr_retriever = chroma_vectorstore_llama.as_retriever(search_type="mmr")

#### Create prompt templates

In [None]:
# simple version
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

In [None]:
# slightly more verbose
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [None]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model = "llama3.1",
    temperature = 0.8,
    num_predict = 256,
    # other params ...
)

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": faiss_hf_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
print(rag_chain.invoke("How do I book a 0001 home somewhere else? "))