# Retrievers

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of Document's as output.

Retrievers are Runnable´s, it can be chained automatically then.

In [5]:
from langchain_community.chat_models.ollama import ChatOllama
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores.faiss import FAISS
from langchain_core.documents import Document
from langchain_experimental.text_splitter import SemanticChunker

#local_llm = ChatOlloma(model="phi3", base_url="http://127.0.0.1:11434", temperature=0)
local_llm = ChatOpenAI(model="gpt-4o", temperature=0)
embedding_function = OpenAIEmbeddings(model="text-embedding-3-large")
#embedding_function = OllamaEmbeddings(model="nomic-embed-text")

# FAISS is an in-memory vector store
def semantic_chunking(documents: list[Document], threshold=25) -> list[Document]:
    text_splitter = SemanticChunker(
    embeddings=embedding_function, 
    breakpoint_threshold_type="percentile", # "percentile" "standard_deviation", "interquartile",
    breakpoint_threshold_amount=threshold
    )
    return text_splitter.split_documents(documents)

def create_in_memory_vector_store(chunks:list[Document]) -> FAISS:
    # First ensure we have valid documents
    if not chunks:
        raise ValueError("No documents provided")
    
    # Convert documents to texts and metadatas
    texts = [doc.page_content for doc in chunks]
    metadatas = [doc.metadata for doc in chunks]
    
    # Create FAISS index from texts
    return FAISS.from_texts(
        texts=texts,
        embedding=embedding_function,
        metadatas=metadatas
    )    

def load_pdf(file_path:str) -> list[Document]:
    from langchain_community.document_loaders.pdf import PyPDFLoader
    loader_PDF = PyPDFLoader(file_path=file_path, extract_images=True)
    docsPdf = loader_PDF.load_and_split()
    return docsPdf

## The simple retriever

In [3]:
from rich import print
file_path = "../Resources/The Code Whisperer.pdf"
documents = load_pdf(file_path)
print(documents)

In [6]:
chunks = semantic_chunking(documents, threshold=20)
print(chunks)
#Expected execution time: 8s

In [7]:

vectorstore = create_in_memory_vector_store(chunks)

In [8]:
#Remember the similarity search ?
results = vectorstore.similarity_search_with_relevance_scores("Who are the founders?", k=3)
#Remember.. this has nothing to do with AI.. this is just a vector distance where the vectors are embedded
#vectorized similar sentences
print(results)



In [10]:
#Retrievers are exactly like similarity search.. but can be used in a chain.
from langchain_core.runnables import RunnablePassthrough # This just says: The parameter will be provided later
from langchain_core.output_parsers import StrOutputParser # gets rid of the AI Output boilerplate coming from the LLM
from langchain_core.prompts import (ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate)
                                    
prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template("""
    You are a helpful assistant that give an aswer only based on the provided contexts: 
    ============================================================================
    Contexts: \n -> {context}
    ============================================================================"""),
    HumanMessagePromptTemplate.from_template("Question: {question}")
]) #yes you don´t need to remember this syntax anymore: [{'role':'system','content':system}, {'role':'user','content...}

simple_retriever = vectorstore.as_retriever(
    search_function="similarity", # Can be "similarity" (default), "mmr", or "similarity_score_threshold"
    search_kwargs={"k":5})  #k=5 returns the 5 most relevant chunks in the vector database

def format_document(documents:list[Document])->str:
    return "\n -> ".join([f"{i}:{doc.page_content}" for i,doc in enumerate(documents)])

chain = (
    {"context":simple_retriever | format_document, "question": RunnablePassthrough()}
    | prompt 
    #| local_llm
    #| StrOutputParser()
)

result = chain.invoke("Who are the founders?") #there is only one RunnablePassTrough(), you can type the question directly
print(result)
print(result.dict()['messages'][0]['content'])

## MultiQuery Retriever

Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on “distance”. But, retrieval may produce different results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.

In [11]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multiquery_retriever = MultiQueryRetriever.from_llm(
    retriever=simple_retriever,
    llm=local_llm
)

import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

results = await multiquery_retriever.ainvoke("Who are the founders?")
print([f"\n - {i}: {doc.page_content}" for i,doc in enumerate(results)])

INFO:langchain.retrievers.multi_query:Generated queries: ['Who established the organization?  ', 'Who initiated the founding of the company?  ', 'Who were the original creators?']
  warn_deprecated(


## Filters

### LLMCHainFilter

Uses LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [12]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(
    local_llm
)  # but this call to LLM slows down the process
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=multiquery_retriever
)

results = compression_retriever.invoke("Who are the founders?")
print([f"\n - {i}: {doc.page_content}" for i, doc in enumerate(results)])

# Expected execution time 42s

INFO:langchain.retrievers.multi_query:Generated queries: ['Who established the organization?  ', 'Who initiated the founding of the company?  ', 'Who were the original creators?']


## Multiple strategies and Document Transforms together

Using the DocumentCompressorPipeline we can also easily combine multiple strategies in sequence. Along with different compressors and filters we can add DocumentTransformers to our pipeline, which don’t perform any contextual compression but simply perform some transformation on a set of documents. For example Text Splitters can be used as document transformers to split documents into smaller pieces, and the EmbeddingsRedundantFilter can be used to filter out redundant documents based on embedding similarity between documents.

Below we create a compressor pipeline by first splitting our docs into smaller chunks, then removing redundant documents, and then filtering based on relevance to the query.

In [13]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import (DocumentCompressorPipeline, EmbeddingsFilter, LLMChainFilter)
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_transformers import (EmbeddingsRedundantFilter, LongContextReorder)

# Split the chunks even more
splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0, separators=[". "])

#Remove the redundant documents by using an EmbeddingsRedundantFilter
redundant_filter = EmbeddingsRedundantFilter(embeddings=embedding_function)

# The EmbeddingsFilter # provides a cheaper and faster option by embedding the documents and 
# return only the ones with sufficiently similar embeddings to the query.
relevant_filter = EmbeddingsFilter(embeddings= embedding_function, similarity_threshold=0.2)

reorder_filter = LongContextReorder()

llm_filter = LLMChainFilter.from_llm(local_llm) #THIS ONE USES LLM to check similarity, it will slow the process

pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter, reorder_filter, llm_filter]
)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, base_retriever=multiquery_retriever
)

compressed_docs = compression_retriever.invoke(
    "Who are the founders?"
)

for i,doc in enumerate(compressed_docs):
    print(f"- {i}: {doc.page_content}")
    
# Expected execution time: 24s (without llm filter) 41s with (llm filter)

INFO:langchain.retrievers.multi_query:Generated queries: ['Who established the organization?  ', 'Who initiated the founding of the company?  ', 'Who were the original creators?']
