## Hybird Search & Rerank

Reference:
* [Advanced RAG Implementation using Hybrid Search and Reranking](https://medium.com/@nadikapoudel16/advanced-rag-implementation-using-hybrid-search-reranking-with-zephyr-alpha-llm-4340b55fef22)

In [1]:
# pip install -U faiss-cpu rank_bm25
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_ollama import OllamaEmbeddings
from langchain.vectorstores import FAISS

loader = PyMuPDFLoader(
    file_path = "./docs/2005.11401.pdf",
    mode = "page",
    extract_tables = "markdown"
)
documents = loader.load()

recursive_text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
recursive_chunks = recursive_text_splitter.split_documents(documents)

vectorstore = FAISS.from_documents(
    documents = recursive_chunks,
    embedding = OllamaEmbeddings(model="bge-m3"),
)

In [2]:
query = "what's key concepts of RAG"

In [3]:
# https://python.langchain.com/docs/how_to/ensemble_retriever
from langchain.retrievers import BM25Retriever, EnsembleRetriever

vectorstore_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
keyword_retriever = BM25Retriever.from_documents(recursive_chunks)
keyword_retriever.k =  5
ensemble_retriever = EnsembleRetriever(
    retrievers=[vectorstore_retriever, keyword_retriever],
    weights=[0.5, 0.5]
)

relevance_docs = ensemble_retriever.invoke(query)

for d in relevance_docs:
    print(d.page_content)

in 71% of cases, and a gold article is present in the top 10 retrieved articles in 90% of cases.
4.5
Additional Results
Generation Diversity
Section 4.3 shows that RAG models are more factual and speciﬁc than
BART for Jeopardy question generation. Following recent work on diversity-promoting decoding
[33, 59, 39], we also investigate generation diversity by calculating the ratio of distinct ngrams to
total ngrams generated by different models. Table 5 shows that RAG-Sequence’s generations are
more diverse than RAG-Token’s, and both are signiﬁcantly more diverse than BART without needing
any diversity-promoting decoding.
Retrieval Ablations
A key feature of RAG is learning to retrieve relevant information for the task.
To assess the effectiveness of the retrieval mechanism, we run ablations where we freeze the retriever
during training. As shown in Table 6, learned retrieval improves results for all tasks.
two variants: the standard 3-way classiﬁcation task (supports/refutes/not enough 

In [4]:
# https://python.langchain.com/docs/integrations/document_transformers/cross_encoder_reranker/
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

rerank_model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-v2-m3")
compressor = CrossEncoderReranker(model = rerank_model, top_n = 3)
compression_retriever = ContextualCompressionRetriever(
    base_compressor = compressor, 
    base_retriever = ensemble_retriever
)

compressed_docs = compression_retriever.invoke(query)

for d in compressed_docs:
    print(d.page_content)

in 71% of cases, and a gold article is present in the top 10 retrieved articles in 90% of cases.
4.5
Additional Results
Generation Diversity
Section 4.3 shows that RAG models are more factual and speciﬁc than
BART for Jeopardy question generation. Following recent work on diversity-promoting decoding
[33, 59, 39], we also investigate generation diversity by calculating the ratio of distinct ngrams to
total ngrams generated by different models. Table 5 shows that RAG-Sequence’s generations are
more diverse than RAG-Token’s, and both are signiﬁcantly more diverse than BART without needing
any diversity-promoting decoding.
Retrieval Ablations
A key feature of RAG is learning to retrieve relevant information for the task.
To assess the effectiveness of the retrieval mechanism, we run ablations where we freeze the retriever
during training. As shown in Table 6, learned retrieval improves results for all tasks.
decisions and updating their world knowledge remain open research problems. Pre-
