**Sometimes your existing search is not enough or do not priorties the right chunk/context, specially when the chunks are long orders. *FlashRank ReRanker*
 provides an effective mechanism to re-rank your existing search and prioritize the relevant context**


 This scripts demonstrate the same.

 Courtesy: Langchain

# FlashRank Reranker

FlashRank is the Ultra-lite & Super-fast Python library to add re-ranking to your existing search & retrieval pipelines.

In [12]:
!pip install --upgrade --quiet beautifulsoup4 langchain langchain_community langchain_core langchain_groq langchain_text_splitters faiss-cpu sentence_transformers flashrank

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/245.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m235.5/245.3 kB[0m [31m10.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
from google.colab import userdata
import os

groq_key = userdata.get('groq-key')
smith_key = userdata.get('smith-key')
pine_key = userdata.get('pine-key')
openai_key = userdata.get('openai-key')

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_PROJECT'] = 'rag-query-routing'
os.environ['LANGCHAIN_API_KEY'] = smith_key
os.environ['GROQ_API_KEY'] = groq_key

In [10]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_groq import ChatGroq
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain.load import dumps, loads
from langchain_core.documents import Document
from typing import Literal
from pprint import pprint
import uuid

In [15]:
# Helper function for printing docs


def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [
                f"Document {i+1}:\n\n{d.page_content}\nMetadata: {d.metadata}"
                for i, d in enumerate(docs)
            ]
        )
    )

Lets create a base vector Retriever and store a large document with chunks

In [7]:
# I have added a document of Hamlet as text file.
!ls

hamlet.txt  sample_data


In [8]:
documents = TextLoader("hamlet.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
for idx, text in enumerate(texts):
    text.metadata["id"] = idx

In [13]:
model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}
hf_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [14]:
retriever = FAISS.from_documents(texts, hf_embeddings).as_retriever(search_kwargs={"k": 10})

In [16]:
query = "Who were the two sentinels in the first scene?"
docs = retriever.invoke(query)
pretty_print_docs(docs)

Document 1:

Getrude, Queen of Denmark, mother to Hamlet.
  Ophelia, daughter to Polonius.

  Ghost of Hamlet's Father.

  Lords, ladies, Officers, Soldiers, Sailors, Messengers, Attendants.





SCENE.- Elsinore.


ACT I. Scene I.
Elsinore. A platform before the Castle.

Enter two Sentinels-[first,] Francisco, [who paces up and down
at his post; then] Bernardo, [who approaches him].
Metadata: {'source': 'hamlet.txt', 'id': 3}
----------------------------------------------------------------------------------------------------
Document 2:

Scene II.
Elsinore. A hall in the Castle.

Enter Hamlet and Horatio.
Metadata: {'source': 'hamlet.txt', 'id': 494}
----------------------------------------------------------------------------------------------------
Document 3:

Scene V.
Elsinore. The Castle. Another part of the fortifications.

Enter Ghost and Hamlet.
Metadata: {'source': 'hamlet.txt', 'id': 97}
-----------------------------------------------------------------------------------------

# Lets apply FlashRank Reranker

In [17]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank

In [18]:
llm = ChatGroq()

compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

ms-marco-MultiBERT-L-12.zip: 100%|██████████| 98.7M/98.7M [00:00<00:00, 131MiB/s]


In [19]:
compressed_docs = compression_retriever.invoke(
    "Who were the two sentinels in the first scene?"
)
print([doc.metadata["id"] for doc in compressed_docs])

[364, 3, 388]


In [20]:
pretty_print_docs(compressed_docs)

Document 1:

Scene II.
Elsinore. A passage in the Castle.

Enter Hamlet.

  Ham. Safely stow'd.
  Gentlemen. (within) Hamlet! Lord Hamlet!
  Ham. But soft! What noise? Who calls on Hamlet? O, here they come.

               Enter Rosencrantz and Guildenstern.
Metadata: {'id': 364, 'relevance_score': 0.99873847, 'source': 'hamlet.txt'}
----------------------------------------------------------------------------------------------------
Document 2:

Getrude, Queen of Denmark, mother to Hamlet.
  Ophelia, daughter to Polonius.

  Ghost of Hamlet's Father.

  Lords, ladies, Officers, Soldiers, Sailors, Messengers, Attendants.





SCENE.- Elsinore.


ACT I. Scene I.
Elsinore. A platform before the Castle.

Enter two Sentinels-[first,] Francisco, [who paces up and down
at his post; then] Bernardo, [who approaches him].
Metadata: {'id': 3, 'relevance_score': 0.99686563, 'source': 'hamlet.txt'}
----------------------------------------------------------------------------------------------------

Question-Answering style usage of FlashRank

In [21]:
from langchain.chains import RetrievalQA

chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)

chain.invoke("Who were the two sentinels in the first scene?")

{'query': 'Who were the two sentinels in the first scene?',
 'result': 'The two sentinels in the first scene are Francisco and Bernardo.'}