Using a rerank model in a Retrieval-Augmented Generation (RAG) solution can significantly improve the quality of the information provided to the language model. Here’s why it’s beneficial, particularly when reranking results from a vector database:
1. Improved Relevance

    Vector databases often return a list of documents using similarity search (e.g., cosine similarity in embeddings).

    While this is efficient, the similarity score may not perfectly reflect relevance to the query’s intent.

    A reranker uses a more sophisticated language model to assess the context, relevance, and alignment with the query, leading to a more accurate ranking.

2. Contextual Understanding

    Rerank models, especially those based on transformer architectures like Cross-Encoders (e.g., BERT-based models), evaluate the query and documents together.

    Unlike vector similarity that treats query and document embeddings independently, rerankers understand the relationship between them using deeper language understanding.

    This leads to better judgment of nuanced, contextual relevance.

3. Handling Ambiguity

    If the query is ambiguous, vector databases might return a broad set of results with varied relevance.

    A rerank model can prioritize results that are most likely to resolve the ambiguity by focusing on the query's intent.

4. Filtering Noise

    Vector search may introduce irrelevant documents due to issues like semantic drift or embedding inaccuracies.

    A reranker serves as a second layer of filtration, pushing low-quality results further down the list or removing them.

## Install all required packages

In [138]:
!pip install qdrant-client langchain langchain_community pypdf openai lmstudio sentence-transformers duckduckgo-search --quiet


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Setup models

In [384]:
import lmstudio as lms
# local model for embedding
embedding_model = lms.embedding_model("nomic-embed-text-v1.5")
# chat model
model = lms.llm()

from sentence_transformers import CrossEncoder
# rerank model
# rank_model = CrossEncoder("mixedbread-ai/mxbai-rerank-xsmall-v1")
rank_model = CrossEncoder("mixedbread-ai/mxbai-rerank-base-v1")

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/369M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.45k [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.65M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/970 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/49.5k [00:00<?, ?B/s]

## We have to chunk the documents

In [385]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_pdf(file_path, chunk_size=10000, overlap=500):
    loader = PyPDFLoader(file_path)
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=overlap)
    return splitter.split_documents(docs)

## Let's use a local qdrant vector store
For installation refer to:
https://hub.docker.com/r/qdrant/qdrant

In [386]:
from qdrant_client import QdrantClient, models
import uuid  # Add this import to generate UUIDs
from tqdm import tqdm  # Import tqdm for the progress bar

client = QdrantClient("localhost", port=6333)

def pdf2rag_store(pdf_file, collection_name=None, batch_size=100):
    if collection_name is None:
        collection_name = pdf_file

    # Generate the chunks from the PDF file
    chunks = chunk_pdf(pdf_file)
    points = []

    # Get the first embedding to determine the vector size
    first_chunk = chunks[0]
    first_embedding = embedding_model.embed(first_chunk.page_content)
    vector_size = len(first_embedding)  # Determine vector size from first embedding

    # Check if the collection exists, and create it if not
    try:
        client.get_collection(collection_name=collection_name)
        print(f"Collection '{collection_name}' already exists.")
    except Exception as e:
        print(f"Collection '{collection_name}' does not exist. Creating...")
        
        # Specify the vector configuration (e.g., vector size and distance metric)
        vectors_config = models.VectorParams(
            size=vector_size,
            distance=models.Distance.COSINE  # You can change this to another distance metric if needed
        )
        
        # Create the collection with the vector config
        client.create_collection(
            collection_name=collection_name,
            vectors_config=vectors_config
        )
        print(f"Collection '{collection_name}' created.")

    # Create points for each chunk with tqdm progress bar
    for i, chunk in tqdm(enumerate(chunks), total=len(chunks), desc="Processing Chunks"):
        # Get the embedding for the chunk
        # Optionally you can summarize the chunk with LLM and embedd the summary
        embedding = embedding_model.embed(chunk.page_content)

        # Use a UUID for the point ID instead of a string index
        point_id = str(uuid.uuid4())  # Generate a UUID for each point

        # Create a PointStruct and append it to the points list
        points.append(models.PointStruct(
            id=point_id,  # Use UUID as ID
            vector=embedding,
            payload={"text": chunk.page_content}
        ))

        # If we've reached the batch size, upsert and clear the points list
        if len(points) >= batch_size:
            client.upsert(
                collection_name=collection_name,
                points=points
            )
            print(f"Stored {len(points)} points in collection '{collection_name}'")
            points = []  # Clear the list for the next batch

    # Insert any remaining points if they exist
    if points:
        client.upsert(
            collection_name=collection_name,
            points=points
        )
        print(f"Stored {len(points)} remaining points in collection '{collection_name}'")



## Chunk and store pdf datasources

In [464]:
# https://bjpcjp.github.io/pdfs/devops/linux-commands-handbook.pdf
pdf_file = '~/Downloads/linux-commands-handbook.pdf'
pdf2rag_store(pdf_file, "linux-commands-handbook")

Collection 'linux-commands-handbook' does not exist. Creating...
Collection 'linux-commands-handbook' created.


Processing Chunks:  87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                  | 117/134 [00:01<00:00, 91.81it/s]

Stored 100 points in collection 'linux-commands-handbook'


Processing Chunks: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 134/134 [00:01<00:00, 100.44it/s]


Stored 34 remaining points in collection 'linux-commands-handbook'


In [465]:
# https://www.polygwalior.ac.in/file/20181115101103600592.pdf
pdf_file = '~/Downloads/dos_commands.pdf'
pdf2rag_store(pdf_file, "dos_commands")

Collection 'dos_commands' does not exist. Creating...
Collection 'dos_commands' created.


Processing Chunks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 91.15it/s]

Stored 3 remaining points in collection 'dos_commands'





In [343]:
# http://ufdcimages.uflib.ufl.edu/AA/00/01/16/99/00001/WorldHistory.pdf
pdf_file = '~/Downloads/WorldHistory.pdf'
pdf2rag_store(pdf_file, "world-history")

Collection 'world-history' does not exist. Creating...
Collection 'world-history' created.


Processing Chunks:  13%|██████████████████▉                                                                                                                          | 109/809 [00:02<00:15, 45.87it/s]

Stored 100 points in collection 'world-history'


Processing Chunks:  26%|████████████████████████████████████▍                                                                                                        | 209/809 [00:03<00:11, 51.60it/s]

Stored 100 points in collection 'world-history'


Processing Chunks:  38%|█████████████████████████████████████████████████████▏                                                                                       | 305/809 [00:05<00:10, 46.04it/s]

Stored 100 points in collection 'world-history'


Processing Chunks:  50%|██████████████████████████████████████████████████████████████████████▉                                                                      | 407/809 [00:07<00:08, 48.50it/s]

Stored 100 points in collection 'world-history'


Processing Chunks:  63%|████████████████████████████████████████████████████████████████████████████████████████▎                                                    | 507/809 [00:09<00:06, 46.12it/s]

Stored 100 points in collection 'world-history'


Processing Chunks:  75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                   | 608/809 [00:11<00:04, 40.27it/s]

Stored 100 points in collection 'world-history'


Processing Chunks:  88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                 | 708/809 [00:13<00:02, 40.55it/s]

Stored 100 points in collection 'world-history'


Processing Chunks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 809/809 [00:15<00:00, 52.31it/s]

Stored 100 points in collection 'world-history'
Stored 9 remaining points in collection 'world-history'





In [344]:
# https://www.uhd.edu/documents/provost/us-history.pdf
pdf_file = '~/Downloads/us-history.pdf'
pdf2rag_store(pdf_file, "us-history")

Collection 'us-history' does not exist. Creating...
Collection 'us-history' created.


Processing Chunks:  12%|████████████████▍                                                                                                                            | 113/973 [00:01<00:14, 58.75it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  21%|██████████████████████████████▎                                                                                                              | 209/973 [00:02<00:11, 67.26it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  32%|████████████████████████████████████████████▋                                                                                                | 308/973 [00:04<00:10, 61.72it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  42%|███████████████████████████████████████████████████████████▍                                                                                 | 410/973 [00:05<00:09, 60.33it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  53%|██████████████████████████████████████████████████████████████████████████▍                                                                  | 514/973 [00:07<00:07, 57.88it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  62%|████████████████████████████████████████████████████████████████████████████████████████                                                     | 608/973 [00:08<00:06, 59.23it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  73%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                      | 711/973 [00:10<00:04, 56.38it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  84%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                       | 813/973 [00:12<00:02, 58.97it/s]

Stored 100 points in collection 'us-history'


Processing Chunks:  93%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋         | 909/973 [00:13<00:01, 57.73it/s]

Stored 100 points in collection 'us-history'


Processing Chunks: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 973/973 [00:14<00:00, 67.75it/s]


Stored 73 remaining points in collection 'us-history'


In [256]:
def search(collection_name, query_text, top_k=5):
    # Perform a search for similar vectors
    search_result = client.search(
        collection_name=collection_name,
        query_vector=embedding_model.embed(query_text),
        limit=top_k
    )
    return search_result

In [478]:
from duckduckgo_search import DDGS

def web_search(query, top_k=10):
    results = []
    try:
        with DDGS() as ddgs:
            search_results = ddgs.text(query, max_results=top_k)
            results = [
                f">>>>SOURCE WEB<<<: {r['title']} - {r['href']}\n{r['body']}\n\n"
                for r in search_results
            ]
    except Exception as e:
        print(f"Web search failed: {e}")
    return results

In [482]:
def rag(query, top_k=40, rerank_top_k=10):
    collections = client.get_collections()
    text_list = [
        f">>>>SOURCE QDRANT/{c.name}<<<: {i.payload['text']}\n\n"
        for c in collections.collections
        for i in search(c.name, query, top_k=top_k)
    ]
    # Perform a web search
    web_results = web_search(query, top_k=top_k)
    print(f'Web results: {len(web_results)}')

    # Combine results
    text_list.extend(web_results)
    print(f'Total text_list: {len(text_list)}')
    
    rerank_results = rank_model.rank(query, text_list, return_documents=True, top_k=rerank_top_k)
    concatenated_text = "\n".join(i['text'] for i in rerank_results)
    print(f'context: {len(rerank_results)}')
    prompt = f"""
    {query}
    
    Provide a clear and well-structured answer using **Markdown formatting**.
    Only use the provided context
    
    ### Context:
    {concatenated_text}
    """

    # print(prompt)
    return model.respond(prompt), concatenated_text, text_list


In [483]:
from IPython.display import display, Markdown

def rag_formatted(query):
    res, ctx, text_list = rag(query)
    display(Markdown(res.content))
    return ctx, text_list
# Display as markdown
# display(Markdown(res.content))


In [498]:
ctx, text_list = rag_formatted("When was Hunagy founded?")

  search_result = client.search(


Web results: 40
Total text_list: 163
context: 10


Hungary was founded in 896.

The Kingdom of Hungary was formed in the 10th century under King Stephen I, also known as Saint Stephen, who accepted Catholicism as its official religion and established a feudal system.

In the 14th century, Charles Robert of Anjou was brought in to rule Hungary and introduced French and Italian ideas, leading to industrial immigration from Germany, Flanders, and Italy. The country became a Western state with a systematic fiscal policy based on gold production.

Louis I, also known as Louis the Great, expanded Hungarian territory and assumed the Polish throne in 1370. He allied with Genoa and had a long struggle with Venice, which ended in the Peace of 1381.

After Louis' death, his daughter Maria became queen, but her marriage to Sigismund of Luxemburg was soon challenged by Charles of Durazzo, who became king but was assassinated in 1385. This led to a Croatian revolt and Sigismund's return to power.

However, Sigismund was often absent from the country and there was a decline in royal power. In 1396, Hungary suffered a disastrous loss in battle with the Turks, Dalmatia was taken by Venice again, and the Hussite invasions from Bohemia led to further instability.

In the 19th century, Hungary's chief demand was for wider use of the Magyar language in administration, courts, and education. Lexicographers re-fashioned and enriched the native tongue, leading to a cult of language that also emphasized national costumes, dances, and other aspects of Hungarian culture.

Count Istvan Szechenyi was a key figure in promoting Hungarian nationalism and reform. He advocated for Magyarism and reform under the blessings of the emperor, but his contemporary and eventual opponent, Lajos Kossuth, wanted complete liberty from Austria.

In 1840, the government passed a law making Magyar the official language of all institutions in Great Hungary, giving the Croats six years to conform. The reaction was strong among non-Magyars, including Saxons, Slavs, and Romanians in Transylvania.

The Revolution of 1848 led to Kossuth's reform of Hungary as a limited monarchy subject to the Austrian monarch but almost free internally. However, the Serbs and Croats rebelled against Hungary under Jellacic, who initially had Austrian help but later lost it as further revolt occurred in Austria.

By 1867, Hungary recovered her integrity under Deak through the Ausgleich (compromise) of 1867, which gave Hungarians equal status with the German-speaking population. This began the "Age of Dualism" and marked the beginning of "Austria-Hungary."

Magyarization followed, with changes in individual names and places to conform with the Magyar language.

By 1900, Hungary had over a million workers in mining and industry, two universities, and several colleges of law, theology, and other disciplines. The country was led by figures such as Lorand Eotvds, a physicist, and Semmelweiss, a physician who worked in Vienna.

In [499]:
# text_list

In [500]:
ctx

'>>>>SOURCE WEB<<<: When was Hungary founded? - Answers - https://www.answers.com/travel-destinations/When_was_Hungary_founded\nHungary was founded in 896.\n\n\n>>>>SOURCE WEB<<<: History of Hungary | Embassy of Hungary Washington - gov.hu - https://washington.mfa.gov.hu/eng/page/history-of-hungary\nHUNGARY\'S HISTORY IN A NUTSHELL. This nation has more than a thousand years of history, full of great events, kings, battles, allies, enemies, intrigue and sometimes, peaceful years. ... In 1000, King Stephen I (St. Stephen) founded the state of Hungary, and accepted the Catholic religion as its standard. Stephen was crowned with the Holy Crown ...\n\n\n>>>>SOURCE WEB<<<: Hungary - New World Encyclopedia - https://www.newworldencyclopedia.org/entry/Hungary\nWestern Hungary (Pannonia) was initially tributary to the Franks, but in 839 the Slavic Balaton Principality was founded in southwestern Hungary, and in 883/884 the whole of western Hungary was conquered by Great Moravia. Origin of the 

In [501]:
ctx, pre_ctx = rag_formatted("What does umask command do with no arguments?")

  search_result = client.search(


Web results: 40
Total text_list: 163
context: 10


When executed with no arguments, the `umask` command displays the current file mode creation mask of the shell execution environment in octal form. The output will include the default permissions that will be applied to newly created files and directories. This value can be modified by specifying a new permission mask as an argument to the `umask` command.

In [502]:
ctx

'>>>>SOURCE WEB<<<: What is Umask in Linux and how to use it effectively? - https://www.rosehosting.com/blog/what-is-umask-in-linux/\nThe bits in the umask command can be changed by invoking the umask command. The syntax of the umask command is the following one: umask [OPTION]... [MODE] Executing this command without arguments or options will return the current value. Let\'s implement it: umask. You should get output with bits like this: root@host:~# umask 0022\n\n\n>>>>SOURCE WEB<<<: umask Cheat Sheet - umask Command Line Guide - https://www.commandinline.com/cheat-sheet/umask/\nThe umask command sets a mask that restricts these default permissions. Basic Syntax: umask [MASK] [MASK]: The permission mask to apply (as an octal value). Without any arguments, umask displays the current mask. How umask Works. Permissions for files: Files cannot have execute permissions by default.\n\n\n>>>>SOURCE WEB<<<: What is umask command for? - Unix & Linux Stack Exchange - https://unix.stackexchange

In [490]:
pre_ctx

['>>>>SOURCE QDRANT/world-history<<<: vii\n18.7 The Far East: A.D. 301 to 400 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278\n18.8 The Paciﬁc: A.D. 301 to 400 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 279\n18.9 America: A.D. 301 to 400 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281\nSolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .??\n19 A.D. 401 to 500\n19.1 A.D. 401 to 500 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283\n19.2 Africa: A.D. 401 to 500 . . . . . . . . . . . . . . . . . . .