### Reranking
Rerankers are powerful models for scoring relevance.
Pointwise/pairwise/listwise are different strategies to train those models to learn good rankings.

**Techniques**
- Pointwise reranking
- Pairwise reranking
- Listwise reranking

**Rerankers**
- Cross encoder ([BGE](https://github.com/FlagOpen/FlagEmbedding))
- ColBERT
- LLM
- Format the documents and query into a QA pair, then [logits-processor-zoo](https://github.com/NVIDIA/logits-processor-zoo)
- ...

![rerankers](img/rerankers.png)

**Reference:**
- https://towardsdatascience.com/ranking-basics-pointwise-pairwise-listwise-cd5318f86e1b/
- https://medium.com/@adnanmasood/re-ranking-mechanisms-in-retrieval-augmented-generation-pipelines-an-overview-8e24303ee789
- https://galileo.ai/blog/mastering-rag-how-to-select-a-reranking-model

**Rerankers and embedding models can all be trained and finetuned on specific dataset to achieve better results.** Mostly common used is BGE

- Reranker training: https://huggingface.co/blog/train-reranker
- Embedding model training: https://huggingface.co/blog/train-sentence-transformers

### Kaggle competition reference: 
- https://www.kaggle.com/competitions/eedi-mining-misconceptions-in-mathematics
- First place solution [here](https://www.kaggle.com/competitions/eedi-mining-misconceptions-in-mathematics/discussion/551688)

In [3]:
import os
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

import torch
import numpy as np

In [4]:
os.environ['SENTENCE_TRANSFORMERS_HOME'] = '/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/huggingface-models'
cache_dir = '/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/huggingface-models'

In [5]:
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", cache_dir=cache_dir)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B", cache_dir=cache_dir)

In [6]:
# Document loading and chunking
def load_and_split_documents(file_path):
    """Load PDF document and split into chunks"""
    loader = PyPDFLoader(file_path)
    documents = loader.load()
    
    # Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    chunks = text_splitter.split_documents(documents)
    return chunks

chunks = load_and_split_documents("/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/data/boardgames_rulebooks/DnD_LOW_Rulebook_EN.pdf")

In [7]:
# Initialize BGE embeddings model
embeddings = HuggingFaceEmbeddings(
    model_name='BAAI/bge-small-en-v1.5', 
    model_kwargs={'device': 'cpu'},
    show_progress=True
)

In [8]:
# Create vector store from documents
db = FAISS.load_local('/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/rag-optimizations/faiss_rerank', embeddings, allow_dangerous_deserialization=True)
# db = FAISS.from_documents(chunks, embeddings)
# db.save_local('/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/rag-optimizations/faiss_rerank')

In [9]:
# Implement pointwise reranking with Qwen3-Reranker
class Qwen3Reranker:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(self.device)
        
        # Set up tokens for scoring
        self.token_false_id = tokenizer.convert_tokens_to_ids("no")
        self.token_true_id = tokenizer.convert_tokens_to_ids("yes")
        
        # Set up prefix and suffix tokens for the reranker
        self.prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
        self.suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
        self.prefix_tokens = tokenizer.encode(self.prefix, add_special_tokens=False)
        self.suffix_tokens = tokenizer.encode(self.suffix, add_special_tokens=False)
        self.max_length = 8192  # Maximum sequence length
        
        # Default instruction for reranking
        self.default_instruction = 'Given a web search query, retrieve relevant passages that answer the query'
        
    def format_instruction(self, instruction, query, doc_text):
        """Format the input for the reranker with instruction, query and document"""
        if instruction is None:
            instruction = self.default_instruction
        
        output = f"<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc_text}"
        return output
        
    def process_inputs(self, pairs):
        """Process and tokenize inputs for the reranker"""
        inputs = self.tokenizer(
            pairs, padding=False, truncation='longest_first',
            return_attention_mask=False, 
            max_length=self.max_length - len(self.prefix_tokens) - len(self.suffix_tokens)
        )
        
        # Add prefix and suffix tokens to each input
        for i, ele in enumerate(inputs['input_ids']):
            inputs['input_ids'][i] = self.prefix_tokens + ele + self.suffix_tokens
            
        # Pad the inputs
        inputs = self.tokenizer.pad(inputs, padding=True, return_tensors="pt", max_length=self.max_length)
        
        # Move to device
        for key in inputs:
            inputs[key] = inputs[key].to(self.device)
            
        return inputs
    
    def compute_logits(self, inputs):
        """Compute logits and extract scores"""
        with torch.no_grad():
            # Get the raw logits from the model's final token position
            # [:, -1, :] means: all batch items, last token position, all vocabulary tokens
            batch_scores = self.model(**inputs).logits[:, -1, :]
            true_vector = batch_scores[:, self.token_true_id]
            false_vector = batch_scores[:, self.token_false_id]
            batch_scores = torch.stack([false_vector, true_vector], dim=1)
            batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)
            # Extract the exponentiated log probability for the "yes" class (index 1)
            scores = batch_scores[:, 1].exp().tolist()  # Probability of "yes"
            
        return scores
        
    def rerank(self, query, documents, top_k=5, instruction=None):
        """
        Rerank documents based on relevance to query
        
        Args:
            query: User query string
            documents: List of document objects with page_content attribute
            top_k: Number of top documents to return
            instruction: Custom instruction for the reranker
            
        Returns:
            List of reranked documents (top_k)
        """
        if not documents:
            return []
        
        # Format inputs for reranker
        pairs = [self.format_instruction(instruction, query, doc.page_content) for doc in documents]
        
        # Process and tokenize inputs
        inputs = self.process_inputs(pairs)
        
        # Get reranker scores
        scores = self.compute_logits(inputs)
        
        # Sort documents by score
        reranked_indices = np.argsort(scores)[::-1][:top_k].tolist()
        
        # Return reranked documents
        return [documents[i] for i in reranked_indices]

# Initialize reranker
reranker = Qwen3Reranker(model, tokenizer)

In [10]:
# Initialize LLM for response generation
model = ChatOpenAI(model='/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/huggingface-models/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775', base_url='http://0.0.0.0:8000/v1', api_key='n')

In [11]:
prompt_template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Keep the answer as concise as possible.

Context: {context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

In [12]:
query = "How to Win in DnD?"

In [13]:
vector_docs = db.similarity_search(query, k=10)
reranked_docs = reranker.rerank(query, vector_docs, top_k=5)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [14]:
context = "\n\n".join([doc.page_content for doc in reranked_docs])

rag_chain = (prompt | model | StrOutputParser())
res = rag_chain.invoke({"context": reranked_docs, "question": query})
print(res)

To win in Dungeons & Dragons, you'll need to follow these steps:

1. Spend some time setting up the game.
2. Before you can start playing, you'll need to spend a bit of time setting up the game board.
3. After you've set everything up, you'll need to spend a bit of time making sure you have enough resources to play.
4. Once you have the right supplies, you can begin the game.
5. During the game, focus on getting rid of opponents' cards and trying to get them to reveal their identities.
6. Keep track of your victories and defeats, and use them to decide who gets to play next.
7. If you're still tied after several rounds, you may need to consider changing roles or finding other ways to win.
8. Remember, winning doesn't mean being the strongest; it means having a good strategy and staying focused on the game.


In [15]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

In [16]:
pretty_print_docs(reranked_docs)

Document 1:

Keep this card face down—your identity is a secret to your  
opponents.
Return the remaining Lord cards to the box, face down. They 
won’t be used for the rest of the game.
Whenever a player must draw a card from an empty deck, shuffle 
all the cards in the respective discard pile to form a new deck and 
place them face down in the appropriate space of the game board. 
Do not shuffle any completed Quest cards back into the Quest deck.
QuesT CaRds
As a Lord of Waterdeep, you advance your interests by completing 
Quests, represented by Quest cards. See “Complete Quest” on 
page 10 for more about acquiring and completing Quests.
Shuffle the Quest cards and deal 2 cards face up to each player. 
These cards form each player’s active Quests.
Next, place 1 face-up Quest card in each of the 4 spaces of  
Cliffwatch Inn. The rest of the face-down Quest cards form the 
Quest deck. Place the deck in the labeled space on the game 
board near Cliffwatch Inn.
---------------------------

In [17]:
# evaluate and compare the results
def compare_retrieval_methods(vectorstore, reranker, query, top_k=5):
    """Compare standard retrieval vs. retrieval with reranking"""
    # Standard retrieval
    standard_docs = vectorstore.similarity_search(query, k=top_k)
    
    # Retrieval with reranking (get more docs initially, then rerank)
    initial_docs = vectorstore.similarity_search(query, k=top_k*2)
    reranked_docs = reranker.rerank(query, initial_docs, top_k=top_k)
    
    # Display results
    print(f"Query: {query}\n")
    
    print("="*80)
    print("STANDARD RETRIEVAL RESULTS:")
    print("="*80)
    for i, doc in enumerate(standard_docs):
        print(f"\nDocument {i+1}:")
        print(f"Page: {doc.metadata.get('page', 'N/A')}")
        print(f"Text: {doc.page_content[:200]}...")
    
    print("\n\n" + "="*80)
    print("RERANKED RESULTS:")
    print("="*80)
    for i, doc in enumerate(reranked_docs):
        print(f"\nDocument {i+1}:")
        print(f"Page: {doc.metadata.get('page', 'N/A')}")
        print(f"Text: {doc.page_content[:200]}...")
    
    # Calculate overlap
    standard_ids = [doc.metadata.get('page', i) for i, doc in enumerate(standard_docs)]
    reranked_ids = [doc.metadata.get('page', i) for i, doc in enumerate(reranked_docs)]
    overlap = set(standard_ids).intersection(set(reranked_ids))
    
    print("\n\n" + "="*80)
    print(f"Overlap between methods: {len(overlap)}/{top_k} documents")
    print("="*80)
    
    return {
        "standard_docs": standard_docs,
        "reranked_docs": reranked_docs
    }

In [18]:
compare_retrieval_methods(db, reranker, query, top_k=5)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]



Query: How to Win in DnD?

STANDARD RETRIEVAL RESULTS:

Document 1:
Page: 12
Text: seTup
Lay out the game board.
Each player chooses a color and takes a number of that 
color’s Agents that depends on the number of players.
 N umber of Players  A gents per Player
 
2
 4
 
3
 3
 
4
 2...

Document 2:
Page: 1
Text: 2 3
Table of ConTenTs
Introduction 3
Ho
w to Win
 4
Setup
  4
 
 Game Boar
d
 4
 
 
Agents
 4
 
 Sc
ore Markers
 5
 
 Pla
yer Mats
 5
 
 
Adventurers
 6
 
 Other Piec
es
 6
 
 
Buildings
 6
 
 
Cards
...

Document 3:
Page: 0
Text: Rulebook
BOARD GAME TM
AGE 12+...

Document 4:
Page: 2
Text: Knights of the Shield City Guard
Silverstars Harpers Red Sashes
Ambassador Lieutenant
Victory Points Back
Silverstars
100 Victory Points
Victory Points Back
Harpers
Victory Points Back
Knights of the ...

Document 5:
Page: 12
Text: Play proceeds to the next player until all Agents have been 
assigned.
Reassign Agent: After all Agents have been assigned in  
the round, each player with an Age

{'standard_docs': [Document(id='2e719027-8977-4b99-91e6-fe920fafa086', metadata={'producer': 'Adobe PDF Library 9.9', 'creator': 'Adobe InDesign CS5.5 (7.5.2)', 'creationdate': '2012-02-20T09:42:23-08:00', 'moddate': '2012-02-20T11:43:06-08:00', 'trapped': '/False', 'source': '/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/data/boardgames_rulebooks/DnD_LOW_Rulebook_EN.pdf', 'total_pages': 13, 'page': 12, 'page_label': '13'}, page_content='seTup\nLay out the game board.\nEach player chooses a color and takes a number of that \ncolor’s Agents that depends on the number of players.\n N umber of Players  A gents per Player\n \n2\n 4\n \n3\n 3\n \n4\n 2\n \n5\n 2\nEach player also places 1 more Agent of his or her color near \nthe Round 5 space of the rounds track.\nEach player places his or her score marker on the scoring \ntrack at the position labeled “0.”\nPlace the Adventurer cubes and Gold within easy reach of \nall players.\nPlace the Building stack, Quest deck, and Intrigue