# üîÑ RAG Re-Ranking Techniques

**Improving Retrieval Quality with Re-Ranking Models**

This notebook demonstrates how re-ranking models can improve the quality of retrieved documents by scoring and re-ordering them based on semantic relevance.

## üéØ What You'll Learn

1. **Re-Ranking Fundamentals**: Understanding how re-ranking improves retrieval
2. **Jina Re-Ranker**: Using multilingual re-ranking models
3. **Scoring Documents**: Computing relevance scores for query-document pairs
4. **Top-N Selection**: Selecting the most relevant documents after re-ranking

---

## üìä Project Overview

**Goal:** Implement a re-ranking system to improve document retrieval quality.

**Key Concepts:**
- üéØ **Initial Retrieval**: First-pass document retrieval
- üìä **Re-Ranking**: Second-pass relevance scoring
- üèÜ **Top-N Selection**: Selecting best documents
- ‚ö° **Performance**: Balancing quality and speed

---

## ‚öôÔ∏è Section 1: Setup and Model Initialization


In [1]:
import os
import warnings
import torch
warnings.simplefilter('ignore')
from transformers import AutoModelForSequenceClassification

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = AutoModelForSequenceClassification.from_pretrained(
    'jinaai/jina-reranker-v2-base-multilingual',
    torch_dtype="auto",
    trust_remote_code=True,
)

model.to(device) # or 'cpu' if no GPU is available
model.eval()

config.json: 0.00B [00:00, ?B/s]

configuration_xlm_roberta.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual:
- configuration_xlm_roberta.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`torch_dtype` is deprecated! Use `dtype` instead!


modeling_xlm_roberta.py: 0.00B [00:00, ?B/s]

mha.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual:
- mha.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


xlm_padding.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual:
- xlm_padding.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


block.py: 0.00B [00:00, ?B/s]

mlp.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual:
- mlp.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual:
- block.py
- mlp.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


embedding.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual:
- embedding.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual:
- modeling_xlm_roberta.py
- mha.py
- xlm_padding.py
- block.py
- embedding.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

XLMRobertaForSequenceClassification(
  (roberta): XLMRobertaModel(
    (embeddings): XLMRobertaEmbeddings(
      (word_embeddings): Embedding(250002, 768, padding_idx=1)
      (position_embeddings): Embedding(1026, 768)
      (token_type_embeddings): Embedding(1, 768)
    )
    (emb_drop): Dropout(p=0.1, inplace=False)
    (emb_ln): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (encoder): XLMRobertaEncoder(
      (layers): ModuleList(
        (0-11): 12 x Block(
          (mixer): MHA(
            (Wqkv): LinearResidual(in_features=768, out_features=2304, bias=True)
            (inner_attn): SelfAttention(
              (drop): Dropout(p=0.1, inplace=False)
            )
            (inner_cross_attn): CrossAttention(
              (drop): Dropout(p=0.1, inplace=False)
            )
            (out_proj): Linear(in_features=768, out_features=768, bias=True)
          )
          (dropout1): Dropout(p=0.1, inplace=False)
          (drop_path1): StochasticDepth(p=0.0, mode=r

In [3]:
query = "Best methods for brewing coffee at home"

documents = [
    #Highly relevant
    "Pour-over coffee allows precise control over brewing time and water temperature.",
    "Cold brew coffee is made by steeping coarse coffee grounds in cold water for several hours.",
    "The French press is a simple and popular method for brewing coffee at home.",
    "Espresso machines can produce rich, concentrated coffee, but require more maintenance.",
    "Using freshly ground beans can greatly improve coffee flavor and aroma.",
    #Somewhat relevant
    "Drip coffee makers are ideal for making multiple cups quickly in the morning.",
    "Coffee grinders come in blade and burr styles, which affect grind consistency.",
    #Unrelated (for reranker contrast)
    "Regular exercise and a balanced diet contribute to better overall health.",
    "Modern smartphones now feature advanced AI-powered cameras.",
    "Electric cars are becoming more common as battery technology improves.",
    "Learning a musical instrument can boost creativity and cognitive skills.",
    "Proper home lighting can improve mood and productivity.",
]

In [4]:
result = model.rerank(
    query,
    documents,
    max_query_length=512,
    max_length=1024,
    top_n=3
)


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [5]:
result

[{'document': 'The French press is a simple and popular method for brewing coffee at home.',
  'relevance_score': 0.7662936449050903,
  'index': np.int64(2)},
 {'document': 'Cold brew coffee is made by steeping coarse coffee grounds in cold water for several hours.',
  'relevance_score': 0.34775859117507935,
  'index': np.int64(1)},
 {'document': 'Using freshly ground beans can greatly improve coffee flavor and aroma.',
  'relevance_score': 0.30569732189178467,
  'index': np.int64(4)}]