# Technique 4: Reranking

## The Problem
Initial retrieval isn't perfect. The top-5 docs might not be in optimal order.

**Why?** Embedding models optimize for speed, not perfect ranking.

## The Solution
**Cross-Encoder Reranking:**
1. Retrieve top-K docs (e.g., 20)
2. Use powerful cross-encoder to rerank
3. Take top-N after reranking (e.g., 5)

Slower but more accurate!

**Difficulty:** ‚≠ê‚≠ê‚≠ê‚≠ê‚òÜ

## üéØ How Reranking Works: Two-Stage Retrieval

**Stage 1: Fast Retrieval (Bi-Encoder)**
- Retrieve MORE documents (e.g., top-10 or top-20)
- Uses fast vector similarity (embeddings computed separately)
- Goal: Cast a wide net - don't miss relevant docs

**Stage 2: Accurate Reranking (Cross-Encoder)**
- Take those candidates from Stage 1
- Use powerful cross-encoder to score each query-doc pair
- Keep only the BEST N (e.g., top-5)
- Goal: Optimize ranking for maximum relevance

### Example: Before vs After Reranking

**Before Reranking (Vector Similarity):**
```
Query: "MSME financing options"

Top-5 by cosine similarity:
1. [SMEDAN overview] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 0.82
2. [Financing options] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 0.81  ‚Üê Should be #1!
3. [Registration process] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 0.79
4. [Tax benefits] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 0.77
5. [Loan requirements] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ  0.76  ‚Üê Should be #2!
```

**After Reranking (Cross-Encoder):**
```
Retrieved 10, reranked, kept top-5:

1. [Financing options] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 0.94  ‚úÖ NOW #1!
2. [Loan requirements] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ  0.89  ‚úÖ NOW #2!
3. [Development Bank info] ‚îÄ‚îÄ‚îÄ‚îÄ 0.85
4. [SMEDAN financing] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 0.82
5. [BOI programs] ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 0.78
```

**Result:** Much better document ranking!

## Step 1: Imports

In [1]:
from utils_openai import setup_openai_api, create_embeddings, create_llm, load_msme_data, create_vectorstore, get_baseline_prompt, load_existing_vectorstore
from langchain_classic.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain_classic.retrievers import ContextualCompressionRetriever
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print('[OK] Imports done!')

  from .autonotebook import tqdm as notebook_tqdm


[OK] Imports done!


## Step 2: Setup

In [2]:
api_key = setup_openai_api()
embeddings = create_embeddings(api_key)
llm = create_llm(api_key)
docs, metas, ids = load_msme_data('msme.csv')
vectorstore = create_vectorstore(docs, metas, ids, embeddings, 'msme_t7', './chroma_db_t7')
base_retriever = vectorstore.as_retriever(search_kwargs={'k': 10})  # Retrieve MORE for reranking
print('[OK] Base retriever ready (k=10)!')

[OK] Initialized embeddings: text-embedding-3-small
[OK] Initialized LLM: gpt-4o-mini (temp=0)
[OK] Loaded 14 documents from msme.csv
[OK] Created vector store: msme_t7 (14 docs)
[OK] Base retriever ready (k=10)!


In [3]:
api_key = setup_openai_api()
embeddings = create_embeddings(api_key)
llm = create_llm(api_key)
vectorstore = load_existing_vectorstore(embeddings, 'msme_t7', './chroma_db_t7')
base_retriever = vectorstore.as_retriever(search_kwargs={'k': 10})  # Retrieve MORE for reranking
print('[OK] Base retriever ready (k=10)!')

[OK] Initialized embeddings: text-embedding-3-small
[OK] Initialized LLM: gpt-4o-mini (temp=0)
[OK] Loaded existing vector store: msme_t7
[OK] Base retriever ready (k=10)!


## Step 3: Create Cross-Encoder Reranker

## üîç Bi-Encoder vs Cross-Encoder: The Key Difference

### Bi-Encoder (Used in Vector Search - Stage 1)
```
Query ‚Üí Embed ‚Üí Vector_Q  ‚îÄ‚îê
                            ‚îú‚Üí cosine_similarity(Vector_Q, Vector_D)
Doc ‚Üí Embed ‚Üí Vector_D    ‚îÄ‚îò
```
- **Process:** Query and document embedded **separately**
- **Comparison:** Simple cosine similarity between vectors
- **Speed:** FAST
- **Accuracy:** Good (but misses nuanced relevance)

### Cross-Encoder (Used in Reranking - Stage 2)
```
[Query + Doc] ‚Üí Feed TOGETHER into model ‚Üí Relevance Score
```
- **Process:** Query and document processed **together** as a pair
- **Comparison:** Model sees both at once, captures interaction
- **Speed:** SLOWER (must run model for each query-doc pair)
- **Accuracy:** EXCELLENT (captures semantic relationships)

**Why Cross-Encoder is More Accurate:**
- Sees query and doc together (not in isolation)
- Can capture word interactions and context
- Optimized specifically for ranking tasks

In [4]:
# Using HuggingFace cross-encoder model:
model = HuggingFaceCrossEncoder(model_name='cross-encoder/ms-marco-MiniLM-L-6-v2')
reranker = CrossEncoderReranker(model=model, top_n=5)  # Rerank and keep top 5
print('[OK] Reranker created!')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


[OK] Reranker created!


## Step 4: Wrap with ContextualCompressionRetriever

In [5]:
reranking_retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)
print('[OK] Reranking retriever ready!')

[OK] Reranking retriever ready!


## Step 5: Build RAG Chain

In [6]:
prompt = get_baseline_prompt()

reranking_rag_chain = (
    {'context': reranking_retriever, 'question': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
print('[OK] Reranking RAG chain ready!')

[OK] Reranking RAG chain ready!


## Step 6: Test

In [7]:
question = 'What are the challenges faced by MSMEs in Nigeria?'

# Compare base retrieval order vs reranked order
base_docs = base_retriever.invoke(question)
reranked_docs = reranking_retriever.invoke(question)

print(f'Base retrieval: {len(base_docs)} docs')
print(f'After reranking: {len(reranked_docs)} docs')
print(f'\nTop doc after reranking:')
print(reranked_docs[0].page_content[:300])

answer = reranking_rag_chain.invoke(question)
print(f'\nAnswer:\n{answer}')

Base retrieval: 10 docs
After reranking: 5 docs

Top doc after reranking:
CHALLENGES CONFRONTING MSMEs IN NIGERIA The Micro, Small and Medium Enterprises (MSMEs) have been known, in both developed and developing nations, to be incontrovertible contributors to employment generation, wealth creation and poverty alleviation. It is on this premise that several efforts are gea

Answer:
Micro, Small, and Medium Enterprises (MSMEs) in Nigeria face several significant challenges that hinder their growth and sustainability. Key issues include limited access to financing due to high-interest rates and stringent collateral requirements, inadequate infrastructure such as unstable power supply and poor transportation networks, and complex regulatory frameworks that create bureaucratic hurdles (SMEDAN). Additionally, MSMEs struggle with market access and competition from larger firms, as well as a lack of skilled labor and technological adoption (World Bank). These challenges collectively impede the

## ‚öñÔ∏è Performance Trade-offs

| Aspect | Vector Search Only | With Reranking |
|--------|-------------------|----------------|
| **Speed** | Very fast (~10ms) | Slower (~100-500ms) |
| **Accuracy** | Good (70-80%) | Excellent (85-95%) |
| **Cost** | Cheap | More expensive |
| **Computation** | Pre-computed embeddings | Must score each pair |
| **Use Case** | Speed-critical apps | Quality-critical tasks |

## When to Use
**Use when:**
- Accuracy critical
- Initial retrieval misses best docs
- Can afford extra computation

**Avoid when:**
- Speed is priority
- Initial retrieval already good
- Extra cost unjustified

## Exercise
1. Compare answers with/without reranking
2. Test different reranker models
3. Measure quality improvement

Time: 15 min

In [8]:
# Your code here

**Next:** Technique 5 - HyDE