# Setup Prerequisites

## 🔗 Complete Setup Required

Before running this notebook, you **must** complete the workshop setup process:

📖 **Please follow the complete setup guide here: [`SETUP.md`](../SETUP.md)**

This notebook requires:
- ✅ **Qdrant database** setup (Cloud or Docker)
- ✅ **Data ingestion** completed 
- ✅ **OPENAI_API_KEY** configured
- ✅ **COHERE_API_KEY** configured (for reranking features)

**🚫 Do not proceed** until setup is complete!


# RAG Workshop Notebook - Naive RAG with Cohere Reranking


In [1]:
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

True

### 1.1. Initialize Clients

In [2]:

import os
from openai import OpenAI
from qdrant_client import QdrantClient

# Initialize OpenAI client
openai_client = OpenAI()

# Initialize Qdrant Cloud client
qdrant_client = QdrantClient(
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY")
)

# Collection configuration
collection_name = "workshop_wikipedia_extended"
embedding_model = "text-embedding-3-small"

print(f"✅ Connected to Qdrant Cloud")
print(f"📚 Collection: {collection_name}")
print(f"🤖 Embedding model: {embedding_model}")

✅ Connected to Qdrant Cloud
📚 Collection: workshop_wikipedia_extended
🤖 Embedding model: text-embedding-3-small


### 1.2. Verify Collection and Dataset

In [3]:

# Get collection information
collection_info = qdrant_client.get_collection(collection_name)
point_count = collection_info.points_count

print(f"📊 Collection Statistics:")
print(f"   Total chunks: {point_count:,}")
print(f"   Vector dimension: {collection_info.config.params.vectors.size}")
print(f"   Distance metric: {collection_info.config.params.vectors.distance}")

# Sample a few points to see the data structure
sample_points = qdrant_client.scroll(
    collection_name=collection_name,
    limit=3,
    with_payload=True,
    with_vectors=False
)[0]

print(f"\n📝 Sample data structure:")
for i, point in enumerate(sample_points):
    payload = point.payload
    print(f"\nChunk {i+1}:")
    print(f"   Title: {payload.get('title', 'Unknown')}")
    print(f"   Text preview: {payload.get('text', '')[:100]}...")
    print(f"   Chunk {payload.get('chunk_index', 0)+1} of {payload.get('total_chunks', 0)}")

📊 Collection Statistics:
   Total chunks: 1,210
   Vector dimension: 1536
   Distance metric: Cosine

📝 Sample data structure:

Chunk 1:
   Title: BERT (language model)
   Text preview: Bidirectional encoder representations from transformers (BERT) is a language model introduced in Oct...
   Chunk 1 of 10

Chunk 2:
   Title: BERT (language model)
   Text preview: Euclidean space. Encoder: a stack of Transformer blocks with self-attention, but without causal mask...
   Chunk 2 of 10

Chunk 3:
   Title: BERT (language model)
   Text preview: consists of a sinusoidal function that takes the position in the sequence as input. Segment type: Us...
   Chunk 3 of 10


## 2. Build the Q/A Chatbot

![../imgs/naive-rag.png](../imgs/naive-rag.png)


### 2.1. Retrieval - Search the database for the most relevant embeddings.

In [4]:
# Function to search the database
def vector_search(query, top_k=1):
    # create embedding of the query
    response = openai_client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    )
    query_embeddings = response.data[0].embedding
    # similarity search using the embedding, give top n results which are close to the query embeddings
    search_result = qdrant_client.query_points(
        collection_name=collection_name,
        query=query_embeddings,
        with_payload=True,
        limit=top_k,
    ).points
    return [result.payload for result in search_result]


search_result = vector_search("What does the word 'deep' in 'deep learning' refer")

from pprint import pprint

pprint(search_result[0])

{'chunk_index': 0,
 'text': 'In machine learning, deep learning focuses on utilizing multilayered '
         'neural networks to perform tasks such as classification, regression, '
         'and representation learning. The field takes inspiration from '
         'biological neuroscience and is centered around stacking artificial '
         'neurons into layers and "training" them to process data. The '
         'adjective "deep" refers to the use of multiple layers (ranging from '
         'three to several hundred or thousands) in the network. Methods used '
         'can be supervised, semi-supervised or unsupervised. Some common deep '
         'learning network architectures include fully connected networks, '
         'deep belief networks, recurrent neural networks, convolutional '
         'neural networks, generative adversarial networks, transformers, and '
         'neural radiance fields. These architectures have been applied to '
         'fields including computer vision,

### 2.2. Generation - Use the retrieved embeddings to generate the answer.

In [5]:
def model_generate(prompt, model="gpt-4o"):
    messages = [{"role": "user", "content": prompt}]
    response = openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,  # this is the degree of randomness of the model's output
    )
    return response.choices[0].message.content

In [6]:
import json


def prompt_template(question, context):
    return """You are a AI Assistant that provides answer to the question at the end, over the following
  pieces of context. Make sure to only use the context to answer the question. Keep the wording very close to the context.
  Explicitly mention you DONT KNOW if the answer is not in the context. Answering questions when the answers are not in the context is NOT allowed.
  context:
  ```
  """ + json.dumps(context) + """
  ```
  User question: """ + question + """
  Answer in markdown:"""


In [7]:
def generate_answer(question):
    #Retrieval: search a knowledge base.
    search_result = vector_search(question)

    prompt = prompt_template(question, search_result)
    # Generation: LLMs' ability to generate the answer
    return model_generate(prompt)

In [8]:
question = f"Who introduced the time delay neural network (TDNN)? and when ?"
answer = generate_answer(question)
print("Answer:", answer)

Answer: ```markdown
The time delay neural network (TDNN) was introduced in 1987 by Alex Waibel.
```


## 3. RAG Evaluation with RAGAS

Before proceeding with improvements, let's establish baseline scores using **RAGAS** (Retrieval Augmented Generation Assessment Suite) - a specialized framework designed specifically for evaluating RAG systems.

### Context-Focused Metrics:

1. **Context Precision**: How well are relevant chunks ranked at the top?
2. **Context Recall**: How much of the necessary information was retrieved?
3. **Context Relevancy**: How relevant is the retrieved context to the question?

We're using **RAGAS** because it's purpose-built for RAG evaluation and provides deep insights into context quality - the most critical component of RAG performance. The evaluation is simple to use - just call one function!


In [9]:
# Import the RAGAS evaluation utility
# Import the RAGAS evaluation utility
import sys
import os
sys.path.append('../naive-rag')
from rag_evaluator_v2 import evaluate_naive_rag_v2

# Run evaluation on the current RAG system using RAGAS
print("🔍 Evaluating your Naive RAG system with RAGAS...")
print("This will evaluate context quality metrics on 14 questions...\n")

baseline_results = evaluate_naive_rag_v2(
    vector_search_func=vector_search,
    generate_answer_func=generate_answer
)


🔍 Evaluating your Naive RAG system with RAGAS...
This will evaluate context quality metrics on 15 questions...

✅ Loaded 14 questions from evaluation dataset

Evaluating 14 questions...

Question 1/14: Who introduced the ReLU (rectified linear unit) ac...
Question 2/14: What was the first working deep learning algorithm...
Question 3/14: Which CNN achieved superhuman performance in a vis...
Question 4/14: When was BERT introduced and by which organization...
Question 5/14: What are the two model sizes BERT was originally i...
Question 6/14: What percentage of tokens are randomly selected fo...
Question 7/14: Who introduced the term 'deep learning' to the mac...
Question 8/14: Which three researchers were awarded the 2018 Turi...
Question 9/14: When was the first GPT introduced and by which org...
Question 10/14: What were the three parameter sizes of the first v...
Question 11/14: What is the 'one in ten rule' in regression analys...
Question 12/14: What is the essence of overfitting a

Evaluating:   0%|          | 0/14 [00:00<?, ?it/s]


RAGAS EVALUATION RESULTS

📋 INDIVIDUAL QUESTION SCORES:
------------------------------------------------------------
 1. 🟢 1.000 - Who introduced the ReLU (rectified linear unit) activation f...
 2. 🔴 0.000 - What was the first working deep learning algorithm and who p...
 3. 🟢 1.000 - Which CNN achieved superhuman performance in a visual patter...
 4. 🔴 0.000 - When was BERT introduced and by which organization?
 5. 🟢 1.000 - What are the two model sizes BERT was originally implemented...
 6. 🟢 1.000 - What percentage of tokens are randomly selected for the mask...
 7. 🔴 0.000 - Who introduced the term 'deep learning' to the machine learn...
 8. 🔴 0.000 - Which three researchers were awarded the 2018 Turing Award f...
 9. 🟢 1.000 - When was the first GPT introduced and by which organization?
10. 🟢 1.000 - What were the three parameter sizes of the first versions of...
11. 🟢 1.000 - What is the 'one in ten rule' in regression analysis?
12. 🟢 1.000 - What is the essence of overfitting 

In [14]:
# Store baseline scores for comparison later
baseline_scores = baseline_results.get('aggregate_scores', {})

print("📊 BASELINE CONTEXT METRICS SUMMARY:")
if 'context_recall' in baseline_scores:
    print(f"Context Recall: {baseline_scores['context_recall']:.3f}")

print("\n💡 What these RAGAS metrics mean:")
print("• Context Recall: How much of the necessary information was retrieved")

print("\n🎯 Score Interpretation:")
print("• 0.8+ = Excellent")
print("• 0.6-0.8 = Good") 
print("• 0.4-0.6 = Needs Improvement")
print("• <0.4 = Poor")


📊 BASELINE CONTEXT METRICS SUMMARY:
Context Recall: 0.714

💡 What these RAGAS metrics mean:
• Context Recall: How much of the necessary information was retrieved

🎯 Score Interpretation:
• 0.8+ = Excellent
• 0.6-0.8 = Good
• 0.4-0.6 = Needs Improvement
• <0.4 = Poor


### 📋 Why We Need These Baseline Scores

These **RAGAS-powered** baseline scores are crucial because:

1. **Context Quality Focus**: RAGAS specifically measures how well your retrieval system finds and ranks relevant information
2. **Purpose-Built for RAG**: Unlike general evaluation tools, RAGAS is designed specifically for RAG systems
3. **Objective Measurement**: Quantitative metrics that measure actual retrieval performance
4. **Debugging Aid**: Low context scores immediately tell you where your RAG is failing
5. **Optimization Guide**: Use these metrics to systematically improve your retrieval strategy

🔬 **What makes RAGAS special**: 
- **Context Precision** helps ensure the most relevant information appears first
- **Context Recall** ensures you're not missing important information
- **Context Relevancy** validates that retrieved chunks actually help answer the question

**Next Steps**: Now that we have our baseline context metrics, let's improve our RAG system with Cohere's reranking!


## 4. Improving RAG with Cohere Reranking

Now let's see how adding a reranking step can improve our context selection and overall RAG performance.

### Why Reranking?

While vector similarity search is good at finding semantically related content, it has limitations:
- **Bi-encoder limitation**: Vector embeddings compress all information into a fixed-size representation
- **Lost nuances**: Subtle relevance signals can be lost in the embedding process
- **No query-document interaction**: Embeddings are created independently

Reranking solves these issues by:
- **Cross-encoder architecture**: Processes query and document together
- **Fine-grained relevance**: Captures subtle semantic relationships
- **Better precision**: Filters out less relevant results even if they have high vector similarity

### 4.1. Initialize Cohere Client

In [16]:
import cohere
import os

# Initialize Cohere client
cohere_client = cohere.Client(os.getenv("COHERE_API_KEY"))

print("✅ Cohere client initialized successfully!")

✅ Cohere client initialized successfully!


### 4.1.1. Simple Reranking Demo

Before diving into the full RAG implementation, let's see a **simple example** of how Cohere reranking works:

**The Challenge**: You have 10 documents about meditation, but only some directly answer "health benefits"

**The Solution**: Cohere's rerank model can identify which documents are most relevant to your specific query

This example will show:
- 📝 **Input**: A query + 10 documents (mixed relevance)
- 🧠 **Processing**: Cohere rerank model scores each document
- 🏆 **Output**: Top 3 most relevant documents with scores

**Key Insight**: Notice how the reranker identifies documents that specifically mention health benefits (stress, blood pressure, heart disease) rather than just general meditation topics!


In [17]:
# 🎯 Simple Reranking Example: See how Cohere rerank works!

query = "What are the health benefits of meditation?"

# Sample documents - some are highly relevant to health benefits, others less so
documents = [
    "Several clinical studies have shown that regular meditation can help reduce stress and anxiety levels.",
    "Meditation involves focusing the mind and eliminating distractions, often through breathing techniques or guided imagery.",
    "A daily meditation practice has been associated with lower blood pressure and improved sleep quality in adults.",
    "The city of Kyoto is famous for its Zen temples, where meditation has been practiced for centuries.",
    "People who meditate frequently often report feeling calmer and more focused throughout the day.",
    "Research suggests meditation may lower the risk of heart disease by reducing inflammation and improving heart rate variability.",
    "Meditation apps have become increasingly popular, offering guided sessions on mindfulness and relaxation.",
    "A 2021 meta-analysis found that meditation can reduce symptoms of depression when used alongside other treatments.",
    "Some forms of meditation emphasize compassion and kindness, aiming to improve emotional well-being.",
    "Athletes sometimes use meditation techniques to enhance concentration and mental resilience during competition.",
]

print("🔍 Query:", query)
print("\n📚 Documents to rank (10 total):")
for i, doc in enumerate(documents):
    print(f"{i+1}. {doc}")

print("\n" + "="*80)

# Use Cohere's rerank API directly
try:
    rerank_response = cohere_client.rerank(
        query=query,
        documents=documents,
        model='rerank-english-v3.0',
        top_n=3  # Get top 3 most relevant
    )
    
    print("\n🏆 TOP 3 RERANKED RESULTS:")
    print("(Notice how the reranker identifies the most health-focused documents!)")
    print("-" * 80)
    
    for i, result in enumerate(rerank_response.results):
        doc_index = result.index + 1  # +1 for human-readable numbering
        relevance_score = result.relevance_score
        document_text = documents[result.index]
        
        print(f"\n🥇 Rank #{i+1}")
        print(f"   📊 Relevance Score: {relevance_score:.4f}")
        print(f"   📝 Original Position: #{doc_index}")
        print(f"   📄 Text: {document_text}")
        
    print("\n" + "="*80)
    print("💡 Key Insight: Notice how documents specifically about health benefits")
    print("   (stress reduction, blood pressure, heart disease) rank highest!")
    
except Exception as e:
    print(f"❌ Error during reranking: {str(e)}")
    print("⚠️  Make sure you have set COHERE_API_KEY in your environment")

🔍 Query: What are the health benefits of meditation?

📚 Documents to rank (10 total):
1. Several clinical studies have shown that regular meditation can help reduce stress and anxiety levels.
2. Meditation involves focusing the mind and eliminating distractions, often through breathing techniques or guided imagery.
3. A daily meditation practice has been associated with lower blood pressure and improved sleep quality in adults.
4. The city of Kyoto is famous for its Zen temples, where meditation has been practiced for centuries.
5. People who meditate frequently often report feeling calmer and more focused throughout the day.
6. Research suggests meditation may lower the risk of heart disease by reducing inflammation and improving heart rate variability.
7. Meditation apps have become increasingly popular, offering guided sessions on mindfulness and relaxation.
8. A 2021 meta-analysis found that meditation can reduce symptoms of depression when used alongside other treatments.
9. Some 

### 4.2. Create Reranking Function

In [18]:
def rerank_results(query, search_results, top_k=3, max_retries=5, initial_backoff=10):
    """
    Rerank search results using Cohere's rerank model with rate limit handling.
    
    Args:
        query: The user's question
        search_results: List of search results from vector search
        top_k: Number of top results to return after reranking
        max_retries: Maximum number of retry attempts for rate-limited requests
        initial_backoff: Initial backoff time in seconds (will increase exponentially)
    
    Returns:
        List of reranked results
    """
    import time
    
    # Extract texts from search results
    documents = [result.get('text', '') for result in search_results]
    
    # Implement retry with exponential backoff for rate limit handling
    retry_count = 0
    backoff_time = initial_backoff
    
    while retry_count <= max_retries:
        try:
            # Call Cohere rerank
            rerank_response = cohere_client.rerank(
                query=query,
                documents=documents,
                model='rerank-english-v3.0',  # Latest rerank model
                top_n=top_k
            )
            
            # Return reranked results maintaining original structure
            reranked_results = []
            for result in rerank_response.results:
                original_result = search_results[result.index].copy()
                original_result['rerank_score'] = result.relevance_score
                reranked_results.append(original_result)
            
            return reranked_results
            
        except Exception as e:
            # Check if it's a rate limit error (429)
            if hasattr(e, 'status_code') and e.status_code == 429:
                if retry_count < max_retries:
                    print(f"⚠️ Rate limit reached. Waiting for {backoff_time} seconds before retrying...")
                    time.sleep(backoff_time)
                    # Exponential backoff
                    backoff_time *= 2
                    retry_count += 1
                    continue
                else:
                    print(f"❌ Maximum retries ({max_retries}) reached. Falling back to vector search results.")
            else:
                print(f"❌ Error during reranking: {str(e)}")
            
            # Fallback: return the top_k results from the original search
            print("⚠️ Using original vector search results without reranking.")
            return search_results[:top_k]

### 4.3. Create Enhanced Answer Generation Function

In [19]:
def generate_answer_with_rerank(question, initial_top_k=10, final_top_k=1):
    """
    Generate answer using reranked results.
    
    Args:
        question: The user's question
        initial_top_k: Number of candidates to retrieve from vector search
        final_top_k: Number of results to keep after reranking
    
    Returns:
        Generated answer
    """
    # Step 1: Retrieval - get more candidates
    search_results = vector_search(question, top_k=initial_top_k)
    
    # Step 2: Reranking - select best results
    reranked_results = rerank_results(question, search_results, top_k=final_top_k)
    
    # Step 3: Generation - use reranked results
    prompt = prompt_template(question, reranked_results)
    return model_generate(prompt)

### 4.4. Compare Results: Naive RAG vs. Reranked RAG

In [24]:
# Let's compare results on a challenging question
test_question = "What was the first working deep learning algorithm and who published it?"
print("=" * 80)
print("COMPARISON: Naive RAG vs. Reranked RAG")
print("=" * 80)
print(f"\nQuestion: {test_question}\n")

print("=== Without Reranking (Top 3 by vector similarity) ===")
search_results = vector_search(test_question, top_k=3)
for i, result in enumerate(search_results):
    print(f"\n{i+1}. {result['text'][:200]}...")

answer_naive = generate_answer(test_question)
print(f"\n📝 Answer (Naive RAG):\n{answer_naive}")

print("\n" + "=" * 80 + "\n")

print("=== With Cohere Reranking (Top 3 from 10 candidates) ===")
search_results_extended = vector_search(test_question, top_k=10)
reranked_results = rerank_results(test_question, search_results_extended, top_k=3)
for i, result in enumerate(reranked_results):
    title = result.get('title', 'Unknown')
    print(f"\n{i+1}. [Rerank Score: {result['rerank_score']:.3f}] (Title: {title}) {result['text'][:200]}...")

answer_reranked = generate_answer_with_rerank(test_question)
print(f"\n📝 Answer (Reranked RAG):\n{answer_reranked}")

COMPARISON: Naive RAG vs. Reranked RAG

Question: What was the first working deep learning algorithm and who published it?

=== Without Reranking (Top 3 by vector similarity) ===

1. A 1971 paper described a deep network with eight layers trained by this method, which is based on layer by layer training through regression analysis. Superfluous hidden units are pruned using a separ...

2. first perceptrons did not have adaptive hidden units. However, Joseph (1960) also discussed multilayer perceptrons with an adaptive hidden layer. Rosenblatt (1962): section 16 cited and adopted these ...

3. the 1920s, Wilhelm Lenz and Ernst Ising created the Ising model which is essentially a non-learning RNN architecture consisting of neuron-like threshold elements. In 1972, Shun'ichi Amari made this ar...

📝 Answer (Naive RAG):
The first deep learning multilayer perceptron trained by stochastic gradient descent was published in 1967 by Shun'ichi Amari.


=== With Cohere Reranking (Top 3 from 10 cand

### 4.5. Evaluate Improvement with RAGAS

In [25]:
# Run evaluation with reranking
print("🔍 Evaluating RAG system with Cohere Reranking...")
print("This will evaluate the improved system on the same 15 questions...\n")

reranked_results = evaluate_naive_rag_v2(
    vector_search_func=lambda q: vector_search(q, top_k=15),
    generate_answer_func=generate_answer_with_rerank
)

# Store reranked scores
reranked_scores = reranked_results.get('aggregate_scores', {})

🔍 Evaluating RAG system with Cohere Reranking...
This will evaluate the improved system on the same 15 questions...

✅ Loaded 14 questions from evaluation dataset

Evaluating 14 questions...

Question 1/14: Who introduced the ReLU (rectified linear unit) ac...
Question 2/14: What was the first working deep learning algorithm...
Question 3/14: Which CNN achieved superhuman performance in a vis...
Question 4/14: When was BERT introduced and by which organization...
Question 5/14: What are the two model sizes BERT was originally i...
Question 6/14: What percentage of tokens are randomly selected fo...
Question 7/14: Who introduced the term 'deep learning' to the mac...
Question 8/14: Which three researchers were awarded the 2018 Turi...
Question 9/14: When was the first GPT introduced and by which org...
Question 10/14: What were the three parameter sizes of the first v...
Question 11/14: What is the 'one in ten rule' in regression analys...
Question 12/14: What is the essence of overfitt

Evaluating:   0%|          | 0/14 [00:00<?, ?it/s]


RAGAS EVALUATION RESULTS

📋 INDIVIDUAL QUESTION SCORES:
------------------------------------------------------------
 1. 🟢 1.000 - Who introduced the ReLU (rectified linear unit) activation f...
 2. 🟢 1.000 - What was the first working deep learning algorithm and who p...
 3. 🟢 1.000 - Which CNN achieved superhuman performance in a visual patter...
 4. 🟢 1.000 - When was BERT introduced and by which organization?
 5. 🟢 1.000 - What are the two model sizes BERT was originally implemented...
 6. 🟢 1.000 - What percentage of tokens are randomly selected for the mask...
 7. 🟢 1.000 - Who introduced the term 'deep learning' to the machine learn...
 8. 🟢 1.000 - Which three researchers were awarded the 2018 Turing Award f...
 9. 🟢 1.000 - When was the first GPT introduced and by which organization?
10. 🟢 1.000 - What were the three parameter sizes of the first versions of...
11. 🟢 1.000 - What is the 'one in ten rule' in regression analysis?
12. 🟢 1.000 - What is the essence of overfitting 

In [26]:
# Compare improvements
print("\n" + "=" * 60)
print("📊 IMPROVEMENT WITH RERANKING")
print("=" * 60)

for metric in ['context_recall']:
    if metric in baseline_scores and metric in reranked_scores:
        baseline = baseline_scores[metric]
        reranked = reranked_scores[metric]
        improvement = reranked - baseline
        improvement_pct = (improvement / baseline) * 100 if baseline > 0 else 0
        
        print(f"\n{metric.replace('_', ' ').title()}:")
        print(f"  Baseline: {baseline:.3f}")
        print(f"  With Reranking: {reranked:.3f}")
        print(f"  Improvement: {improvement:+.3f} ({improvement_pct:+.1f}%)")

print("\n" + "=" * 60)
print("\n🎉 Key Insights:")
print("• Reranking typically improves context precision significantly")
print("• Better context selection leads to more accurate answers")
print("• The cross-encoder architecture of rerankers captures nuanced relevance")
print("• This is especially valuable for complex or ambiguous queries")


📊 IMPROVEMENT WITH RERANKING

Context Recall:
  Baseline: 0.714
  With Reranking: 1.000
  Improvement: +0.286 (+40.0%)


🎉 Key Insights:
• Reranking typically improves context precision significantly
• Better context selection leads to more accurate answers
• The cross-encoder architecture of rerankers captures nuanced relevance
• This is especially valuable for complex or ambiguous queries


## 5. Summary and Next Steps

### What We've Learned

1. **Naive RAG Limitations**: While vector search is effective, it can miss nuanced relevance
2. **Reranking Benefits**: Cross-encoder models like Cohere's reranker significantly improve context selection
3. **Measurable Improvements**: RAGAS metrics clearly show the performance gains

### Architecture Comparison

**Naive RAG:**
```
Query → Embedding → Vector Search (Top 3) → Generate Answer
```

**Reranked RAG:**
```
Query → Embedding → Vector Search (Top 10) → Rerank (Top 3) → Generate Answer
```

### When to Use Reranking

✅ **Use reranking when:**
- Answer quality is critical
- You have complex, nuanced queries
- Your corpus contains similar but subtly different content
- You can afford the additional API call latency

❌ **Skip reranking when:**
- Speed is more important than accuracy
- Queries are simple and unambiguous
- Your corpus has clearly distinct topics

### Further Improvements

1. **Hybrid Search**: Combine vector search with keyword search
2. **Query Expansion**: Generate multiple query variations
3. **Document Expansion**: Add metadata and summaries to chunks
4. **Fine-tuning**: Train custom rerankers on your domain
5. **Caching**: Store reranked results for common queries

### Try It Yourself!

Experiment with:
- Different `initial_top_k` values (try 20, 50)
- Different `final_top_k` values (try 5, 7)
- Different reranking models
- Your own questions and see the improvement!