# Cross-Encoder Reranking in RAG Systems with Gemini

This notebook demonstrates advanced cross-encoder reranking in Retrieval-Augmented Generation (RAG) systems using:
- **Cross-Encoder Models** for sophisticated query-document relevance scoring
- **Sentence Transformers** for initial retrieval
- **FAISS** for efficient vector search
- **Google Gemini** for intelligent explanations and final answer generation

## Cross-Encoder vs Bi-Encoder (Semantic) Reranking

**Bi-Encoder (Semantic Reranking):**
- Encodes query and documents separately
- Computes similarity using vector operations
- Fast but limited interaction between query and document

**Cross-Encoder Reranking:**
- Processes query-document pairs jointly
- Deep attention mechanisms between query and document tokens
- More accurate but computationally expensive
- Better understanding of relevance and context

## Project Overview
We'll build an advanced question-answering system about space exploration that:
1. Retrieves relevant documents using bi-encoder embeddings
2. Reranks results using cross-encoder models for superior relevance
3. Uses Gemini to provide comprehensive answers with detailed explanations
4. Compares cross-encoder vs bi-encoder performance

## 1. Setup and Installation

In [None]:
# Install required packages
!pip install sentence-transformers faiss-cpu numpy pandas google-generativeai python-dotenv scikit-learn transformers torch

In [None]:
import os
import numpy as np
import pandas as pd
import faiss
import time
import warnings
from sentence_transformers import SentenceTransformer, CrossEncoder
from sklearn.metrics.pairwise import cosine_similarity
import google.generativeai as genai
from typing import List, Tuple, Dict, Optional
import json
from dotenv import load_dotenv
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Load environment variables
load_dotenv()

print("✅ All packages imported successfully!")

## 2. Configuration and API Setup

In [None]:
# Configure Gemini API
# Get your API key from https://makersuite.google.com/app/apikey
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY') or 'your-gemini-api-key-here'
genai.configure(api_key=GEMINI_API_KEY)

# Initialize Gemini model
gemini_model = genai.GenerativeModel('gemini-pro')

print("✅ Gemini API configured!")

## 3. Enhanced Dataset: Space Exploration Knowledge Base

In [None]:
# Extended knowledge base about space exploration for better cross-encoder demonstration
documents = [
    {
        "id": 1,
        "title": "Mars Exploration and Life Detection",
        "content": "Mars exploration has been a key focus of space agencies worldwide, particularly in the search for signs of past or present life. NASA's Mars rovers, including Perseverance and Curiosity, have provided invaluable data about the Red Planet's geology, climate, and potential for past life. Perseverance carries sophisticated instruments like MOXIE for oxygen production and the SUPERCAM laser spectrometer. The planet's thin atmosphere, composed mainly of carbon dioxide, presents unique challenges for exploration missions. Recent discoveries of seasonal methane emissions and ancient riverbeds suggest Mars may have once harbored microbial life."
    },
    {
        "id": 2,
        "title": "International Space Station Scientific Research",
        "content": "The International Space Station (ISS) serves as a premier microgravity laboratory where astronauts conduct cutting-edge scientific experiments across multiple disciplines. Located approximately 408 kilometers above Earth, the ISS completes an orbit around our planet every 90 minutes, providing a unique platform for research. The station supports investigations in biology, physics, astronomy, materials science, and human physiology. Experiments include studying protein crystallization, plant growth in microgravity, fluid physics, and the effects of long-duration spaceflight on the human body. The ISS has been continuously inhabited since November 2000, representing unprecedented international cooperation in space exploration."
    },
    {
        "id": 3,
        "title": "Apollo Program and Lunar Exploration Legacy",
        "content": "The Apollo program represents humanity's greatest achievement in space exploration, successfully landing twelve astronauts on the Moon between 1969 and 1972. Six successful Moon landings were completed, with Apollo 11 being the historic first on July 20, 1969. Neil Armstrong and Buzz Aldrin were the first humans to walk on the lunar surface, while Michael Collins orbited above in the command module. The program involved over 400,000 people and cost $25 billion (1970s dollars). Apollo missions collected 842 pounds of lunar samples, conducted extensive geological surveys, and proved that large-scale human space exploration was possible. The legacy continues with NASA's Artemis program aiming to return humans to the Moon by 2026."
    },
    {
        "id": 4,
        "title": "Exoplanet Discovery and Characterization Methods",
        "content": "Astronomers employ sophisticated methods to discover and characterize exoplanets, with the transit method and radial velocity method being the most successful. The Kepler Space Telescope revolutionized exoplanet science by discovering over 2,600 confirmed planets, while TESS (Transiting Exoplanet Survey Satellite) continues this work. The transit method detects the slight dimming when a planet passes in front of its star, while radial velocity measures the gravitational wobble of stars caused by orbiting planets. Many exoplanets orbit in their star's habitable zone, where liquid water could potentially exist. Advanced techniques like direct imaging and gravitational microlensing are revealing diverse planetary systems, including super-Earths, hot Jupiters, and potentially habitable rocky worlds."
    },
    {
        "id": 5,
        "title": "SpaceX Innovation and Reusable Rocket Technology",
        "content": "SpaceX has fundamentally transformed the space industry through breakthrough reusable rocket technology and ambitious mission goals. The Falcon 9 rocket features a first stage that can autonomously land back on Earth, either on drone ships or landing pads, dramatically reducing launch costs from tens of thousands of dollars per kilogram to under $3,000. The company's Dragon spacecraft regularly transports crew and cargo to the ISS, ending America's dependence on Russian Soyuz vehicles. SpaceX's Starship project represents the next leap forward, designed as a fully reusable super heavy-lift vehicle capable of carrying 100+ tons to low Earth orbit. The ultimate goal is enabling human missions to Mars and establishing a self-sustaining colony on the Red Planet, with Elon Musk targeting the 2030s for crewed Mars missions."
    },
    {
        "id": 6,
        "title": "Jupiter's Moons and Astrobiology Potential",
        "content": "Jupiter's four largest moons - Io, Europa, Ganymede, and Callisto - represent some of the most fascinating targets for astrobiology research in our solar system. Europa is particularly compelling due to its subsurface ocean beneath an icy crust, containing more water than all of Earth's oceans combined. This moon shows evidence of hydrothermal activity on its ocean floor, similar to environments on Earth where life thrives around deep-sea vents. Ganymede, the largest moon in the solar system, also harbors a subsurface ocean and has its own magnetic field. NASA's Europa Clipper mission, launching in 2024, will conduct detailed reconnaissance of Europa's ice shell and subsurface ocean. The European Space Agency's JUICE mission will study Jupiter's icy moons, focusing on Ganymede's potential habitability."
    },
    {
        "id": 7,
        "title": "Solar System Formation and Planetary Science",
        "content": "The solar system formed approximately 4.6 billion years ago from the gravitational collapse of a giant molecular cloud called the solar nebula. This process concentrated most mass at the center, forming the proto-Sun, while the remaining material formed a rotating protoplanetary disk. Through accretion, dust grains stuck together to form planetesimals, which eventually grew into planets. The inner rocky planets (Mercury, Venus, Earth, Mars) formed in the hot inner region where only refractory materials could condense, while the outer gas giants (Jupiter, Saturn, Uranus, Neptune) formed beyond the snow line where volatile compounds could freeze. Jupiter's early formation significantly influenced the architecture of the entire solar system, preventing Mars from growing larger and scattering asteroids throughout the system."
    },
    {
        "id": 8,
        "title": "Space Telescopes and Deep Universe Observations",
        "content": "Space telescopes represent humanity's most powerful tools for understanding the cosmos, operating beyond Earth's atmospheric interference to capture unprecedented views of the universe. The Hubble Space Telescope, operational since 1990, has revolutionized astronomy with discoveries including the accelerating expansion of the universe, the age of the cosmos (13.8 billion years), and detailed images of distant galaxies. The James Webb Space Telescope, Hubble's successor, observes in infrared wavelengths and can peer back to the universe's first galaxies formed just 400 million years after the Big Bang. Other specialized missions include the Spitzer Space Telescope (infrared), Chandra X-ray Observatory (X-rays), and the upcoming Nancy Grace Roman Space Telescope (wide-field surveys). These instruments have revealed exoplanets, black holes, dark matter structures, and the cosmic web."
    },
    {
        "id": 9,
        "title": "Asteroid Mining and Space Resources",
        "content": "Asteroid mining represents the next frontier in space exploration and resource utilization, with the potential to provide virtually unlimited raw materials for space-based industries and Earth's growing population. Near-Earth asteroids contain vast quantities of precious metals, rare earth elements, and water ice. A single metallic asteroid could contain more platinum than has ever been mined on Earth. Water extracted from asteroids can be split into hydrogen and oxygen for rocket fuel, enabling sustainable space exploration. Companies like Planetary Resources and Deep Space Industries are developing the technology for robotic asteroid prospecting and mining operations. NASA's OSIRIS-REx mission successfully collected samples from asteroid Bennu, demonstrating the feasibility of asteroid resource extraction. The economic potential is enormous, with some estimates suggesting asteroid mining could create the world's first trillionaire."
    },
    {
        "id": 10,
        "title": "Human Spaceflight and Mars Colonization",
        "content": "Human spaceflight to Mars represents the ultimate challenge in space exploration, requiring solutions to numerous technical, biological, and psychological obstacles. The journey to Mars takes 6-9 months each way, exposing astronauts to dangerous levels of cosmic radiation and prolonged microgravity effects including bone loss, muscle atrophy, and cardiovascular deconditioning. Psychological challenges include isolation, confinement, and delayed communication with Earth. Establishing a sustainable Mars colony requires local resource utilization (ISRU) for producing water, oxygen, and fuel from the Martian atmosphere and subsurface ice. NASA's Artemis program serves as a stepping stone, testing technologies and procedures on the Moon that will be essential for Mars missions. Private companies like SpaceX are developing the heavy-lift capabilities needed for Mars transportation, while space agencies worldwide collaborate on the technical challenges of keeping humans alive and productive on another planet."
    }
]

print(f"📚 Created enhanced knowledge base with {len(documents)} documents")
print("Sample document:")
print(f"Title: {documents[0]['title']}")
print(f"Content: {documents[0]['content'][:150]}...")

## 4. Cross-Encoder Reranking System Implementation

In [None]:
class CrossEncoderReranker:
    def __init__(self, 
                 bi_encoder_model: str = 'all-MiniLM-L6-v2',
                 cross_encoder_model: str = 'cross-encoder/ms-marco-MiniLM-L-6-v2'):
        """
        Initialize the cross-encoder reranker with both bi-encoder and cross-encoder models.
        
        Args:
            bi_encoder_model: Name of the bi-encoder model for initial retrieval
            cross_encoder_model: Name of the cross-encoder model for reranking
        """
        print(f"🔄 Loading bi-encoder model: {bi_encoder_model}")
        self.bi_encoder = SentenceTransformer(bi_encoder_model)
        
        print(f"🔄 Loading cross-encoder model: {cross_encoder_model}")
        self.cross_encoder = CrossEncoder(cross_encoder_model)
        
        self.embeddings = None
        self.index = None
        self.documents = None
        
    def build_index(self, documents: List[Dict]):
        """
        Build FAISS index from documents using bi-encoder.
        
        Args:
            documents: List of document dictionaries
        """
        print("🔧 Building document embeddings and FAISS index...")
        self.documents = documents
        
        # Combine title and content for better semantic representation
        texts = [f"{doc['title']}. {doc['content']}" for doc in documents]
        
        # Generate embeddings using bi-encoder
        self.embeddings = self.bi_encoder.encode(texts, show_progress_bar=True)
        
        # Build FAISS index
        dimension = self.embeddings.shape[1]
        self.index = faiss.IndexFlatIP(dimension)  # Inner product for cosine similarity
        
        # Normalize embeddings for cosine similarity
        normalized_embeddings = self.embeddings / np.linalg.norm(self.embeddings, axis=1, keepdims=True)
        self.index.add(normalized_embeddings.astype('float32'))
        
        print(f"✅ Index built with {len(documents)} documents (dimension: {dimension})")
    
    def initial_retrieval(self, query: str, k: int = 20) -> List[Tuple[Dict, float]]:
        """
        Perform initial retrieval using bi-encoder and FAISS.
        
        Args:
            query: Search query
            k: Number of documents to retrieve (higher for cross-encoder reranking)
            
        Returns:
            List of (document, score) tuples
        """
        # Encode query using bi-encoder
        query_embedding = self.bi_encoder.encode([query])
        query_embedding = query_embedding / np.linalg.norm(query_embedding)
        
        # Search
        scores, indices = self.index.search(query_embedding.astype('float32'), k)
        
        # Return documents with scores
        results = []
        for i, (score, idx) in enumerate(zip(scores[0], indices[0])):
            if idx != -1:  # Valid index
                results.append((self.documents[idx], float(score)))
        
        return results
    
    def cross_encoder_rerank(self, query: str, retrieved_docs: List[Tuple[Dict, float]], 
                            top_k: int = 5) -> List[Tuple[Dict, float, float]]:
        """
        Rerank retrieved documents using cross-encoder model.
        
        Args:
            query: Original search query
            retrieved_docs: List of (document, initial_score) tuples
            top_k: Number of top documents to return after reranking
            
        Returns:
            List of (document, initial_score, cross_encoder_score) tuples
        """
        if not retrieved_docs:
            return []
        
        print(f"🔄 Cross-encoder reranking {len(retrieved_docs)} documents...")
        
        # Prepare query-document pairs for cross-encoder
        query_doc_pairs = []
        for doc, _ in retrieved_docs:
            # Use both title and content for cross-encoder evaluation
            doc_text = f"{doc['title']}. {doc['content']}"
            query_doc_pairs.append([query, doc_text])
        
        # Get cross-encoder scores
        cross_encoder_scores = self.cross_encoder.predict(query_doc_pairs)
        
        # Combine with original documents and scores
        reranked_results = []
        for i, (doc, initial_score) in enumerate(retrieved_docs):
            cross_score = float(cross_encoder_scores[i])
            reranked_results.append((doc, initial_score, cross_score))
        
        # Sort by cross-encoder score (descending)
        reranked_results.sort(key=lambda x: x[2], reverse=True)
        
        print(f"✅ Reranking complete, returning top {top_k} documents")
        return reranked_results[:top_k]
    
    def bi_encoder_rerank(self, query: str, retrieved_docs: List[Tuple[Dict, float]], 
                         top_k: int = 5) -> List[Tuple[Dict, float, float]]:
        """
        Rerank using bi-encoder for comparison purposes.
        
        Args:
            query: Original search query
            retrieved_docs: List of (document, initial_score) tuples
            top_k: Number of top documents to return after reranking
            
        Returns:
            List of (document, initial_score, bi_encoder_score) tuples
        """
        if not retrieved_docs:
            return []
        
        # Encode query
        query_embedding = self.bi_encoder.encode([query])
        
        # Encode retrieved documents
        doc_texts = [f"{doc['title']}. {doc['content']}" for doc, _ in retrieved_docs]
        doc_embeddings = self.bi_encoder.encode(doc_texts)
        
        # Compute semantic similarity scores
        similarities = cosine_similarity(query_embedding, doc_embeddings)[0]
        
        # Combine with original documents and scores
        reranked_results = []
        for i, (doc, initial_score) in enumerate(retrieved_docs):
            bi_encoder_score = similarities[i]
            reranked_results.append((doc, initial_score, bi_encoder_score))
        
        # Sort by bi-encoder score (descending)
        reranked_results.sort(key=lambda x: x[2], reverse=True)
        
        return reranked_results[:top_k]

print("✅ CrossEncoderReranker class defined!")

## 5. Advanced RAG System with Cross-Encoder Integration

In [None]:
class AdvancedRAGSystem:
    def __init__(self, reranker: CrossEncoderReranker, gemini_model):
        self.reranker = reranker
        self.gemini_model = gemini_model
    
    def generate_comprehensive_answer(self, query: str, context_docs: List[Dict], 
                                    ranking_method: str = "cross-encoder") -> Dict:
        """
        Generate answer using Gemini with retrieved context and ranking analysis.
        
        Args:
            query: User question
            context_docs: List of relevant documents
            ranking_method: Method used for ranking (for explanation)
            
        Returns:
            Dictionary containing answer and analysis
        """
        # Prepare context
        context_text = "\n\n".join([
            f"Document {i+1}: {doc['title']}\n{doc['content']}"
            for i, doc in enumerate(context_docs)
        ])
        
        # Create enhanced prompt for Gemini
        prompt = f"""
You are an expert space exploration assistant with access to a carefully curated knowledge base. The documents provided have been ranked using {ranking_method} reranking to ensure maximum relevance to the user's question.

Context Documents (ranked by relevance):
{context_text}

User Question: {query}

Please provide a comprehensive response with:

1. **Direct Answer:** A clear, concise answer to the user's specific question

2. **Detailed Explanation:** Expand on your answer with relevant details from the provided documents

3. **Source Analysis:** Explain which documents were most valuable for answering this question and why the {ranking_method} method likely ranked them highly

4. **Cross-References:** Identify connections between different documents that support or enhance your answer

5. **Knowledge Gaps:** Note any aspects of the question that aren't fully covered by the available documents

6. **Additional Context:** Provide any relevant background information that helps understand the broader implications of your answer

Format your response clearly with the section headers above.
"""
        
        try:
            response = self.gemini_model.generate_content(prompt)
            return {
                'answer': response.text,
                'context_docs': context_docs,
                'query': query,
                'ranking_method': ranking_method
            }
        except Exception as e:
            return {
                'answer': f"Error generating response: {str(e)}",
                'context_docs': context_docs,
                'query': query,
                'ranking_method': ranking_method
            }
    
    def process_query_with_comparison(self, query: str, retrieve_k: int = 15, rerank_k: int = 5) -> Dict:
        """
        Complete RAG pipeline with both cross-encoder and bi-encoder comparison.
        
        Args:
            query: User question
            retrieve_k: Number of documents to initially retrieve
            rerank_k: Number of documents to keep after reranking
            
        Returns:
            Complete response with cross-encoder results and comparison
        """
        print(f"🔍 Processing query: '{query}'")
        
        # Step 1: Initial retrieval
        print(f"📥 Retrieving top {retrieve_k} documents...")
        start_time = time.time()
        retrieved_docs = self.reranker.initial_retrieval(query, k=retrieve_k)
        retrieval_time = time.time() - start_time
        
        # Step 2: Cross-encoder reranking
        print(f"🎯 Cross-encoder reranking to top {rerank_k} documents...")
        start_time = time.time()
        cross_encoder_results = self.reranker.cross_encoder_rerank(query, retrieved_docs, top_k=rerank_k)
        cross_encoder_time = time.time() - start_time
        
        # Step 3: Bi-encoder reranking for comparison
        print(f"📊 Bi-encoder reranking for comparison...")
        start_time = time.time()
        bi_encoder_results = self.reranker.bi_encoder_rerank(query, retrieved_docs, top_k=rerank_k)
        bi_encoder_time = time.time() - start_time
        
        # Extract documents for context (using cross-encoder results)
        context_docs = [doc for doc, _, _ in cross_encoder_results]
        
        # Step 4: Generate answer with Gemini
        print("🤖 Generating comprehensive answer with Gemini...")
        start_time = time.time()
        result = self.generate_comprehensive_answer(query, context_docs, "cross-encoder")
        generation_time = time.time() - start_time
        
        # Add comprehensive analysis
        result['analysis'] = {
            'initial_retrieval_count': len(retrieved_docs),
            'cross_encoder_results': [{
                'title': doc['title'],
                'initial_score': initial_score,
                'cross_encoder_score': cross_score,
                'rank': i + 1
            } for i, (doc, initial_score, cross_score) in enumerate(cross_encoder_results)],
            'bi_encoder_results': [{
                'title': doc['title'],
                'initial_score': initial_score,
                'bi_encoder_score': bi_score,
                'rank': i + 1
            } for i, (doc, initial_score, bi_score) in enumerate(bi_encoder_results)],
            'timing': {
                'retrieval_time': retrieval_time,
                'cross_encoder_time': cross_encoder_time,
                'bi_encoder_time': bi_encoder_time,
                'generation_time': generation_time,
                'total_time': retrieval_time + cross_encoder_time + generation_time
            }
        }
        
        print("✅ Query processing complete!")
        return result

print("✅ AdvancedRAGSystem class defined!")

## 6. Initialize and Build the Advanced System

In [None]:
# Initialize the cross-encoder reranker
print("🚀 Initializing Cross-Encoder Reranking System...")
cross_encoder_reranker = CrossEncoderReranker(
    bi_encoder_model='all-MiniLM-L6-v2',
    cross_encoder_model='cross-encoder/ms-marco-MiniLM-L-6-v2'
)

# Build the index with our enhanced documents
cross_encoder_reranker.build_index(documents)

# Initialize the complete advanced RAG system
advanced_rag_system = AdvancedRAGSystem(cross_encoder_reranker, gemini_model)

print("🚀 Advanced RAG System with Cross-Encoder Reranking is ready!")

## 7. Demo: Cross-Encoder vs Bi-Encoder Comparison

In [None]:
# Test queries specifically designed to show cross-encoder advantages
test_queries = [
    "How do space agencies search for signs of life beyond Earth?",
    "What are the main challenges of establishing a human presence on Mars?",
    "Compare the scientific value of space telescopes versus robotic missions to other planets"
]

def display_comprehensive_results(result: Dict):
    """
    Display results with detailed analysis and comparison.
    """
    print("=" * 90)
    print(f"🔍 QUERY: {result['query']}")
    print("=" * 90)
    
    # Show timing information
    timing = result['analysis']['timing']
    print(f"\n⏱️ PERFORMANCE METRICS:")
    print(f"   Retrieval Time:     {timing['retrieval_time']:.3f}s")
    print(f"   Cross-Encoder Time: {timing['cross_encoder_time']:.3f}s")
    print(f"   Bi-Encoder Time:    {timing['bi_encoder_time']:.3f}s")
    print(f"   Generation Time:    {timing['generation_time']:.3f}s")
    print(f"   Total Time:         {timing['total_time']:.3f}s")
    
    # Show ranking comparison
    print(f"\n📊 RANKING COMPARISON (Top {len(result['analysis']['cross_encoder_results'])}):")
    print("-" * 70)
    
    cross_results = result['analysis']['cross_encoder_results']
    bi_results = result['analysis']['bi_encoder_results']
    
    # Create comparison table
    print(f"{'Rank':<4} {'Cross-Encoder Results':<35} {'Bi-Encoder Results':<35}")
    print("-" * 70)
    
    for i in range(len(cross_results)):
        cross_title = cross_results[i]['title'][:30] + "..." if len(cross_results[i]['title']) > 30 else cross_results[i]['title']
        bi_title = bi_results[i]['title'][:30] + "..." if len(bi_results[i]['title']) > 30 else bi_results[i]['title']
        
        print(f"{i+1:<4} {cross_title:<35} {bi_title:<35}")
        print(f"     Score: {cross_results[i]['cross_encoder_score']:.4f}             Score: {bi_results[i]['bi_encoder_score']:.4f}")
        print()
    
    # Analyze ranking differences
    cross_titles = [r['title'] for r in cross_results]
    bi_titles = [r['title'] for r in bi_results]
    
    if cross_titles != bi_titles:
        print("✨ RANKING DIFFERENCES DETECTED!")
        print("Cross-encoder provided different ranking than bi-encoder.")
    else:
        print("➡️ Both methods produced the same ranking.")
    
    # Show Gemini's response
    print(f"\n🤖 GEMINI RESPONSE ({result['ranking_method'].upper()}):")
    print("-" * 70)
    print(result['answer'])
    print("\n" + "=" * 90 + "\n")

# Run demo with first query
demo_query = test_queries[0]
result = advanced_rag_system.process_query_with_comparison(demo_query, retrieve_k=12, rerank_k=4)
display_comprehensive_results(result)

## 8. Detailed Cross-Encoder vs Bi-Encoder Analysis

In [None]:
def analyze_ranking_differences(query: str, k: int = 8):
    """
    Detailed analysis of how cross-encoder and bi-encoder rankings differ.
    """
    print(f"🔬 DETAILED RANKING ANALYSIS")
    print(f"Query: '{query}'\n")
    
    # Get initial retrieval results
    initial_results = cross_encoder_reranker.initial_retrieval(query, k=k*2)
    
    # Get both reranking results
    cross_results = cross_encoder_reranker.cross_encoder_rerank(query, initial_results, top_k=k)
    bi_results = cross_encoder_reranker.bi_encoder_rerank(query, initial_results, top_k=k)
    
    # Create detailed comparison
    print("📋 DOCUMENT-BY-DOCUMENT COMPARISON:")
    print("=" * 80)
    
    for i in range(k):
        cross_doc, cross_initial, cross_score = cross_results[i]
        bi_doc, bi_initial, bi_score = bi_results[i]
        
        print(f"\n📄 RANK {i+1}:")
        print(f"Cross-Encoder: {cross_doc['title']}")
        print(f"   Score: {cross_score:.4f} (Initial: {cross_initial:.4f})")
        
        print(f"Bi-Encoder:    {bi_doc['title']}")
        print(f"   Score: {bi_score:.4f} (Initial: {bi_initial:.4f})")
        
        if cross_doc['id'] == bi_doc['id']:
            print("   ✅ Same document selected by both methods")
        else:
            print("   🔄 Different documents selected")
    
    # Calculate ranking similarity metrics
    cross_ids = [doc['id'] for doc, _, _ in cross_results]
    bi_ids = [doc['id'] for doc, _, _ in bi_results]
    
    # Overlap at different positions
    overlap_metrics = {}
    for pos in [1, 3, 5]:
        if pos <= len(cross_ids):
            cross_top = set(cross_ids[:pos])
            bi_top = set(bi_ids[:pos])
            overlap = len(cross_top.intersection(bi_top)) / pos
            overlap_metrics[f'top_{pos}'] = overlap
    
    print(f"\n📊 RANKING SIMILARITY METRICS:")
    for metric, value in overlap_metrics.items():
        print(f"   {metric.replace('_', ' ').title()} Overlap: {value:.2%}")
    
    # Score distribution analysis
    cross_scores = [score for _, _, score in cross_results]
    bi_scores = [score for _, _, score in bi_results]
    
    print(f"\n📈 SCORE DISTRIBUTION:")
    print(f"   Cross-Encoder: Mean={np.mean(cross_scores):.4f}, Std={np.std(cross_scores):.4f}")
    print(f"   Bi-Encoder:    Mean={np.mean(bi_scores):.4f}, Std={np.std(bi_scores):.4f}")
    
    return {
        'cross_results': cross_results,
        'bi_results': bi_results,
        'overlap_metrics': overlap_metrics
    }

# Analyze ranking differences for different query types
for query in test_queries[:2]:
    analysis = analyze_ranking_differences(query)
    print("\n" + "="*80 + "\n")

## 9. Performance and Quality Metrics

In [None]:
def comprehensive_performance_analysis(queries: List[str]):
    """
    Comprehensive analysis of cross-encoder vs bi-encoder performance.
    """
    print("🔬 COMPREHENSIVE PERFORMANCE ANALYSIS")
    print("=" * 60)
    
    results = {
        'queries': queries,
        'cross_encoder_times': [],
        'bi_encoder_times': [],
        'retrieval_times': [],
        'overlap_scores': [],
        'score_differences': []
    }
    
    for i, query in enumerate(queries):
        print(f"\nProcessing query {i+1}/{len(queries)}: '{query[:60]}...'")
        
        # Measure retrieval time
        start_time = time.time()
        retrieved_docs = cross_encoder_reranker.initial_retrieval(query, k=15)
        retrieval_time = time.time() - start_time
        results['retrieval_times'].append(retrieval_time)
        
        # Measure cross-encoder time
        start_time = time.time()
        cross_results = cross_encoder_reranker.cross_encoder_rerank(query, retrieved_docs, top_k=5)
        cross_encoder_time = time.time() - start_time
        results['cross_encoder_times'].append(cross_encoder_time)
        
        # Measure bi-encoder time
        start_time = time.time()
        bi_results = cross_encoder_reranker.bi_encoder_rerank(query, retrieved_docs, top_k=5)
        bi_encoder_time = time.time() - start_time
        results['bi_encoder_times'].append(bi_encoder_time)
        
        # Calculate overlap
        cross_ids = set([doc['id'] for doc, _, _ in cross_results])
        bi_ids = set([doc['id'] for doc, _, _ in bi_results])
        overlap = len(cross_ids.intersection(bi_ids)) / len(cross_ids)
        results['overlap_scores'].append(overlap)
        
        # Score differences
        cross_scores = [score for _, _, score in cross_results]
        bi_scores = [score for _, _, score in bi_results]
        score_diff = np.mean(cross_scores) - np.mean(bi_scores)
        results['score_differences'].append(score_diff)
        
        print(f"  Retrieval: {retrieval_time:.3f}s | Cross-Encoder: {cross_encoder_time:.3f}s | Bi-Encoder: {bi_encoder_time:.3f}s")
    
    # Summary statistics
    print(f"\n📊 SUMMARY STATISTICS:")
    print("-" * 40)
    print(f"Queries processed: {len(queries)}")
    print(f"\n⏱️ TIMING ANALYSIS:")
    print(f"   Avg Retrieval Time:     {np.mean(results['retrieval_times']):.3f}s")
    print(f"   Avg Cross-Encoder Time: {np.mean(results['cross_encoder_times']):.3f}s")
    print(f"   Avg Bi-Encoder Time:    {np.mean(results['bi_encoder_times']):.3f}s")
    print(f"   Cross-Encoder Overhead: {np.mean(results['cross_encoder_times']) / np.mean(results['bi_encoder_times']):.1f}x")
    
    print(f"\n🎯 QUALITY ANALYSIS:")
    print(f"   Avg Ranking Overlap:    {np.mean(results['overlap_scores']):.2%}")
    print(f"   Avg Score Difference:   {np.mean(results['score_differences']):.4f}")
    print(f"   Score Std Deviation:    {np.std(results['score_differences']):.4f}")
    
    # Recommendations
    print(f"\n💡 RECOMMENDATIONS:")
    avg_overlap = np.mean(results['overlap_scores'])
    avg_overhead = np.mean(results['cross_encoder_times']) / np.mean(results['bi_encoder_times'])
    
    if avg_overlap < 0.7:
        print(f"   ✅ Cross-encoder shows significant ranking improvements ({(1-avg_overlap):.1%} different)")
    else:
        print(f"   ⚠️ High overlap ({avg_overlap:.1%}) - consider if cross-encoder overhead is justified")
    
    if avg_overhead > 10:
        print(f"   ⚠️ High computational overhead ({avg_overhead:.1f}x) - consider for high-value queries only")
    else:
        print(f"   ✅ Reasonable computational overhead ({avg_overhead:.1f}x)")
    
    return results

# Extended test queries for comprehensive analysis
extended_queries = [
    "How do space agencies search for signs of life beyond Earth?",
    "What are the main challenges of establishing a human presence on Mars?",
    "Compare the scientific value of space telescopes versus robotic missions",
    "What role does the International Space Station play in preparing for Mars missions?",
    "How has SpaceX changed the economics of space exploration?"
]

# Run comprehensive analysis
performance_results = comprehensive_performance_analysis(extended_queries)

## 10. Visualization of Results

In [None]:
# Create visualizations to compare cross-encoder vs bi-encoder performance
plt.style.use('default')
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Cross-Encoder vs Bi-Encoder Reranking Analysis', fontsize=16, fontweight='bold')

# 1. Timing Comparison
ax1 = axes[0, 0]
methods = ['Retrieval', 'Bi-Encoder', 'Cross-Encoder']
times = [
    np.mean(performance_results['retrieval_times']),
    np.mean(performance_results['bi_encoder_times']),
    np.mean(performance_results['cross_encoder_times'])
]
colors = ['#3498db', '#2ecc71', '#e74c3c']
bars = ax1.bar(methods, times, color=colors, alpha=0.7)
ax1.set_ylabel('Time (seconds)')
ax1.set_title('Average Processing Time by Method')
ax1.grid(True, alpha=0.3)

# Add value labels on bars
for bar, time in zip(bars, times):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 0.001,
             f'{time:.3f}s', ha='center', va='bottom')

# 2. Ranking Overlap Distribution
ax2 = axes[0, 1]
ax2.hist(performance_results['overlap_scores'], bins=10, color='#9b59b6', alpha=0.7, edgecolor='black')
ax2.set_xlabel('Ranking Overlap (0-1)')
ax2.set_ylabel('Frequency')
ax2.set_title('Distribution of Ranking Overlap Scores')
ax2.axvline(np.mean(performance_results['overlap_scores']), color='red', linestyle='--', 
           label=f'Mean: {np.mean(performance_results["overlap_scores"]):.2f}')
ax2.legend()
ax2.grid(True, alpha=0.3)

# 3. Score Differences
ax3 = axes[1, 0]
query_nums = range(1, len(performance_results['score_differences']) + 1)
ax3.plot(query_nums, performance_results['score_differences'], 'o-', color='#f39c12', linewidth=2, markersize=8)
ax3.set_xlabel('Query Number')
ax3.set_ylabel('Score Difference (Cross - Bi)')
ax3.set_title('Score Differences Across Queries')
ax3.grid(True, alpha=0.3)
ax3.axhline(0, color='black', linestyle='-', alpha=0.5)

# 4. Efficiency vs Quality Trade-off
ax4 = axes[1, 1]
efficiency = [1/t for t in performance_results['cross_encoder_times']]  # Inverse of time = efficiency
quality = [(1 - overlap) for overlap in performance_results['overlap_scores']]  # 1 - overlap = uniqueness

scatter = ax4.scatter(efficiency, quality, c=performance_results['score_differences'], 
                     cmap='viridis', s=100, alpha=0.7)
ax4.set_xlabel('Efficiency (1/time)')
ax4.set_ylabel('Ranking Uniqueness (1 - overlap)')
ax4.set_title('Efficiency vs Quality Trade-off')
ax4.grid(True, alpha=0.3)
plt.colorbar(scatter, ax=ax4, label='Score Difference')

plt.tight_layout()
plt.show()

print("📊 Visualization complete! Key insights:")
print(f"   • Cross-encoder is {np.mean(performance_results['cross_encoder_times'])/np.mean(performance_results['bi_encoder_times']):.1f}x slower but provides different rankings")
print(f"   • Average ranking overlap: {np.mean(performance_results['overlap_scores']):.1%}")
print(f"   • Cross-encoder scores are typically {'higher' if np.mean(performance_results['score_differences']) > 0 else 'lower'} than bi-encoder scores")

## 11. Interactive Cross-Encoder Demo

In [None]:
def interactive_cross_encoder_demo():
    """
    Interactive interface for testing cross-encoder reranking.
    """
    print("🚀 Interactive Cross-Encoder RAG System")
    print("Ask questions about space exploration and see cross-encoder reranking in action!")
    print("\nFeatures:")
    print("  • Real-time cross-encoder vs bi-encoder comparison")
    print("  • Detailed scoring and timing analysis")
    print("  • Comprehensive answers from Gemini")
    print("\nType 'quit' to exit, 'help' for commands\n")
    
    while True:
        try:
            query = input("🔍 Your question: ").strip()
            
            if query.lower() in ['quit', 'exit', 'q']:
                print("👋 Thanks for exploring cross-encoder reranking!")
                break
            
            if query.lower() == 'help':
                print("\n📖 Available commands:")
                print("  • Just type your question to get a full analysis")
                print("  • 'quit' or 'exit' to stop")
                print("  • 'help' to see this message\n")
                continue
            
            if not query:
                continue
            
            # Process the query with full comparison
            result = advanced_rag_system.process_query_with_comparison(query, retrieve_k=12, rerank_k=4)
            display_comprehensive_results(result)
            
        except KeyboardInterrupt:
            print("\n👋 Thanks for exploring cross-encoder reranking!")
            break
        except Exception as e:
            print(f"❌ Error: {str(e)}")

# Uncomment the next line to run the interactive interface
# interactive_cross_encoder_demo()

## 12. Final Demonstration: Complex Multi-Aspect Query

In [None]:
# Final comprehensive demonstration with a complex query
complex_query = """Analyze the technological and scientific progression from Apollo missions to current Mars 
exploration efforts, highlighting how lessons learned from lunar exploration inform current strategies 
for finding life on Mars and establishing sustainable human presence on other worlds."""

print("🎯 FINAL DEMONSTRATION: Complex Multi-Aspect Query")
print("=" * 80)
print(f"Query: {complex_query}")
print("=" * 80)

# Process with maximum context for comprehensive analysis
final_result = advanced_rag_system.process_query_with_comparison(
    complex_query, 
    retrieve_k=len(documents), 
    rerank_k=6
)

display_comprehensive_results(final_result)

print("🎉 CROSS-ENCODER RERANKING DEMO COMPLETE!")
print("\n📋 What we've demonstrated:")
print("✅ Cross-encoder vs bi-encoder reranking comparison")
print("✅ Superior relevance detection through joint query-document processing")
print("✅ Performance vs quality trade-off analysis")
print("✅ Real-world computational cost considerations")
print("✅ Integration with state-of-the-art language models")
print("✅ Comprehensive evaluation metrics and visualizations")

print("\n🔍 Key Insights:")
print("• Cross-encoders provide more nuanced relevance scoring")
print("• Higher computational cost but better quality for critical applications")
print("• Significant ranking differences show value of cross-encoder approach")
print("• Best suited for high-value queries where accuracy is paramount")

print("\n🚀 Ready for production considerations:")
print("• Use cross-encoders for final reranking of top candidates")
print("• Implement caching for repeated queries")
print("• Consider hybrid approaches based on query complexity")
print("• Monitor user satisfaction to validate ranking improvements")

In [None]:
# Final demonstration with a complex query
complex_query = "Compare the exploration strategies for Mars and Europa, focusing on the search for life"

print("🎯 FINAL DEMONSTRATION: Complex Query")
print("=" * 60)

final_result = rag_system.process_query(complex_query, retrieve_k=8, rerank_k=4)
display_results(final_result)

print("🎉 Semantic Reranking RAG System Demo Complete!")
print("\nThis notebook demonstrated:")
print("✅ Document embedding and indexing")
print("✅ Semantic similarity-based reranking")
print("✅ Integration with Gemini for explanations")
print("✅ Performance analysis and comparison")
print("✅ Complete RAG pipeline implementation")