[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextualAI/examples/blob/main/18-contextualai-chroma/02-contextual-ai-reranker-chroma.ipynb)

# Using Contextual AI Reranker with Chroma

**Last updated:** October 2025

Contextual AI's reranker is the first with instruction-following capabilities to handle conflicts in retrieval. It is the most accurate reranker in the world per industry-leading benchmarks like BEIR. This notebook demonstrates how to integrate Contextual AI's reranker with Chroma for enhanced RAG pipelines.

**Key Features:**
- **Instruction-following reranking**: Handle complex retrieval scenarios with custom instructions
- **BEIR benchmark-leading accuracy**: State-of-the-art reranking performance
- **Multi-lingual support**: Handle documents in multiple languages
- **Chroma integration**: Seamless vector database integration for retrieval + reranking

The current reranker models include: 
- ctxl-rerank-v2-instruct-multilingual 
- ctxl-rerank-v2-instruct-multilingual-mini
- ctxl-rerank-v1-instruct

## Installation and Setup

First, let's install the required packages and set up our environment.


In [None]:
%%capture
%pip install --upgrade chromadb contextual-client openai requests rich

import warnings
warnings.filterwarnings("ignore")

import logging
# Suppress Chroma client logs
logging.getLogger("chromadb").setLevel(logging.ERROR)


### API Keys Setup üîë

We'll be using the Contextual AI API for reranking and OpenAI API for embeddings. The code below dynamically fetches your API keys based on whether you're running this notebook in Google Colab or as a regular Jupyter notebook.


In [None]:
# API key variable names
contextual_api_key_var = "CONTEXTUAL_API_KEY"  # Replace with the name of your secret/env var
openai_api_key_var = "OPENAI_API_KEY"  # Replace with the name of your secret/env var

# Fetch API keys
try:
    # If running in Colab, fetch API keys from Secrets
    import google.colab
    from google.colab import userdata
    contextual_api_key = userdata.get(contextual_api_key_var)
    openai_api_key = userdata.get(openai_api_key_var)
    
    if not contextual_api_key:
        raise ValueError(f"Secret '{contextual_api_key_var}' not found in Colab secrets.")
    if not openai_api_key:
        raise ValueError(f"Secret '{openai_api_key_var}' not found in Colab secrets.")
except ImportError:
    # If not running in Colab, fetch API keys from environment variables
    import os
    contextual_api_key = os.getenv(contextual_api_key_var)
    openai_api_key = os.getenv(openai_api_key_var)
    
    if not contextual_api_key:
        raise EnvironmentError(
            f"Environment variable '{contextual_api_key_var}' is not set. "
            "Please define it before running this script."
        )
    if not openai_api_key:
        raise EnvironmentError(
            f"Environment variable '{openai_api_key_var}' is not set. "
            "Please define it before running this script."
        )

print("API keys configured successfully!")


## Part 1: Setup Chroma with Sample Data

Let's create a Chroma collection with sample enterprise documents to demonstrate the reranking capabilities.


In [None]:
import chromadb
from chromadb.utils import embedding_functions
from contextual import ContextualAI
from rich.console import Console
from rich.panel import Panel
from rich.table import Table

# Initialize clients
contextual_client = ContextualAI(api_key=contextual_api_key)
chroma_client = chromadb.Client()

# Use OpenAI embeddings
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key=openai_api_key,
    model_name="text-embedding-3-small"
)

# Create collection
collection_name = "enterprise_documents"
collection = chroma_client.create_collection(
    name=collection_name,
    embedding_function=openai_ef
)

print(f"Created collection '{collection_name}' with OpenAI embeddings")


In [None]:
# Sample enterprise documents with different types and dates
sample_documents = [
    {
        "content": "Following detailed cost analysis and market research, we have implemented the following changes: AI training clusters will see a 15% uplift in raw compute performance, enterprise support packages are being restructured, and bulk procurement programs (100+ units) for the RTX 5090 Enterprise series will operate on a $2,899 baseline.",
        "metadata": {
            "title": "Enterprise GPU Pricing Update",
            "date": "2025-01-15",
            "source": "NVIDIA Enterprise Sales Portal",
            "classification": "Internal Use Only",
            "department": "Sales"
        }
    },
    {
        "content": "Enterprise pricing for the RTX 5090 GPU bulk orders (100+ units) is currently set at $3,100-$3,300 per unit. This pricing for RTX 5090 enterprise bulk orders has been confirmed across all major distribution channels.",
        "metadata": {
            "title": "Market Analysis Report",
            "date": "2023-11-30",
            "source": "TechAnalytics Research Group",
            "classification": "Public",
            "department": "Research"
        }
    },
    {
        "content": "RTX 5090 Enterprise GPU requires 450W TDP and 20% cooling overhead. Power consumption analysis shows optimal performance at 85% utilization with enterprise-grade cooling solutions.",
        "metadata": {
            "title": "Technical Specifications",
            "date": "2025-01-25",
            "source": "NVIDIA Enterprise Sales Portal",
            "classification": "Internal Use Only",
            "department": "Engineering"
        }
    },
    {
        "content": "Our enterprise customers have reported significant performance improvements with the RTX 5090 in AI workloads. Training times reduced by 40% compared to previous generation GPUs.",
        "metadata": {
            "title": "Customer Performance Report",
            "date": "2025-01-10",
            "source": "Customer Success Team",
            "classification": "Confidential",
            "department": "Customer Success"
        }
    },
    {
        "content": "The RTX 5090 represents a breakthrough in enterprise AI computing. With 128GB of HBM3e memory and 2.5x faster training performance, it's designed for the most demanding AI workloads.",
        "metadata": {
            "title": "Product Launch Announcement",
            "date": "2024-12-01",
            "source": "Marketing Department",
            "classification": "Public",
            "department": "Marketing"
        }
    },
    {
        "content": "Internal memo: RTX 5090 enterprise pricing strategy has been revised. New baseline pricing effective January 15, 2025: $2,899 for bulk orders (100+ units), $3,200 for standard enterprise orders.",
        "metadata": {
            "title": "Internal Pricing Memo",
            "date": "2025-01-12",
            "source": "Executive Team",
            "classification": "Internal Use Only",
            "department": "Executive"
        }
    }
]

# Add documents to Chroma
documents = [doc["content"] for doc in sample_documents]
metadatas = [doc["metadata"] for doc in sample_documents]
ids = [f"doc_{i}" for i in range(len(sample_documents))]

collection.add(
    documents=documents,
    metadatas=metadatas,
    ids=ids
)

print(f"Added {len(sample_documents)} documents to Chroma collection")


## Part 2: Basic Retrieval vs. Reranked Retrieval

Let's demonstrate the difference between basic Chroma retrieval and Contextual AI's instruction-following reranking.


In [None]:
# Query and instruction for reranking
query = "What is the current enterprise pricing for the RTX 5090 GPU for bulk orders?"

instruction = "Prioritize internal sales documents over market analysis reports. More recent documents should be weighted higher. Enterprise portal content supersedes distributor communications."

print(f"Query: {query}")
print(f"Instruction: {instruction}")
print("\n" + "="*80)


In [None]:
# Step 1: Basic Chroma retrieval
print("üîç BASIC CHROMA RETRIEVAL")
print("="*50)

# Retrieve more documents than we need for reranking
chroma_results = collection.query(
    query_texts=[query],
    n_results=6,  # Get all documents for reranking
    include=["documents", "metadatas", "distances"]
)

print(f"Retrieved {len(chroma_results['documents'][0])} documents from Chroma")
print("\nChroma Results (ordered by similarity):")
for i, (doc, metadata, distance) in enumerate(zip(
    chroma_results['documents'][0], 
    chroma_results['metadatas'][0], 
    chroma_results['distances'][0]
)):
    print(f"\n{i+1}. {metadata['title']} (Similarity: {1-distance:.3f})")
    print(f"   Source: {metadata['source']} | Date: {metadata['date']}")
    print(f"   Classification: {metadata['classification']}")
    print(f"   Content: {doc[:100]}...")


In [None]:
# Step 2: Contextual AI Reranking
print("\n\nüéØ CONTEXTUAL AI RERANKING")
print("="*50)

# Prepare documents and metadata for reranking
documents_to_rerank = chroma_results['documents'][0]
metadata_for_rerank = [str(meta) for meta in chroma_results['metadatas'][0]]

# Apply Contextual AI reranking with instruction
rerank_response = contextual_client.rerank.create(
    query=query,
    instruction=instruction,
    documents=documents_to_rerank,
    metadata=metadata_for_rerank,
    model="ctxl-rerank-v2-instruct-multilingual"
)

print(f"Reranked {len(rerank_response.results)} documents using instruction-following reranking")
print("\nReranked Results (ordered by relevance + instruction):")
for i, result in enumerate(rerank_response.results):
    original_index = result.index
    original_metadata = chroma_results['metadatas'][0][original_index]
    original_doc = chroma_results['documents'][0][original_index]
    
    print(f"\n{i+1}. {original_metadata['title']} (Score: {result.relevance_score:.3f})")
    print(f"   Source: {original_metadata['source']} | Date: {original_metadata['date']}")
    print(f"   Classification: {original_metadata['classification']}")
    print(f"   Content: {original_doc[:100]}...")


## Part 3: Complete RAG Pipeline with Reranking

Now let's demonstrate a complete RAG pipeline that combines Chroma retrieval, Contextual AI reranking, and LLM generation.


In [None]:
from openai import OpenAI

# Initialize OpenAI client
openai_client = OpenAI(api_key=openai_api_key)

def complete_rag_pipeline(query, instruction, top_k=3):
    """
    Complete RAG pipeline: Chroma retrieval + Contextual AI reranking + LLM generation
    """
    console = Console()
    
    # Step 1: Retrieve from Chroma
    console.print(Panel("Step 1: Retrieving from Chroma", style="bold blue"))
    chroma_results = collection.query(
        query_texts=[query],
        n_results=6,  # Get more for reranking
        include=["documents", "metadatas", "distances"]
    )
    
    # Step 2: Rerank with Contextual AI
    console.print(Panel("Step 2: Reranking with Contextual AI", style="bold green"))
    documents_to_rerank = chroma_results['documents'][0]
    metadata_for_rerank = [str(meta) for meta in chroma_results['metadatas'][0]]
    
    rerank_response = contextual_client.rerank.create(
        query=query,
        instruction=instruction,
        documents=documents_to_rerank,
        metadata=metadata_for_rerank,
        model="ctxl-rerank-v2-instruct-multilingual"
    )
    
    # Step 3: Get top-k reranked documents
    top_docs = []
    top_metadata = []
    
    for i in range(min(top_k, len(rerank_response.results))):
        result = rerank_response.results[i]
        original_index = result.index
        top_docs.append(chroma_results['documents'][0][original_index])
        top_metadata.append(chroma_results['metadatas'][0][original_index])
    
    # Step 4: Generate response with LLM
    console.print(Panel("Step 3: Generating response with LLM", style="bold yellow"))
    context = "\n\n".join(top_docs)
    
    response = openai_client.chat.completions.create(
        model="gpt-5-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided context. Use only the information from the context and cite your sources."},
            {"role": "user", "content": f"Context: {context}\n\nQuestion: {query}"}
        ],
        temperature=1
    )
    
    return {
        "response": response.choices[0].message.content,
        "sources": top_metadata,
        "rerank_scores": [result.relevance_score for result in rerank_response.results[:top_k]]
    }

# Example 1: Enterprise pricing query
console = Console()
console.print(Panel("üöÄ COMPLETE RAG PIPELINE DEMO", style="bold magenta"))

result = complete_rag_pipeline(
    query="What is the current enterprise pricing for the RTX 5090 GPU for bulk orders?",
    instruction="Prioritize internal sales documents over market analysis reports. More recent documents should be weighted higher. Enterprise portal content supersedes distributor communications.",
    top_k=3
)

console.print(Panel(result["response"], title="Generated Response", border_style="bold green"))
console.print(Panel(f"Sources used: {[meta['title'] for meta in result['sources']]}", title="Sources", border_style="bold blue"))


In [None]:
# Example 2: Technical specifications query with different instruction
console.print(Panel("üîß TECHNICAL SPECIFICATIONS QUERY", style="bold cyan"))

result2 = complete_rag_pipeline(
    query="What are the technical specifications and power requirements for the RTX 5090?",
    instruction="Prioritize technical documentation and engineering specifications. Internal technical documents should rank higher than marketing materials. Focus on detailed specifications and performance metrics.",
    top_k=3
)

console.print(Panel(result2["response"], title="Generated Response", border_style="bold green"))
console.print(Panel(f"Sources used: {[meta['title'] for meta in result2['sources']]}", title="Sources", border_style="bold blue"))


## Part 4: Advanced Reranking Scenarios

Let's demonstrate different reranking scenarios to show the flexibility of instruction-following reranking.


In [None]:
def compare_reranking_strategies(query, strategies):
    """
    Compare different reranking strategies for the same query
    """
    console = Console()
    
    # Get initial results from Chroma
    chroma_results = collection.query(
        query_texts=[query],
        n_results=6,
        include=["documents", "metadatas", "distances"]
    )
    
    documents_to_rerank = chroma_results['documents'][0]
    metadata_for_rerank = [str(meta) for meta in chroma_results['metadatas'][0]]
    
    for strategy_name, instruction in strategies.items():
        console.print(Panel(f"Strategy: {strategy_name}", style="bold magenta"))
        console.print(f"Instruction: {instruction}")
        
        # Apply reranking
        rerank_response = contextual_client.rerank.create(
            query=query,
            instruction=instruction,
            documents=documents_to_rerank,
            metadata=metadata_for_rerank,
            model="ctxl-rerank-v2-instruct-multilingual"
        )
        
        # Show top 3 results
        console.print("Top 3 Results:")
        for i in range(min(3, len(rerank_response.results))):
            result = rerank_response.results[i]
            original_index = result.index
            original_metadata = chroma_results['metadatas'][0][original_index]
            console.print(f"  {i+1}. {original_metadata['title']} (Score: {result.relevance_score:.3f})")
        
        console.print("\n" + "="*60 + "\n")

# Define different reranking strategies
strategies = {
    "Recent Documents First": "Prioritize the most recent documents. Documents from 2025 should rank higher than older documents.",
    "Internal Documents Priority": "Prioritize internal and confidential documents over public documents. Internal Use Only and Confidential documents should rank highest.",
    "Department-Specific": "Prioritize documents from Sales and Engineering departments. Customer Success and Marketing documents should rank lower.",
    "Source Authority": "Prioritize documents from NVIDIA Enterprise Sales Portal and Executive Team. External sources like TechAnalytics should rank lower."
}

# Compare strategies for the same query
query = "What is the current status and pricing for RTX 5090 enterprise GPUs?"

console = Console()
console.print(Panel("üîÑ COMPARING RERANKING STRATEGIES", style="bold magenta"))
console.print(f"Query: {query}\n")

compare_reranking_strategies(query, strategies)


## Summary

This notebook demonstrates the powerful combination of Chroma and Contextual AI's instruction-following reranker for enhanced RAG pipelines.

### What We Demonstrated:

1. **Basic Chroma Retrieval**: Standard vector similarity search
2. **Contextual AI Reranking**: Instruction-following reranking with custom business logic
3. **Complete RAG Pipeline**: Chroma ‚Üí Reranking ‚Üí LLM Generation
4. **Advanced Reranking Strategies**: Multiple instruction-based ranking approaches

### Key Benefits of Contextual AI Reranker:

- **Instruction-Following**: Handle complex business logic through natural language instructions
- **BEIR Benchmark Leading**: State-of-the-art accuracy on industry benchmarks
- **Multi-lingual Support**: Handle documents in multiple languages
- **Metadata-Aware**: Leverage document metadata for intelligent ranking
- **Conflict Resolution**: Handle conflicting information in retrieval results

### Chroma Integration Advantages:

- **Seamless Integration**: Easy to add reranking to existing Chroma workflows
- **Metadata Preservation**: Maintain document metadata through the reranking process
- **Flexible Retrieval**: Retrieve more documents than needed for optimal reranking
- **Production Ready**: Scalable solution for enterprise applications

### Use Cases Demonstrated:

1. **Enterprise Document Search**: Prioritize internal documents over external sources
2. **Technical Documentation**: Focus on engineering specifications over marketing materials
3. **Temporal Relevance**: Weight recent documents higher than older ones
4. **Authority-Based Ranking**: Prioritize authoritative sources and departments

### Next Steps for Enhancement:

- **Hybrid Search**: Combine keyword and semantic search with reranking
- **Custom Instructions**: Develop domain-specific reranking instructions
- **Performance Optimization**: Batch processing for large document collections
- **Evaluation Metrics**: Measure reranking effectiveness with custom metrics

---

**Ready to get started?** This notebook provides a complete, production-ready example of integrating Contextual AI's instruction-following reranker with Chroma for sophisticated RAG applications. The combination enables intelligent document ranking that goes beyond simple similarity to understand business context and requirements.
