# Cross-Encoders Basics

Learn the fundamentals of cross-encoders for reranking and relevance scoring.

In [None]:
from sentence_transformers import SentenceTransformer, CrossEncoder
import numpy as np
import time

## Bi-Encoder vs Cross-Encoder

### Bi-Encoder (What we learned in Module 2)

In [None]:
# Load bi-encoder
bi_encoder = SentenceTransformer('all-MiniLM-L6-v2')

query = "How to reset password?"
documents = [
    "To reset your password, go to Settings and click Reset Password.",
    "Python tutorial for beginners: learn basic syntax.",
    "Contact customer support at support@example.com"
]

# Encode separately
query_emb = bi_encoder.encode(query)
doc_embs = bi_encoder.encode(documents)

# Calculate cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([query_emb], doc_embs)[0]

print("Bi-Encoder Scores:")
for i, (doc, score) in enumerate(zip(documents, similarities)):
    print(f"{i+1}. [{score:.3f}] {doc[:50]}...")

### Cross-Encoder (More Accurate)

Encodes query and document **together** for better accuracy.

In [None]:
# Load cross-encoder
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Create query-document pairs
pairs = [[query, doc] for doc in documents]

# Get relevance scores
scores = cross_encoder.predict(pairs)

print("\nCross-Encoder Scores:")
for i, (doc, score) in enumerate(zip(documents, scores)):
    print(f"{i+1}. [{score:.3f}] {doc[:50]}...")

### Key Differences

| Aspect | Bi-Encoder | Cross-Encoder |
|--------|------------|---------------|
| **Encoding** | Separate | Together |
| **Accuracy** | Good (85-90%) | Better (92-97%) |
| **Speed** | Fast | Slow |
| **Caching** | Can cache docs | Must recompute |
| **Use Case** | Retrieval | Reranking |

## Architecture Comparison

In [None]:
print("Bi-Encoder:")
print("  [CLS] query [SEP] → Embedding A (384 dims)")
print("  [CLS] document [SEP] → Embedding B (384 dims)")
print("  Score = cosine_sim(A, B)")
print("  No interaction between query and document tokens\n")

print("Cross-Encoder:")
print("  [CLS] query [SEP] document [SEP] → Score")
print("  Full attention between all tokens")
print("  Direct relevance prediction")

## Speed Comparison

In [None]:
# Test with more documents
test_docs = documents * 50  # 150 documents
query = "password reset"

# Bi-encoder (with precomputed embeddings)
start = time.time()
query_emb = bi_encoder.encode(query)
doc_embs = bi_encoder.encode(test_docs, show_progress_bar=False)
similarities = cosine_similarity([query_emb], doc_embs)[0]
bi_time = time.time() - start

# Cross-encoder
start = time.time()
pairs = [[query, doc] for doc in test_docs]
scores = cross_encoder.predict(pairs, show_progress_bar=False)
cross_time = time.time() - start

print(f"Bi-Encoder: {bi_time:.3f}s")
print(f"Cross-Encoder: {cross_time:.3f}s")
print(f"Cross-encoder is {cross_time/bi_time:.1f}x slower")

# But if doc embeddings are precomputed:
start = time.time()
query_emb = bi_encoder.encode(query)
similarities = cosine_similarity([query_emb], doc_embs)[0]
bi_time_cached = time.time() - start

print(f"\nBi-Encoder (cached docs): {bi_time_cached*1000:.1f}ms")
print(f"Speedup: {cross_time/bi_time_cached:.0f}x faster with caching!")

## Accuracy Comparison

Let's test on tricky examples where bi-encoders struggle.

In [None]:
# Tricky example: semantic vs lexical match
query = "change password"
docs = [
    "To modify your account credentials, visit the settings page.",  # Semantic match
    "Change password update reset modify credentials password.",  # Keyword stuffing
    "The weather might change tomorrow. Remember your umbrella."   # False positive
]

# Bi-encoder
query_emb = bi_encoder.encode(query)
doc_embs = bi_encoder.encode(docs)
bi_scores = cosine_similarity([query_emb], doc_embs)[0]

# Cross-encoder
pairs = [[query, doc] for doc in docs]
cross_scores = cross_encoder.predict(pairs)

print("Query:", query)
print("\nBi-Encoder Rankings:")
for idx in np.argsort(bi_scores)[::-1]:
    print(f"  [{bi_scores[idx]:.3f}] {docs[idx][:60]}...")

print("\nCross-Encoder Rankings:")
for idx in np.argsort(cross_scores)[::-1]:
    print(f"  [{cross_scores[idx]:.3f}] {docs[idx][:60]}...")

print("\n✓ Cross-encoder correctly ranks semantic match first!")
print("✗ Bi-encoder fooled by keyword stuffing.")

## When to Use Each

### Use Bi-Encoder for:
- First-stage retrieval (millions of candidates)
- Real-time search (<50ms latency)
- Similarity search with caching
- Semantic search, duplicate detection

### Use Cross-Encoder for:
- Reranking top candidates (10-100)
- High accuracy requirements
- Question answering
- When latency allows (>100ms)

## Available Models

### MS MARCO Models (Passage Ranking)

In [None]:
models = [
    "cross-encoder/ms-marco-TinyBERT-L-2-v2",     # Fastest (14M params)
    "cross-encoder/ms-marco-MiniLM-L-6-v2",       # Balanced (22M params)
    "cross-encoder/ms-marco-MiniLM-L-12-v2",      # Accurate (33M params)
]

query = "python tutorial"
doc = "Learn Python programming from scratch with examples"

for model_name in models:
    model = CrossEncoder(model_name)
    score = model.predict([(query, doc)])[0]
    print(f"{model_name.split('/')[-1]}: {score:.3f}")

## Batch Processing

In [None]:
# Process multiple queries efficiently
queries = ["password reset", "payment failed", "slow loading"]
docs = [
    "Reset your password in settings",
    "Payment troubleshooting guide",
    "Performance optimization tips"
]

# Create all pairs
all_pairs = []
for query in queries:
    for doc in docs:
        all_pairs.append([query, doc])

# Batch predict (much faster than individual calls)
scores = cross_encoder.predict(all_pairs, batch_size=32)

# Reshape results
scores_matrix = np.array(scores).reshape(len(queries), len(docs))

print("Relevance Matrix:")
print(f"{'Query':<20} | {' | '.join([f'Doc{i+1}' for i in range(len(docs))])}")
print("-" * 60)
for i, query in enumerate(queries):
    scores_str = ' | '.join([f'{s:.2f}' for s in scores_matrix[i]])
    print(f"{query:<20} | {scores_str}")

## Score Interpretation

MS MARCO models output scores (not probabilities):
- Scores range from negative to positive
- Higher = more relevant
- No fixed threshold (relative ranking matters)

In [None]:
# Check score ranges
test_pairs = [
    ["weather", "Today's weather forecast: sunny and warm"],  # Highly relevant
    ["weather", "Climate change impacts global temperatures"],  # Somewhat relevant
    ["weather", "Python programming tutorial for beginners"],  # Not relevant
]

scores = cross_encoder.predict(test_pairs)

print("Score Ranges:")
for pair, score in zip(test_pairs, scores):
    relevance = "High" if score > 5 else "Medium" if score > 0 else "Low"
    print(f"Score: {score:6.2f} ({relevance:6s}) | {pair[1][:50]}...")

## Summary

✅ Cross-encoders are more accurate than bi-encoders  
✅ Use for reranking, not initial retrieval  
✅ Process query and document together  
✅ Slower but worth it for final ranking  
✅ Choose model based on speed/accuracy tradeoff

**Next:** Learn about retrieve-rerank pipelines in the next notebook!