# What is Hybrid Search?
- Hybrid search combines multiple search methodologies to retrieve more comprehensive and relevant results. It typically merges:

- Dense Retrieval (Vector/Semantic Search) - understands meaning and context
- Sparse Retrieval (Keyword/Lexical Search) - matches exact terms and phrases
- Optional: Graph-based search, metadata filtering, or other specialized methods

- The results from different methods are then combined using fusion techniques to create a unified ranking.
### When is Hybrid Search Done?
Hybrid search occurs during the retrieval phase of RAG:

- Query Processing → User submits a question
- 🔄 Hybrid Retrieval → Multiple search methods run in parallel
- Score Fusion → Combine and normalize scores from different methods
- Unified Ranking → Single ranked list of documents
- Optional Reranking → Further refinement
- Generation → LLM uses retrieved documents

### Why Hybrid Search Matters
Different search methods have complementary strengths:

- Vector search excels at semantic similarity and context
- Keyword search excels at exact matches, proper nouns, and technical terms
- Combined approach reduces the weaknesses of each individual method

### Key Differences in Output Quality
#### Vector Search Only:

- Strengths: Understands semantic meaning, context, and synonyms
- Weaknesses: May miss exact terms, proper nouns, technical specifications
- Best for: Conceptual queries, when users describe what they want rather than using exact terms

#### Keyword Search Only:

- Strengths: Precise term matching, excellent for specific names/versions/codes
- Weaknesses: Misses semantically related content, struggles with synonyms
- Best for: Technical searches, specific product versions, exact terminology

#### Hybrid Search:

- Strengths: Combines the best of both worlds, more robust and comprehensive
- Weaknesses: More complex to implement and tune
- Best for: Most production RAG systems where you want maximum recall and precision

In [None]:
# Technical Implementation Details
# Step 1: Score Fusion Methods 

# Weighted Sum
hybrid_score = alpha * vector_score + beta * keyword_score

# Reciprocal Rank Fusion (RRF)
hybrid_score = 1/(k + vector_rank) + 1/(k + keyword_rank)

# Normalized Combination
normalized_vector = (vector_score - min_vec) / (max_vec - min_vec)
normalized_keyword = (keyword_score - min_key) / (max_key - min_key)
hybrid_score = weight_v * normalized_vector + weight_k * normalized_keyword

### Step 2. Popular Implementations

- Weaviate: Native hybrid search with configurable alpha parameter
- Elasticsearch: Vector search + BM25 with script scoring
- Pinecone: Sparse-dense vectors in single index
- Custom Solutions: Multiple retrievers with fusion logic

### Step 3. Tuning Considerations

- Weight Balance: Adjust vector vs keyword importance based on domain
- Normalization: Ensure scores are comparable across methods
- Query Analysis: Some queries benefit more from one method than another
- Evaluation: Use metrics like NDCG, MRR to optimize fusion parameters

The demo above shows how hybrid search can significantly improve retrieval quality by leveraging the complementary strengths of different search methodologies, leading to more relevant context for LLM generation.