🔧 **Setup Required**: Before running this notebook, please follow the [setup instructions](../README.md#setup-instructions) to configure your environment and API keys.

# Building a Hybrid RAG Pipeline with Haystack

Welcome to this notebook where we'll build an advanced Retrieval-Augmented Generation (RAG) pipeline using both dense and sparse retrieval methods. This hybrid approach combines the strengths of:

1. **Dense Retrieval**: Using semantic embeddings to capture meaning and context
2. **Sparse Retrieval**: Using BM25 algorithm for keyword matching
3. **Re-ranking**: Using a transformer model to improve result relevance

This combination provides more robust and accurate document retrieval than using either method alone.

## What You'll Learn

- How to combine multiple retrieval methods in a single pipeline
- The benefits of hybrid search approaches
- How to use re-ranking to improve search results
- Advanced pipeline construction with multiple components

## 1. Required Components

Let's start by importing the specialized components needed for our hybrid pipeline:

- `InMemoryBM25Retriever`: Implements the BM25 algorithm for keyword-based search
- `DocumentJoiner`: Combines results from multiple retrievers
- `TransformersSimilarityRanker`: Re-ranks documents using a transformer model

We'll also import the basic RAG components we used in the previous notebook.

In [None]:
# Continue from the previous script, assuming 'document_store' is populated.
from scripts.indexing import document_store #this runs our indexing pipeline
# Import additional components for hybrid retrieval
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.joiners import DocumentJoiner
from haystack.components.rankers import SentenceTransformersSimilarityRanker

In [3]:
# Import necessary components for the query pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.utils import Secret
from haystack import Pipeline

## 2. Component Initialization

Our hybrid pipeline requires several specialized components that work together:

### Dense Retrieval Components
- **Text Embedder**: Converts text into dense vector representations
- **Embedding Retriever**: Uses vector similarity to find relevant documents

### Sparse Retrieval Components
- **BM25 Retriever**: Uses keyword matching, great for exact matches
- **Document Joiner**: Combines results from both retrieval methods

### Reranking and Generation
- **Ranker**: Uses BAAI/bge-reranker-base to improve result relevance
- **Prompt Builder & LLM**: Creates context and generates answers

The combination of these components allows us to leverage both semantic understanding and keyword matching.

In [None]:
# --- 1. Initialize Query Pipeline Components ---

# Text Embedder: To embed the user's query. Must be compatible with the document embedder.
text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

# Retriever: Fetches documents from the DocumentStore based on vector similarity.
retriever = InMemoryEmbeddingRetriever(document_store=document_store, top_k=3)

# PromptBuilder: Creates a prompt using the retrieved documents and the query.
# The Jinja2 template iterates through the documents and adds their content to the prompt.
prompt_template_for_pipeline = """
Given the following information, answer the user's question.
If the information is not available in the provided documents, say that you don't have enough information to answer.

Context:
{% for doc in documents %}
    {{ doc.content }}
{% endfor %}

Question: {{question}}
Answer:
"""
prompt_builder_inst = PromptBuilder(template=prompt_template_for_pipeline,
                                    required_variables="*")
llm_generator_inst = OpenAIGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"), model="gpt-4o-mini")



# Sparse Retriever (BM25): For keyword-based search.
# This retriever needs to be "warmed up" by calculating statistics on the documents in the store.
bm25_retriever = InMemoryBM25Retriever(document_store=document_store, top_k=3)

# DocumentJoiner: To merge the results from the two retrievers.
# The default 'concatenate' mode works well here as the ranker will handle final ordering.
document_joiner = DocumentJoiner()

# Ranker: A cross-encoder model to re-rank the combined results for higher precision.
# This model is highly effective at identifying the most relevant documents from a candidate set.
ranker = SentenceTransformersSimilarityRanker(model="BAAI/bge-reranker-base", top_k=3)



TransformersSimilarityRanker is considered legacy and will no longer receive updates. It may be deprecated in a future release, with removal following after a deprecation period. Consider using SentenceTransformersSimilarityRanker instead, which provides the same functionality along with additional features.


## 3. Pipeline Assembly

Now we'll assemble all components into a cohesive pipeline. The order of components is crucial:

1. The query is processed by both dense and sparse retrievers in parallel
2. Results are combined by the document joiner
3. The ranker improves the relevance of the combined results
4. The most relevant documents are used to build the prompt
5. The LLM generates the final answer

This architecture allows us to benefit from both retrieval methods while using the ranker to select the best documents.

In [5]:
# --- 2. Build the Hybrid RAG Pipeline ---

hybrid_rag_pipeline = Pipeline()

# Add all necessary components
hybrid_rag_pipeline.add_component("text_embedder", text_embedder)
hybrid_rag_pipeline.add_component("embedding_retriever", retriever) # Dense retriever
hybrid_rag_pipeline.add_component("bm25_retriever", bm25_retriever) # Sparse retriever
hybrid_rag_pipeline.add_component("document_joiner", document_joiner)
hybrid_rag_pipeline.add_component("ranker", ranker)
hybrid_rag_pipeline.add_component("prompt_builder", prompt_builder_inst)
hybrid_rag_pipeline.add_component("llm", llm_generator_inst)

## 4. Component Connections

The connections between components define how data flows through the pipeline. Understanding these connections is crucial:

1. **Query Processing**
   - The text embedder processes the query for dense retrieval
   - The raw query text is sent directly to the BM25 retriever

2. **Document Flow**
   - Both retrievers send their documents to the joiner
   - The joiner concatenates all documents
   - The ranker processes the combined set
   - Ranked documents flow to the prompt builder

3. **Final Steps**
   - The prompt builder creates the context
   - The LLM receives the final prompt for answer generation

In [8]:
# --- 3. Connect the Components in a Graph ---

# The query is embedded for the dense retriever
hybrid_rag_pipeline.connect("text_embedder.embedding", "embedding_retriever.query_embedding")

# The raw query text is sent to the BM25 retriever and the ranker
# Note: The query input for these components is the raw text string.

# The outputs of both retrievers are fed into the document joiner
hybrid_rag_pipeline.connect("embedding_retriever.documents", "document_joiner.documents")
hybrid_rag_pipeline.connect("bm25_retriever.documents", "document_joiner.documents")

# The joined documents are sent to the ranker
hybrid_rag_pipeline.connect("document_joiner.documents", "ranker.documents")

# The ranked documents are sent to the prompt builder
hybrid_rag_pipeline.connect("ranker.documents", "prompt_builder.documents")

# The final prompt is sent to the LLM
hybrid_rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")


<haystack.core.pipeline.pipeline.Pipeline object at 0x330ebb9b0>
🚅 Components
  - text_embedder: SentenceTransformersTextEmbedder
  - embedding_retriever: InMemoryEmbeddingRetriever
  - bm25_retriever: InMemoryBM25Retriever
  - document_joiner: DocumentJoiner
  - ranker: TransformersSimilarityRanker
  - prompt_builder: PromptBuilder
  - llm: OpenAIGenerator
🛤️ Connections
  - text_embedder.embedding -> embedding_retriever.query_embedding (list[float])
  - embedding_retriever.documents -> document_joiner.documents (list[Document])
  - bm25_retriever.documents -> document_joiner.documents (list[Document])
  - document_joiner.documents -> ranker.documents (list[Document])
  - ranker.documents -> prompt_builder.documents (list[Document])
  - prompt_builder.prompt -> llm.prompt (str)

## 5. Pipeline Visualization

Below, we'll generate a visual representation of our hybrid pipeline. This visualization helps us understand:

- The overall flow of information
- How components are connected
- The parallel nature of dense and sparse retrieval
- The convergence point at the document joiner
- The final processing stages through ranking and generation

Study this diagram to understand how the different retrieval methods work together.

In [9]:
# --- 4. Visualize the Pipeline (Optional) ---
try:
    hybrid_rag_pipeline.draw(path="./images/hybrid_rag_pipeline.png")
    print("Hybrid pipeline visualization saved to 'hybrid_rag_pipeline.png'")
except Exception as e:
    print(f"Could not draw hybrid pipeline: {e}")

Hybrid pipeline visualization saved to 'hybrid_rag_pipeline.png'


![](./images/hybrid_rag_pipeline.png)

## 6. Running the Hybrid Pipeline

Now let's test our hybrid pipeline with a question. Notice how we need to provide the query to multiple components:

- `text_embedder`: For creating query embeddings
- `bm25_retriever`: For keyword matching
- `ranker`: For comparing query with documents
- `prompt_builder`: For including the question in the prompt

This example demonstrates how the hybrid approach can leverage both semantic understanding and keyword matching to find the most relevant information.

In [11]:
# --- 5. Run the Pipeline ---

# A query that benefits from both semantic and keyword matching
hybrid_question = "What is the Haystack 2.0 framework?"

# The run dictionary must now provide inputs for all components at the start of the graph.
# The query text needs to be passed to the text_embedder, bm25_retriever, ranker, and prompt_builder.
hybrid_result = hybrid_rag_pipeline.run({
    "text_embedder": {"text": hybrid_question},
    "bm25_retriever": {"query": hybrid_question},
    "ranker": {"query": hybrid_question},
    "prompt_builder": {"question": hybrid_question}
})



Batches: 100%|██████████| 1/1 [00:00<00:00,  8.32it/s]


In [12]:
print(f"\nQuestion: {hybrid_question}")
print(f"Answer: {hybrid_result['llm']['replies']}")


Question: What is the Haystack 2.0 framework?
Answer: ['Haystack 2.0 is an open-source Python framework developed by deepset for building production-ready large language model (LLM) applications. It allows developers to create retrieval-augmented generative pipelines and state-of-the-art search systems. The framework is designed to be flexible and customizable, making it easier to implement composable AI systems that can be extended, optimized, evaluated, and deployed to production. Haystack 2.0 includes integrations with major model providers and databases and comes with documentation, tutorials, and a guide for getting started.']


In [14]:
# A query that benefits from both semantic and keyword matching
hybrid_question = "What can I build with Haystack"

# The run dictionary must now provide inputs for all components at the start of the graph.
# The query text needs to be passed to the text_embedder, bm25_retriever, ranker, and prompt_builder.
hybrid_result = hybrid_rag_pipeline.run({
    "text_embedder": {"text": hybrid_question},
    "bm25_retriever": {"query": hybrid_question},
    "ranker": {"query": hybrid_question},
    "prompt_builder": {"question": hybrid_question}
})

Batches: 100%|██████████| 1/1 [00:01<00:00,  1.28s/it]



In [15]:
hybrid_result

{'llm': {'replies': ['With Haystack 2.0, you can build production-ready LLM (Large Language Model) applications. The framework allows for the creation of composable AI systems that are easy to use, customize, and extend. You can build various components that implement specific logic, such as embedders that convert text into vector representations or retrievers that take embeddings as input and return documents. Additionally, you have the flexibility to create custom components that incorporate unique behaviors and functionalities, fostering an open ecosystem around Haystack. The community and third parties also contribute components, expanding the possibilities of what you can develop.'],
  'meta': [{'model': 'gpt-4o-mini-2024-07-18',
    'index': 0,
    'finish_reason': 'stop',
    'usage': {'completion_tokens': 120,
     'prompt_tokens': 688,
     'total_tokens': 808,
     'completion_tokens_details': {'accepted_prediction_tokens': 0,
      'audio_tokens': 0,
      'reasoning_tokens'