# Creating QA Pipeline with Retrieval-Augmentation

### Install Libraries

In [17]:
!pip install -q haystack-ai google-ai-haystack datasets sentence-transformers

### Import Libraries

In [18]:
import os
from getpass import getpass
from datasets import load_dataset

from haystack import Document, Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.generators.google_ai import GoogleAIGeminiChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore

### Fetching and Indexing Documents
I’ll start by downloading my data and indexing it along with embeddings into a DocumentStore.

For this, I’ll keep it simple and use an InMemoryDocumentStore to store my documents and embeddings, which my QA system will later use to find answers.

In [19]:
document_store = InMemoryDocumentStore()

#### Fetch the Data

I’ll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) of the Ancient World as my documents. Since the data is already preprocessed and available on Hugging Face Spaces, I don’t need to worry about cleaning or splitting it.

I’ll just fetch the data and convert it into Haystack Documents.

In [20]:
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

### Initalize a Document Embedder

I’ll initialize a **SentenceTransformersDocumentEmbedder** with my chosen model to generate embeddings for the documents. Then, I’ll call `warm_up()` to download and prepare the embedding model.

In [21]:
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()

### Write Documents to the DocumentStore

I’ll run the **doc_embedder** on my documents to generate embeddings and store them in each document’s `embedding` field. After that, I’ll use `write_documents()` to save these documents (with embeddings) into the **DocumentStore**.

In [22]:
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])

Batches:   0%|          | 0/5 [00:00<?, ?it/s]

151

### Building the RAG Pipeline

Next, I’ll build a **Pipeline** to generate answers using the **RAG** approach.  
I’ll start by initializing each component, then add and connect them within the pipeline.

### Initialize a Text Embedder

I’ll initialize a text embedder to create embeddings for user queries. These embeddings will help the Retriever find relevant documents from the DocumentStore.

⚠️ Since I used the sentence-transformers/all-MiniLM-L6-v2 model for my documents, I’ll use the same model here to ensure consistency.

In [23]:
text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

### Initialize the Retriever

I’ll initialize an **InMemoryEmbeddingRetriever** and connect it to the **InMemoryDocumentStore** I set up earlier. This retriever will handle fetching the most relevant documents for any given query.

In [24]:
retriever = InMemoryEmbeddingRetriever(document_store)

### Define a Template Prompt

I’ll create a custom prompt for my RAG-based question answering. The prompt will take two inputs:

*   documents (retrieved from the DocumentStore)
*   question (from the user)


I’ll use Jinja2 looping syntax to combine the content of the retrieved documents into the prompt.

Then, I’ll initialize a PromptBuilder with this template. When I pass in the documents and question, the PromptBuilder will automatically fill in the variables and generate the final prompt for the model — giving me a more tailored and effective QA experience.

In [25]:
template = [
    ChatMessage.from_user(
        """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
"""
    )
]

prompt_builder = ChatPromptBuilder(template=template)



### Initialize a ChatGenerator

**ChatGenerators** are the components that talk to large language models (LLMs). I’ll set the **GOOGLE_API_KEY** environment variable and initialize a **GoogleAIGeminiChatGenerator** to communicate with **Google Gemini** models. While initializing, I’ll specify the model name (like `"gemini-2.0-flash"`).

In [26]:
if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass("Enter Google API key:")

chat_generator = GoogleAIGeminiChatGenerator(model="gemini-2.0-flash")

### Build the Pipeline


To build my **pipeline**, I’ll add all the components and connect them step by step:

1. I’ll connect the **text_embedder’s "embedding" output** to the **retriever’s "query_embedding" input**.  
2. Then, I’ll link the **retriever’s output** to the **prompt_builder**.  
3. Since the **prompt_builder** has two inputs ("documents" and "question"), I’ll explicitly connect the **retriever’s output** to the **"documents" input** of the prompt_builder.  
4. Finally, I’ll connect the **prompt_builder’s output** to the **LLM (chat_generator)**.

This setup ensures everything flows properly from the query to the final answer.

In [27]:
basic_rag_pipeline = Pipeline()

# Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)

# Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7e7da01e7210>
🚅 Components
  - text_embedder: SentenceTransformersTextEmbedder
  - retriever: InMemoryEmbeddingRetriever
  - prompt_builder: ChatPromptBuilder
  - llm: GoogleAIGeminiChatGenerator
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.messages (List[ChatMessage])

### Asking a Question

When I ask a question, I’ll use the **pipeline’s `run()` method**. I’ll make sure to pass the question to **both the `text_embedder` and the `prompt_builder`**.

This way, the **{{question}}** variable in my prompt template gets filled correctly with the user’s specific question.

In [28]:
question = "Why did people visit the Temple of Artemis?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(response["llm"]["replies"][0].text)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

People visited the Temple of Artemis for several reasons:

*   **To pay homage to Artemis:** The temple was dedicated to Artemis, and visitors came to honor her with jewelry and various goods.
*   **Sightseeing:** The Temple became an important attraction for merchants, kings, and sightseers.
*   **Sanctuary:** The temple offered sanctuary to those fleeing persecution or punishment.
*   **To attend the Artemis Procession:** Large numbers of people came to Ephesus in March and in the beginning of May to attend the main Artemis Procession.

