
# Build RAG with Haystack and watsonx.data Milvus

## Introduction

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing large language models with external knowledge sources. By combining the strengths of vector search and generative AI, RAG systems can provide more accurate, reliable, and contextually relevant responses.

In this tutorial, we'll build a sophisticated RAG system using :

- **Haystack**: An open-source framework for building production-ready LLM applications with modular components
- **Milvus**: A high-performance vector database built specifically for embedding similarity search
- **IBM watsonx.ai**: Enterprise-grade AI models with state-of-the-art capabilities for embedding and text generation

By integrating these three technologies, we'll create a system that can:
- Process and index documents
- Generate high-quality vector embeddings
- Store and efficiently retrieve similar documents
- Generate contextually relevant answers to user queries
  
## What We'll Accomplish

By the end of this tutorial, you'll have built a complete RAG system capable of:
1. Indexing a text document into a vector database
2. Converting user questions into semantic embeddings
3. Retrieving relevant contextual information
4. Generating accurate answers using retrieved context
5. Responding to open-ended questions about the indexed data

This combination offers several advantages for enterprise applications:

1. **Data privacy**: Keep your data within your control by using IBM watsonx.ai instead of sending it to public APIs
2. **Scalability**: Milvus provides production-grade vector search designed to handle millions of documents
3. **Flexibility**: Haystack's component-based architecture allows easy customization and extension
4. **Enterprise support**: IBM's watsonx.ai provides enterprise-grade models with proper support channels

Let's dive in and build a complete RAG system with these technologies!


## 1. Setup and Installation

### Install required packages

In [1]:
!pip install --upgrade --quiet pymilvus milvus-haystack haystack-ai ibm-watsonx-ai

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3.10 install --upgrade pip[0m


## 2. Data Preparation

First, we'll acquire a sample document about Leonardo da Vinci from Project Gutenberg. This will serve as our knowledge base for answering questions.

In [None]:
# Download sample data for demonstration
import os
import urllib.request

url = "https://www.gutenberg.org/cache/epub/7785/pg7785.txt"
file_path = "./davinci.txt"

if not os.path.exists(file_path):
    urllib.request.urlretrieve(url, file_path)

## 3. IBM watsonx.ai Configuration

Now, we'll set up the IBM watsonx.ai models that will power our RAG system. We'll need:
- An embedding model to convert text into vector representations
- A language model to generate human-like responses

watsonx.ai provides powerful foundation models that are pre-trained on vast amounts of data. For our RAG system, we'll use:
- **IBM Slate**: A powerful embedding model optimized for retrieval tasks
- **IBM Granite**: A state-of-the-art language model for high-quality text generation

In [3]:
# Set up IBM watsonx API credentials
watsonx_credentials = {
    "url": "<watsonx url>",  # Replace with your watsonx URL
    "apikey":  "<watsonx_api_key>",  # Replace with your watsonx API Key
}
project_id = "<project_id>"  # Replace with your project ID

In [5]:
# Import watsonx libraries
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models.embeddings import Embeddings
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames as EmbedParams

# Initialize the IBM watsonx client
client = APIClient(watsonx_credentials)

# Configure embedding model
embedding_model_id = "ibm/slate-30m-english-rtrvr"  
embedding_params = {
    EmbedParams.TRUNCATE_INPUT_TOKENS: 128,
    EmbedParams.RETURN_OPTIONS: {'input_text': True},
}

# Initialize the embedding model
watsonx_embeddings = Embeddings(
    model_id=embedding_model_id,
    credentials=watsonx_credentials,
    params=embedding_params,
    project_id=project_id,
    space_id=None,
    verify=False
)

# Configure LLM generation model
generation_model_id = "ibm/granite-3-3-8b-instruct"  
generation_params = {
    "max_new_tokens": 1024,
    "temperature": 0,  
    "top_p": 0.9,
    "repetition_penalty": 1.05
}

# Initialize the LLM model
watsonx_llm = ModelInference(
    model_id=generation_model_id,
    credentials=watsonx_credentials,
    params=generation_params,
    project_id=project_id
)

## 4. Haystack Integration

Haystack provides a modular approach to building NLP pipelines through components that can be connected in various ways. To integrate IBM watsonx.ai models with Haystack, we need to create custom components that wrap around watsonx.ai's API calls.

First, we'll create wrapper functions to simplify the interaction with watsonx.ai:

In [6]:
def embed_documents(texts):
    """Wrapper function to embed documents using watsonx"""
    return watsonx_embeddings.embed_documents(texts=texts)

def embed_query(text):
    """Wrapper function to embed a single query using watsonx"""
    return watsonx_embeddings.embed_query(text=text)

def generate_text(prompt):
    """Wrapper function to generate text using watsonx LLM"""
    response = watsonx_llm.generate(prompt=prompt)
    return response['results'][0]['generated_text']


### Now we'll create custom Haystack components that use these wrapper functions. These components will slot into our pipelines, enabling us to use watsonx.ai within the Haystack framework:


In [7]:
from haystack import component
from haystack.dataclasses import Document
from typing import Dict, List, Optional, Any

# Create minimal custom components that use our wrapper functions
# Haystack's @component decorator allows you to plug in your logic into pipelines.
@component
class watsonxDocumentEmbedder:
    @component.output_types(documents=List[Document])
    def run(self, documents: List[Document]):
        texts = [doc.content for doc in documents]
        embeddings = embed_documents(texts)
        
        for doc, embedding in zip(documents, embeddings):
            doc.embedding = embedding
            
        return {"documents": documents}

@component
class watsonxTextEmbedder:
    @component.output_types(embedding=List[float], text=str)
    def run(self, text: str):
        embedding = embed_query(text)
        return {"embedding": embedding, "text": text}

@component
class watsonxGenerator:
    @component.output_types(replies=List[str])
    def run(self, prompt: str):
        generated_text = generate_text(prompt)
        return {"replies": [generated_text]}


### 5. Milvus Vector Database Setup

In [8]:
# Import Haystack and Milvus components
from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.components.builders import PromptBuilder
from milvus_haystack import MilvusDocumentStore
from milvus_haystack.milvus_embedding_retriever import MilvusEmbeddingRetriever

# Initialize Milvus document store
document_store = MilvusDocumentStore(
     connection_args={
        "uri": "https://<hostname>:<port>", # Replace with your watsonx.data Milvus URI or IP
        "user":"<user>",
        "password":"<password>",
        "secure": True,  # Set True if TLS is enabled
        "server_pem_path": "/root/path of ca.cert"
    }, 
    drop_old=True,
)

print("Milvus document store initialized")


Milvus document store initialized


### 6. Indexing Pipeline

Now that we have our components set up, we'll create an indexing pipeline to process our documents and store them in the Milvus vector database.

This pipeline will:
1. Load the text file
2. Split it into smaller chunks for better retrieval
3. Generate embeddings for each chunk using watsonx.ai
4. Store both the text and embeddings in Milvus

In [9]:
# Create an indexing pipeline to process and store documents
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", TextFileToDocument())
indexing_pipeline.add_component(
    "splitter", DocumentSplitter(split_by="sentence", split_length=2)
)
indexing_pipeline.add_component("embedder",watsonxDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store))

# Connect indexing pipeline components
indexing_pipeline.connect("converter", "splitter")
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")

# Run the indexing pipeline
print("Running indexing pipeline...")
indexing_pipeline.run({"converter": {"sources": [file_path]}})
print(f"Number of documents indexed: {document_store.count_documents()}")


No abbreviations file found for en. Using default abbreviations.


Running indexing pipeline...
Number of documents indexed: 191


### 7. Testing Retrieval Capabilities

Before building our complete RAG system, let's test the retrieval capabilities to ensure we can find relevant documents. We'll create a simple retrieval pipeline and test it with a question about the "Warrior" painting mentioned in our document.


In [10]:
# Define a test question
question = 'Where is the painting "Warrior" currently stored?'

# Create and run a simple retrieval pipeline
retrieval_pipeline = Pipeline()
retrieval_pipeline.add_component("embedder", watsonxTextEmbedder())
retrieval_pipeline.add_component(
    "retriever", MilvusEmbeddingRetriever(document_store=document_store, top_k=3)
)
retrieval_pipeline.connect("embedder.embedding", "retriever.query_embedding")

print("\nTesting retrieval with question:", question)
retrieval_results = retrieval_pipeline.run({"embedder": {"text": question}})

# Display retrieved documents
print("\nRetrieved documents:")
print("===================")
for i, doc in enumerate(retrieval_results["retriever"]["documents"], 1):
    print(f"Document {i}:")
    print(doc.content)
    print("-" * 50)



Testing retrieval with question: Where is the painting "Warrior" currently stored?

Retrieved documents:
Document 1:
To about this period belongs the superb drawing of the "Warrior," now
in the Malcolm Collection in the British Museum. This drawing may have
been made while Leonardo still frequented the studio of Andrea del
Verrocchio, who in 1479 was commissioned to execute the equestrian
statue of Bartolommeo Colleoni, which was completed twenty years later
and still adorns the Campo di San Giovanni e Paolo in Venice.





--------------------------------------------------
Document 2:
Some of these in red and black chalk are now preserved
in the Royal Library at Windsor, where there are in all 145 drawings
by Leonardo.

Several other old copies of the fresco exist, notably the one in the
Louvre. 
--------------------------------------------------
Document 3:
1252). As a matter of course it is
unfinished, only the under-painting and the colouring of the figures
in green on a brown gro

### 8. Building the Complete RAG Pipeline

Now that we've confirmed our retrieval works, let's build the complete RAG pipeline. This pipeline will:
1. Convert the user query into an embedding
2. Retrieve relevant context from Milvus
3. Create a prompt that includes the query and retrieved context
4. Generate a response using the watsonx.ai language model

In [11]:
# Define a prompt template for RAG
prompt_template = """
Answer the following query based on the provided context. If the context does
not include an answer, reply with 'I don't know'.

Query: {{query}}

Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}

Answer:
"""

# Create the full RAG pipeline
rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder",watsonxTextEmbedder())
rag_pipeline.add_component(
    "retriever", MilvusEmbeddingRetriever(document_store=document_store, top_k=3)
)
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
rag_pipeline.add_component("generator", watsonxGenerator())

# Connect RAG pipeline components
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "generator")

# Run the RAG pipeline
print("\nRunning RAG pipeline...")
rag_results = rag_pipeline.run(
    {
        "text_embedder": {"text": question},
        "prompt_builder": {"query": question},
    }
)

# Display the final answer
print("\nRAG Answer:")
print("==========")
print(rag_results["generator"]["replies"][0])


PromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.



Running RAG pipeline...

RAG Answer:
 The painting "Warrior" is currently stored in the Malcolm Collection in the British Museum.


### We can see the answer retrieved from our RAG Pipeline. 

## Conclusion

In this tutorial, we've built a complete Retrieval-Augmented Generation (RAG) system by integrating three powerful technologies:

1. **IBM watsonx.ai** provided the AI brains of our system with:
   - The Slate embedding model to create semantic representations of text
   - The Granite language model to generate natural language responses

2. **Milvus** served as our vector database, enabling:
   - Efficient storage of document embeddings
   - Fast similarity search to find relevant context

3. **Haystack** tied everything together with:
   - Modular pipeline components
   - Flexible document processing
   - Seamless integration of different technologies

This RAG system demonstrates how enterprises can leverage their private data to enhance AI capabilities. By retrieving relevant information and providing it as context to language models, we ensure more accurate, factual, and contextually appropriate responses.

By combining the strengths of IBM watsonx.ai, Milvus, and Haystack, you now have a powerful, flexible RAG system that can be customized for a wide range of enterprise applications.

