# Advanced methods for RAG applications

This notebook demonstrates advanced techniques for building a Retrieval-Augmented Generation (RAG) pipeline. It includes steps for setting up a vector database, indexing documents, and generating responses using OpenAI's GPT models. Specifically, we are goint to explore these two techniques:

1. Using a content extraction tool such as Docling to extract chunks and metadata for each document
2. Reranking Documents
3. Using LLMs to rephrase prompts


## Setup Environment

This section clones the relevant data we are going to use in this notebook, while also installed all the relevant packages.


**NOTE: Make sure to change the notebook runtime to T4 GPU**

In [None]:
!git clone https://github.com/CaSToRC-CyI/AI-Agents-Training.git

In [None]:
%cd ./AI-Agents-Training

In [None]:
%%bash

uv pip install haystack-ai
uv pip install milvus_haystack
uv pip install pymilvus
uv pip install python-docx
uv pip install docling-haystack

### Import packages

In [4]:
from pathlib import Path
import glob
import os
from typing import List
from haystack import Pipeline
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from getpass import getpass
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from docling_haystack.converter import DoclingConverter
from docling.chunking import HybridChunker
from haystack.components.rankers import SentenceTransformersSimilarityRanker
from haystack.components.joiners import ListJoiner
from haystack.components.builders import (
    AnswerBuilder,
    ChatPromptBuilder,
    PromptBuilder,
)
from haystack.components.converters import OutputAdapter
from haystack.components.generators import OpenAIGenerator
from milvus_haystack import MilvusDocumentStore, MilvusEmbeddingRetriever
from haystack_experimental.components.retrievers import ChatMessageRetriever
from haystack_experimental.components.writers import ChatMessageWriter
from haystack_experimental.chat_message_stores.in_memory import InMemoryChatMessageStore
import textwrap

### Setup Open-AI API key

In [None]:
os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

### Extract chunks and meta-data using Docling

In [6]:
DOCUMENTS_DIR = Path("./dummy_data/documents_dir")
FILES = [file.resolve() for file in DOCUMENTS_DIR.rglob("*") if file.is_file()]

Using the **DoclingConverter** with a **HybridChunker** that utilizes the `"sentence‑transformers/all‑MiniLM‑L6‑v2"` tokenizer to split and process a PDF (from arXiv ID 2408.09869) into semantically coherent chunks.
 
It applies layout-aware and tokenization-aligned chunking, preparing the document for embedding with sentence-transformer models. Perfect for downstream tasks like RAG pipelines, semantic search, or document understanding.

In [None]:
EMBED_MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"
converter = DoclingConverter(chunker=HybridChunker(tokenizer=EMBED_MODEL_ID))
documents = converter.run(paths=["https://arxiv.org/pdf/2408.09869"])


#### Inspect a chunk

In [None]:
CHUNK_ID = 2
print(f'Chunk Heading: {documents["documents"][CHUNK_ID].meta["dl_meta"]["meta"]["headings"]}\n')
print("Chunk content:")
print(f'{documents["documents"][CHUNK_ID].content}\n')
doc_items = documents["documents"][CHUNK_ID].meta["dl_meta"]["meta"]["doc_items"][0]
print("Document Items:")
for k, v in doc_items.items():
    if k == 'prov':
      print(v[0])
    else:
      print(f"{k}: {v}")

### Initialize the Vector Database

Set up the Milvus vector database to store document embeddings. The `drop_old` parameter ensures any existing data is cleared.

In [9]:
connection_args={"uri": "./rag_advanced.db"}
document_store = MilvusDocumentStore(
    connection_args=connection_args,
    drop_old=True,
)

## Indexing Documents and performing RAG

Before we were using 5 components:

1. Turn them into compatible Haystack *Documents*.
2. Clean each Document using Haystack's `DocumentCleaner`. This removes any whitespaces, empty lines, specified substrings, regexes and so on.
3. Then we split our documents into *smaller chunks*. We can define various split methods and length.
4. Turn them into embeddings with an *embedder*.
5. Store them in a Haystack *Document Store* so they can be accessed later on.

---

This time we are using 3:

1. DoclingConverter
2. Embedder
3. Writer

This is because Docling will perform the following for us:

1. Document conversion
2. Document Chunking
3. Document tokenization

In [None]:
# Initialize the indexing pipeline
indexing_pipeline = Pipeline()

# Add each component to the pipeline
indexing_pipeline.add_component(
                    "converter",
                    DoclingConverter(chunker=HybridChunker(tokenizer=EMBED_MODEL_ID))
                )
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store))

#  Connect each component
indexing_pipeline.connect("converter", "embedder")
indexing_pipeline.connect("embedder", "writer")

# Run the Pipeline
indexing_pipeline.run({"converter": {"paths": FILES}})

#### Basic arguments for our LLM

In [11]:
generation_kwargs = {"max_completion_tokens":32768,
                    "temperature":0.5,
                    "seed": 42,}

#### Setup the re-ranker

In [None]:
ranker_query_prefix = "Given the following user query, determine if the information provided below addresses the inquiry accurately."
ranker = SentenceTransformersSimilarityRanker(query_prefix=ranker_query_prefix, top_k=5)
ranker.warm_up()

#### Read the two templates

1. One for our RAG
2. One for our prompt-rephraser

In [12]:
with open("./dummy_data/RAG_TEMPLATE.txt") as rag_file:
    rag_template = rag_file.read()

with open("./dummy_data/QUERY_REPHRASE_TEMPLATE.txt") as query_rephrase_file:
    query_rephrase_template = query_rephrase_file.read()

In [13]:
user_prompt_template = ChatMessage.from_user(rag_template)

In [None]:
rag_pipeline = Pipeline()
memory_store = InMemoryChatMessageStore()

# components for query rephrasing
rag_pipeline.add_component("query_rephrase_prompt_builder", PromptBuilder(query_rephrase_template))
rag_pipeline.add_component("query_rephrase_llm", OpenAIGenerator())
rag_pipeline.add_component("list_to_str_adapter", OutputAdapter(template="{{ replies[0] }}", output_type=str))
rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
# components for RAG
rag_pipeline.add_component("retriever", MilvusEmbeddingRetriever(document_store=document_store, top_k=5))
rag_pipeline.add_component(instance=ranker, name="ranker")
rag_pipeline.add_component("prompt_builder", ChatPromptBuilder(variables=["query", "documents", "memories"], required_variables=["query", "documents", "memories"]))
rag_pipeline.add_component("llm", OpenAIChatGenerator())

# components for memory
rag_pipeline.add_component("memory_retriever", ChatMessageRetriever(memory_store))
rag_pipeline.add_component("memory_writer", ChatMessageWriter(memory_store))
rag_pipeline.add_component("memory_joiner", ListJoiner(List[ChatMessage]))


# connections for query rephrasing
rag_pipeline.connect("memory_retriever", "query_rephrase_prompt_builder.memories")
rag_pipeline.connect("query_rephrase_prompt_builder.prompt", "query_rephrase_llm")
rag_pipeline.connect("query_rephrase_llm.replies", "list_to_str_adapter.replies")
rag_pipeline.connect("list_to_str_adapter.output", "text_embedder.text")
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

# connections for RAG
rag_pipeline.connect("retriever.documents", "ranker.documents")
rag_pipeline.connect("ranker.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
rag_pipeline.connect("llm.replies", "memory_joiner")

# connections for memory
rag_pipeline.connect("memory_joiner", "memory_writer")
rag_pipeline.connect("memory_retriever", "prompt_builder.memories")

In [None]:
messages = [user_prompt_template]
question = "Tell me a bit about the Quantum Stream product."

result = rag_pipeline.run(data={"query_rephrase_prompt_builder": {"query": question},
                          "prompt_builder": {"template": messages, "query": question},
                          "ranker": {"query": question},
                          "memory_joiner": {"values": [ChatMessage.from_user(question)]}},
                        include_outputs_from=["llm","query_rephrase_llm"])

search_query = result['query_rephrase_llm']['replies'][0]
print(f"Search Query: {search_query}")


formatted_text = result["llm"]["replies"][0].text
wrapped_text = "\n".join(
    textwrap.fill(line, width=120, subsequent_indent="  ") if line.strip() else line
    for line in formatted_text.splitlines()
)

print(wrapped_text)