<a href="https://www.nvidia.com/dli"> <img src="images/nvidia_header.png" style="margin-left: -30px; width: 300px; float: left;"> </a>

# Building a RAG Workflow with AgentIQ

## Introduction to RAG

**Retrieval Augmented Generation (RAG)** is a powerful technique that enhances Large Language Models (LLMs) by providing them with relevant information retrieved from external knowledge sources. RAG combines the strengths of retrieval-based systems with the generative capabilities of LLMs to produce more accurate, up-to-date, and contextually relevant responses.

### How RAG Works

RAG operates in three main steps:

1. **Retrieval**: When a user asks a question, the system searches through a knowledge base to find relevant documents or passages.
2. **Augmentation**: The retrieved information is added to the prompt sent to the LLM.
3. **Generation**: The LLM generates a response based on both its pre-trained knowledge and the retrieved information.

### Benefits of RAG

- **Reduced Hallucinations**: By grounding responses in retrieved facts, RAG helps minimize the LLM's tendency to generate plausible but incorrect information.
- **Up-to-date Knowledge**: RAG can access information that wasn't available during the LLM's training.
- **Domain Adaptation**: RAG allows LLMs to work with specialized knowledge without fine-tuning.
- **Transparency**: The retrieved documents provide a source for the information, making the system more explainable.

## RAG in AgentIQ

AgentIQ provides robust support for building RAG workflows through integration with popular frameworks like LlamaIndex and LangChain. In this notebook, we'll build a RAG workflow using the `book_knowledge_rag` tool, which allows us to:

1. Ingest and process documents
2. Create vector embeddings for efficient retrieval
3. Query the knowledge base with natural language
4. Evaluate and profile the RAG system's performance

By the end of this notebook, you'll understand how to:
- Configure a RAG workflow in AgentIQ
- Evaluate its performance using standard metrics
- Profile and optimize your RAG system
- Enhance performance by upgrading components like the LLM

## Setting Up Our Environment

Let's start by setting up our environment for the RAG workflow. We'll need to set the NVIDIA API key and create our project structure.

### 1. Opening Phoenix (optional)

If you want to observe detailed data on the RAG calls, you can open Phoenix as these workflow configs have been instrumented for it:

In [None]:
%%js
const href = window.location.hostname;
let a = document.createElement('a');
let link = document.createTextNode('Click here to open Phoenix!');
a.appendChild(link);
a.href = "http://" + href + "/phoenix";
a.style.color = "navy"
a.target = "_blank"
element.append(a);

### 2. Creating the AgentIQ workflow

Let's create a workflows directory (if it doesn't exist from previous notebooks):

In [None]:
!mkdir -p workflows

Now let's use the AgentIQ cli to create the new workflow

In [None]:
!aiq workflow create --no-install --workflow-dir workflows ragtime

Let's confirm that the project is properly in place:

In [None]:
!tree workflows/ragtime

We should see the following directory structure for our RAG project. We'll follow this standard structure:
```
workflows/ragtime
├── pyproject.toml
└── src
    └── ragtime
        ├── __init__.py
        ├── configs
        │   └── config.yml
        ├── ragtime_function.py
        └── register.py
```

Let's create an additional directory to store our agent config (note that this does not need to be stored in the workflow directory, we're just doing it here for convenience):

In [None]:
!mkdir -p workflows/ragtime/configs
!mkdir -p workflows/ragtime/data

### 3. Creating the Package Configuration

Now we can update the `pyproject.toml` file with specifics about our workflow. This file specifies:

- Package metadata (name, version, description)
- Dependencies (we'll use `agentiq[llama-index,langchain]` for RAG support and `colorama` for better output)
- Entry point registration so AgentIQ can discover our components

The entry point maps the 'agentiq_ragtime.register' module to the AgentIQ component system.

In [None]:
%%writefile workflows/ragtime/pyproject.toml
[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools >= 64"]

[project]
name = "ragtime"
version = "0.1.0"
dependencies = [
  "agentiq[llama-index,langchain]",
  "colorama"
]
requires-python = ">=3.12"
description = "RAG workflow for AgentIQ"
classifiers = ["Programming Language :: Python"]



[project.entry-points.'aiq.components']
ragtime = "ragtime.register"

## Creating a Basic Configuration Without RAG

Before implementing RAG, let's create a basic configuration that uses a standard agent without retrieval capabilities. This will help us understand the baseline performance and see the impact of adding RAG later.

### 1. Creating the Initial Configuration File

Let's create a YAML configuration file that defines the core components of our workflow:

- **general**: Basic settings like using uvloop for better performance
- **llms**: Language model configuration (using NIM with Llama 3.1)
- **workflow**: Agent type (ReAct) and connection to the LLM

At this stage, we're not including any RAG-specific components.

In [None]:
%%writefile workflows/ragtime/configs/basic_config.yml

general:
  use_uvloop: true
  telemetry:
    tracing:
      phoenix:
          _type: phoenix
          endpoint: http://phoenix:6006/v1/traces
          project: rag_example

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0

workflow:
  _type: simple_llm_call
  llm_name: nim_llm
  verbose: true


### 2. Installing the Workflow

Now let's install our workflow package using pip. This makes our custom components available to AgentIQ.

In [None]:
%pip install -e workflows/ragtime

### 3. Testing the Basic Configuration

Let's run our agent with a question about literature to see how it performs without RAG. We'll use a question about "Anne of Green Gables" since that's the document we'll be using for RAG later.

In [None]:
!aiq run --config_file workflows/ragtime/configs/basic_config.yml --input "Who is Anne Shirley and what is her personality like?"

As we can see, the model provides an answer based on its pre-trained knowledge. However, it might not have the most detailed or accurate information about specific aspects of "Anne of Green Gables". This is where RAG can help by providing the model with relevant passages from the book.

## Documents for RAG

For our RAG system to be effective, we need to prepare appropriate documents that will serve as our knowledge base. This course has a few books from Project Gutenberg which are in the public domain, downloaded in advance so we don't waste Project Gutenberg's bandwidth.

We have the following titles:
- Twenty Thousand Leagues under the Sea by Jules Verne
- The War of the Worlds by H. G. Wells
- Anne of Green Gables by Lucy Maud Montgomery

### 1. Listing Available Documents

Run the following command to ensure the documents are ready:

In [None]:
# list the files in the rag_data directory
!ls -al rag_data

### 2. Exploring the Document

Let's take a look at the beginning of one of the documents to understand its structure:

In [None]:
# Read the first 1000 characters to see the structure
with open("rag_data/anne_of_green_gables_project_gutenberg.txt", "r", encoding="utf-8") as f:
    preview = f.read(1000)
    
print(preview)

### 4. Creating Evaluation Questions

Now, let's create a set of evaluation questions that we'll use to test our RAG system. These questions should be specific to the content of "Anne of Green Gables" so we can clearly see the difference between the model's pre-trained knowledge and information retrieved from the text.

In [None]:
%%writefile workflows/ragtime/data/rag_questions.json
[
  {
    "id": 1,
    "question": "What is the name of the book Anne is caught reading in class in Anne of Green Gables?",
    "answer": "Ben Hur"
  },
  {
    "id": 2,
    "question": "What type of flowers does Anne place on Matthew's grave?",
    "answer": "roses"
  },
  {
    "id": 3,
    "question": "What poem does Anne recite at the White Sands Hotel concert?",
    "answer": "The Maiden's Vow"
  },
  {
    "id": 4,
    "question": "What was the color of the smoke that squirted from the joints of the martian machines?",
    "answer": "green"
  },
  {
    "id": 5,
    "question": "What poisonous weapon did the aliens use to silence large amounts of artillery in The War of the Worlds?",
    "answer": "black smoke"
  },
  {
    "id": 6,
    "question": "What is the name of the inventor of the projectile weapon used by Captain Nemo in Twenty Thousand Leagues Under the Sea?",
    "answer": "Leniebroek"
  }
]



### 5. Creating an Evaluation Configuration

Now, let's create an evaluation configuration file that we'll use to test our agent's performance with and without RAG.

Note that we are using a larger LLM (Llama 3.1 405B) as the evaluation LLM. This is not always necessary, but this highlights that we can define multiple LLM components for different purposes.

In [None]:
%%writefile workflows/ragtime/configs/eval_config.yml

general:
  use_uvloop: true
  telemetry:
    tracing:
      phoenix:
          _type: phoenix
          endpoint: http://phoenix:6006/v1/traces
          project: rag_example

functions:
  current_datetime:
    _type: current_datetime

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
  
  nim_rag_eval_llm:
    _type: nim
    model_name: meta/llama-3.1-405b-instruct
    temperature: 0.0

workflow:
  _type: simple_llm_call
  llm_name: nim_llm
  verbose: true

eval:
  general:
    output_dir: ./ragtime_eval/
    dataset:
      _type: json
      file_path: workflows/ragtime/data/rag_questions.json
  evaluators:
    rag_accuracy:
      _type: ragas
      metric: AnswerAccuracy
      llm_name: nim_rag_eval_llm

### 6. Running the Baseline Evaluation

Let's run an evaluation of our basic agent (without RAG) to establish a baseline performance:

In [None]:
!aiq eval --config_file workflows/ragtime/configs/eval_config.yml

### 7. Examining the Evaluation Results

Let's look at the evaluation results to understand how well our agent performed without RAG:

In [None]:
import json

# Load the workflow output
with open("./ragtime_eval/workflow_output.json", "r") as f:
    workflow_output = json.load(f)

# Load the evaluation results
with open("./ragtime_eval/rag_accuracy_output.json", "r") as f:
    eval_results = json.load(f)

print(f"Average accuracy score: {eval_results['average_score']}\n")

# Print questions, expected answers, and generated answers
for i, item in enumerate(workflow_output):
    print(f"Question {i+1}: {item['question']}")
    print(f"Expected: {item['answer']}")
    print(f"Generated: {item['generated_answer']}")
    print(f"Score: {eval_results['eval_output_items'][i]['score']}\n")

As we can see, the model's performance on these specific questions about the books is limited by its pre-trained knowledge. In fact, we have a very low score. Now, let's implement RAG to see if we can improve the accuracy by providing the model with relevant passages from the books.

## Implementing the RAG Workflow

Now that we've established a baseline, let's implement a RAG workflow using AgentIQ and LlamaIndex. This will allow our agent to retrieve relevant information from "Anne of Green Gables" when answering questions.

### 1. Creating the LlamaIndex RAG Tool

First, let's create the LlamaIndex RAG tool that will handle document ingestion, embedding, and retrieval. This tool will:
1. Load the documents from our data directory
2. Parse them into chunks
3. Create embeddings for each chunk
4. Build a vector index for efficient retrieval
5. Provide a query interface for the agent

In [None]:
%%writefile workflows/ragtime/src/ragtime/ragtime_function.py
import logging
import os

from pydantic import ConfigDict

from aiq.builder.builder import Builder
from aiq.builder.framework_enum import LLMFrameworkEnum
from aiq.builder.function_info import FunctionInfo
from aiq.cli.register_workflow import register_function
from aiq.data_models.component_ref import EmbedderRef
from aiq.data_models.component_ref import LLMRef
from aiq.data_models.function import FunctionBaseConfig

logger = logging.getLogger(__name__)


class BookKnowledgeRAGConfig(FunctionBaseConfig, name="book_knowledge_rag"):
    data_dir: str
    data_db_dir: str
    chunk_size: int = 200
    chunk_overlap: int = 50
    top_k: int = 10
    fetch_k: int = 20
    rerank_model: str = "nvidia/llama-3.2-nv-rerankqa-1b-v2"
    debug_mode: bool = False
    model_config = ConfigDict(protected_namespaces=())
    llm_name: LLMRef
    embedding_name: EmbedderRef
    query_response_mode: str = "simple_summarize"

@register_function(config_type=BookKnowledgeRAGConfig, framework_wrappers=[LLMFrameworkEnum.LLAMA_INDEX])
async def book_knowledge_rag_tool(tool_config: BookKnowledgeRAGConfig, builder: Builder):
    """
    A RAG system for querying a collection of books.
    """
    from pathlib import Path
    from typing import List, Dict
    from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings, StorageContext, load_index_from_storage
    from llama_index.core.node_parser import SentenceSplitter
    from llama_index.vector_stores.faiss import FaissVectorStore
    import faiss
    from llama_index.llms.nvidia import NVIDIA
    from llama_index.embeddings.nvidia import NVIDIAEmbedding
    from llama_index.postprocessor.nvidia_rerank import NVIDIARerank

    # Get the LLM and embedder from the builder
    llm = await builder.get_llm(tool_config.llm_name, wrapper_type=LLMFrameworkEnum.LLAMA_INDEX)
    embedder = await builder.get_embedder(tool_config.embedding_name, wrapper_type=LLMFrameworkEnum.LLAMA_INDEX)

    reranker = NVIDIARerank(model=tool_config.rerank_model, top_n=tool_config.top_k)
    # NOTE: this workshop environment is sending API requests through a proxy and for LlamaIndex's reranker
    # we need to set `base_url` explicity to use it. In your own environment, you would not need the following line.
    reranker.base_url = "http://proxy/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking"
    Settings.llm = llm
    Settings.embed_model = embedder

    # Validate data directory
    data_dir_path = Path(tool_config.data_dir)
    if not data_dir_path.exists() or not data_dir_path.is_dir():
        raise ValueError(f"Invalid or non-existent data directory: {tool_config.data_dir}")
    txt_files = list(data_dir_path.glob("*.txt"))
    if not txt_files:
        raise ValueError(f"No .txt files found in directory: {tool_config.data_dir}")

    # Prepare database directory
    db_dir_path = Path(tool_config.data_db_dir)
    if not db_dir_path.exists():
        db_dir_path.mkdir(parents=True, exist_ok=True)

    # Check if index exists
    index_persisted = all(
        db_dir_path.joinpath(f).exists() 
        for f in ["default__vector_store.json", "docstore.json", "index_store.json"]
    )

    if index_persisted:
        vector_store = FaissVectorStore.from_persist_dir(persist_dir=str(db_dir_path))
        storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir=str(db_dir_path))
        vector_index = load_index_from_storage(
            storage_context=storage_context,
            embed_model=embedder
        )
    else:
        documents = SimpleDirectoryReader(
            input_dir=str(data_dir_path),
            required_exts=[".txt"],
            recursive=True,
            filename_as_id=True
        ).load_data()

        node_parser = SentenceSplitter(chunk_size=tool_config.chunk_size, chunk_overlap=tool_config.chunk_overlap)
        nodes = node_parser.get_nodes_from_documents(documents)
        for node in nodes:
            node.metadata["book_title"] = Path(node.metadata.get("file_name", "Unknown")).name

        # Create FAISS index
        sample_embedding = embedder.get_text_embedding("test")
        embedding_dim = len(sample_embedding)
        faiss_index = faiss.IndexFlatL2(embedding_dim)  # Flat L2 index
        vector_store = FaissVectorStore(faiss_index=faiss_index)
        storage_context = StorageContext.from_defaults(vector_store=vector_store)

        # Build and populate index
        vector_index = VectorStoreIndex(
            nodes=nodes,
            storage_context=storage_context,
            embed_model=embedder,
            show_progress=True
        )

        # Persist the index (handled by llama_index)
        vector_index.storage_context.persist(persist_dir=str(db_dir_path))

    # Query engine setup
    node_postprocessors = [reranker]
    query_engine = vector_index.as_query_engine(
        similarity_top_k=tool_config.fetch_k,
        response_mode=tool_config.query_response_mode,
        verbose=tool_config.debug_mode,
        node_postprocessors=node_postprocessors
    )

    async def _arun(question: str) -> str:
        try:
            # Query processing
            response = await query_engine.aquery(question)
            answer = str(response.response).strip()
            
            return answer
        except Exception as e:
            return f"Error: Failed to process query - {str(e)}"

    yield FunctionInfo.from_fn(_arun, description="Extract relevant information from books to answer questions efficiently")

And we need to update the imports in `register.py`

In [None]:
%%writefile workflows/ragtime/src/ragtime/register.py
# pylint: disable=unused-import
# flake8: noqa

# Import any tools which need to be automatically registered here
from ragtime.ragtime_function import book_knowledge_rag_tool

### 3. Reinstalling the Workflow

Now that we've added our RAG components, let's reinstall the workflow:

In [None]:
%pip install -e workflows/ragtime

### 4. Creating the RAG Configuration

Now, let's create a configuration file that includes our RAG components.

Note that we are now using a `react_agent` that can call functions, namely our `book_knowledge_rag` function.

In [None]:
%%writefile workflows/ragtime/configs/rag_config.yml

general:
  use_uvloop: true
  telemetry:
    tracing:
      phoenix:
          _type: phoenix
          endpoint: http://phoenix:6006/v1/traces
          project: rag_example
  
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
    base_url: $NVIDIA_BASE_URL

embedders:
  nim_embedder:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
    base_url: $NVIDIA_BASE_URL
    truncate: END

functions:
  book_knowledge_rag:
    _type: book_knowledge_rag
    llm_name: nim_llm
    embedding_name: nim_embedder
    data_dir: rag_data
    data_db_dir: rag_db

workflow:
  _type: react_agent
  tool_names:
    - book_knowledge_rag
  llm_name: nim_llm
  verbose: true

### 5. Testing the RAG Workflow

Let's test our RAG workflow with the same question we used earlier:

In [None]:
!aiq run --config_file workflows/ragtime/configs/rag_config.yml --input "Tell me a little about the personality of Anne Shirley."

Notice how the response now includes specific details from the book that weren't in the model's pre-trained knowledge. The RAG system has successfully retrieved relevant passages and used them to enhance the response.

### 6. Creating a RAG Evaluation Configuration

Now, let's create an evaluation configuration for our RAG workflow:

In [None]:
%%writefile workflows/ragtime/configs/rag_eval_config.yml

general:
  use_uvloop: true
  telemetry:
    tracing:
      phoenix:
          _type: phoenix
          endpoint: http://phoenix:6006/v1/traces
          project: rag_example

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
    base_url: $NVIDIA_BASE_URL
  
  nim_rag_eval_llm:
    _type: nim
    model_name: meta/llama-3.1-405b-instruct
    temperature: 0.0
    base_url: $NVIDIA_BASE_URL

embedders:
  nim_embedder:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
    base_url: $NVIDIA_BASE_URL
    truncate: END

functions:
  book_knowledge_rag:
    _type: book_knowledge_rag
    llm_name: nim_llm
    embedding_name: nim_embedder
    data_dir: rag_data
    data_db_dir: rag_db

workflow:
  _type: react_agent
  tool_names:
    - book_knowledge_rag
  llm_name: nim_llm
  verbose: true

eval:
  general:
    output_dir: ./ragtime_eval/
    dataset:
      _type: json
      file_path: workflows/ragtime/data/rag_questions.json
  evaluators:
    rag_accuracy:
      _type: ragas
      metric: AnswerAccuracy
      llm_name: nim_rag_eval_llm
    rag_groundedness:
      _type: ragas
      metric: ResponseGroundedness
      llm_name: nim_rag_eval_llm
    rag_relevance:
      _type: ragas
      metric: ContextRelevance
      llm_name: nim_rag_eval_llm

### 7. Running the RAG Evaluation

Let's run the evaluation on our RAG workflow:

In [None]:
!aiq eval --config_file workflows/ragtime/configs/rag_eval_config.yml

### 8. Comparing Evaluation Results

Now, let's compare the evaluation results between the non-RAG and RAG workflows:

In [None]:
import json

# Load the RAG evaluation results
with open("./ragtime_eval/rag_accuracy_output.json", "r") as f:
    rag_results = json.load(f)

print(f"RAG average accuracy: {rag_results['average_score']}\n")

# Load the groundedness and relevance results for RAG
with open("./ragtime_eval/rag_groundedness_output.json", "r") as f:
    groundedness_results = json.load(f)

with open("./ragtime_eval/rag_relevance_output.json", "r") as f:
    relevance_results = json.load(f)

print(f"RAG groundedness: {groundedness_results['average_score']}")
print(f"RAG relevance: {relevance_results['average_score']}\n")


As we can see, the RAG workflow outperforms the non-RAG workflow (which had an average accuracy of 0.20) on our evaluation questions.

## Conclusion

In this notebook, we've built and enhanced a RAG workflow using AgentIQ that effectively retrieves and processes information from a library of books. We've covered:

1. Setting up the basic RAG infrastructure
2. Implementing document processing and retrieval
3. Creating an evaluation framework

This implementation serves as a foundation for building more sophisticated RAG systems with AgentIQ.