# Lab: Retrieval-Augmented Generation (RAG)

## Introduction

In this lab, we'll explore the concept of **Retrieval-Augmented Generation (RAG)** within the LLM Agentic Tool Mesh platform. RAG is a crucial process that enhances the capabilities of Large Language Models (LLMs) by providing them with access to external knowledge. Unlike traditional generation techniques, which rely solely on the model's pre-trained knowledge, RAG allows the model to retrieve relevant data from external sources, improving both the relevance and accuracy of the responses. LLM Agentic Tool Mesh provides all the necessary tools to build a powerful RAG system by handling:

- **Data Extraction**
- **Data Transformation**
- **Data Storage**
- **Data Load**
- **Data Retrieval**

## Objectives

By the end of this lab, you will:

- Understand the concept of **RAG** and its importance in improving LLM performance.
- Learn how LLM Agentic Tool Mesh manages the different stages of the RAG process, from data extraction to retrieval.
- Build a simple RAG application using LLM Agentic Tool Mesh, integrating data retrieval into the generation process.
- Implement best practices for organizing and retrieving data to ensure relevant and accurate generation of content.

## Getting Started

The following bullets must be ran prior to executing notebooks for running this lab:
  1. uv installed and available on PATH with python 3.12 venv

      - Linux/MacOS:
          - `curl -sSL https://get.uv.dev | sh`
          - `source ~/.bashrc`
          - `curl -LsSf https://astral.sh/uv/install.sh | sh`
          - `source $HOME/.local/bin/env`
          - `uv python install 3.12`
          - `uv venv -p 3.12`
          - `source .venv/bin/activate`
          - `uv pip install ipykernel`

      - Windows:
          - TODO

  - Select the venv as the notebook kernel
  <div align="left">
    <img src="pictures/kernel.png" alt="VSCode Juypter UI hint" width="800">
  </div>

**MUST restart Juypter kernel if automated install dependencies cell is ran**
<div align="left">
  <img src="pictures/restart.png" alt="VSCode Juypter UI hint" width="800">
</div>

In [1]:
"Install rag dependencies"
!cd ../../.. && uv pip install --quiet 'llmesh[rag]' ipywidgets
# Shut down the kernel so user must restart it to apply new pip installations.
# This is a workaround for the fact that Jupyter does not automatically
# pick up new installations in the current kernel.
!echo "Kernel will shut down to apply new pip installations, manual restart required."
import os
os._exit(00)

Kernel will shut down to apply new pip installations, manual restart required.


: 

In [1]:
"""Setup environment for the lab."""
from src.notebooks.platform_services.lablib.env_util import set_services_env
_, _, _ = set_services_env()

BackgroundProcessManager: atexit handler registered for automated cleanup on kernel exit.
Keeping existing .env file.


## Data Pipeline for RAG

In LLM Agentic Tool Mesh, the RAG process is divided into two main stages: **Data Injection** (where data is injected into the LLM's reasoning) and **Data Retrieval** (where relevant information is retrieved to support generation). 

### Injection Process

The **Injection Process** is a critical step in **RAG**, where data is prepared and integrated into a storage system for efficient retrieval during the generation process. This process ensures that the relevant data is available in a format that the LLM can easily access and use to enhance the quality and relevance of its outputs. In LLM Agentic Tool Mesh, the Injection Process is abstracted into several key steps, including data extraction, transformation, and storage injection.

#### 1. Extract

- **Data Gathering**: The first step in the injection process involves collecting raw data from various sources. This data can come in different formats such as DOCX, PDF, Excel, or even API responses. The goal is to bring together all the information that may be relevant for the tasks the LLM will perform.
  
- **Conversion**: Once the data is gathered, it needs to be converted into a common format—typically JSON—to ensure consistency across different data types and sources. This conversion process makes it easier to standardize the subsequent transformation and storage processes.

#### 2. Transform

- **Clean**: In this stage, irrelevant or redundant information is removed from the data. This might involve eliminating duplicates, irrelevant sections, or noisy data that doesn't contribute to the task at hand. The goal is to focus on the core, useful content.

- **Enrich Metadata**: Adding metadata to the data helps improve its searchability and contextual relevance during the retrieval process. Metadata can include information such as the source, keywords, timestamps, and other contextual markers that make it easier to retrieve and use the data efficiently.

- **Transform with LLM**: This is where the power of the LLM comes into play. The cleaned data can be transformed into useful formats such as summaries, question-and-answer pairs, or structured outputs. This transformation makes it easier for the LLM to access and use the data in specific ways during the generation process, ensuring that the most relevant information is readily available.

#### 3. Load

- **Storage Injection**: Once the data is transformed, it is injected into the chosen storage solution. This is typically a **vector database**, which allows for fast and efficient retrieval based on the content's semantic meaning. The vector database stores the transformed data in a format that the LLM can easily access during generation.

- **Adaptation**: To optimize data retrieval, the stored data may be further adapted by **chunking** it into smaller, manageable pieces. This process ensures that data is stored in a way that allows the LLM to retrieve only the relevant portions when needed, improving both speed and accuracy during the generation process.

<div align="center">
  <img src="pictures/ingestion.png" alt="RAG Ingestion" width="800">
</div>

### Data Extraction

This step involves gathering information from a variety of sources and document types, preparing it for further processing in the RAG pipeline. The goal of data extraction is to retrieve relevant information from diverse formats—such as PDFs, DOCX files, or web data—and convert it into a standardized format that can be used efficiently by the system in subsequent stages like transformation, storage, and retrieval.

#### Key Features

1. **Multi-Format Support**:
   - The Data Extraction service is designed to handle a wide variety of document types, such as **PDF**, **DOCX**, **HTML**, and **XLS** files. This flexibility allows LLM Agentic Tool Mesh to process data from different sources and industries.
   - Supports extracting both text and metadata from these files, ensuring that all relevant information is captured for downstream tasks.

2. **Standardization**:
   - Once the data is extracted from its original format, it is converted into a **standardized format**, typically **JSON**. This uniform format ensures that the data can be seamlessly integrated into the rest of the RAG pipeline, enabling consistency in the way the data is transformed, stored, and retrieved.
   - Extracted data can include both text and structural elements like tables, lists, and headers, which are preserved in the standardized format.

In [2]:
from src.lib.package.athon.rag import DataExtractor

# Example configuration for the Data Extractor
EXTRACTOR_CONFIG = {
    'type': 'UnstructuredForSections',
    'document_type': 'Docx',
    'extraction_type': 'SectionWithHeader'
}

# Initialize the Data Extractor with the provided configuration
data_extractor = DataExtractor.create(EXTRACTOR_CONFIG)

In [3]:
# Parse a document file
file_path = 'documents/23502-i60-smf.docx'
result = data_extractor.parse(file_path)

# Handle the extraction result
if result.status == "success":
    print(f"EXTRACTED # ELEMENTS:\n{len(result.elements)}")
    print(f"FIRST EXTRACTED ELEMENTS:\n{result.elements[:2]}")
    extracted_elements = result.elements
else:
    print(f"ERROR:\n{result.error_message}")

2025-05-11 16:05:44,981 - ATHON - DEBUG - Parsing document.
2025-05-11 16:05:44,982 - ATHON - DEBUG - Partitioning Docx document.
2025-05-11 16:05:46,463 - ATHON - DEBUG - Distribution of element types:
2025-05-11 16:05:46,464 - ATHON - DEBUG - Header: 1
2025-05-11 16:05:46,464 - ATHON - DEBUG - Title: 29
2025-05-11 16:05:46,465 - ATHON - DEBUG - Text: 82
2025-05-11 16:05:46,465 - ATHON - DEBUG - NarrativeText: 127
2025-05-11 16:05:46,465 - ATHON - DEBUG - Table: 3
2025-05-11 16:05:46,466 - ATHON - DEBUG - ListItem: 60
2025-05-11 16:05:46,466 - ATHON - DEBUG - PageBreak: 15
2025-05-11 16:05:46,467 - ATHON - DEBUG - Footer: 1
2025-05-11 16:05:46,468 - ATHON - INFO - Parsing all elements.
2025-05-11 16:05:46,610 - ATHON - DEBUG - Saving elements to cache.
EXTRACTED # ELEMENTS:
316
FIRST EXTRACTED ELEMENTS:
[{'text': '5.2.8\tSMF Services', 'metadata': {'category_depth': 2, 'file_directory': 'documents', 'filename': '23502-i60-smf.docx', 'filetype': 'application/vnd.openxmlformats-officedo

### Data Transformation

After the data has been extracted from various sources, the transformation phase ensures that the data is cleaned, optimized, and enriched, making it ready for efficient storage and retrieval. This stage prepares the data in such a way that it can be seamlessly integrated into the retrieval process, allowing language models to access, understand, and use the data effectively during generation.

#### Key Features

1. **Data Cleaning**:
   - **Remove Redundant Information**: The transformation process begins by cleaning the extracted data. This involves removing irrelevant, duplicate, or noisy content that could confuse or slow down the retrieval and generation processes.
   - **Focus on Core Content**: By focusing on essential content, the data is made leaner and more relevant, ensuring that only the most useful and accurate information is retained for use by the LLM.

2. **Metadata Enrichment**:
   - **Add Contextual Metadata**: Additional metadata, such as keywords, categories, timestamps, and author information, is added to the data during the transformation phase. This enrichment makes the data easier to search and retrieve by improving the system’s ability to match queries with relevant content.
   - **Enhance Searchability**: Metadata plays a vital role in making the retrieval process more efficient, ensuring that specific queries lead to precise results. The more enriched the metadata, the more accurate the retrieval, helping the LLM generate contextually relevant responses.

3. **Transformation**:
   - **Generate Summaries and Q&A Pairs**: The data can also be processed with the help of LLMs to create summaries, question-and-answer pairs, or other useful formats that facilitate quicker retrieval and understanding during the generation phase. This transformation allows the LLM to directly interact with summarized or structured data, improving the quality and relevance of the generated responses.
   - **Chunking for Optimization**: The data may also be broken into smaller chunks or sections that allow for quicker, more targeted retrieval during the generation process. Chunking ensures that the language model retrieves only the most relevant pieces of data, improving both response time and accuracy.

In [4]:
from src.lib.package.athon.rag import DataTransformer

# Example configuration for the Data Transformer
TRANSFORMER_CONFIG = {
    'type': 'CteActionRunner',
    'clean': {
        'min_section_length': 100
    },
    'transform': {
        'chunk_size': 1000,
        'chunk_overlap': 0,
        'token_chunk': 256
    },
    'enrich': {
        'metadata': {
            'processed_by': 'WKSHP-LLM'
        }
    }
}

# Initialize the Data Transformer with the provided configuration
data_transformer = DataTransformer.create(TRANSFORMER_CONFIG)

In [5]:
# Define the actions to be performed
actions = [
    'RemoveMultipleSpaces',
    'ReplaceTabsWithSpaces',
    'TransformInSectionByHeader',
    'RemoveTitleElementsOnly',
    'EnrichMetadata',
    'TransformInChunk',
]

# Process the elements
result = data_transformer.process(actions, extracted_elements)

# Handle the transformation result
if result.status == "success":
    print(f"TRANSFORMED # ELEMENTS:\n{len(result.elements)}")
    print(f"FIRST TRANSFORMED ELEMENTS:\n{result.elements[:2]}")
    trasformed_elements = result.elements
else:
    print(f"ERROR:\n{result.error_message}")

2025-05-11 16:06:11,743 - ATHON - DEBUG - Performing action: RemoveMultipleSpaces
2025-05-11 16:06:11,746 - ATHON - DEBUG - Performing action: ReplaceTabsWithSpaces
2025-05-11 16:06:11,746 - ATHON - DEBUG - Performing action: TransformInSectionByHeader
2025-05-11 16:06:11,747 - ATHON - DEBUG - Performing action: RemoveTitleElementsOnly
2025-05-11 16:06:11,748 - ATHON - DEBUG - Performing action: EnrichMetadata
2025-05-11 16:06:11,749 - ATHON - DEBUG - Performing action: TransformInChunk


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

TRANSFORMED # ELEMENTS:
88
FIRST TRANSFORMED ELEMENTS:
[{'text': '5. 2. 8. 1 general the following table shows the smf services and smf service operations. table 5. 2. 8. 1 - 1 : nf services provided by the smf service name service operations operation semantics example consumer ( s ) nsmf _ pdusession create request / response v - smf / i - smf update request / response v - smf / i - smf, h - smf release request / response v - smf / i - smf createsmcontext request / response amf updatesmcontext request / response amf releasesmcontext request / response amf smcontextstatusnotify subscribe / notify amf statusnotify subscribe / notify v - smf / i - smf contextrequest request / response amf, v - smf / i - smf, smf contextpush request / response smf sendmodata request / response amf transfermodata request / response v - smf / i - smf transfermtdata request / response smf, h - smf nsmf _ eventexposure subscribe subscribe / notify nef, amf, nwdaf unsubs', 'metadata': {'category_depth': 3, 'f

### Data Storage

Once data has been extracted and transformed, it must be stored in a way that allows for fast and accurate retrieval during the generation process. Effective data storage ensures that the system can quickly access the relevant data needed by the LLM to generate contextually appropriate and accurate responses. LLM Agentic Tool Mesh supports a variety of storage strategies, including the use of specialized databases like vector stores, which are optimized for handling semantic search and retrieval.

#### Key Features

1. **Efficient Storage Solutions**:
   - **Vector Stores**: LLM Agentic Tool Mesh utilizes vector databases to store data as numerical vectors. This storage method is ideal for handling semantic searches, where the meaning of the input query is more important than exact keyword matching. Vector stores enable the system to retrieve data that is most semantically relevant to the user’s query, improving both speed and accuracy.

2. **Structured Data Organization**:
   - **Metadata-Enhanced Storage**: During storage, the data is organized with enriched metadata, ensuring that specific filters and searches can be applied quickly. This structured organization helps LLM Agentic Tool Mesh narrow down large datasets to retrieve exactly what is needed for the query.
   - **Chunking and Indexing**: Data is stored in chunks, and these chunks are indexed for fast lookup. This chunking strategy ensures that the system retrieves only the most relevant portions of data, improving both the accuracy and efficiency of the retrieval process.

In [6]:
from src.lib.package.athon.rag import DataStorage

# Example configuration for the Data Storage
STORAGE_CONFIG = {
    'type': 'ChromaCollection',
    'path': 'data',
    'collection': 'WkshpLlm'
}

# Initialize the Data Storage with the provided configuration
data_storage = DataStorage.create(STORAGE_CONFIG)

2025-05-11 16:06:43,092 - ATHON - DEBUG - Attempting to get or create Chroma dB collection 'WkshpLlm'.
2025-05-11 16:06:45,000 - ATHON - DEBUG - Creating or retrieving collection with arguments: {'name': 'WkshpLlm', 'metadata': {'hnsw:space': 'cosine'}, 'embedding_function': <chromadb.utils.embedding_functions.sentence_transformer_embedding_function.SentenceTransformerEmbeddingFunction object at 0x7f8bb6426810>}


In [7]:
# Retrieve the data collection
result = data_storage.get_collection()

# Handle the retrieval result
if result.status == "success":
    print(f"COLLECTION RETRIEVED:\n{result.collection}")
    collection = result.collection
else:
    print(f"ERROR:\n{result.error_message}")

2025-05-11 16:06:49,032 - ATHON - DEBUG - Successfully retrieved the collection.
COLLECTION RETRIEVED:
Collection(name=WkshpLlm)


### Data Loader

Once data has been extracted, transformed, and prepared for storage, the final step is to load this data into the storage system, making it readily accessible for retrieval and use by LLMs. The Data Loader ensures that all the cleaned and structured data is properly indexed and optimized for retrieval, playing a vital role in the efficiency of the overall RAG pipeline.

#### Key Features

1. **Seamless Data Loading**:
   - The Data Loader is responsible for moving the processed data into the selected storage solution, such as a vector database or a traditional document store. This ensures that the data is loaded in a format optimized for fast access during the retrieval stage.

2. **Data Integrity and Validation**:
   - Validation processes ensure that the data is compliant with the expected structure and ready to be indexed for fast retrieval.

In [8]:
from src.lib.package.athon.rag import DataLoader

# Example configuration for the Data Loader
LOADER_CONFIG = {
    'type': 'ChromaForSentences'
}

# Initialize the Data Loader with the provided configuration
data_loader = DataLoader.create(LOADER_CONFIG)

In [9]:
# Insert the elements into the collection
result = data_loader.insert(collection, trasformed_elements)

# Handle the insertion result
if result.status == "success":
    print("Data successfully inserted into the collection.")
else:
    print(f"ERROR:\n{result.error_message}")

Validating and processing elements: 100%|██████████| 88/88 [00:00<00:00, 318463.12it/s]


2025-05-11 16:07:02,722 - ATHON - DEBUG - Inserted 88 documents into the collection.
2025-05-11 16:07:02,723 - ATHON - DEBUG - Successfully inserted elements into the collection.
Data successfully inserted into the collection.


### Retrieval Process

Once the data has been injected and stored, the **Retrieval Process** comes into play, focusing on fetching the most relevant information based on a given input query. The retrieval process is essential for ensuring that the LLM can access the appropriate data to generate accurate and contextually relevant responses. In LLM Agentic Tool Mesh, the retrieval process utilizes advanced search techniques, metadata filtering, and chunk expansion to optimize the data retrieval for better output generation.

#### 1. Search Techniques

- **Data Retrieval**: The retrieval process begins by searching the stored data using advanced techniques like **dense retrieval** (leveraging semantic vector search). Dense retrieval methods use the semantic meaning of the query and stored data to find the most relevant results.

#### 2. Metadata Filtering

- **Refinement**: After an initial retrieval, metadata filtering is applied to refine the search results. By utilizing the metadata added during the **Injection Process**, the system can narrow down the retrieved data to ensure it closely aligns with the specific needs of the query. For example, filters can be applied based on the document source, creation date, or topic tags to return only the most relevant sections of data.

#### 3. Chunk Expansion

- **Data Expansion**: In some cases, the initial retrieved data may only include small sections of text or partial results. To provide more comprehensive context, the system applies **chunk expansion**, which expands the data around the retrieved sections, either by adding surrounding paragraphs, sections, or related sentences. This ensures that the LLM has access to a more complete and contextually rich dataset when generating responses, leading to more accurate and nuanced outputs.

<div align="center">
  <img src="Pictures/retrieve.png" alt="RAG Retrieval" width="800">
</div>

### Data Retriever

The Data Retriever ensures that the LLM has access to the most contextually appropriate and relevant information to generate high-quality outputs. It does this by not only searching for relevant data but also expanding and refining the results to provide a more comprehensive and accurate set of information for the language model to use.

#### Key Features

1. **Search for Relevant Data**:
   - The Data Retriever uses advanced search techniques, such as **semantic vector search** or traditional keyword-based search, to identify the most relevant data from the storage based on the user’s query.
   - **Semantic Search**: This method leverages vector embeddings to find data that is semantically related to the query, focusing on meaning rather than exact keyword matching. This ensures that the most contextually relevant data is retrieved, even if the exact terms differ.

2. **Chunk Expansion**:
   - Once the most relevant chunks of data are retrieved, the Data Retriever can **expand** the results by pulling in surrounding text or related sections of the data. This ensures that the language model has access to a comprehensive context, which is critical for generating accurate and meaningful responses.
   - Expansion techniques help provide a more complete picture by retrieving additional information beyond just the exact matching chunk, giving the LLM more context to work with when generating responses.

3. **Refinement of Results**:
   - After retrieving the initial set of data, the Data Retriever applies **refinement techniques** to filter and prioritize the information. This can include using metadata filters, such as document type, date, or author, to ensure that the results are highly relevant to the user’s specific needs.
   - Refinement ensures that irrelevant or outdated information is excluded, leaving only the most valuable and contextually appropriate data for the LLM to use during generation.

In [10]:
from src.lib.package.athon.rag import DataRetriever

# Example configuration for the Data Retriever
RETRIEVER_CONFIG = {
    'type': 'ChromaForSentences',
    'n_results': 3,
}

# Initialize the Data Retriever with the provided configuration
data_retriever = DataRetriever.create(RETRIEVER_CONFIG)

In [11]:
# Example query to search within the collection
query = "What are the services of SMF?"

# Retrieve the relevant data based on the query
result = data_retriever.select(collection, query)

# Handle the retrieval result
if result.status == "success":
    for element in result.elements:
        print(f"TEXT:\n{element['text']}\nMETADATA:\n{element['metadata']}\n")
else:
    print(f"ERROR:\n{result.error_message}")

2025-05-11 16:07:12,790 - ATHON - DEBUG - Expanding results by section.
2025-05-11 16:07:12,796 - ATHON - DEBUG - Successfully retrieved elements from the collection.
TEXT:
5. 2. 8. 4. 1 general this service is used for nidd transfer between smf and another nf. see clause 4. 25. 5.
METADATA:
{'category_depth': 4, 'file_directory': 'documents', 'filename': '23502-i60-smf.docx', 'filetype': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'header': '5.2.8.4.1 General', 'id': '88cbf2d0246bb602f5dd9f0d9a3b832c', 'languages': "['eng']", 'last_modified': '2025-05-09T23:58:02', 'page_number': 15, 'parent_id': '09358a13f349091f7135b4e62a1766cf', 'processed_by': 'WKSHP-LLM', 'type': 'Title'}

TEXT:
5. 2. 8. 1 general the following table shows the smf services and smf service operations. table 5. 2. 8. 1 - 1 : nf services provided by the smf service name service operations operation semantics example consumer ( s ) nsmf _ pdusession create request / response v - smf / i