# AgenticRAG: Full Customization Guide

## Introduction

So far, we've explored how to use RAGAgent's built-in capabilities to load datasets and work with multiple retrievers and tasks. But what if you want more control? Can you add your own logic for particular data types, retrievers, or tasks?

**Absolutely!** In this notebook, we'll explore how to customize each component of the AgenticRAG system.


## 1. Stores: The Foundation

The foundation of our agentic RAG system is a highly customizable storage system that lets us store various data types consistently. The library provides two built-in storage backends:

- **SQLBackend**
- **ChromaBackend**

You can also create your own storage backend by inheriting from the `BaseBackend` abstract class.

### Core Store Types

Using these backends, we have individual stores (essentially tables in SQL backend) with their own model and schema types:

- **MetaStore**: Stores metadata about all datasets (required)
- **TextStore**: Manages text-based data
- **TableStore**: Handles tabular data
- **ExternalDBStore**: Interfaces with external databases

Each store provides basic CRUD operations (add, update, delete, get, get_all, index) and may include additional methods like `search_similar` in specific implementations.

In [1]:
import sys
import os

sys.path.append(os.path.abspath(".."))

In [None]:
import os
from agenticrag.stores import MetaStore, TextStore, TableStore, ExternalDBStore

# Create a directory for data storage
os.mkdir(".agenticrag_data_custom") if not os.path.exists(".agenticrag_data_custom") else None

# Define storage locations
SQL_DB_CONNECTION = "sqlite:///.agenticrag_data_custom/agenticrag.db"
CHROMA_PERSISTENT_FOLDER = ".agenticrag_data_custom"

# Initialize stores
meta_store = MetaStore(connection_url=SQL_DB_CONNECTION)
table_store = TableStore(connection_url=SQL_DB_CONNECTION)
external_db_store = ExternalDBStore(connection_url=SQL_DB_CONNECTION)
text_store = TextStore(persistent_dir=CHROMA_PERSISTENT_FOLDER)

### Adding Data Manually

You can add data directly to stores, but it requires handling several steps:


In [None]:
from agenticrag.types import TextData, TableData, MetaData, DataFormat

# Add text data
text_store.add(
    TextData(
        id="text_1", 
        name="random_fact",  
        text="The sun rises in the east"
    )
)

# Add table data
table_store.add(
    TableData(
        id=1,
        name="iris",
        path="data/iris.csv",
        structure_summary="""The Iris Flowers Dataset, it has the following columns:
        Id(int): Id of the entry
        SepalLengthCm (float): Sepal length in cm
        SepalWidthCm (float): Sepal width in cm
        PetalLengthCm (float): Petal length in cm
        PetalWidthCm (float): Petal width in cm
        Species (str): Name of the iris species
        """,
    )
)

# Don't forget to add metadata entries
meta_store.add(
    MetaData(
        name="random_fact",
        format=DataFormat.TEXT,
        description="This data contains random fact",
    )
)

meta_store.add(
    MetaData(
        name="iris",
        description="This data contains information on iris flowers and their different categories",
        format=DataFormat.TABLE,
    )
)


As you can see, adding data directly to stores can be complex, especially for structured data types. You must also remember to update the MetaStore accordingly. This is where Loaders come in handy.

## 2. Loaders: Simplifying Data Ingestion

Loaders make adding data to stores convenient by handling complex transformations from various sources (PDFs, databases, raw texts, web pages, etc.) to store-compatible formats.

You can create custom Loader classes by inheriting from the `BaseLoader` abstract class. Loaders help avoid the complexity of converting inconsistent data types into consistent, store-compatible formats.

### Example: Using Built-in Loaders

In [None]:
from agenticrag.loaders import TextLoader, TableLoader

# Initialize loaders
text_loader = TextLoader(text_store, meta_store)
table_loader = TableLoader(table_store, meta_store, persistence_dir=".agenticrag_data_custom/tables")

# Load a PDF file - parsing, chunking, description generation handled automatically
text_loader.load_pdf(path="data/attention.pdf")

The beauty of this design is its modularity. If you want custom logic for PDF processing or chunking, you can create a separate loader while keeping the rest of the system intact.

### Viewing Stored Data

In [None]:
import pandas as pd

# View all text data
all_text = text_store.get_all()
df = pd.DataFrame([text.model_dump() for text in all_text])
df

# View table data
pd.DataFrame([table.model_dump() for table in table_store.get_all()])

# View metadata
pd.DataFrame([meta.model_dump() for meta in meta_store.get_all()])

## 3. Connectors: Interfacing with External Sources

Connectors are similar to loaders but specifically designed to interface with external data sources rather than loading raw data. Currently, we have an ExternalDBConnector for connecting to external SQL databases.

In [None]:
from agenticrag.connectors import ExternalDBConnector

connector = ExternalDBConnector(external_db_store, meta_store)

# Connect to an external database
connector.connect_db(name="movie_database", connection_url_env_var="DATABASE_URL")

# View updated metadata
pd.DataFrame([meta.model_dump() for meta in meta_store.get_all()])

## 4. Retrievers: Finding Relevant Data

After loading data, the next task is retrieval. Retrievers are tools or agents that take specific input formats and retrieve relevant data. All retrievers implement the `BaseRetriever` abstract class.

Instead of returning data directly, retrievers typically save relevant information to files (.txt, .csv, etc.) and return a success message with the file path. This approach prevents large data from being passed through unnecessary LLM calls.


In [None]:
from agenticrag.retrievers import SQLRetriever, VectorRetriever, TableRetriever

# Initialize retrievers
sql_retriever = SQLRetriever(external_db_store, persistent_dir=".agenticrag_data_custom/retrieved_data")
vector_retriever = VectorRetriever(text_store, persistent_dir=".agenticrag_data_custom/retrieved_data")
table_retriever = TableRetriever(table_store, persistent_dir=".agenticrag_data_custom/retrieved_data")

# Retrieve text data
vector_retriever.retrieve(query="Self attention", document_name="attention")

# Read retrieved data
with open(".agenticrag_data_custom/retrieved_data/text_data.txt", "r") as f:
    print(f.read())

# Retrieve SQL data
sql_retriever.retrieve("list all movies name and running duration whose name start with Sa", db_name="movie_database")

# Read retrieved data
with open(".agenticrag_data_custom/retrieved_data/table_data.csv", "r") as f:
    print(f.read())

## 5. Tasks: Acting on Retrieved Data

Just retrieving data isn't enough; we need downstream tasks like question answering, chart generation, report creation, or making predictions. Tasks utilize retrieved context to perform specific operations.


In [None]:
from agenticrag.tasks import QuestionAnsweringTask, ChartGenerationTask

# Initialize tasks
qa_task = QuestionAnsweringTask()
chart_generation = ChartGenerationTask()

# Execute tasks with retrieved data
qa_task.execute("Why is self attention important?", ".agenticrag_data_custom/retrieved_data/text_data.txt")
chart_generation.execute("Generate a bar plot showing duration of each movie", ".agenticrag_data_custom/retrieved_data/table_data.csv")


## Putting It All Together: RAGAgent

Combining all these components creates the complete RAGAgent. You can provide specific retrievers and tasks to the agent as needed. Based on the provided retrievers, the agent will have access to certain stores and can load data directly into them.

For larger projects, it's recommended to use separate loaders and use the central RAGAgent primarily as a controller that manages tasks and retrievers.

In [None]:
from agenticrag import RAGAgent

# Create a complete RAG agent
agent = RAGAgent(
    persistent_dir=".agenticrag_data_custom",
    retrievers=[sql_retriever, vector_retriever, table_retriever],
    tasks=[qa_task, chart_generation]
)

# Execute a query
agent.invoke("What is multi-headed attention?")

This modular architecture allows you to customize any component while maintaining compatibility with the rest of the system, giving you the flexibility to adapt AgenticRAG to your specific needs.