# Agentic RAG with SmoLAgents

This notebook demonstrates how to build agentic RAG systems using HuggingFace's SmoLAgents library.

## Contents
1. Quickstart - Basic agent usage
2. Agentic RAG - Advanced retrieval with reasoning

## Installation

First, let's install the required packages:

In [None]:
!pip install 'smolagents[toolkit]' pandas langchain langchain-community sentence-transformers datasets python-dotenv rank_bm25 --upgrade -q

## Part 1: Quickstart Examples

### Example 1: Basic Agent

Let's create a simple agent that can perform calculations:

In [None]:
from smolagents import CodeAgent, InferenceClientModel

# Initialize a model
model = InferenceClientModel()

# Create an agent with no tools
agent = CodeAgent(tools=[], model=model)

# Run the agent with a task
result = agent.run("Calculate the sum of numbers from 1 to 10")
print(result)

### Example 2: Agent with Web Search Tool

Now let's add web search capabilities to our agent:

In [None]:
from smolagents import CodeAgent, InferenceClientModel, DuckDuckGoSearchTool

model = InferenceClientModel()
agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
)

# Agent can now search the web
result = agent.run("What are the latest developments in LLM agents?")
print(result)

### Example 3: Using Different Models

SmoLAgents supports various model backends:

In [None]:
# Using a specific Hugging Face model
from smolagents import InferenceClientModel

# You can specify any model from Hugging Face
model = InferenceClientModel(model_id="Qwen/Qwen2.5-72B-Instruct")

agent = CodeAgent(tools=[], model=model)
result = agent.run("What is the Fibonacci sequence? Calculate the first 10 numbers.")
print(result)

## Part 2: Agentic RAG Implementation

Now let's build a more advanced system: an agentic RAG pipeline that can reason over documents.

### What is Agentic RAG?

Traditional RAG:
1. User query → Retrieve documents → Generate answer

Agentic RAG:
1. User query → Agent formulates optimal retrieval query
2. Agent retrieves documents
3. Agent reasons over results
4. Agent may retrieve again with refined queries
5. Agent synthesizes final answer

### Step 1: Load and Prepare Knowledge Base

In [None]:
import datasets
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load Hugging Face documentation dataset
print("Loading knowledge base...")
knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")

# Filter for transformers documentation
knowledge_base = knowledge_base.filter(
    lambda row: row["source"].startswith("huggingface/transformers")
)

print(f"Loaded {len(knowledge_base)} documents")

# Convert to Document objects
source_docs = [
    Document(
        page_content=doc["text"],
        metadata={"source": doc["source"].split("/")[1]}
    )
    for doc in knowledge_base
]

# Split documents into chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", " ", ""]
)

docs_processed = text_splitter.split_documents(source_docs)
print(f"Split into {len(docs_processed)} chunks")

### Step 2: Create Custom Retriever Tool

We'll create a custom tool that the agent can use to retrieve relevant documents:

In [None]:
from smolagents import Tool
from langchain_community.retrievers import BM25Retriever

class RetrieverTool(Tool):
    name = "retriever"
    description = "Uses semantic search to retrieve documents that match the query. Returns the top k most relevant documents."
    inputs = {
        "query": {
            "type": "string",
            "description": "The search query to find relevant documents"
        }
    }
    output_type = "string"

    def __init__(self, docs, k=10, **kwargs):
        super().__init__(**kwargs)
        self.retriever = BM25Retriever.from_documents(docs, k=k)

    def forward(self, query: str) -> str:
        """Retrieve documents based on the query."""
        docs = self.retriever.invoke(query)
        
        # Format retrieved documents
        result = "\n\n".join(
            [f"Document {i+1}:\n{doc.page_content}" for i, doc in enumerate(docs)]
        )
        return result

# Initialize the retriever tool
retriever_tool = RetrieverTool(docs_processed, k=7)
print("Retriever tool created successfully!")

### Step 3: Create the Agentic RAG System

Now we'll create an agent that can use the retriever tool to answer questions:

In [None]:
from smolagents import CodeAgent, InferenceClientModel

# Initialize the model
model = InferenceClientModel()

# Create agent with retriever tool
agent = CodeAgent(
    tools=[retriever_tool],
    model=model,
    max_steps=4,
    verbosity_level=2  # Set to 2 to see agent's reasoning process
)

print("Agentic RAG system ready!")

### Step 4: Test the Agentic RAG System

Let's ask some questions about the Transformers library:

In [None]:
# Example 1: Simple question
question = "How do I load a pretrained model in Transformers?"
print(f"Question: {question}\n")

answer = agent.run(question)
print(f"\nAnswer: {answer}")

In [None]:
# Example 2: More complex question requiring multi-step reasoning
question = "What are the different methods for fine-tuning models, and which one should I use for limited computational resources?"
print(f"Question: {question}\n")

answer = agent.run(question)
print(f"\nAnswer: {answer}")

In [None]:
# Example 3: Question that might require multiple retrievals
question = "Compare AutoModel and AutoTokenizer - how do they work together and what are their key differences?"
print(f"Question: {question}\n")

answer = agent.run(question)
print(f"\nAnswer: {answer}")

## Benefits of Agentic RAG

The agentic approach provides several advantages:

1. **Adaptive Retrieval**: The agent can formulate multiple queries if needed
2. **Reasoning**: The agent can analyze retrieved content and synthesize information
3. **Self-Correction**: If initial results aren't satisfactory, the agent can try different approaches
4. **Multi-Step Queries**: Complex questions can be broken down into sub-questions

Compare this to traditional RAG where you have a fixed pipeline with no ability to adapt based on the quality of retrieved results.

## Optional: Using Local Models

You can also use local models with SmoLAgents. Here's an example with Ollama:

In [None]:
# Uncomment to use with Ollama (requires Ollama to be running)
# from smolagents import LiteLLMModel
# 
# model = LiteLLMModel(model_id="ollama/llama3.1", api_base="http://localhost:11434")
# 
# agent = CodeAgent(
#     tools=[retriever_tool],
#     model=model,
#     max_steps=4,
#     verbosity_level=2
# )
# 
# answer = agent.run("How do I use pipeline in Transformers?")
# print(answer)

## Next Steps

Try extending this notebook by:
1. Adding your own documents to the knowledge base
2. Creating custom tools for specific tasks
3. Experimenting with different retrieval methods (dense embeddings, hybrid search)
4. Adding memory to the agent for multi-turn conversations
5. Implementing citation tracking to show which documents were used