# Multi-Agent RAG Pipline
RAG system contsin (3) major layers:
1. Knowledge Layer (Vector Store)
    - Where documents live, chunked and embdeed (for semantic searching)

2. Retrieval Layer
    - Takes a user query
    - Finds relevant chunks
    - Returns them as context

3. Agent Layer
    Team of LLM-powered agents that:
    - Inerpret question
    - Retrieve context
    - Reason collaboratively
    - Produce anser

## Setup: Import and environemnt
To build a pipeline, we need:
- Vector Store (Chroma DB)
- Embedding Model (OpenAI)
- Chunking logic (break long docs into retrivalbe pieces)
- AutoGen Agents (to reason over retrieved context)

In [1]:
import os
from dotenv import load_dotenv

import chromadb # vectore store backend
from chromadb.config import Settings 

from openai import OpenAI   # for embeddings

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

import asyncio  # Autogen 0.4 uses aysnc execution
import textwrap # clean chunking

### Load environment and initialize OpenAI + Chroma

In [2]:
# Load env variables
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("OPENAI_API_KEY not found in environment variables.")

In [3]:
# Init openai client (used to generated embeddings)
oa_client = OpenAI(api_key=api_key)

# Init ChromaDB (Persistent, Telemtry Off)
# NOTE: Want local vector store that doesn't "phone home"
client_settings = Settings(anonymized_telemetry=False)
chroma_client = chromadb.PersistentClient(
    path="./chroma_db_multi_agent",
    settings=client_settings
)

### Define OpenAI embedding function for Chroma

In [10]:
EMBEDDING_MODEL = 'text-embedding-3-small'

# NOTE: Wrap embedings in a class as chroma expected an object with __call__ method
class OpenAIEmbeddingFunction:
    def __init__(self, client, model: str):
        self.client = client
        self.model = model

    def __call__(self, input: list[str]) -> list[list[float]]:
        response = self.client.embeddings.create(
            model=self.model,
            input=input,
        )
        return [item.embedding for item in response.data]
    
    def name(self) -> str:
        # Chroma requires this for conflict detection
        return f"openai={self.model}"
    
embedding_fn = OpenAIEmbeddingFunction(oa_client, EMBEDDING_MODEL)

### Create Chroma Collection

In [13]:
# chroma_client.delete_collection("rag_collection_multi_agent")     # clean-up (if needed)
collection = chroma_client.get_or_create_collection(
    name="rag_collection_multi_agent",
    embedding_function=embedding_fn,
    metadata={"hnsw:space": "cosine"}   # Use cosine as standard metric for semantic embeddings
)

### Chunking Helper and Indexing Sample Content
(LLM retreive better when documents broken into small, overlapping chunks)

In [14]:
def chunk_text(text: str, chunk_size: int = 300, overlap: int = 50):
    text = textwrap.dedent(text).strip()
    words = text.split()
    chunks = []
    start = 0

    while start < len(words):
        end = start + chunk_size
        chunk = " ".join(words[start:end])
        chunks.append(chunk)
        start = end - overlap # slide with overlap

    return chunks

In [16]:
# Index sample content
kb_text = """ AutoGen is a framework for building multi-agent systems that can collaborate to solve complex tasks. 
It supports tools, function calling, and orchestration patterns for LLM-based agents. 

Retrieval-Augmented Generation (RAG) is a technique where a model retrieves relevant context from an external 
knowledge base (like a vector database) and uses it to ground its responses. 

By combining AutoGen with a vector store like ChromaDB, you can build multi-agent systems that retrieve, 
reason, and respond with up-to-date, domain-specific knowledge. 
"""

# Chunk text
chunks = chunk_text(kb_text, chunk_size=40, overlap=10)

# Generate unique id for each chunk (chroma requries every document to have a unique ID)
ids = [f"chunk-{i}" for i in range(len(chunks))]

# Insert chunked text into Chroma collection 
collection.add(
    documents=chunks,
    ids=ids,
)

# ouput total number of chunks, preview first two chunks
len(chunks), chunks[:2]

(3,
 ['AutoGen is a framework for building multi-agent systems that can collaborate to solve complex tasks. It supports tools, function calling, and orchestration patterns for LLM-based agents. Retrieval-Augmented Generation (RAG) is a technique where a model retrieves relevant context from an',
  'a technique where a model retrieves relevant context from an external knowledge base (like a vector database) and uses it to ground its responses. By combining AutoGen with a vector store like ChromaDB, you can build multi-agent systems that retrieve,'])

(3,
 ['AutoGen is a framework for building multi-agent systems that can collaborate to solve complex tasks. It supports tools, function calling, and orchestration patterns for LLM-based agents. Retrieval-Augmented Generation (RAG) is a technique where a model retrieves relevant context from an',
  'a technique where a model retrieves relevant context from an external knowledge base (like a vector database) and uses it to ground its responses. By combining AutoGen with a vector store like ChromaDB, you can build multi-agent systems that retrieve,'])

## Build Multi-Agent Team
This multi-agent team will have the following agents: researcher, writer, critic

### Create Model Client

In [17]:
# NOTE: Did previously, restating for clarity
model_client = OpenAIChatCompletionClient(model='gpt-4o', api_key=api_key)

### Create the Agents
Key things to note about agents
- Like movie star actors, agents need clear-defined roles

In [18]:
researcher = AssistantAgent(
    name="researcher",
    model_client=model_client,
    system_message=(
        "You are a researcher-focused agent. "
        "Your job is to analyze the user's query, retrieve relevant context, "
        "and propose a plan for answering the question. "
        "Be thorough, analytical, and explicit about what information is needed."
    )
)

writer = AssistantAgent(
    name="writer",
    model_client=model_client,
    system_message=(
        "You are a writing-focused agent. "
        "Your job is to synthesize a clear, accurate, grounded answer "
        "using the retrieved context and the researcher's plan. "
        "Write with clarity and precision"
    )
)

critic = AssistantAgent(
    name="critic",
    model_client=model_client,
    system_message=(
        "You are a critical evaluator. "
        "Your job is to review the writer's answer for correctness, "
        "grounding in retrieved context, and clarity. "
        "Suggest improvements when necessary"
    )
)

### Define Agent-to-Agent Messaging

In [None]:

# NOTE: Change in how communicate with agents
'''
Chat helper function

Instead of having to run agents like:
`await researcher.run(task="Here is my plan", recipient=writer)`

Can NOW just say:
`await chat(researcher, writer, "Here is my plan")`
'''
async def chat(sender, receiver, message: str):
    # Ask the sender agent to run a task
    result = await sender.run(
        task=message,   # what sender is saying
        receiver=receiver   # who the sender is talkign to
    )

    # Return the result of that interaction
    return result

In [None]:
# Sanity Check 
result = await researcher.run(task="Explain your role in one sentence.")
print(result.messages[-1].content)

As a researcher-focused agent, my role is to analyze the user's query, retrieve relevant context and information, and propose a comprehensive plan for addressing their question or research needs.


As a researcher-focused agent, my role is to analyze the user's query, retrieve relevant context and information, and propose a comprehensive plan for addressing their question or research needs.

## Buld Multi-AGent RAG Pipeline (Retrieval + Collaboration + Revision)