## <b><center>Multi-Agent Retrieval-Augmented Generation Orchestration</center></b>

Welcome to the third phase of the InfoFusion Technologies Multi-Agent RAG project!  

In this notebook, we will build the orchestration layer that brings together our embedded knowledge base and internet search capabilities.  
Using multiple specialized agents, we will enable comprehensive retrieval, question answering, and intelligent response synthesis.


### 🧠**Load Vector Store and Embedding Model**

Before building retrieval agents, we first reload the ChromaDB collection containing our embedded knowledge base, and initialize the embedding model for semantic queries.

This step ensures our agents operate on the latest indexed data.

In [1]:
import os
from dotenv import load_dotenv
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper, SerpAPIWrapper
from langchain_groq import ChatGroq
from langchain.prompts import PromptTemplate
from langchain.agents import create_openai_tools_agent, AgentExecutor

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Load environment variables
load_dotenv()

# Initialize embedding model
embedder = SentenceTransformer("all-MiniLM-L6-v2")
print("Embedding model loaded.")

# Connect to ChromaDB
client = chromadb.PersistentClient(path="db/chromadb_data")
collection = client.get_collection(name="infofusion_chunks")
print(f"Loaded ChromaDB collection with {collection.count()} embeddings.")

# Initialize Wikipedia and SerpAPI agents
wiki_api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=500)
wikipedia_agent = WikipediaQueryRun(api_wrapper=wiki_api_wrapper)

serp_api_key = os.environ.get("SERP_API_KEY")
serp_api_wrapper = SerpAPIWrapper(serpapi_api_key=serp_api_key)

# Agent list prioritized: Wikipedia first, then SerpAPI
# internet_search_agents = [wikipedia_agent, serp_api_wrapper]

internet_search_agents = {
    "Wikipedia": wikipedia_agent,
    "SerpAPI": serp_api_wrapper
}

Embedding model loaded.
Loaded ChromaDB collection with 8929 embeddings.


### 🔍 **Define Semantic Retrieval Agent**

This agent performs similarity search over our embedded documents using ChromaDB.

Given a natural language query, it retrieves relevant text chunks that will form the knowledge base context for downstream reasoning.


In [3]:
def semantic_retrieval(query, top_k = 1):
    query_emb = embedder.encode([query])[0] 
    results = collection.query(
        query_embeddings = [query_emb],
        n_results = top_k
    )
    retrieved_chunks = results['documents'][0]
    return retrieved_chunks

# Test retrieval agent
sample_query = "What is gradient descent?"
retrieved_docs = semantic_retrieval(sample_query)
print(f"Top {len(retrieved_docs)} docs retrieved:")
for doc in retrieved_docs:
    print(doc, "...\n")

Top 1 docs retrieved:
110 • Artifici Al intelligence , MAchine  leArning , Deep leArning
Here is a key point: backward error propagation involves the cal -
culation of numbers that are used to update the weights of the edges in 
the neural network . The update process is performed by means of a loss 
function (and an optimizer and a learning rate), starting from the output 
layer (the right-most layer) and then moving in a right-to-left  fashion 
in order to update the weights of the edges between consecutive lay -
ers. This procedure trains the neural network, which involves reducing 
the loss between the estimated values at the output layer and the true 
values (in the case of supervised learning). This procedure is repeated 
for each data point in the training portion of the dataset. Processing the 
dataset is called an epoch, and many times a neural network is trained 
via multiple epochs.
The previous paragraph did not explain what the loss function is or ...



### 🌐 Initialize Internet Search Agents

To complement static documents, we integrate real-time external knowledge sources like Wikipedia and general web search through SerpAPI.  
These agents ensure we have the latest, broad knowledge at our fingertips beyond our vector store.


In [4]:
def internet_search(query):
    """
    Performs an internet search using multiple agents and aggregates the results.
    """
    results = {}
    for agent_name, agent_object in internet_search_agents.items():
        try:
            # We call the 'run' method on the agent object, not the agent name
            response = agent_object.run(query)
            results[agent_name] = response
        except Exception as e:
            print(f"Agent {agent_name} failed with error: {e}")
    return results

# Test internet search function
sample_query = "What is gradient descent?"
internet_search_results = internet_search(sample_query)

# Now, iterate over the returned dictionary to print the results
for agent_name, result in internet_search_results.items():
    print(f"Agent {agent_name} response:\n{result}\n")

Agent Wikipedia response:
Page: Gradient
Summary: In vector calculus, the gradient of a scalar-valued differentiable function 
  
    
      
        f
      
    
    {\displaystyle f}
  
 of several variables is the vector field (or vector-valued function) 
  
    
      
        ∇
        f
      
    
    {\displaystyle \nabla f}
  
 whose value at a point 
  
    
      
        p
      
    
    {\displaystyle p}
  
 gives the direction and the rate of fastest increase. The gradient transforms like a vector under c

Agent SerpAPI response:
['Gradient descent is an optimization algorithm used to train machine learning models by minimizing errors between predicted and actual results.', 'Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate ...', 'Gradient Descent is a fundamental algorithm in machine learning and optimization. It is used for tasks like training neural networks, 

### 🛠️ Wrap Retrieval and Internet Agents as Tools

We transform each knowledge source into a LangChain Tool with clear names and descriptions.  
This modular tool design empowers the language model to flexibly call on the best resource for each query.


In [5]:
from langchain.tools import BaseTool

class VectorDBTool(BaseTool):
    name = "VectorDB"
    description = "Searches for answers in the embedded document database."
    
    def _run(self, query: str) -> str:
        chunks = semantic_retrieval(query, top_k=3)
        return "\n---\n".join(chunks)
    
    async def _arun(self, query: str) -> str:
        raise NotImplementedError("Async not implemented")

class WikipediaTool(BaseTool):
    name = "Wikipedia"
    description = "Performs Wikipedia search to answer questions."
    
    def _run(self, query: str) -> str:
        try:
            return wikipedia_agent.run(query)
        except Exception as e:
            return f"Wikipedia Agent error: {e}"
    
    async def _arun(self, query: str) -> str:
        raise NotImplementedError("Async not implemented")

class SerpApiTool(BaseTool):
    name = "SerpAPI"
    description = "Uses SerpAPI for live web search."
    
    def _run(self, query: str) -> str:
        try:
            return serp_api_wrapper.run(query)
        except Exception as e:
            return f"SerpAPI Agent error: {e}"
    
    async def _arun(self, query: str) -> str:
        raise NotImplementedError("Async not implemented")

tools = [VectorDBTool(), WikipediaTool(), SerpApiTool()]

### 🤖 Prioritized Orchestrator Function

Our orchestrator queries each tool in priority order:  
- VectorDB first, checking for substantial content,  
- then Wikipedia,  
- finally SerpAPI for live web results.  

This approach balances depth, recency, and performance.


In [6]:
def prioritized_orchestrator(query):
    # Query VectorDB first
    vec_res = VectorDBTool()._run(query)
    # Confidence: require minimum length; can be extended to semantic quality or keywords
    if vec_res and len(vec_res.strip()) > 50:
        return vec_res

    # Fallback to Wikipedia search
    wiki_res = WikipediaTool()._run(query)
    if wiki_res and any(char.isalnum() for char in wiki_res):
        return wiki_res

    # Final fallback to SerpAPI web search
    serp_res = SerpApiTool()._run(query)
    if serp_res and any(char.isalnum() for char in serp_res):
        return serp_res

    return "Sorry, no good answer found from any source."


# # Example usage
# user_query = "What is gradient descent?"
# answer = prioritized_orchestrator(user_query)
# print(answer)

### ✍️ Design a Custom Prompt Template

We craft a carefully worded prompt that instructs ChatGroq how to prioritize tools for answering:  

1) Check VectorDB first, 
2) Fallback to Wikipedia, 
3) Finally try SerpAPI if needed.  

This prompt guides the model to provide accurate and concise answers.


In [12]:
from langchain import PromptTemplate

custom_prompt = """
You are an expert AI assistant with access to three powerful tools:

- **VectorDB:** An internal vector database containing high-quality, trusted documents and knowledge, preferred for all answers when possible.
- **Wikipedia:** A tool to search Wikipedia for up-to-date factual content when the internal database does not fully answer the question.
- **SerpAPI:** A live internet search tool for real-time and broad web knowledge, used only if neither VectorDB nor Wikipedia provide sufficient information.

**For every user request:**
1. Carefully consider the user's question, reduce it to its core intent if needed.
2. Query VectorDB for the most semantically relevant content chunks.
      - If the retrieved content directly and confidently answers the user's question, use ONLY this content. Compose a precise, well-written answer using and summarizing these chunks. Cite/refer to the found content in your synthesis.
3. If VectorDB does not contain enough or sufficiently relevant information, use the Wikipedia tool.
      - Integrate the best Wikipedia result(s) clearly and accurately with any helpful context you have from earlier steps.
4. If both VectorDB and Wikipedia fail to contain the required answer, use SerpAPI as a last resort.
      - When using SerpAPI, prioritize concise, factual, and up-to-date responses.
5. In all cases, create a direct, complete answer that directly addresses the user's actual question.
      - Explain reasoning step by step if helpful for clarity, but avoid any filler, speculation, or unsupported content.
      - Ground the answer in the retrieved sources and cite their origin (e.g., "According to VectorDB", "Wikipedia states...", etc.).
      - Do NOT hallucinate or invent information beyond what is retrieved.
6. If no sources provide a sufficient answer, honestly state that the information could not be found.

**Remember:**  
- Always try VectorDB first and prefer it for trusted, detailed content.
- Only use external search (Wikipedia, SerpAPI) if necessary, and integrate results transparently and carefully.
- Answer in a more natural manner, and do not mention things like "According to VectorDB", "Wikipedia states...", etc.

**User Question:**  
{input}

{agent_scratchpad}
"""

prompt = PromptTemplate(template=custom_prompt, input_variables=["input", "agent_scratchpad"])

### 🤖 Instantiate ChatGroq Agent Executor

Now we start up our ChatGroq LLM with the tools and prompt, assembling the complete multi-agent orchestrator.  

This agent intelligently decides which knowledge source to query for the best answer.


In [21]:
groq_api_key = os.environ.get("GROQ_API_KEY")

llm = ChatGroq(groq_api_key = groq_api_key, model = "llama-3.3-70b-versatile")

agent = create_openai_tools_agent(llm = llm, tools = tools, prompt = prompt)

agent_executer = AgentExecutor(agent= agent, tools = tools, verbose= True, return_intermediate_steps=True)

### ✅ Test Direct Q&A with Multi-Agent RAG

Let’s put it all together by asking a question and seeing how our system dynamically queries multiple knowledge bases.  

The answer synthesis showcases the power of combining internal and external intelligence seamlessly.

In [22]:
user_question = "What is PCA?"

response = agent_executer.invoke({"input": user_question})

print("Agent response:\n", response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `VectorDB` with `PCA`


[0m[36;1m[1;3mintroDuction  to MAchine  leArning  • 33
There are two advantages to PCA: 1) reduced computation time due to 
far fewer features and 2) the ability to graph the components when there are 
at most three components. If you have four or five components, you won’t be 
able to display them visually, but you could select subsets of three components 
for visualization, and perhaps gain some additional insight into the dataset.
PCA uses the variance as a measure of information: the higher the variance, 
the more important the component. In fact, just to jump ahead slightly: PCA 
determines the eigenvalues and eigenvectors of a covariance matrix (discussed 
later), and constructs a new matrix whose columns are eigenvectors, ordered 
from left-to-right based on the maximum eigenvalue in the left-most column, 
decreasing until the right-most eigenvector also has the smallest eigenvalue

### 🚀 Next Steps

- **Modularize the codebase** to enhance maintainability and scalability.  
- **Integrate the logic into an interactive Streamlit application** featuring a multi-tab user interface for an enhanced user experience.  
- **Develop MCQ creation and automated answer evaluation features** by leveraging similar orchestrator design patterns.  
- **Experiment with advanced confidence metrics and agent-based LLM conversation chains** to improve accuracy and interactivity.  
- **Deploy your Retrieval-Augmented Generation (RAG) assistant** either as a cloud-hosted API or a fully containerized application for seamless accessibility and scalability.