# 🤖 Building Autonomous Agents: Adding Retrieval

Welcome to the next step in building autonomous agents! In this notebook, we'll focus on expanding an LLM's knowledge by allowing it to query a knowledge base filled with OpenSearch's Documentation.

In a typical Retrieval Augmented Generation (RAG) system, you programmatically query the knowledge base and pass it in as context into the LLM. We'll be doing something very similar here with one cavaet, the model is deciding whether to query the knowledge base through the use of a function call. 

## Objectives:
- Extend the existing agent we've built to query our chromaDB instance we've been using through this lab. 
- Register the retrieval tool so that the agent can decide to query the knowledge base
- Observe how the agent can use retrieval to solve problems

At the end of this module, we'll have a fully fledged agent complete with memory, retrieval, and tools

Let's begin by setting up our environment and importing all the types, clients, and converters we've built so far. 🚀

In [None]:
# Lets import all the types, clients, and converters we've used so far.

from agentic_platform.core.models.prompt_models import BasePrompt
from agentic_platform.core.models.memory_models import Message, ToolResult, ToolCall
from agentic_platform.core.models.llm_models import LLMResponse, LLMRequest
from agentic_platform.core.models.tool_models import ToolSpec

# Import our decorator for tools like we discussed in the previous lab. 
from agentic_platform.core.decorator.toolspec_decorator import tool_spec

# import our sample tools from the tool lab.
from agentic_platform.core.tool.sample_tools import weather_report, handle_calculation

# Import our chroma client we've been using throughout all the modules.
# If you have not gone through the setup.ipynb file, please do so before continuing 
from utils.retrieval_client import get_chroma_os_docs_collection, RetrievalResult, ChromaDBRetrievalClient


# Import Converse API converter to convert the raw JSON into our own types.
from agentic_platform.core.converter.llm_request_converters import ConverseRequestConverter
from agentic_platform.core.converter.llm_response_converters import ConverseResponseConverter
from agentic_platform.core.models.llm_models import LLMRequest, LLMResponse
from typing import Dict, Any
import boto3

bedrock = boto3.client('bedrock-runtime')
# Helper function to call Bedrock. Passing around JSON is messy and error prone.
def call_bedrock(request: LLMRequest) -> LLMResponse:
    kwargs: Dict[str, Any] = ConverseRequestConverter.convert_llm_request(request)
    # Call Bedrock
    converse_response: Dict[str, Any] = bedrock.converse(**kwargs)
    # Get the model's text response
    return ConverseResponseConverter.to_llm_response(converse_response)

chroma_client = get_chroma_os_docs_collection()
print("Make sure our chroma client is working")
print(len(chroma_client.retrieve(query_text="How do I install OpenSearch on AWS?", n_results=1)) == 1)


# Create a Knowledge Base Client wrapper
Lets create some types around the knowledge base retrieval and implement our client. We're using the VectorSearchRequest and Response from the agentic platform here. Even though it's geared towards a more complex search, we can still use it for our simple ChromaDB implementation. We're using an abstraction layer between the actual db and the rest of the code so swapping different DBs becomes trivial. 

In [None]:
from pydantic import BaseModel, Field
from typing import Optional, List, Dict, Any

from agentic_platform.core.models.vectordb_models import VectorSearchRequest, VectorSearchResponse, VectorSearchResult, FilterCondition

# Lets see what the request looks like. 
VectorSearchRequest??

In [None]:
# Create a simple VectorSearch client that abstracts away which vector DB we're using.
class VectorSearchClient:
    """Client for vector search."""
    def __init__(self):
        self.client: ChromaDBRetrievalClient = get_chroma_os_docs_collection()

    def search(self, request: VectorSearchRequest) -> VectorSearchResult:
        response: List[RetrievalResult] = self.client.retrieve(request.query, n_results=1)
        # Create a VectorSearchResult for each retrieval result

        results: List[VectorSearchResult] = []
        for result in response:
            results.append(VectorSearchResult(text=result.document, score=result.distance, metadata=result.metadata))
        
        
        return VectorSearchResponse(results=results)
    
vector_search_client: VectorSearchClient = VectorSearchClient()

# Add Retrieval Tool. 
There are a couple ways to implement this, but the simplest is to build out retrieval as a tool that essentially just does RAG. You can pass the retrieval results directly back into the model but it generally works better over multi turn conversations to make a 1 off call to an LLM to summarize the results. It keeps the conversation token usage down on multi-turn

In [None]:
from pydantic import BaseModel, Field
from typing import List, Dict, Any

SYSTEM_PROMPT = "You are a RAG bot. You are given a query and context. Your job is to answer the query using ONLY the context provided."

USER_PROMPT = """
For the users query:
<query>
{user_message}
</query>

And context below

<context>
{context}
</context>

Answer the query using ONLY the context provided. Avoid saying "according to the context provided" or anything similar.
Be very direct and straight forward with your answer. It's returning to an agent, not a human.
"""    

class RAGPrompt(BasePrompt):
    system_prompt: str = SYSTEM_PROMPT
    user_prompt: str = USER_PROMPT

class RagInput(BaseModel):
    query_text: str

def retrieve_and_answer(input: RagInput) -> str:
    """Search the knowledge base for relevant information based on a query."""
    # Perform the search
    response: VectorSearchResponse = vector_search_client.search(VectorSearchRequest(query=input.query_text))
    # Aggregate the results into a single string
    context: str = "\n".join([result.text for result in response.results])
    # Create the RAG prompt
    rag_prompt: RAGPrompt = RAGPrompt(inputs={"user_message": input.query_text, "context": context})
    # Build out the LLMRequest.
    llm_request: LLMRequest = LLMRequest(
        system_prompt=rag_prompt.system_prompt,
        messages=[Message(role="user", text=rag_prompt.user_prompt)],
        model_id=rag_prompt.model_id,
        hyperparams=rag_prompt.hyperparams
    )
    # Call Bedrock through the client.
    rag_response: LLMResponse = call_bedrock(llm_request)  
    # Return the results 
    return rag_response.text

In [None]:
test_tool_result: str = retrieve_and_answer(RagInput(query_text='How does the aggregate function work in OpenSearch?'))

test_tool_result

Awesome! Now that we have our RAG tool, let's add it to our agent. 

We can actually reuse the same agent from our previous workshop and just import it / add the extra tool. 

In [None]:
# Reuse from the previous lab.
from pydantic import BaseModel

# Import our agent request and response types.
from agentic_platform.core.models.api_models import AgenticRequest, AgenticResponse
from agentic_platform.core.models.memory_models import TextContent
from agentic_platform.core.models.memory_models import SessionContext


# Clients.
class MemoryClient:
    """Manages conversations"""
    def __init__(self):
        self.conversations: Dict[str, SessionContext] = {}

    def upsert_conversation(self, conversation: SessionContext) -> bool:
        self.conversations[conversation.session_id] = conversation

    def get_or_create_conversation(self, conversation_id: str=None) -> SessionContext:
        return self.conversations.get(conversation_id, SessionContext()) if conversation_id else SessionContext()

# Lets reuse this from the previous lab.
memory_client: MemoryClient = MemoryClient()

# Create a prompt with our system and user messages.
class AgentPrompt(BasePrompt):
    system_prompt: str = "You are a helpful assistant. You are given tools to help you accomplish your task. You can choose to use them or not."
    user_prompt: str = "{user_message}"


class ToolCallingAgent:
    # This is new, we're adding tools in the constructor to bind them to the agent.
    # Don't get too attached to this idea, it'll change as we get into MCP.
    def __init__(self, tools: List[ToolSpec], prompt: BasePrompt):
        self.tools: List[ToolSpec] = tools
        self.conversation: SessionContext = SessionContext()
        self.prompt: BasePrompt = prompt

    def call_llm(self) -> LLMResponse:
        # Create LLM request
        request: LLMRequest = LLMRequest(
            system_prompt=self.prompt.system_prompt,
            messages=self.conversation.get_messages(),
            model_id=self.prompt.model_id,
            hyperparams=self.prompt.hyperparams,
            tools=self.tools
        )

        # Call the LLM.
        response: LLMResponse = call_bedrock(request)
        # Append the llms response to the conversation.
        self.conversation.add_message(Message(
            role="assistant",
            text=response.text,
            tool_calls=response.tool_calls
        ))
        # Return the response.
        return response
    
    def execute_tools(self, llm_response: LLMResponse) -> List[ToolResult]:
        """Call tools and return the results."""
        # It's possible that the model will call multiple tools.
        tool_results: List[ToolResult] = []
        # Iterate over the tool calls and call the tool.
        for tool_invocation in llm_response.tool_calls:
            # Get the tool spec for the tool call.
            tool: ToolSpec = next((t for t in self.tools if t.name == tool_invocation.name), None)
            # Call the tool.
            input_data: BaseModel = tool.model.model_validate(tool_invocation.arguments)
            function_result: str = str(tool.function(input_data))
            tool_response: ToolResult = ToolResult(
                id=tool_invocation.id,
                content=[TextContent(text=function_result)],
                isError=False
            )

            print(f"Tool response: {tool_response}")

            # Add the tool result to the list.
            tool_results.append(tool_response)

        # Add the tool results to the conversation
        message: Message = Message(role="user", tool_results=tool_results)
        self.conversation.add_message(message)
        
        # Return the tool results even though we don't use it.
        return tool_results
        
    def invoke(self, request: AgenticRequest) -> AgenticResponse:
        # Get or create conversation
        self.conversation = memory_client.get_or_create_conversation(request.session_id)
        # Add user message to conversation
        self.conversation.add_message(request.message)

        # Keep calling LLM until we get a final response
        while True:
            # Call the LLM
            response: LLMResponse = self.call_llm()
            
            # If the model wants to use tools
            if response.stop_reason == "tool_use":
                # Execute the tools
                self.execute_tools(response)
                # Continue the loop to get final response
                continue
            
            # If we get here, it's a final response 
            break

        # Save updated conversation
        memory_client.upsert_conversation(self.conversation)

        # Return our own type.
        return AgenticResponse(
            message=self.conversation.messages[-1],
            session_id=self.conversation.session_id
        )

# Add retrieval to our agent
We'll create a new instantiation of our agent and bind the retrieval tool to it. We'll ask a question related to open search and see if the model routes correctly

In [None]:
from agentic_platform.core.tool.sample_tools import weather_report, handle_calculation, WeatherReportInput, Calculator
# Helper to construct request
def construct_request(user_message: str, conversation_id: str=None) -> AgenticRequest:
    return AgenticRequest.from_text(
        text=user_message, 
        **{'session_id': conversation_id}
    )
tools = [
    ToolSpec(
        name="WeatherReport",
        description="Useful for getting the weather in a given location",
        function=weather_report,
        model=WeatherReportInput
    ),
    ToolSpec(
        name="Calculator",
        description="Useful for calculating the result of a mathematical operation",
        function=handle_calculation,
        model=Calculator
    ),
    ToolSpec(
        name="RetrieveAndAnswer",
        description="Useful for retrieving and answering questions about the OpenSearch documentation",
        function=retrieve_and_answer,
        model=RagInput
    )
]

agent: ToolCallingAgent = ToolCallingAgent(
    tools=tools,
    prompt=AgentPrompt()
)

# Invoke the agent
user_message: str = "What is 7+7?"
request: AgenticRequest = construct_request(user_message)
response: AgenticResponse = agent.invoke(request)

# Print the response
print(response.message)

# What did we just do?
We essentially built a ReACT based agent using tools in a while loop augmenting the LLM with retrieval, memory, and tools.

While this was a great excercise. Building a ReACT agent is kind of undifferentiated at this point. There's lots of great frameworks out there that can do it for you. With the right level of abstraction you just use an existing framework and let it do the heavy lifting for you. 

Next, we'll explore using agent frameworks to do similar things 🚀