# Building an Agentic RAG with Amazon Bedrock Converse API

> ⚠️ **Important**: Complete [01-metadata-extraction-and-kb-creation.ipynb](./01-metadata-extraction-and-kb-creation.ipynb) before starting this notebook.

This notebook guides you through building an agentic RAG system that makes intelligent decisions about information retrieval. The system:
- Uses a two-step retrieval process where an agent:
  1. First analyzes document summaries to determine which documents are most relevant
  2. Then specifically queries chunks from selected documents using metadata filters
- Creates a sophisticated question-answering system that understands document context and relevance

## Prerequisites
- Completed Notebook 1
- Two Knowledge Bases populated with summaries and document chunks
- Amazon Bedrock access configured. [Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes base models from Amazon and third-party model providers accessible through an API.

<div class="alert alert-block alert-warning">
<b>Note:</b> Amazon Bedrock users need to request access to models before they are available for use. If you want to add additional models for text, chat, and image generation, you need to request access to models in Amazon Bedrock. To request access to additional models, select the Model access link in the left side navigation panel in the Amazon Bedrock console. For more information see: <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html">https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html</a>
</div>

In this example we will be using different models of **Anthropic Claude on Amazon Bedrock**. The biggest model, Sonnet 3.5, will be in charge of planning the execution, while the smaller and faster, Haiku 3, will execute the plan. For this, you will need to request access to:

- Planning model: **Sonnet 3.5**
- Execution model: **Haiku 3**

## Set up the environment

First, we'll install and import the necessary libraries. We need boto3 version > 1.34.123 for Amazon Bedrock Converse API support.

In [None]:
%pip install boto3

In [None]:
import boto3
from botocore.exceptions import ClientError

Verify the boto3 version:

In [None]:
print(boto3.__version__)

## Initialize Amazon Bedrock clients

Set up the necessary AWS clients for interacting with Bedrock services.

In [None]:
region1="us-west-2"
region2="us-east-1"

client = boto3.client("bedrock-runtime", region_name=region1)
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name=region2)

## Test Amazon Bedrock Converse API

Let's test the API with a simple query to ensure everything is set up correctly.

In [None]:
messages = [{"role": "user", "content": [{"text": "What is your name?"}]}]

MODEL_NAME_1 = "anthropic.claude-3-5-sonnet-20240620-v1:0"
MODEL_NAME_2 = "anthropic.claude-3-haiku-20240307-v1:0"

model_arn = "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"

response = client.converse(
    modelId=MODEL_NAME_1,
    messages=messages,
)

print(response)

## Define the client-side tools

Next, we'll define the client-side tools that our agent will use to assist customers. We'll create seven tools: **xxx, yyy,**

These tools represent the capabilities that the agent will have access to when interacting with customers. Each tool is defined with a specific purpose and input schema, allowing the agent to use them appropriately based on the customer's needs.

The tools defined include two key functions: "get_filename" and "process_query". 

- **get_filename** takes a user query and returns the most relevant document's filename, title, and summary. 
- **process_query** then uses this filename along with the original query to extract specific, relevant information from the identified document, returning a set of pertinent text chunks. 

Together, these tools enable a two-step information retrieval process, first identifying the most appropriate document and then extracting the most relevant information from it, thereby providing targeted responses to user queries.

It's important to note that this code snippet only defines the specifications for these tools. The actual implementation of these functions will be created later in this notebook. These specifications serve as a blueprint for what each tool can do and what input it requires.

For more information on building AI agents with Amazon Bedrock using tools, you might want to refer to the Amazon Bedrock Converse API tool use documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use.html 

This two-step process enhances the RAG's ability to handle complex queries, navigate large document repositories, and deliver tailored information, ultimately improving the overall quality and relevance of its responses.

**Note:** Tool use with models is also known as **Function calling**.

We'll create two tools for our agent: one to retrieve filenames and another to process queries.

In [None]:
tools = [
    {
        "toolSpec": {
            "name": "get_filename",
            "description": "Useful to retrieve the filename of the document associated to the user's query. Returns the filename, title and summary",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "The decomposed query"
                        }
                    },
                    "required": ["query"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "process_query",
            "description": "Retrieves specific information related to the user's query using a filter containing the name of the filename in order to get specific details from the relevant document. Returns a set of relevant chunks",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "filename": {
                            "type": "string",
                            "description": "The name of the filename corresponding to the relevant document"
                        },
                        "query": {
                            "type": "string",
                            "description": "The decomposed query"
                        }                    },
                    "required": ["filename", "query"]
                }
            }
        }
    }
]

## Implement tool functions

Now we'll implement the functions that our tools will use to retrieve information.

Here's how the process works:

1. The LLM analyzes the user's request and formulates a plan using the available tools.
2. When the LLM determines that a tool should be used, it doesn't execute the function itself. Instead, it indicates which tool should be used and with what parameters.
3. The agent framework (which is separate from the LLM) then executes the corresponding function and returns the result.
4. The LLM receives the result of the tool execution and uses this information to continue its plan or formulate a response to the user.
5. This process of planning, tool use, and result interpretation continues until the LLM can generate a final answer without needing additional tool use.

This combination allows for a realistic simulation of a production environment while still providing controlled data for certain operations.
Remember, the LLM's role is to understand the user's request, plan the necessary steps using these tools, and interpret the results to provide a coherent response to the user. The actual execution of these functions is handled by the agent framework, creating a powerful and flexible agentic RAG.

 ### Load variables saved in prior Notebook
 
 At the end of Notebook 1 we saved several variables that are needed to continue. The following cell will load those variables into this lab environment.

In [None]:
%store -r

In [None]:
def get_filename(text):
    response = bedrock_agent_runtime.retrieve(
        knowledgeBaseId=summaries_kb_id,
        retrievalConfiguration={
            "vectorSearchConfiguration": {
                "numberOfResults": 5
            }
        },
        retrievalQuery={
            'text': text
        }
    )
    return response

def process_query(text, filename):
    metadata_filter = construct_metadata_filter(filename)
    print('Here is the prepared metadata filters:')
    print(metadata_filter)

    response = bedrock_agent_runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalConfiguration={
            "vectorSearchConfiguration": {
                "filter": metadata_filter,
                "numberOfResults": 5
            }
        },
        retrievalQuery={
            'text': text
        }
    )
    return response

def construct_metadata_filter(filename):
    if not filename:
        return None
    metadata_filter = {"equals": []}

    if filename and filename != 'unknown':
        metadata_filter = {
            "equals": {
                "key": "filename",
                "value": filename
            }
        }

    return metadata_filter if metadata_filter["equals"] else None

## Test the tools

Let's test our implemented tools to ensure they're working correctly.

In [None]:
text = "Which are the Retrieval-based Evaluation results for LongLora"
get_filename(text)

In [None]:
process_query("Which are the Retrieval-based Evaluation results for LongLora", "longlora.pdf")

## Process tool calls and return results

We'll create a function to process the tool calls made by Claude and return the appropriate results. The `process_tool_call` function is crucial for bridging the gap between the language model's decisions and the actual execution of tools in our system. Here's why it's important and how it works:

### What is a tool call?
A tool call is a request made by the language model (Claude 3) to use a specific tool with certain input parameters. It's the model's way of indicating that it needs to use a particular function to gather information or perform an action in response to a user's query.

### Why do we need to process tool calls?
We need to process tool calls for several reasons:
1. The language model doesn't directly execute code or access data sources.
2. We need to translate the model's high-level requests into actual function calls in our system.
3. It allows us to control and monitor what actions are being taken on behalf of the model.
4. We can add error handling, logging, or additional logic as needed.

### How the model selects tools
The model selects tools based on its understanding of:
- The user's query
- The available tools and their descriptions
- The current context of the conversation
- What information it needs to answer the query or perform the requested task

The model uses its reasoning to determine which tool is most appropriate for gathering the necessary information or performing the required action.

### The process_tool_call function
This function acts as a bridge between the model's high-level tool requests and the actual function calls in our system. It:
- Takes the tool name and input parameters as arguments
- Maps these to the corresponding Python functions we defined earlier
- Calls the appropriate function with the given inputs
- Returns the result back to the model

This setup allows the language model to make decisions about what information or actions are needed, while keeping the actual execution of these actions under the control of our system. It's a way of giving the model access to external data and capabilities without giving it direct control over the system's resources.

In [None]:
def process_tool_call(tool_name, tool_input):
    if tool_name == "get_filename":
        return get_filename(tool_input["query"])
    elif tool_name == "process_query":
        return process_query(tool_input["query"], tool_input["filename"])

## Interact with the chatbot

Now, we'll create a function to manage the interaction between the user and our agentic RAG. This function will encapsulate the core logic of our agent, handling the flow of information between the user, the language model (Claude 3), and our defined tools.

### Agent's Logic Flow

1. **Initialization**: 
   - Set up the initial context and system prompt for the agent.
   - This prompt will guide the agent's behavior.

2. **User Input**: 
   - Receive the user's message or query.

3. **LLM Processing**:
   - Send the user's input, along with the current context and system prompt, to Claude.
   - Claude analyzes the input and determines the next action (either responding directly or using a tool or a set of tools).

4. **Tool Execution (if needed)**:
   - If Claude decides to use a tool, our function will:
     a. Extract the tool name and parameters from Claude's response.
     b. Call the `process_tool_call` function to execute the appropriate tool.
     c. Capture the tool's output.

5. **Result Interpretation**:
   - Send the tool's output back to Claude for interpretation.
   - Claude may decide to use another tool or formulate a final response.

6. **Response Generation**:
   - Once Claude has gathered all necessary information, it generates a final response to the user.

7. **Conversation Update**:
   - Update the conversation history with the user's input and the agent's response.

8. **Repeat**:
   - The process repeats for each user input, maintaining context throughout the conversation.

### System Prompt

The system prompt is crucial as it sets the tone, capabilities, and limitations of our agent. Our system prompt includes:

- The agent's role as an AI assistant to provide meaningful responses
- A step-by-step guide for the agent to follow when assisting customers
- Rules for interaction and information gathering
- Guidelines for tool usage and response formulation

By encapsulating this logic in a single function, we create a modular and maintainable structure for our chatbot. This allows for easy updates to the agent's behavior and capabilities as needed.

Let's first create the system message variable

In [None]:
system_message = """
You are an advanced AI assistant, designed to process user queries efficiently using a ReAct (Reasoning and Acting) approach. Your task is to break down complex queries, reason about each step, and utilize appropriate tools to provide accurate and comprehensive responses.

Core Process

    For each user query, follow these steps:

    - Query Analysis and Decomposition
    - Metadata Search
    - Filtered Information Retrieval
    - Response Formulation

    After each step, engage in explicit reasoning to justify your actions and plan your next move.

Detailed Instructions

1. Query Analysis and Decomposition

    - Carefully analyze the user's query.
    - Break it down into smaller, more specific sub-queries.
    - Reformulate each sub-query to optimize for semantic search.

    Reasoning: Explain why you decomposed the query as you did and how it will help in the information retrieval process.

2. Metadata Search

    - Use the metadata Knowledge Base to identify the most relevant filename(s) for the query.
    - If multiple filenames are relevant, prioritize them based on their likely relevance.

    Reasoning: Justify your choice of filename(s) and explain how they relate to the user's query.

3. Filtered Information Retrieval

    - Use the process_query tool to retrieve information.
    - Apply the identified filename(s) as a filter parameter.
    - If multiple sub-queries exist, perform separate retrievals for each.

    Reasoning: Explain why the retrieved information is relevant and how it addresses the sub-queries.

4. Response Formulation

    - Synthesize the retrieved information into a coherent response.
    - Ensure your answer directly addresses the user's original query.
    - If any aspects of the query remain unanswered, acknowledge this and explain why.

    Reasoning: Justify how your response addresses the user's query and identify any potential gaps or areas for further exploration.

5. Comprehensive Answer Compilation

    - Before finalizing your response, review all subqueries and their corresponding answers.
    - Ensure that each subquery has been addressed in your final response.
    - If any subquery remains unanswered, explicitly state this and explain why (e.g., lack of information, ambiguity in the query).
    - Organize your response to clearly address each part of the original query, using subheadings if necessary for clarity.

    Reasoning: Explain how your final response comprehensively addresses all aspects of the user's original query, referencing each subquery explicitly.

Key Principles

- Always start with query decomposition, even for seemingly simple queries.
- Continuously refine and reformulate sub-queries to enhance semantic search effectiveness.
- Use explicit reasoning after each step to justify your actions and plan subsequent steps.
- Prioritize relevance and accuracy in your information retrieval and response formulation.
- Be transparent about the process you're following and any limitations encountered.

By adhering to this ReAct approach, you will provide users with well-reasoned, accurate, and comprehensive responses while demonstrating your thought process throughout the interaction.
"""


Now, let's create the main function chatbot interaction. This function will accept as arguments the user message and the chat history if available. Including the chat history is very important in order for the model to have the entire context before deciding the next step.

## Chatbot Interaction Function

This function encapsulates the core logic of our chatbot interaction:

1. **Initialization**: It sets up the conversation with the user's message and any existing chat history by extending the messages list.

2. **Initial Planning**: It makes a call to a larger language model (`MODEL_NAME_1`) for comprehensive response planning.

3. **Tool Use Loop**: If tools are required, it enters a loop where it:
   a. Extracts tool use information
   b. Processes the tool call
   c. Prepares the result for the next model call
   d. Calls a smaller, quicker model (`MODEL_NAME_2`) for subsequent interactions

4. **Iteration**: This loop continues until no more tool use is required.

5. **Response Generation**: Finally, it extracts the final response text and returns it along with the updated message history.

### Optimization Strategy

The use of two different models (a larger one for initial planning and a smaller one for subsequent interactions) is an optimization technique. This approach aims to balance between:

- Comprehensive planning
- Quick responses
- Cost reduction
- Improved response times

By using a larger model for initial planning and a smaller, faster model for follow-up actions, we can potentially achieve better performance and efficiency in our chatbot interactions.


In [None]:
import json

def chatbot_interaction(user_message, chat_history=None):
    print(f"\n{'='*50}\nUser Message: {user_message}\n{'='*50}")

    messages = []
    if chat_history:
        messages.extend(chat_history)
    messages.append({"role": "user", "content": [{"text": user_message}]})

    response = client.converse(
        modelId=MODEL_NAME_1,
        inferenceConfig={
            'maxTokens': 4096,
            'temperature': 0,
        },
        messages=messages,
        system=[
            {
                 'text': system_message
            },
        ],
        toolConfig={"tools": tools}
    )

    print(f"\nInitial Response:")
    print(f"Stop Reason: {response['stopReason']}")
    print(f"Content: {response['output']['message']['content']}")

    while response['stopReason'] == "tool_use":
        tool_use = next(block for block in response['output']['message']['content'] if isinstance(block, dict) and 'toolUse' in block)
        tool_name = tool_use['toolUse']['name']
        tool_input = tool_use['toolUse']['input']
        tool_use_id = tool_use['toolUse']['toolUseId']

        print(f"\nTool Used: {tool_name}")
        print(f"Tool Input:")
        print(json.dumps(tool_input, indent=2))

        tool_result = process_tool_call(tool_name, tool_input)

        print(f"\nTool Result:")
        print(json.dumps(tool_result, indent=2))

        messages_temp = [
            {"role": "assistant", "content": response['output']['message']['content']},
            {
                "role": "user",
                "content": [
                    {
                        "toolResult": {
                            "toolUseId": tool_use_id, 
                            "content": [
                                {
                                    "json": {
                                        "result": tool_result
                                    }
                                }
                            ]
                        }
                    }
                ],
            },
        ]

        messages.extend(messages_temp)

        response = client.converse(
            modelId=MODEL_NAME_2,
            inferenceConfig={
                'maxTokens': 4096,
                'temperature': 0,
            },
            messages=messages,
            system=[
                {
                    'text': system_message
                },
            ],
            toolConfig={"tools": tools}
        )

        print(f"\nResponse:")
        print(f"Stop Reason: {response['stopReason']}")
        print(f"Content: {response['output']['message']['content']}")

    final_response = next(
        (block['text'] for block in response['output']['message']['content'] if 'text' in block),
        None,
    )

    if not final_response:
        final_response = None

    print(f"\nFinal Response: {final_response}")

    messages.append({"role": "assistant", "content": [{"text": final_response}]})

    return final_response, messages

For better understanding, see the flowchart for function chatbot_interaction:

<div style="text-align: center;">
    <img src="static/function-flowchart.png" alt="Flowchart of the function" width="70%">
</div>

## Test the chatbot

Let's test our agent with a few sample queries.

In [None]:
# For subsequent iterations
user_message = "Hello!"
final_response, chat_history = chatbot_interaction(user_message, chat_history)

In [None]:
user_message = "Compare and contrast the LoRA papers (LongLoRA, LoftQ) and the metagpt paper. Analyze the approach in each paper first."
final_response, chat_history = chatbot_interaction(user_message, chat_history)

## Build a chatbot widget

Let's create an interactive widget for our chatbot.

In [None]:
!pip install ipywidgets

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output

def handle_user_input(user_message):
    global chat_history

    response, messages = chatbot_interaction(user_message, chat_history)

    chat_history = messages

    with chat_output:
        clear_output()
        print(f"Response: {response}")

    user_input.value = ''

def handle_button_click(sender):
    handle_user_input(user_input.value)

user_input = widgets.Text(
    placeholder='Type your message here...',
    description='User:',
    disabled=False,
    continuous_update=False,
    on_submit=handle_user_input
)

chat_output = widgets.Output()

chat_history = []

send_button = widgets.Button(description='Send')
send_button.on_click(handle_button_click)

display(widgets.HBox([user_input, send_button]))
print("\n")
display(chat_output)
print("\n")

You can now interact with the chatbot using the widget above. Try asking questions about the LoRA papers or the MetaGPT paper, such as:

- "Compare and contrast the LoRA papers (LongLoRA, LoftQ) and the one from metagpt. Analyze the approach in each paper first."
- "What are the evaluation metrics used in each study?"
- "Which are the Retrieval-based Evaluation results for LongLora?"

Remember to run the above cell if you want to start a new conversation from scratch.

# Clean up

Run the following cell to delete the created resources and avoid unnecesary costs. This should take about 2-3 minutes to complete. 

In [None]:
import time

# First, set up the session with the correct profile
session = boto3.Session()

# Now, create all clients using this session
s3_client = session.client('s3')
cloudformation = session.client('cloudformation')

# Delete all objects in the bucket
try:
    response = s3_client.list_objects_v2(Bucket=s3_bucket)
    if 'Contents' in response:
        for obj in response['Contents']:
            s3_client.delete_object(Bucket=s3_bucket, Key=obj['Key'])
        print(f"All objects in {s3_bucket} have been deleted.")
except Exception as e:
    print(f"Error deleting objects from {s3_bucket}: {e}")

time.sleep(60) # Wait until the objects have been deleted

# Define the stack names to delete
stack_names = ["KB-E2E-KB-{}".format(solution_id),"KB-E2E-Base-{}".format(solution_id)]

# Iterate over the stack names and delete each stack
for stack_name in stack_names:
    try:
        # Retrieve the stack information
        stack_info = cloudformation.describe_stacks(StackName=stack_name)
        stack_status = stack_info['Stacks'][0]['StackStatus']

        # Check if the stack exists and is in a deletable state
        if stack_status != 'DELETE_COMPLETE':
            # Delete the stack
            cloudformation.delete_stack(StackName=stack_name)
            print(f'Deleting stack: {stack_name}')

            # Wait for the stack deletion to complete
            waiter = cloudformation.get_waiter('stack_delete_complete')
            waiter.wait(StackName=stack_name)
            print(f'Stack {stack_name} deleted successfully.')
        else:
            print(f'Stack {stack_name} does not exist or has already been deleted.')

    except cloudformation.exceptions.ClientError as e:
        print(f'Error deleting stack {stack_name}: {e.response["Error"]["Message"]}')