## Building a Q&A application using Amazon Bedrock Knowledge Bases with Strands Agents

### Context

In this notebook, we will dive deep into building a Q&A application using Amazon Bedrock Knowledge Bases - Retrieve API. Here, we will query the Knowledge Base to get the desired number of document chunks based on similarity search. We will then augment the prompt with relevant documents. The prompt will be the input to Amazon Nova models for generating the response.

With a Knowledge Base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company
data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant,
context-speciﬁc, and accurate responses without continuously retraining the FM. All information retrieved from
Knowledge Bases comes with source attribution to improve transparency and minimize hallucinations. For more information on creating a Knowledge Base using console, please refer to this [post](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html).

We will cover 2 parts in the notebook:

- Part 1: We will use a Strands-based retrieve-first approach that maintains deterministic behavior.
- Part 2: We will showcase the Strands SDK integration with custom tools.

### Pattern

We will implement the solution using Retreival Augmented Generation (RAG) pattern. RAG retrieves data from outside the language model and augments the prompts by adding the relevant retrieved data in context. Here, we are performing RAG effectively on the Knowledge Base created using console/sdk. 

### Pre-requisite

Before being able to answer the questions, the documents must be processed and ingested in vector database. Notebook [01_create_ingest_documents_test_kb.ipynb](./01_create_ingest_documents_test_kb.ipynb) takes care of it for you.

1. Load the documents into the Knowledge Base by connecting your s3 bucket (data source). 
2. Ingestion - Knowledge Bases will split them into smaller chunks (based on the strategy selected), generate embeddings and store it in the associated vectore store.

#### Use case

In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on. This data is already ingested into the Amazon Bedrock Knowledge Bases. You will need the `Knowledge Base id` to run this example.
In your specific use case, you can sync different files for different domain topics and query this notebook in the same manner to evaluate model responses using the retrieve API from Knowledge Bases.

In [None]:
%store -r

#### Restart the kernel with the updated packages that are installed through the dependencies above

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [None]:
import boto3
import pprint
from botocore.client import Config
from strands import Agent, tool
from strands.models import BedrockModel

pp = pprint.PrettyPrinter(indent=2)
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
boto3_session = boto3.Session()
bedrock_agent_client = boto3.client("bedrock-agent-runtime", config=bedrock_config)

In [None]:
# Create interactive model selector
import sys
sys.path.append('../')
from util.model_selector import create_text_model_selector
model_selector = create_text_model_selector().display()
selected_model = model_selector.get_model_id()

## Part 1 - Strands-Based Retrieve-First Implementation

This section demonstrates a Strands-based retrieve-first function that maintains deterministic behavior while using Strands' cleaner API and better model management.

In this approach we:

1. **Always retrieve context first** - Deterministic retrieval behavior
2. **Then prompt the LLM** - Generate response with retrieved context
3. **Use explicit workflow control** - No agent decision-making about when to retrieve

### Benefits of Retrieve-First Pattern:

- **Deterministic Behavior** - Always retrieves relevant context before generating responses
- **Explicit Control** - You control exactly when and how retrieval happens
- **Consistent Context** - Every response is grounded in retrieved documents
- **Reduced Hallucination** - Responses are always based on factual retrieved content
- **Simplified Architecture** - Clear separation between retrieval and generation steps

This pattern follows the approach described in the Strands Agents Knowledge Base example, where retrieval happens first, then the LLM generates a response based on the retrieved context.

### Extract the text chunks from the retrieveAPI response

In the cell below, we will fetch the context from the retrieval results.

You can view the associated `score` of each of the text chunk that was returned which depicts its correlation to the query in terms of how closely it matches it.

In [None]:
# fetch context from the response
def get_contexts(retrievalResults):
    contexts = []
    for retrievedResult in retrievalResults: 
        contexts.append(retrievedResult['content']['text'])
    return contexts

In [None]:
# Create Bedrock model
bedrock_model = BedrockModel(
    model_id=selected_model,
    temperature=0.5
)

def retrieve_context(query, num_results=4):
    response = bedrock_agent_client.retrieve(
        retrievalQuery={'text': query},
        knowledgeBaseId=kb_id,
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': num_results,
                'overrideSearchType': "HYBRID"
            }
        }
    )
    contexts = get_contexts(response['retrievalResults'])
    return "\n\n".join(contexts)

def answer_with_context(query):
    context = retrieve_context(query)
    # Simple agent call with context already included
    agent = Agent(
        system_prompt="You are a financial advisor AI system...",
        model=bedrock_model,
        callback_handler=None,  # default is PrintingCallbackHandler
    )
    return agent(f"Context: {context}\n\nQuestion: {query}")

### Testing the Strands Retrieve-First Implementation

In [None]:
# Test with a single query
test_query = "By what percentage did AWS revenue grow year-over-year in 2022?"

print("=== Testing Strands Retrieve-First Implementation ===")
print(f"Query: {test_query}")
print("\nProcessing...")

result = answer_with_context(test_query)
print(f"\nAnswer: {result}")

## Part 2 - Strands SDK integration
In this section, we will build a Q&A application using Retrieve API provided by Amazon Bedrock Knowledge Bases and Strands SDK. We will query the Knowledge Base to get the desired number of document chunks based on similarity search, create a custom retrieval tool with Strands Agent, and use Amazon Nova Lite model for answering questions.

Create a custom retrieval tool using Strands SDK that will call the `Retrieve API` provided by Amazon Bedrock Knowledge Bases. This tool converts user queries into embeddings, searches the Knowledge Base, and returns the relevant results, giving you more control to build custom workflows on top of the semantic search results. The output of the `Retrieve API` includes the `retrieved text chunks`, the `location type` and `URI` of the source data, as well as the relevance `scores` of the retrievals.

In [None]:
# Create a custom retrieval tool for Strands Agent
@tool
def knowledge_base_retriever(query: str) -> str:
    """
    Retrieve relevant documents from Amazon Bedrock Knowledge Base.
    This tool searches the Knowledge Base and returns relevant context for answering questions.
    """
    try:
        response = bedrock_agent_client.retrieve(
            retrievalQuery={'text': query},
            knowledgeBaseId=kb_id,
            retrievalConfiguration={
                'vectorSearchConfiguration': {
                    'numberOfResults': 4,
                    'overrideSearchType': "SEMANTIC"
                }
            }
        )
        
        # Extract and format the retrieved contexts
        contexts = []
        for result in response['retrievalResults']:
            contexts.append(result['content']['text'])
        
        return "\n\n".join(contexts)
    except Exception as e:
        return f"Error retrieving from Knowledge Base: {str(e)}"

# Test the retrieval tool
query = "By what percentage did AWS revenue grow year-over-year in 2022?"
retrieved_context = knowledge_base_retriever(query)
print("Retrieved context:")
print(retrieved_context)

## Prompt specific to the model to personalize responses
Here, we will use the specific prompt below for the model to act as a financial advisor AI system that will provide answers to questions by using fact based and statistical information when possible. The prompt instructs the agent to use the `knowledge_base_retriever` tool that we created above.

In [None]:
system_prompt = """You are a financial advisor AI system that provides answers to questions by using fact-based and statistical information when possible.
Use the retriever tool to search for relevant information to answer user questions.

Retrieval guidelines:
- Only call the retriever tool ONCE per user question
- If the first retrieval doesn't contain the exact answer, work with the information provided
- Do NOT retry the same or similar queries - analyze what you received first
- If the retrieved information is insufficient, clearly state what information is missing

If the Knowledge Base does not contain the answer, and you don't know the answer, just say that you don't know, don't try to make up an answer.
The response should be specific and use statistics or numbers when possible."""

In [None]:
# Create a Strands Agent with the retrieval tool
financial_advisor_agent = Agent(
    system_prompt=system_prompt,
    tools=[knowledge_base_retriever],
    model=bedrock_model,
    callback_handler=None,  # default is PrintingCallbackHandler
)

### Using the Strands Agent to answer questions with Knowledge Base retrieval

In [None]:
# Test the financial advisor agent with a question
query = "By what percentage did AWS revenue grow year-over-year in 2022?"
answer = financial_advisor_agent(query)
print("Question:", query)
print("\nAnswer:")
print(answer)

In [None]:
# Test with another question to demonstrate the agent's capabilities
query2 = "What are Amazon's key investments in generative AI?"
answer2 = financial_advisor_agent(query2)
print("Question:", query2)
print("\nAnswer:")
print(answer2)

## Conclusion

We showed different retrieval techniques for customizing your RAG based application. Each approach offers distinct advantages:

**Retrieve-First Pattern (Part 1):**

- **Deterministic behavior** - Always retrieves context before generating responses
- **Explicit workflow control** - You control exactly when retrieval happens
- **Consistent grounding** - Every response is based on retrieved factual content
- **Production reliability** - Predictable, controllable RAG workflows

**Agent-Based Tools (Part 2):**

- **Dynamic decision-making** - Agent decides when and how to use tools
- **Simplified orchestration** - Strands SDK handles tool coordination automatically
- **Conversational capabilities** - Natural interaction with built-in context management
- **Extensible architecture** - Easy to add multiple tools and complex workflows

### Recommendation:

- **Use retrieve-first pattern** (Part 1) when you need deterministic, controllable RAG workflows with consistent context grounding
- **Use agent-based tools** (Part 2) when you need dynamic decision-making, conversational flows, or complex tool orchestration