<div style="text-align: center;">
    <h1 style="color: #347AB7;">Building Q&A application using Knowledge Bases for Amazon <code style="background-color: #f5f5f5; color: #EB5424;">Bedrock</code> - Retrieve API</h1>
</div>

In this notebook, we will dive deep into building Q&A application using Knowledge Bases for Amazon Bedrock - Retrieve API. Here, we will query the knowledge base to get the desired number of document chunks based on similarity search. We will then augment the prompt with relevant documents and query which will go as input to Anthropic Claude V2 for generating response.

With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company
data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant,
context-speciÔ¨Åc, and accurate responses without continuously retraining the FM. All information retrieved from
knowledge bases comes with source attribution to improve transparency and minimize hallucinations. For more information on creating a knowledge base using console, please refer to this [post](!https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html).


<h1 style="color: #347AB7;">Introduction</h1>


<h3 style="color: #D35400;">Pattern</h3>

<p>We can implement the solution using Retrieval Augmented Generation (RAG) pattern. RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. Here, we are performing RAG effectively on the knowledge base created using console/sdk.</p>

<h3 style="color: #D35400;">Pre-requisite</h3>

<p>Before being able to answer the questions, the documents must be processed and ingested into a vector database.</p>

<ol>
    <li>Load the documents into the knowledge base by connecting your s3 bucket (data source).</li>
    <li>Ingestion - Knowledge bases will split them into smaller chunks (based on the strategy selected), generate embeddings and store it in the associated vector store.</li>
</ol>

<p><img src="./images/data_ing.png" alt="data ingestion" style="max-width: 100%; display: block; margin-left: auto; margin-right: auto;"/></p>

<h4 style="color: #D35400;">Notebook Walkthrough</h4>

<p>For our notebook, we will use the <code style="background-color: #f5f5f5; color: #EB5424;">Retrieve API</code> provided by Knowledge Bases for Amazon <code style="background-color: #f5f5f5; color: #EB5424;">Bedrock</code> which converts user queries into embeddings, searches the knowledge base, and returns the relevant results, giving you more control to build custom workflows on top of the semantic search results. The output of the <code style="background-color: #f5f5f5; color: #EB5424;">Retrieve API</code> includes the retrieved text chunks, the location type, and URI of the source data, as well as the relevance scores of the retrievals.</p>

<p>We will then use the text chunks being generated and augment it with the original prompt and pass it through the <code style="background-color: #f5f5f5; color: #EB5424;">anthropic.claude-v2</code> model using prompt engineering patterns based on your use case.</p>


<h1 style="color: #347AB7;">Gettings started with the <code style="background-color: #f5f5f5; color: #EB5424;">Data source</code></h1>


<p>In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on. This data is already ingested into the Knowledge Bases for Amazon <code style="background-color: #f5f5f5; color: #EB5424;">Bedrock</code>. You will need the <code style="background-color: #f5f5f5; color: #EB5424;">knowledge base id</code> to run this example.
In your specific use case, you can sync different files for different domain topics and query this notebook in the same manner to evaluate model responses using the retrieve API from knowledge bases.</p>

<p>Let's go to the <a href="https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/knowledge-bases" style="color: #347AB7;">Amazon Bedrock Knowledge base console</a></p>

<h1 style="color: #347AB7;">Initiate the bedrock <code style="background-color: #f5f5f5; color: #EB5424;">client</code></h1>

1. Import the necessary libraries, along with langchain for bedrock model selection, llama index to store the service context containing the llm and embedding model instances. We will use this service context later in the notebook for evaluating the responses from our Q&A application. 

2. Initialize `anthropic.claude-v2` as our large language model to perform query completions using the RAG pattern with the given knowledge base, once we get all text chunk searches through the `retrieve` API.

In [1]:
import boto3
import pprint
from botocore.client import Config
from langchain.llms.bedrock import Bedrock

pp = pprint.PrettyPrinter(indent=2)

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                                    config=bedrock_config)

model_kwargs_claude = {
                            "temperature": 0,
                            "top_k": 10,
                            "max_tokens_to_sample": 3000
                        }

llm = Bedrock(model_id="anthropic.claude-v2",
              model_kwargs=model_kwargs_claude,
              client = bedrock_client,)

<h1 style="color: #347AB7;">Retrieve API: <code style="background-color: #f5f5f5; color: #EB5424;">Process flow</code></h1>

Define a retrieve function that calls the `Retreive API` provided by Knowledge Bases for Amazon Bedrock which converts user queries into
embeddings, searches the knowledge base, and returns the relevant results, giving you more control to build custom
workÔ¨Çows on top of the semantic search results. The output of the `Retrieve API` includes the the `retrieved text chunks`, the `location type` and `URI` of the source data, as well as the relevance `scores` of the retrievals. 

![retrieveAPI](./images/arch_rag_kb.png)

<h2 style="color: #347AB7;"><span style="color: #000000;">Step 1 üèÅ:</span> Initialize your <code style="background-color: #f5f5f5; color: #EB5424;">Knowledge base id</code> before querying responses from the initialized LLM</h2>


In [2]:
kb_id = "YWNES8HIIH" # replace it with your Knowledge base id.

In [3]:
def retrieve(query, kbId, numberOfResults=5):
    
    response = bedrock_agent_client.retrieve(
                                                retrievalQuery= {
                                                    'text': query
                                                },
                                                knowledgeBaseId=kbId,
                                                retrievalConfiguration= {
                                                    'vectorSearchConfiguration': {
                                                        'numberOfResults': numberOfResults
                                                    }
                                                }
                                            )
    
    return response

Next, we will call the `retreive API`, and pass `knowledge base id`, `number of results` and `query` as paramters. 

`score`: You can view the associated score of each of the text chunk that was returned which depicts its correlation to the query in terms of how closely it matches it.

In [4]:
query = "What is Amazon's doing in the field of generative AI?"

response = retrieve(query, kb_id, 3)
retrievalResults = response['retrievalResults']

pp.pprint(retrievalResults)

[ { 'content': { 'text': 'One final investment area that I‚Äôll mention, that‚Äôs '
                         'core to setting Amazon up to invent in every area of '
                         'our business for many decades to come, and where '
                         'we‚Äôre investing heavily is Large Language Models '
                         '(‚ÄúLLMs‚Äù) and Generative AI. Machine learning has '
                         'been a technology with high promise for several '
                         'decades, but it‚Äôs only been the last five to ten '
                         'years that it‚Äôs started to be used more pervasively '
                         'by companies. This shift was driven by several '
                         'factors, including access to higher volumes of '
                         'compute capacity at lower prices than was ever '
                         'available. Amazon has been using machine learning '
                         'extensively for 25 years, employ

<h2 style="color: #347AB7;"><span style="color: #000000;">Step 2 üîç:</span> Extract the <code style="background-color: #f5f5f5; color: #EB5424;">text chunks</code> from the <code style="background-color: #f5f5f5; color: #EB5424;">retrieveAPI</code> response</h2>


In the cell below, we will fetch the context from the retrieval results.

In [5]:
# Fetch context from the response
def get_contexts(retrievalResults):
    
    contexts = []
    for retrievedResult in retrievalResults: 
        contexts.append(retrievedResult['content']['text'])
        
    return contexts

contexts = get_contexts(retrievalResults)

print(contexts)
#pp.pprint(contexts)

['One final investment area that I‚Äôll mention, that‚Äôs core to setting Amazon up to invent in every area of our business for many decades to come, and where we‚Äôre investing heavily is Large Language Models (‚ÄúLLMs‚Äù) and Generative AI. Machine learning has been a technology with high promise for several decades, but it‚Äôs only been the last five to ten years that it‚Äôs started to be used more pervasively by companies. This shift was driven by several factors, including access to higher volumes of compute capacity at lower prices than was ever available. Amazon has been using machine learning extensively for 25 years, employing it in everything from personalized ecommerce recommendations, to fulfillment center pick paths, to drones for Prime Air, to Alexa, to the many machine learning services AWS offers (where AWS has the broadest machine learning functionality and customer base of any cloud provider). More recently, a newer form of machine learning, called Generative AI, has 

<h2 style="color: #347AB7;"><span style="color: #000000;">Step 3 ‚úçÔ∏è:</span> Prompt specific to the model to personalize <code style="background-color: #f5f5f5; color: #EB5424;">responses</code></h2>


Here, we will use the specific prompt below for the model to act as a financial advisor AI system that will provide answers to questions by using fact based and statistical information when possible. We will provide the `Retrieve API` responses from above as a part of the `{context_str}` in the prompt for the model to refer to, along with the user `{query_str}`.  

In [6]:
from langchain.prompts import PromptTemplate

PROMPT_TEMPLATE = """
                    Human: You are a financial advisor AI system, and provides answers to questions by using fact based and statistical information when possible. 
                    Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags. 
                    If you don't know the answer, just say that you don't know, don't try to make up an answer.
                    <context>
                    {context_str}
                    </context>

                    <question>
                    {query_str}
                    </question>

                    The response should be specific and use statistics or numbers when possible.

                    Assistant:"""

claude_prompt = PromptTemplate(template=PROMPT_TEMPLATE, 
                               input_variables=["context_str","query_str"])

<h2 style="color: #347AB7;"><span style="color: #000000;">Step 4 üöÄ:</span> Initiate the <code style="background-color: #f5f5f5; color: #EB5424;">user prompt</code> and <code style="background-color: #f5f5f5; color: #EB5424;">response</code> via the LLM</h2>


Here, we are going to format our prompt using the context generated by the retrieve API associated to our KB as well as the user query to get the final response.

In [7]:
import json
prompt = claude_prompt.format(context_str=contexts, 
                              query_str=query)

In [8]:
response = llm(prompt)
print(response)

 Based on the context provided, Amazon is investing heavily in large language models (LLMs) and generative AI. Specifically:

- Amazon has been working on its own LLMs for a while and believes they will transform and improve virtually every customer experience. 

- Amazon will continue to invest substantially in LLMs and generative AI across all of its consumer, seller, brand, and creator experiences.

- Amazon is offering LLMs and generative AI capabilities through AWS so companies of all sizes can leverage this technology. This includes offerings like Trainium and Inferentia chips, a variety of LLMs, and services like CodeWhisperer.

- Amazon sees LLMs and generative AI as having huge potential and being very transformative for customers, shareholders, and the company itself. 

In summary, Amazon is heavily investing in developing its own LLMs, providing LLMs and generative AI services through AWS, and sees this as a critical area for innovation and growth going forward.
