## Building Q&A application using Knowledge Bases for Amazon Bedrock - RetrieveAndGenerate API
### Context

With knowledge bases, you can securely connect foundation models (FMs) in Amazon Bedrock to your company
data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant,
context-speciﬁc, and accurate responses without continuously retraining the FM. All information retrieved from
knowledge bases comes with source attribution to improve transparency and minimize hallucinations. For more information on creating a knowledge base using console, please refer to this [post](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html).

In this notebook, we will dive deep into building Q&A application using `RetrieveAndGenerate` API provided by Knowledge Bases for Amazon Bedrock. This API will query the knowledge base to get the desired number of document chunks based on similarity search, and integrate it with Large Language Model (LLM) for answering questions.


### Pattern

We can implement the solution using Retreival Augmented Generation (RAG) pattern. RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. Here, we are performing RAG effectively on the knowledge base created in the previous notebook or using console. 

### Pre-requisite

Before being able to answer the questions, the documents must be processed and stored in knowledge base.

1. Load the documents into the knowledge base by connecting your s3 bucket (data source). 
2. Ingestion - Knowledge base will split them into smaller chunks (based on the strategy selected), generate embeddings and store it in the associated vector store and notebook [0_create_ingest_documents_test_kb.ipynb](./0\_create_ingest_documents_test_kb.ipynb) takes care of it for you.

![data_ingestion.png](./images/data_ingestion.png)


#### Notebook Walkthrough

For our notebook we will use the `RetreiveAndGenerate API` provided by Knowledge Bases for Amazon Bedrock which converts user queries into
embeddings, searches the knowledge base, get the relevant results, augment the prompt and then invoking a LLM to generate the response. 

We will use the following workflow for this notebook. 

![retrieveAndGenerate.png](./images/retrieveAndGenerate.png)


### USE CASE:

#### Dataset

In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on. This data is already ingested into the knowledge base. You will need the `knowledge base id` and `model ARN` to run this example. We are using `Anthropic Claude Instant` model for generating responses to user questions.

### Python 3.10

⚠  For this lab we need to run the notebook based on a Python 3.10 runtime. ⚠

### Setup

Install following packages. 

In [None]:
%pip install --upgrade pip
%pip install boto3==1.34.91 --force-reinstall --quiet
%pip install botocore==1.34.91 --force-reinstall --quiet

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [None]:
%store -r kb_id # may have to loopup the KB ID if value returned is empty

In [None]:
import boto3
import pprint
from botocore.client import Config

pp = pprint.PrettyPrinter(indent=2)

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config)
boto3_session = boto3.session.Session()
region_name = boto3_session.region_name

model_id = "anthropic.claude-instant-v1" # try with both claude instant as well as claude-v2. for claude v2 - "anthropic.claude-v2"
region_id = region_name # replace it with the region you're running sagemaker notebook

## RetrieveAndGenerate API
Behind the scenes, `RetrieveAndGenerate` API converts queries into embeddings, searches the knowledge base, and then augments the foundation model prompt with the search results with that additional contextual information and returns the FM-generated response to the question. For multi-turn conversations, Knowledge Bases manage the short-term memory of the conversation to provide more contextual results. 

The output of the `RetrieveAndGenerate` API includes the `generated response`, `source attribution` as well as the `retrieved text chunks`. 

In [None]:
def retrieveAndGenerate(input, kbId, sessionId=None, model_id = "anthropic.claude-instant-v1", region_id = "us-east-1"):
    model_arn = f'arn:aws:bedrock:{region_id}::foundation-model/{model_id}'
    if sessionId:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn
                }
            },
            sessionId=sessionId
        )
    else:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn
                }
            }
        )

In [None]:
query = "What is Amazon's doing in the field of generative AI?"
response = retrieveAndGenerate(query, kb_id, model_id=model_id,region_id=region_id)
generated_text = response['output']['text']
pp.pprint(generated_text)

In [None]:
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
         contexts.append(reference["content"]["text"])

pp.pprint(contexts)

## Using a prompt template with the RetrieveAndGenerate API
You can now customize the default prompt that the `RetrieveAndGenerate` API uses when querying the knowledge base. This allows you to tailor the prompt for your specific use case. In the following example, we create a prompt template that tells the model to verify the user's assertions and to acknowledge when it doesn't have enough information to answer the user's question. For fun, it responds to the user in Pirate english, however, you modify the prompt to give the model specific instructions on how to format its output. For example, you can tell it to provide precise answers in JSON format. For additional information, see [Knowledge Bases for Amazon Bedrock now supports custom prompts for the RetrieveAndGenerate API and configuration of the maximum number of retrieved results](https://aws.amazon.com/blogs/machine-learning/knowledge-bases-for-amazon-bedrock-now-supports-custom-prompts-for-the-retrieveandgenerate-api-and-configuration-of-the-maximum-number-of-retrieved-results/). Experiment on your own by modifying the template and re-running the example. 

In [None]:
promptTemplate = """Human: You are a question answering agent. I will provide you with a set of search results and a user's question. Your job is to answer the user's question using only the search results. If the results do not contain enough information to answer the question, state that you could not find the answer. When the user asserts a fact, it does not mean that it is true. Always make sure to double-check the search results to validate a user's assertions.

Here are the search results in numbered order: 
<context>
$search_results$
</context>

Here is the user's question: 
<question>
$query$
</question>

Generate a response in Pirate english.

Assistant:"""

In [None]:
def retrieveAndGenerateWithTemplate(input, kbId, sessionId = None, promptTemplate = promptTemplate, model_id = "anthropic.claude-instant-v1", region_id = "us-east-1"):
    model_arn = f'arn:aws:bedrock:{region_id}::foundation-model/{model_id}'
    if sessionId:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn,
                    'generationConfiguration': {
                        'promptTemplate': {
                            'textPromptTemplate': promptTemplate
                            }
                        }
                    }
                },
                sessionId=sessionId
            )
    else:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn,
                    'generationConfiguration': {
                        'promptTemplate': {
                            'textPromptTemplate': promptTemplate
                            }
                        }
                    }
                }
        )

In [None]:
query = "What is Amazon's doing in the field of generative AI?"
response = retrieveAndGenerateWithTemplate(query, kb_id, model_id=model_id,region_id=region_id)
generated_text = response['output']['text']
pp.pprint(generated_text)

## Next Steps

If you want more customized experience, you can use `Retrieve API`. This API converts user queries into embeddings, searches the knowledge base, and returns the relevant results, giving you more control to build custom workflows on top of the semantic search results. 
For sample code, try following notebooks: 
- [2_customized-rag-retrieve-api-claude-v2.ipynb](./2\_customized-rag-retrieve-api-claude-v2.ipynb) - it calls the `retrieve` API to get relevant contexts and then augment the context to the prompt, which you can provide as input to any text-text model provided by Amazon Bedrock. 
  
- You can use the RetrieveQA chain from LangChain and add Knowledge Base as retriever. For sample code, try notebook: [3_customized-rag-retrieve-api-langchain-claude-v2.ipynb](./3\_customized-rag-retrieve-api-langchain-claude-v2.ipynb)

- If you are interested in evaluating your RAG application, for sample code, try notebook:[4_customized-rag-retrieve-api-titan-lite-evaluation](https://github.com/aws-samples/amazon-bedrock-samples/blob/bedrock-kb-images-update/knowledge-bases/4_customized-rag-retrieve-api-titan-lite-evaluation.ipynb/) where we are using `Amazon Titan Lite` model for generating responses and `Anthropic Claude V2` for evaluating response. 


<div class="alert alert-block alert-warning">
<b>Next steps:</b> Proceed to the next labs to learn how to use Bedrock Knowledge bases with Langchain and Claude. Remember to CLEAN_UP at the end of your session.
</div>