## Building Q&A application using Knowledge Bases for Amazon Bedrock - RetrieveAndGenerate API
### Context

With knowledge bases, you can securely connect foundation models (FMs) in Amazon Bedrock to your company
data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant,
context-speciﬁc, and accurate responses without continuously retraining the FM. All information retrieved from
knowledge bases comes with source attribution to improve transparency and minimize hallucinations. For more information on creating a knowledge base using console, please refer to this [post](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html).

In this notebook, we will dive deep into building Q&A application using `RetrieveAndGenerate` API provided by Knowledge Bases for Amazon Bedrock. This API will query the knowledge base to get the desired number of document chunks based on similarity search, integrate it with Large Language Model (LLM) for answering questions.


### Pattern

We can implement the solution using Retreival Augmented Generation (RAG) pattern. RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. Here, we are performing RAG effectively on the knowledge base created in the previous notebook or using console. 

### Pre-requisite

Before being able to answer the questions, the documents must be processed and stored in knowledge base.

1. Load the documents into the knowledge base by connecting your s3 bucket (data source). 
2. Ingestion - Knowledge base will split them into smaller chunks (based on the strategy selected), generate embeddings and store it in the associated vectore store and notebook [0_create_ingest_documents_test_kb.ipynb](./0\_create_ingest_documents_test_kb.ipynb) takes care of it for you.

![data_ingestion.png](./images/data_ingestion.png)


#### Notebook Walkthrough

For our notebook we will use the `RetreiveAndGenerate API` provided by Knowledge Bases for Amazon Bedrock which converts user queries into
embeddings, searches the knowledge base, get the relevant results, augment the prompt and then invoking a LLM to generate the response. 

We will use the following workflow for this notebook. 

![retrieveAndGenerate.png](./images/retrieveAndGenerate.png)


### USE CASE:

#### Dataset

In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on. This data is already ingested into the knowledge base. You will need the `knowledge base id` and `model ARN` to run this example. We are using `Anthropic Claude 3 Haiku` model for generating responses to user questions.

### Python 3.10

⚠  For this lab we need to run the notebook based on a Python 3.10 runtime. ⚠

### Setup

Install following packages. 

In [1]:
%pip install --force-reinstall -q -r ./requirements.txt

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autogluon-multimodal 1.1.1 requires nvidia-ml-py3==7.352.0, which is not installed.
dash 2.18.1 requires dash-core-components==2.0.0, which is not installed.
dash 2.18.1 requires dash-html-components==2.0.0, which is not installed.
dash 2.18.1 requires dash-table==5.0.0, which is not installed.
aiobotocore 2.13.3 requires botocore<1.34.163,>=1.34.70, but you have botocore 1.35.68 which is incompatible.
amazon-sagemaker-sql-magic 0.1.3 requires sqlparse==0.5.0, but you have sqlparse 0.5.1 which is incompatible.
autogluon-core 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.5.2 which is incompatible.
autogluon-core 1.1.1 requires scipy<1.13,>=1.5.4, but you have scipy 1.14.1 which is incompatible.
autogluon-features 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.5.

In [2]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [3]:
%store -r

In [4]:
import boto3
import pprint
from botocore.client import Config

pp = pprint.PrettyPrinter(indent=2)

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config)
boto3_session = boto3.session.Session()
region_name = boto3_session.region_name

model_id = "anthropic.claude-3-haiku-20240307-v1:0" # try with both claude 3 Haiku as well as claude 3 Sonnet. for claude 3 Sonnet - "anthropic.claude-3-sonnet-20240229-v1:0"
region_id = region_name # replace it with the region you're running sagemaker notebook

## RetreiveAndGenerate API
Behind the scenes, `RetrieveAndGenerate` API converts queries into embeddings, searches the knowledge base, and then augments the foundation model prompt with the search results as context information and returns the FM-generated response to the question. For multi-turn conversations, Knowledge Bases manage short-term memory of the conversation to provide more contextual results. 

The output of the `RetrieveAndGenerate` API includes the   `generated response`, `source attribution` as well as the `retrieved text chunks`. 

In [5]:
def retrieveAndGenerate(input, kbId, sessionId=None, model_id = "anthropic.claude-3-haiku-20240307-v1:0", region_id = "us-east-1"):
    model_arn = f'arn:aws:bedrock:{region_id}::foundation-model/{model_id}'
    if sessionId:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn
                }
            },
            sessionId=sessionId
        )
    else:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kbId,
                    'modelArn': model_arn
                }
            }
        )

In [6]:
query = "What is Amazon's doing in the field of generative AI?"
response = retrieveAndGenerate(query, kb_id, model_id=model_id,region_id=region_id)
generated_text = response['output']['text']
pp.pprint(generated_text)

('Amazon has been investing heavily in Large Language Models (LLMs) and '
 'Generative AI, which they believe will transform and improve virtually every '
 'customer experience. Amazon has been working on their own LLMs and plans to '
 'continue investing substantially in these models across all of their '
 'consumer, seller, brand, and creator experiences. Additionally, Amazon is '
 'democratizing this technology so companies of all sizes can leverage '
 'Generative AI. AWS is offering the most price-performant machine learning '
 'chips in Trainium and Inferentia so small and large companies can afford to '
 'train and run their LLMs in production. AWS also enables companies to choose '
 'from various LLMs and build applications with all of the AWS security, '
 'privacy and other features that customers are accustomed to using. For '
 'example, AWS has released CodeWhisperer, which revolutionizes developer '
 'productivity by generating code suggestions in real-time.')


In [7]:
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
         contexts.append(reference["content"]["text"])

pp.pprint(contexts)

[ 'We believe that we’ve only scratched the surface of what’s possible to '
  'date, and plan to keep building the features our business customers tell us '
  'they need and want.     While many brands and merchants successfully sell '
  'their products on Amazon’s marketplace, there are also a large number of '
  'brands and sellers who have launched their own direct-to-consumer websites. '
  'One of the challenges for these merchants is driving conversion from views '
  'to purchases. We invented Buy with Prime to help with this challenge. Buy '
  'with Prime allows third-party brands and sellers to offer their products on '
  'their own websites to our large Amazon Prime membership, and offer those '
  'customers fast, free Prime shipping and seamless checkout with their Amazon '
  'account. Buy with Prime provides merchants several additional benefits, '
  'including Amazon handling the product storage, picking, packing, delivery, '
  'payment, and any returns, all through Amazon P

## Next Steps

If you want more customized experience, you can use `Retrieve API`. This API converts user queries into embeddings, searches the knowledge base, and returns the relevant results, giving you more control to build custom workflows on top of the semantic search results. 
For sample code, try following notebooks: 
- [2_Langchain-rag-retrieve-api-mistral-and-claude-3-haiku.ipynb](./2_Langchain-rag-retrieve-api-mistral-and-claude-3-haiku.ipynb) - it calls the `retrieve` API to get relevant contexts and then augment the context to the prompt, which you can provide as input to any text-text model provided by Amazon Bedrock. 
  
- You can use the RetrieveQA chain from LangChain and add Knowledge Base as retriever. For sample code, try notebook: [3_Langchain-rag-retrieve-api-claude-3.ipynb](./3_Langchain-rag-retrieve-api-claude-3.ipynb)



<div class="alert alert-block alert-warning">
<b>Next steps:</b> Proceed to the next labs to learn how to use Bedrock Knowledge bases with Langchain and Claude. Remember to CLEAN_UP at the end of your session.
</div>