## Building Q&A application using Amazon Bedrock Knowledge Bases - RetrieveAndGenerate API
### Context

With Amazon Bedrock Knowledge Bases, you can securely connect foundation models (FMs) in Amazon Bedrock to your company
data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant,
context-speciﬁc, and accurate responses without continuously retraining the FM. All information retrieved from
Knowledge Bases comes with source attribution to improve transparency and minimize hallucinations. For more information on creating a Knowledge Base using the console, please refer to this [post](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html).

In this notebook, we will dive deep into building a Q&A application using `RetrieveAndGenerate` API provided by Amazon Bedrock Knowledge Bases. This API will query the Knowledge Base to get the desired number of document chunks based on similarity search, integrate it with Large Language Model (LLM) for answering questions.


### Pattern

We can implement the solution using Retreival Augmented Generation (RAG) pattern. RAG retrieves data from outside the language model and augments the prompts by adding the relevant retrieved data in context. Here, we are performing RAG effectively on the Knowledge Base created in the previous notebook or using console. 

### Pre-requisite

Before being able to answer the questions, the documents must be processed and stored in Knowledge Base.

1. Load the documents into the Knowledge Base by connecting your s3 bucket (data source). 
2. Ingestion - Knowledge Base will split them into smaller chunks (based on the strategy selected), generate embeddings and store it in the associated vectore store and notebook [01_create_ingest_documents_test_kb.ipynb](./01_create_ingest_documents_test_kb.ipynb) takes care of it for you.

![data_ingestion.png](./images/data_ingestion.png)


#### Notebook Walkthrough

For our notebook we will use the `RetrieveAndGenerate API` provided by Amazon Bedrock Knowledge Bases which converts user queries into
embeddings, searches the Knowledge Base, get the relevant results, augment the prompt and then invoking a LLM to generate the response. 

We will use the following workflow for this notebook. 

![retrieveAndGenerate.png](./images/retrieveAndGenerate.png)

#### Use Case

In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on. This data is already ingested into the Knowledge Base. You will need the `Knowledge Base id` and `model ARN` to run this example. We are using `Amazon Nova Lite` model for generating responses to user questions.

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [None]:
%store -r

In [None]:
import boto3
import pprint
from botocore.client import Config
import os

pp = pprint.PrettyPrinter(indent=2)
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime", config=bedrock_config)
region_name = os.environ.get("AWS_DEFAULT_REGION", "us-east-1")

import sys
sys.path.append('../')
from util.model_selector import create_text_model_selector

# Create interactive model selector
model_selector = create_text_model_selector().display()
# Get the selected model from our unified selector
selected_model = model_selector.get_model_id()

### Retrieve API
Retrieve API converts user queries into embeddings, searches the Knowledge Base, and returns the relevant results, giving you more control to build custom workﬂows on top of the semantic search results. The output of the Retrieve API includes the the retrieved text chunks, the location type and URI of the source data, as well as the relevance scores of the retrievals.

In [None]:
def retrieve(input, kb_id):
    # retrieve api for fetching only the relevant context.
    relevant_documents = bedrock_agent_client.retrieve(
        retrievalQuery= {
            'text': input
        },
        knowledgeBaseId=kb_id,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': 3 # will fetch top 3 documents which matches closely with the query.
            }
        }
    )
    return relevant_documents["retrievalResults"]


In [None]:
query = "What is Amazon's doing in the field of generative AI?"

response = retrieve(query, kb_id)
pp.pprint(response)

## RetrieveAndGenerate API
Behind the scenes, `RetrieveAndGenerate` API converts queries into embeddings, searches the Knowledge Base, and then augments the foundation model prompt with the search results as context information and returns the FM-generated response to the question. For multi-turn conversations, Knowledge Bases manage short-term memory of the conversation to provide more contextual results. 

The output of the `RetrieveAndGenerate` API includes the   `generated response`, `source attribution` as well as the `retrieved text chunks`. 

In [None]:
def retrieveAndGenerate(input, kb_id, sessionId=None, model_id = selected_model):
    if sessionId:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kb_id,
                    'modelArn': model_id
                }
            },
            sessionId=sessionId
        )
    else:
        return bedrock_agent_client.retrieve_and_generate(
            input={
                'text': input
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kb_id,
                    'modelArn': model_id
                }
            }
        )

In [None]:
response = retrieveAndGenerate(query, kb_id, model_id=selected_model)
generated_text = response['output']['text']
pp.pprint(generated_text)

In [None]:
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
         contexts.append(reference["content"]["text"])

pp.pprint(contexts)