# Amazon Bedrock Knowledge base with Kendra GenAI index

The new Amazon Kendra GenAI Enterprise Edition index offers the highest accuracy for the Retrieve API operation and for Retrieval Augmented Generation (RAG) use cases. It's powered by the latest information retrieval technologies—like hybrid search (keyword and vector), semantic embedding, and re-ranker models—and has been tested across a variety of datasets. Amazon Kendra GenAI Enterprise Edition index enables mobility of your indexed data across AWS generative AI services. With this functionality, you can seamlessly reuse your investments without having to rebuild indexes. For more details on the new index type see [kendra index types]( https://docs.aws.amazon.com/kendra/latest/dg/hiw-index-types.html#kendra-gen-ai-index)


You can now use Kendra in an Amazon Bedrock knowledge base as a managed retriever, and integrate it with Amazon Bedrock tools like agents and prompt flows to build advanced AI assistants with the new Amazon Kendra GenAI Enterprise Edition index.


## Notebook Overview

In this notebook, will create a Bedrock Knowledge base with an associated Kendra-GenAI index and test the retrieval with the familiar Bedrock __retrieve__ and __retrieve_and_generate__ api. We will first create the policies and roles needed for the kendra index and the Bedrock Knowledge Base. Then we will move onto creating the Kendra-GenAI index and Bedrock Knowledge base, associating our newly created Kendra GenAI index with the KB. 

For more details See [building a Bedrock Knowledge Base with a Kendra-GenAI index](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-kendra-genai-index-create.html)

## 1. Setup
Before running the rest of this notebook, you'll need to run the cells below to (ensure necessary libraries are installed and) connect to Bedrock.

Please ignore any pip dependency error (if you see any while installing libraries)

In [None]:
%pip install --upgrade pip --quiet
%pip install -r ../requirements.txt --no-deps --quiet
%pip install -r ../requirements.txt --upgrade --quiet

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [None]:
import boto3
print(boto3.__version__)

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
import sys
import time
import boto3
import logging
import pprint
import json

# Set the path to import module
from pathlib import Path
current_path = Path().resolve()
current_path = current_path.parent
if str(current_path) not in sys.path:
    sys.path.append(str(current_path))
# Print sys.path to verify
# print(sys.path)

from utils.managed_knowledge_base import BedrockManagedKnowledgeBase
from utils.knowledge_base_operators import create_kedra_genai_index_role, create_bucket, upload_to_s3

In [None]:
#Clients
s3_client = boto3.client('s3')
kendra_client=boto3.client('kendra')
sts_client = boto3.client('sts')
session = boto3.session.Session()
iam_client = session.client('iam')
region =  session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime') 
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)
region, account_id

## 2. Create Kendra genAI index

Note: If you already have a Kendra-Q index created, you can proceed to the __Create Knowledge Base__ section using your kendra_index_id in the creations process.

In [None]:
kendra_role_name=f'kendra-role-genai-{account_id}'
kendra_ds_name=f'kendra-ds-genai-index-kb'
bucket_name=f'kendra-kb--genai-datasource-{account_id}'

### Create role and policies needed for Kendra-GenAI index

In [None]:
kedra_genai_index_role = create_kedra_genai_index_role(kendra_role_name = kendra_role_name, bucket_name=bucket_name, account_id = account_id)
kedra_genai_index_role_arn = kedra_genai_index_role['Role']['Arn']

### Creating a Kendra-GenAI Index

For Amazon Bedrock knowledge bases you can leverage an already created Kendra-GenAI index or create your own. For more details on creating an Index see [Kendra Index Creation](
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kendra/client/create_index.html). In this notebook, we will create a new index.

 
<div class="alert alert-block alert-info">
<b>Note:</b>  The index creation process may take up to 40 minutes to complete.


In [None]:
response = kendra_client.create_index(
    Name= 'sandeep-index', # enter a name for the index as a str
    Edition='GEN_AI_ENTERPRISE_EDITION',
    RoleArn=kedra_genai_index_role_arn)


Once Index is created successfully, let's grab the Kendra index id

In [None]:
kendra_index_id=response['Id']
print(kendra_index_id)

### Data source creation and ingestion

In [None]:
# Create S3 bucket
response=create_bucket(bucket_name)

# Upload the data to S3 bucket
upload_to_s3("../synthetic_dataset", bucket_name)

In [None]:
# create s3 data source
response = kendra_client.create_data_source(
    Name=kendra_ds_name,
    IndexId=kendra_index_id,
    Type='S3',
    RoleArn=kedra_genai_index_role_arn,
    Configuration={
        'S3Configuration': {
            'BucketName': bucket_name 
        }})

In [None]:
kendra_data_source_id=response['Id']
kendra_data_source_id

In [None]:
# create data source sync job

response = kendra_client.start_data_source_sync_job(
    Id=kendra_data_source_id,
    IndexId=kendra_index_id
)

time.sleep(60)

## 3. Create Knowledge Base with Kendra GenAI Index

Here we will create the policies and roles for the Amazon Bedrock Knowledge Base. The we will define a helper function with the new __knowledgeBaseConfiguration__ values. An example is show below:

knowledgeBaseConfiguration = {
            "kendraKnowledgeBaseConfiguration": {
               "kendraIndexArn": f"arn:aws:kendra:us-east-1:{acct_id}:index/{kendra_index_id}"
        },
               "type": "KENDRA"
             } 

In [None]:
# [OPTINAL]: Uncomment this cell if you want to use existing Kendra GenAI index

kendra_index_id = '<insert-your-kendra-index-existing-id' 
kendra_data_source_id ='<insert-your-kendra-data-source-id>' 

In [None]:
import time

# Get the current timestamp
current_time = time.time()

# Format the timestamp as a string
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(current_time))[-7:]
# Create the suffix using the timestamp
suffix = f"{timestamp_str}"

In [None]:
knowledge_base_name = 'kb-with-kendra-index-filters-kb'
knowledge_base_description = "Knowledge Base with Kendra GenAI index."
foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

# kb Configuration
kbConfigParam = {
                "kendraKnowledgeBaseConfiguration": 
                    { 
                        "kendraIndexArn": f"arn:aws:kendra:{region}:{account_id}:index/{kendra_index_id}"
                    },
                    "type": "KENDRA"
                }

In [None]:
knowledge_base = BedrockManagedKnowledgeBase(
    kb_name=f'{knowledge_base_name}',
    kb_description=knowledge_base_description,
    kendra_index=kendra_index_id,
    kbConfigParam = kbConfigParam,
    suffix= f'{suffix}-f'
)

In [None]:
# keep the kb_id for invocation later in the invoke request
kb_id = knowledge_base.get_knowledge_base_id()

## 4. Testing the Kendra-GenAI Bedrock Knowledge Base
Now that Knowledge Base with a kendra-GenAI index is available we can test it out using the [**retrieve**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve.html) and [**retrieve_and_generate**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html) functions. 



#### Testing Knowledge Base with Retrieve and Generate API

Let's first test the knowledge base using the retrieve and generate API. With this API, Bedrock takes care of retrieving the necessary references from the knowledge base and generating the final answer using a Bedrock LLM.

In [None]:
query = "Provide a summary of consolidated statements of cash flows of Octank Financial for the fiscal years ended December 31, 2019?"

In [None]:
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": f"arn:aws:bedrock:{region}::foundation-model/{foundation_model}",
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

As you can see, with the retrieve and generate API we get the final response directly and we don't see the different sources used to generate this response. Let's now retrieve the source information from the knowledge base with the retrieve API.

#### Testing Knowledge Base with Retrieve API
If you need an extra layer of control, you can retrieve the chuncks that best match your query using the retrieve API. In this setup, we can configure the desired number of results and control the final answer with your own application logic. The API then provides you with the matching content, its S3 location, the similarity score and the chunk metadata

In [None]:
response_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id,
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":3,
        } 
    },
    retrievalQuery={
        'text': "What is the patients name?"
    }
)

def response_print(retrieve_resp):
    #structure 'retrievalResults': list of contents. Each list has content, location, score, metadata
    for num,chunk in enumerate(response_ret['retrievalResults'],1):
        print('-----------------------------------------------------------------------------------------')
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

response_print(response_ret)

 In this notebook, we created an Amazon Bedrock Knowledge Base with a Kendra-GenAI index and tested its functionality with the __retrieve__ and __retrieve_and_generate API__. We invite you to go deeper and experiment wih the benefits an Amazon Knowledge Base leveraging a kendra-GenAI index can bring to your use-case. For additional resources see [Amazon Bedrock Samples](https://github.com/aws-samples/amazon-bedrock-samples/tree/main/rag/knowledge-bases)

### Clean Up
Please make sure to uncomment and run the below section to delete all the resources

In [None]:
# # Delete resources
# print("===============================Deleteing resources ==============================\n")
knowledge_base.delete_kb(delete_iam_roles_and_policies=True)