# Structured RAG using Amazon Bedrock Knowledge Bases : end-to-end example using Amazon Redshift DB

Structure RAG allows Amazon Bedrock Knowledge Bases customers to query structured data in Redshift using natural language, and receive natural language responses summarizing the data thereby providing an answer to the user question.

Using advanced natural language processing, Amazon Bedrock Knowledge Bases can transform natural language queries into SQL queries, allowing users to retrieve data directly from the source without the need to move or preprocess the data. To generate accurate SQL queries, Bedrock Knowledge Base leverages database schema, previous query history, and other contextual information that are provided about the data sources. For more details. please see the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-structured.html).


This notebook provides sample code for building a Structured RAG using Amazon Bedrock Knowledgebases using Redshift.


#### Steps: 
- Create Knowledge Base execution role with necessary policies for accessing data from Amazon Redshift.
- Create a knowledge base with Structured database (Redshift database)
- Create data source(s) within knowledge base
- Start ingestion jobs using KB APIs which will read metadata about structred database.

Once the metadata is extracted and ingested, then user can interact with Structured databases via Amazon Bedrock Knowledge Base APIs using Natural language query.

#### Pre-requisites
This notebook requires :
- A Redshift serverless cluster with a workgroup  [OR] Redshift provisioned cluster 
- Your workgroup or cluster is already setup with your structured data ingested
- You've set-up the IAM Role [OR] Secrets manager with User Credentials [OR] the DB User 

To read more details about pre-requisites, see the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq-structured.html)


### 0 - Setup
Before running the rest of this notebook, you'll need to run the cells below to ensure necessary libraries are installed and connect to Bedrock.

Please ignore any pip dependency error (if you see any while installing libraries)

In [None]:
%pip install --upgrade pip --quiet
%pip install -r ../requirements.txt --no-deps --quiet
%pip install -r ../requirements.txt --upgrade --quiet

In [None]:
# %pip install --upgrade boto3
import boto3
print(boto3.__version__)

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [4]:
%load_ext autoreload
%autoreload 2

In [5]:
import warnings
warnings.filterwarnings('ignore')

This code is part of the setup and used to :
- Add the parent directory to the python system path
- Imports a custom module (BedrockStructuredKnowledgeBase) from `utils` necessary for later executions

In [None]:
import sys
import logging
from pathlib import Path

current_path = Path().resolve()
current_path = current_path.parent

if str(current_path) not in sys.path:
    sys.path.append(str(current_path))

# Print sys.path to verify
print(sys.path)

from utils.structured_knowledge_base import BedrockStructuredKnowledgeBase

Setup and initialize boto3 clients 

In [None]:
s3_client = boto3.client('s3')
sts_client = boto3.client('sts')
session = boto3.session.Session(region_name='us-east-1')
region =  session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime') 
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)
region, account_id

Initialize and configure the knowledge base name and the foundational model. This foundational model will be used to generate the natural language response based on the records received from the structured data store.

In [29]:
import time

# Get the current timestamp
current_time = time.time()

# Format the timestamp as a string
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(current_time))[-7:]
# Create the suffix using the timestamp
suffix = f"{timestamp_str}"

knowledge_base_name = f"bedrock-sample-structured-kb-{suffix}"
knowledge_base_description = "Sample Structured KB"


foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

Amazon Bedrock Knowledge Bases uses a service role to connect knowledge bases to structured data stores, retrieve data from these data stores, and generate SQL queries based on user queries and the structure of the data stores. There are several access patterns based on if you're using Redshift Serverless vs Redshift Provisioned Cluster.

<div class="alert alert-block alert-warning">
<b>NOTE:</b> Below are several access patterns based on if you're using Redshift Serverless vs Redshift Provisioned Cluster.
<br><br>
1.   Secrets Manager + Redshift Serverless WorkGroup
<br> 
2.   IAM Role + Redshift Serverless WorkGroup 
<br>
3.   IAM Role + Redshift Cluster
<br>
4.   Secrets Manager + Redshift Cluster 
<br>
5.   DB user + Redshift Cluster
<br>

In this notebook, we'll look at all these access pattern.
</div>

Please note that, you will have to initialize only the variables that are necessary based on following access patterns:

1. Secrets Manager + Redshift Serverless WorkGroup  
`workgroup_id`  
`secretArn`

2.   IAM Role + Redshift Serverless WorkGroup  
`workgroup_id`  
`redshiftDBName`

3.   IAM Role + Redshift Provisioned Cluster  
`provisioned_cluster_identifier`  
`provisioned_cluster_dbname`

4.   Secrets Manager + Redshift Provisioned Cluster  
`provisioned_cluster_identifier`  
`provisioned_cluster_dbname`  
`provisionedSecretArn`

5.   DB user + Redshift Provisioned Cluster  
`provisioned_cluster_identifier`  
`provisioned_cluster_dbname`  
`databaseUser`

Knowledge base configuration and the roles created will use these parameters to perform necessary configuration. Any additional steps required as part of execution will be covered as part of execution steps

#### Redshift acces pattern choice
Select the choice that aligns with your setup. 

In [None]:
import ipywidgets as widgets
from IPython.display import display

# Display the choices to the user
print("Choose your Redshift access pattern:")
print("1. Secrets Manager + Redshift Serverless WorkGroup")
print("2. IAM Role + Redshift Serverless WorkGroup")
print("3. IAM Role + Redshift Provisioned Cluster")
print("4. Secrets Manager + Redshift Provisioned Cluster")
print("5. DB User + Redshift Provisioned Cluster")

# Create a text input widget
choice_widget = widgets.Text(placeholder="Enter your choice (1 Only)")

# Display the widget
display(choice_widget)


In [31]:
# Change below variables as needed

#Redshift Serverless Cluster configuration details
workgroup_id = '<enter-redshift-serverless-workgroup-id>'
redshiftDBName = "<enter-redshift-serverless-database-name"
workgroupArn =  f"arn:aws:redshift-serverless:{region}:{account_id}:workgroup/{workgroup_id}"
secretArn = "<enter-secret-key-arn>"

#Redshift Provisioned Cluster configuration details
provisioned_cluster_identifier = "<enter-provisioned-cluster-identifier>"
provisioned_cluster_dbname = "<enter-provisioned-cluster-database-name>"
#Secrets manager ARN , this value is required to use "Secrets Manager + Redshift Provisioned Cluster" access pattern 
provisionedSecretArn = "<enter-secret-key-arn>"
#Redshift Database UserName, this value is required to use "DB User + Redshift Provisioned Cluster" access pattern
databaseUser = "<enter-db-username>"

# kb Configuration
kbConfigParam = {
            "type": "SQL",
            "sqlKnowledgeBaseConfiguration": {
                "type": "REDSHIFT",
                "redshiftConfiguration": {
                    "storageConfigurations": [{
                        "type": "REDSHIFT",
                        "redshiftConfiguration": {
                            "databaseName": redshiftDBName
                        }
                    }],
                    "queryEngineConfiguration": {
                        "type": "SERVERLESS",
                        "serverlessConfiguration": {
                            "workgroupArn": workgroupArn,
                            "authConfiguration": {}
                        }
                    }
                }
            }
        }

kbProvisionedConfigParam = {
    "type": "SQL",
    "sqlKnowledgeBaseConfiguration": {
        "type": "REDSHIFT",
            "redshiftConfiguration": {
                "queryEngineConfiguration": {
                    "type": "PROVISIONED",
                    "provisionedConfiguration": {
                        "authConfiguration": {},
                        "clusterIdentifier": provisioned_cluster_identifier
                    }
                },
                "storageConfigurations": [{
                        "redshiftConfiguration": {
                            "databaseName": provisioned_cluster_dbname
                        },
                        "type": "REDSHIFT"
                }]
            } 
        }  
}

### 1 - Create Knowledge Base

Based on the selected access pattern, 
1. This code updates the knowledge base configuration with additional parameters. For example, if the access pattern is slected as `IAM Role + Redshift Serverless ` , the `[authCofiguration][type]` parameter willbe updated as "IAM"
2. Then it creates the knowledge base 

In [None]:
# Access the entered value
access_pattern_choice = int(choice_widget.value)
try:
    access_pattern_choice = int(choice_widget.value)
    if access_pattern_choice ==1:
        print(f"Access pattern:{choice_widget.value}. Secrets Manager + Redshift Serverless WorkGroup")

        kbConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['serverlessConfiguration']['authConfiguration']['type'] = "USERNAME_PASSWORD"
        kbConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['serverlessConfiguration']['authConfiguration']['usernamePasswordSecretArn'] = secretArn

        knowledge_base = BedrockStructuredKnowledgeBase(
                        kb_name=f'{knowledge_base_name}',
                        kb_description=knowledge_base_description,
                        workgroup_arn=workgroupArn,
                        secrets_arn = secretArn,
                        kbConfigParam = kbConfigParam,
                        suffix = f'{suffix}-f'
                    )
       
    elif access_pattern_choice ==2:
        print(f"Access pattern:{choice_widget.value} IAM Role + Redshift Serverless WorkGroup")

        kbConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['serverlessConfiguration']['authConfiguration']['type'] = "IAM"
        
        knowledge_base = BedrockStructuredKnowledgeBase(
                        kb_name=f'{knowledge_base_name}',
                        kb_description=knowledge_base_description,
                        workgroup_arn=workgroupArn,
                        kbConfigParam = kbConfigParam,
                        suffix = f'{suffix}-f'
                    )
    
    elif access_pattern_choice == 3:
        print(f"Access pattern:{choice_widget.value} IAM Role + Redshift Provisioned Cluster")

        kbProvisionedConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['provisionedConfiguration']['authConfiguration']['type'] = "IAM"
        
        knowledge_base = BedrockStructuredKnowledgeBase(
                        kb_name=f'{knowledge_base_name}',
                        kb_description=knowledge_base_description,
                        cluster_identifier=provisioned_cluster_identifier,
                        db_name=provisioned_cluster_dbname,
                        kbConfigParam = kbProvisionedConfigParam,
                        suffix = f'{suffix}-f'
                    )
        
    elif access_pattern_choice == 4:
        print(f"Access pattern:{choice_widget.value} Secrets Manager + Redshift Provisioned Cluster")

        kbProvisionedConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['provisionedConfiguration']['authConfiguration']['type'] = "USERNAME_PASSWORD"
        kbProvisionedConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['provisionedConfiguration']['authConfiguration']['usernamePasswordSecretArn'] = provisionedSecretArn
        
        knowledge_base = BedrockStructuredKnowledgeBase(
                        kb_name=f'{knowledge_base_name}',
                        kb_description=knowledge_base_description,
                        cluster_identifier=provisioned_cluster_identifier,
                        db_name=provisioned_cluster_dbname,
                        secrets_arn = provisionedSecretArn,
                        kbConfigParam = kbProvisionedConfigParam,
                        suffix = f'{suffix}-f'
                    )  
    elif access_pattern_choice == 5:
        print(f"Access pattern:{choice_widget.value} DB User + Redshift Provisioned Cluster")

        kbProvisionedConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['provisionedConfiguration']['authConfiguration']['type'] = "USERNAME"
        kbProvisionedConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['provisionedConfiguration']['authConfiguration']['databaseUser'] = databaseUser

        knowledge_base = BedrockStructuredKnowledgeBase(
                        kb_name=f'{knowledge_base_name}',
                        kb_description=knowledge_base_description,
                        cluster_identifier=provisioned_cluster_identifier,
                        db_name=provisioned_cluster_dbname,
                        db_user=databaseUser,
                        kbConfigParam = kbProvisionedConfigParam,
                        suffix = f'{suffix}-f'
                    )   
    else:
        print("Invalid choice. Please enter a number between 1 and 2.")
except ValueError:
    print("Invalid input. Please enter a number.")

### For IAM access pattern, Grant database access to the role you use for authentication

You can use below sample SQL Statement to create user and provide access. Please update the GRANT statement based on the level of access that needs to be provided for your dataset. 

For more detailed steps, please see the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq-structured.html#knowledge-base-prereq-structured-db-access)

In [None]:
#Following SQL commands should be executed in Redshift Query Editor
print(f'CREATE USER "IAMR:{knowledge_base.bedrock_kb_execution_role_name}" WITH PASSWORD DISABLE;')
print(f'GRANT SELECT ON ALL tables IN SCHEMA public TO "IAMR:{knowledge_base.bedrock_kb_execution_role_name}";')

### 2 - Start the ingestion job

This step is to start the ingestion job to sync the datasources. 

In [None]:
# ensure that the kb is available
time.sleep(60)
# sync knowledge base
knowledge_base.start_ingestion_job()

In [None]:
# keep the kb_id for invocation later in the invoke request
kb_id = knowledge_base.get_knowledge_base_id()
%store kb_id

### 3 -  Test the Structured Knowledge Base
Now the Knowlegde Base is available we can test it out using the [**retrieve**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve.html), [**retrieve_and_generate**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html), and [**generate_query**](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_GenerateQuery.html) functions. 

When you use [**retrieve**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve.html), the response returns the result of the SQL query execution. 

When you use [**retrieve_and_generate**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html), the generated response is based on the result of the SQL query execution

When using the [**generate_query**](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_GenerateQuery.html) API, it transforms a natural language query into SQL.



In [None]:
query = "<Enter your sample query here>"

#### 3.1 - Using RetrieveAndGenerate API

In [None]:
foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

#### 3.2 -  Using Retrieve API

In [40]:
response_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id, 
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":5,
        } 
    },
    retrievalQuery={
        "text": query
    }
)


In [None]:
import json
import pandas as pd

# Function to extract retrieved results from Retrieve API response into a pandas dataframe.

def response_print(retrieve_resp):

    # Extract the retrievalResults list
    retrieval_results = retrieve_resp['retrievalResults']

    # Dictionary to store the extracted data
    extracted_data = {}

    # Iterate through each item in retrievalResults
    for item in retrieval_results:
        row = item['content']['row']
        for col in row:
            column_name = col['columnName']
            column_value = col['columnValue']
            
            # If this column hasn't been seen before, create a new list for it
            if column_name not in extracted_data:
                extracted_data[column_name] = []
            
            # Append the value to the appropriate list
            extracted_data[column_name].append(column_value)

    # Create a DataFrame from the extracted data
    df = pd.DataFrame(extracted_data)

    return df
    

In [None]:
# Display the Retrieved results records
df = response_print(response_ret)
print(df.head())

#### 3.3 - Using Generate Query

In [None]:
query_response = bedrock_agent_runtime_client.generate_query(
    queryGenerationInput={
        "text": query,
        "type": "TEXT"
    },
    transformationConfiguration={
        "mode" : "TEXT_TO_SQL",
        "textToSqlConfiguration": {
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseArn": knowledge_base.knowledge_base['knowledgeBaseArn']
            }
        }
    }
)

generated_sql = query_response['queries'][0]['sql']
generated_sql

### Clean Up
Please make sure to uncomment and run the below section to delete all the resources

In [None]:
# # Delete resources
# print("===============================Deleteing resources ==============================\n")
knowledge_base.delete_kb( delete_iam_roles_and_policies=True)