# Structured RAG using Amazon Bedrock Knowledge Bases - End to end example using Amazon Redshift DB

Structure RAG allows Amazon Bedrock Knowledge Bases customers to query structured data in Redshift using natural language, and receive natural language responses summarizing the data thereby providing an answer to the user question.

Using advanced natural language processing, Bedrock Knowledge Bases can transform natural language queries into SQL queries, allowing users to retrieve data directly from the source without the need to move or preprocess the data. o generate accurate SQL queries, Bedrock Knowledge Base leverages database schema, previous query history, and other contextual information that are provided about the data sources.


This notebook provides sample code for building a Structured RAG using Amazon Bedrock Knowledgebases using Redshift.


#### Steps: 
- Create Knowledge Base execution role with necessary policies for accessing data from Amazon Redshift.
- Create a knowledge base with Structured database (Redshift database)
- Create data source(s) within knowledge base
- Start ingestion jobs using KB APIs which will read metadata about structred database and ingest into QDBC.

Once the metadata is extracted and ingested, then user can interact with Structured databases via Amazon Bedrock Knowledge Base APIs using Natural language query.




<div class="alert alert-block alert-warning">
<b>NOTE:</b> There're multiple access pattern for Redshift provisioned cluster: 
<br><br>
1.   Secrets Manager + Redshift Serverless WorkGroup
<br> 
2.   IAM Role + Redshift Serverless WorkGroup 
<br>
3.   DB user + Redshift Cluster
<br>
4.   IAM Role + Redshift Cluster
<br>
5.   Secrets Manager + Redshift Cluster 
<br>


In this notebook, we'll focus on,  `Secrets Manager + Redshift Serverless WorkGroup` access pattern.
</div>

#### Pre-requisites
This notebook requires :
- A Redshift cluster or workgroup already setup with your structured data ingested into the cluster
- You've set-up the secrets manager for Workgroup & Secrets access pattern


### 0 - Setup
Before running the rest of this notebook, you'll need to run the cells below to (ensure necessary libraries are installed and) connect to Bedrock.

Please ignore any pip dependency error (if you see any while installing libraries)

In [None]:
# %pip install --force-reinstall -q -r ../../requirements.txt --quiet

# %pip install boto3 opensearch-py botocore awscli retrying ragas=0.1.9 ipywidgets  iprogressretrying langchain langchain_aws langchain_community --quiet 

In [None]:
# %pip install --upgrade boto3
import boto3
print(boto3.__version__)

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
import sys
import time
import boto3
import logging
import pprint
import json

# SCRIPT_DIR = os.path.dirname(os.path.abspath("./"))
# sys.path.append(os.path.dirname(SCRIPT_DIR))

from structured_knowledge_base import BedrockStructuredKnowledgeBase


In [None]:
#Clients
s3_client = boto3.client('s3')
sts_client = boto3.client('sts')
session = boto3.session.Session()
region =  session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime') 
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)
region, account_id

In [6]:
import time

# Get the current timestamp
current_time = time.time()

# Format the timestamp as a string
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(current_time))[-7:]
# Create the suffix using the timestamp
suffix = f"{timestamp_str}"

knowledge_base_name = f"bedrock-sample-structured-kb-{suffix}"
knowledge_base_description = "Sample Structured KB"


foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

In [20]:
# Change below variables as needed

workgroup_id = 'ad3a0804-6ecf-4636-a798-a00c7f1293a7'

redshiftDBName = "sample_data_dev"
workgroupArn =  f"arn:aws:redshift-serverless:{region}:{account_id}:workgroup/{workgroup_id}"
SecretArn = "arn:aws:secretsmanager:us-east-1:017444429555:secret:redshift-riv-test-user-h86sJH"

# kb Configuration
kbConfigParam = {
            "type": "SQL",
            "sqlKnowledgeBaseConfiguration": {
                "type": "REDSHIFT",
                "redshiftConfiguration": {
                    "storageConfigurations": [{
                        "type": "REDSHIFT",
                        "redshiftConfiguration": {
                            "databaseName": redshiftDBName
                        }
                    }],
                    "queryEngineConfiguration": {
                        "type": "SERVERLESS",
                        "serverlessConfiguration": {
                            "workgroupArn": workgroupArn,
                            "authConfiguration": {}
                        }
                    }
                }
            }
        }

Redshift acces pattern choice

In [None]:
import ipywidgets as widgets
from IPython.display import display

# Display the choices to the user
print("Choose your Redshift access pattern:")
print("1. Secrets Manager + Redshift Serverless WorkGroup")
print("2. IAM Role + Redshift Serverless WorkGroup")

# Create a text input widget
choice_widget = widgets.Text(placeholder="Enter your choice (1 Only)")

# Display the widget
display(choice_widget)


### 1 - Create Knowledge Base

In [None]:
# Access the entered value
access_pattern_choice = int(choice_widget.value)
try:
    access_pattern_choice = int(choice_widget.value)
    if access_pattern_choice ==1:
        print(f"Access pattern:{choice_widget.value}. Secrets Manager + Redshift Serverless WorkGroup")

        kbConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['serverlessConfiguration']['authConfiguration']['type'] = "USERNAME_PASSWORD"
        kbConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['serverlessConfiguration']['authConfiguration']['usernamePasswordSecretArn'] = SecretArn

        knowledge_base = BedrockStructuredKnowledgeBase(
                        kb_name=f'{knowledge_base_name}',
                        kb_description=knowledge_base_description,
                        workgroup_arn=workgroupArn,
                        secrets_arn = SecretArn,
                        kbConfigParam = kbConfigParam,
                        suffix = f'{suffix}-f'
                    )
       
    if access_pattern_choice ==2:
        print(f"Access pattern:{choice_widget.value} IAM Role + Redshift Serverless WorkGroup")

        kbConfigParam['sqlKnowledgeBaseConfiguration']['redshiftConfiguration']['queryEngineConfiguration']['serverlessConfiguration']['authConfiguration']['type'] = "IAM"
        
        knowledge_base = BedrockStructuredKnowledgeBase(
                        kb_name=f'{knowledge_base_name}',
                        kb_description=knowledge_base_description,
                        workgroup_arn=workgroupArn,
                        kbConfigParam = kbConfigParam,
                        suffix = f'{suffix}-f'
                    )
        
    else:
        print("Invalid choice. Please enter a number between 1 and 2.")
except ValueError:
    print("Invalid input. Please enter a number.")

### 2 - Sync query engine

In [None]:
# ensure that the kb is available
time.sleep(60)
# sync knowledge base
knowledge_base.start_ingestion_job()

In [None]:
# keep the kb_id for invocation later in the invoke request
kb_id = knowledge_base.get_knowledge_base_id()
%store kb_id

### 4 -  Test the Structured Knowledge Base
Now the Knowlegde Base is available we can test it out using the [**retrieve**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve.html), [**retrieve_and_generate**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html), and [**generate_query**](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_GenerateQuery.html) functions. 

When you use [**retrieve**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve.html), the response returns the result of the SQL query execution. 

When you use [**retrieve_and_generate**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html), the generated response is based on the result of the SQL query execution

When using the [**generate_query**](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_GenerateQuery.html) API, it transforms a natural language query into SQL.



In [28]:
query = "What are the venues are there in new york city?"

#### 4.1 - Using RetrieveAndGenerate API

In [None]:
foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

#### 4.2 -  Using Retrieve API

In [30]:
response_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id, 
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":5,
        } 
    },
    retrievalQuery={
        "text": query
    }
)

In [31]:
import json
import pandas as pd

#Function to extract retrieved results from Retrieve API response

def response_print(retrieve_resp):

    # Extract the retrievalResults list
    retrieval_results = retrieve_resp['retrievalResults']

    # Dictionary to store the extracted data
    extracted_data = {}

    # Iterate through each item in retrievalResults
    for item in retrieval_results:
        row = item['content']['row']
        for col in row:
            column_name = col['columnName']
            column_value = col['columnValue']
            
            # If this column hasn't been seen before, create a new list for it
            if column_name not in extracted_data:
                extracted_data[column_name] = []
            
            # Append the value to the appropriate list
            extracted_data[column_name].append(column_value)

    # Create a DataFrame from the extracted data
    df = pd.DataFrame(extracted_data)

    return df
    

In [None]:
# Display the Retrieved results records
df = response_print(response_ret)
print(df.head())

#### 4.3 - Using Generate Query

In [None]:
query_response = bedrock_agent_runtime_client.generate_query(
    queryGenerationInput={
        "text": query,
        "type": "TEXT"
    },
    transformationConfiguration={
        "mode" : "TEXT_TO_SQL",
        "textToSqlConfiguration": {
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseArn": knowledge_base.knowledge_base['knowledgeBaseArn']
            }
        }
    }
)

generated_sql = query_response['queries'][0]['sql']
generated_sql

### Clean Up
Please make sure to uncomment and run the below section to delete all the resources

In [None]:
# # Delete resources
# print("===============================Deleteing resources ==============================\n")
knowledge_base.delete_kb( delete_iam_roles_and_policies=True)