## Naive RAG Lab

In this lab, we will build an naive RAG solution leveraging pre-trained foundation models and the knowledge base feature from Amazon Bedrock. Doing so, you will learn various components that makes up a RAG system.

## Pre-req
You must run the notebook in [lab01](../lab01-managed-rag/) to load the necessary data to S3, and stor the parameters for this lab.

In [12]:
import warnings
warnings.warn("Warning: if you did not run lab01, please go back and run the lab01 notebook!") 



## Setup

In [13]:
%pip install -U opensearch-py==2.3.1
%pip install -U retrying==1.3.4

Collecting opensearch-py==2.3.1
  Downloading opensearch_py-2.3.1-py2.py3-none-any.whl.metadata (6.9 kB)
Downloading opensearch_py-2.3.1-py2.py3-none-any.whl (327 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m327.3/327.3 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0mta [36m0:00:01[0m
[?25hInstalling collected packages: opensearch-py
Successfully installed opensearch-py-2.3.1
Note: you may need to restart the kernel to use updated packages.
Collecting retrying==1.3.4
  Downloading retrying-1.3.4-py3-none-any.whl.metadata (6.9 kB)
Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Installing collected packages: retrying
  Attempting uninstall: retrying
    Found existing installation: retrying 1.3.3
    Uninstalling retrying-1.3.3:
      Successfully uninstalled retrying-1.3.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dash 2.17.0

In [14]:
import boto3
import time
import random
import pprint as pp
import uuid
import json
from retrying import retry
from utility import create_bedrock_execution_role, create_oss_policy_attach_bedrock_execution_role, create_policies_in_oss
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth

# auth for opensearch
boto3_session = boto3.Session()
region_name = boto3_session.region_name
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()["Account"]
credentials = boto3_session.get_credentials()

# opensearch service
service = 'aoss'
awsauth = auth = AWSV4SignerAuth(credentials, region_name, service)


bedrock_agent_client = boto3_session.client('bedrock-agent', region_name=region_name)

# bucket and parameter stored from lab01
%store -r bucket
%store -r prefix
%store -r data_dir
%store -r yml_dir
%store -r uml_dir

## Create a vector store - OpenSearch Serverless index

### Step 1 - Create OSS policies and collection
Firt of all we have to create a vector store. In this section we will use *Amazon OpenSerach serverless.*

Amazon OpenSearch Serverless is a serverless option in Amazon OpenSearch Service. As a developer, you can use OpenSearch Serverless to run petabyte-scale workloads without configuring, managing, and scaling OpenSearch clusters. You get the same interactive millisecond response times as OpenSearch Service with the simplicity of a serverless environment. Pay only for what you use by automatically scaling resources to provide the right amount of capacity for your application—without impacting data ingestion.

In [15]:
suffix = random.randrange(200, 900)
vector_store_name = f'swagger-api-{suffix}'
index_name = f"swagger-api-{suffix}"
aoss_client = boto3_session.client('opensearchserverless')
bedrock_kb_execution_role = create_bedrock_execution_role(bucket_name=bucket)
bedrock_kb_execution_role_arn = bedrock_kb_execution_role['Role']['Arn']

In [16]:
# create security, network and data access policies within OSS
encryption_policy, network_policy, access_policy = create_policies_in_oss(vector_store_name=vector_store_name,
                       aoss_client=aoss_client,
                       bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn)
collection = aoss_client.create_collection(name=vector_store_name,type='VECTORSEARCH')

In [17]:
pp.pprint(collection)
time.sleep(10)

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '307',
                                      'content-type': 'application/x-amz-json-1.0',
                                      'date': 'Sat, 01 Jun 2024 04:04:52 GMT',
                                      'x-amzn-requestid': 'd5c1c7e3-a23e-4657-ada8-68c0dd2cee31'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'd5c1c7e3-a23e-4657-ada8-68c0dd2cee31',
                      'RetryAttempts': 0},
 'createCollectionDetail': {'arn': 'arn:aws:aoss:us-west-2:974171775829:collection/ruq5qwuvpwnqdehuz76d',
                            'createdDate': 1717214692104,
                            'id': 'ruq5qwuvpwnqdehuz76d',
                            'kmsKeyArn': 'auto',
                            'lastModifiedDate': 1717214692104,
                            'name': 'swagger-api-828',
                            'standbyReplicas': 'ENABLED',

In [18]:
collection_id = collection['createCollectionDetail']['id']
host = collection_id + '.' + region_name + '.aoss.amazonaws.com'
print(host)

ruq5qwuvpwnqdehuz76d.us-west-2.aoss.amazonaws.com


In [19]:
# create oss policy and attach it to Bedrock execution role
create_oss_policy_attach_bedrock_execution_role(collection_id=collection_id,
                                                bedrock_kb_execution_role=bedrock_kb_execution_role)

Opensearch serverless arn:  arn:aws:iam::974171775829:policy/AmazonBedrockOSSPolicyForKnowledgeBase_209


### Step 2 - Create vector index

In [20]:
index_name = f"bedrock-sample-index-{suffix}"
body_json = {
   "settings": {
      "index.knn": "true"
   },
   "mappings": {
      "properties": {
         "vector": {
            "type": "knn_vector",
            "dimension": 1536,
            "method": {
                "name": "hnsw",
                "space_type": "innerproduct",
                "engine": "faiss",
                "parameters": {
                  "ef_construction": 256,
                  "m": 48
                }
             }
         },
         "text": {
            "type": "text"
         },
         "text-metadata": {
            "type": "text"         
         }
      }
   }
}
# Build the OpenSearch client
oss_client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)
# # It can take up to a minute for data access rules to be enforced
time.sleep(100)

In [23]:
# Create index
response = oss_client.indices.create(index=index_name, body=json.dumps(body_json))
print('\nCreating index:')
print(response)


Creating index:
{'acknowledged': True, 'shards_acknowledged': True, 'index': 'bedrock-sample-index-828'}


## Create Knowledge Base
Steps:
- initialize Open search serverless configuration which will include collection ARN, index name, vector field, text field and metadata field.
- initialize chunking strategy, based on which KB will split the documents into pieces of size equal to the chunk size mentioned in the `chunkingStrategyConfiguration`.
- initialize the s3 configuration, which will be used to create the data source object later.
- initialize the Titan embeddings model ARN, as this will be used to create the embeddings for each of the text chunks.

In [24]:
opensearchServerlessConfiguration = {
            "collectionArn": collection["createCollectionDetail"]['arn'],
            "vectorIndexName": index_name,
            "fieldMapping": {
                "vectorField": "vector",
                "textField": "text",
                "metadataField": "text-metadata"
            }
        }

chunkingStrategyConfiguration = {
    "chunkingStrategy": "NONE",
}

s3Configuration = {
    "bucketArn": f"arn:aws:s3:::{bucket}",
    "inclusionPrefixes":[f"{prefix}/yml_questions/"] # you can use this if you want to create a KB using data within s3 prefixes.
}

embeddingModelArn = f"arn:aws:bedrock:{region_name}::foundation-model/amazon.titan-embed-text-v1"

kb_name = f"bedrock-sample-knowledge-base-{suffix}"
description = "Swagger OpenAPI knowledge base."
roleArn = bedrock_kb_execution_role_arn

Provide the above configurations as input to the `create_knowledge_base` method, which will create the Knowledge base.

In [25]:
# Create a KnowledgeBase
from retrying import retry

@retry(wait_random_min=1000, wait_random_max=2000,stop_max_attempt_number=7)
def create_knowledge_base_func():
    create_kb_response = bedrock_agent_client.create_knowledge_base(
        name = kb_name,
        description = description,
        roleArn = roleArn,
        knowledgeBaseConfiguration = {
            "type": "VECTOR",
            "vectorKnowledgeBaseConfiguration": {
                "embeddingModelArn": embeddingModelArn
            }
        },
        storageConfiguration = {
            "type": "OPENSEARCH_SERVERLESS",
            "opensearchServerlessConfiguration":opensearchServerlessConfiguration
        }
    )
    return create_kb_response["knowledgeBase"]

In [26]:
try:
    kb = create_knowledge_base_func()
except Exception as err:
    print(f"{err=}, {type(err)=}")

Next we need to create a data source, which will be associated with the knowledge base created above. Once the data source is ready, we can then start to ingest the documents.

In [27]:
# Get KnowledgeBase 
get_kb_response = bedrock_agent_client.get_knowledge_base(knowledgeBaseId = kb['knowledgeBaseId'])

In [28]:
# Create a DataSource in KnowledgeBase 
create_ds_response = bedrock_agent_client.create_data_source(
    name = kb_name,
    description = description,
    knowledgeBaseId = kb['knowledgeBaseId'],
    dataSourceConfiguration = {
        "type": "S3",
        "s3Configuration":s3Configuration
    },
    vectorIngestionConfiguration = {
        "chunkingConfiguration": chunkingStrategyConfiguration
    }
)
ds = create_ds_response["dataSource"]
# # It can take up to a minute for data access rules to be enforced
time.sleep(20)
pp.pprint(ds)

{'createdAt': datetime.datetime(2024, 6, 1, 4, 13, 2, 187233, tzinfo=tzlocal()),
 'dataSourceConfiguration': {'s3Configuration': {'bucketArn': 'arn:aws:s3:::sagemaker-us-west-2-974171775829',
                                                 'inclusionPrefixes': ['swagger_codegen/yml_questions/']},
                             'type': 'S3'},
 'dataSourceId': '532DYM9UHD',
 'description': 'Swagger OpenAPI knowledge base.',
 'knowledgeBaseId': 'TN24JCHRAN',
 'name': 'bedrock-sample-knowledge-base-828',
 'status': 'AVAILABLE',
 'updatedAt': datetime.datetime(2024, 6, 1, 4, 13, 2, 187233, tzinfo=tzlocal()),
 'vectorIngestionConfiguration': {'chunkingConfiguration': {'chunkingStrategy': 'NONE'}}}


### Start ingestion job
Once the KB and data source is created, we can start the ingestion job.
During the ingestion job, KB will fetch the documents in the data source, pre-process it to extract text, chunk it based on the chunking size provided, create embeddings of each chunk and then write it to the vector database, in this case OSS.

In [29]:
# Start an ingestion job
start_job_response = bedrock_agent_client.start_ingestion_job(knowledgeBaseId = kb['knowledgeBaseId'], dataSourceId = ds["dataSourceId"])

In [30]:
job = start_job_response["ingestionJob"]
pp.pprint(job)

{'dataSourceId': '532DYM9UHD',
 'ingestionJobId': 'HBSZLQINWL',
 'knowledgeBaseId': 'TN24JCHRAN',
 'startedAt': datetime.datetime(2024, 6, 1, 4, 13, 22, 754503, tzinfo=tzlocal()),
 'statistics': {'numberOfDocumentsDeleted': 0,
                'numberOfDocumentsFailed': 0,
                'numberOfDocumentsScanned': 0,
                'numberOfModifiedDocumentsIndexed': 0,
                'numberOfNewDocumentsIndexed': 0},
 'status': 'STARTING',
 'updatedAt': datetime.datetime(2024, 6, 1, 4, 13, 22, 754503, tzinfo=tzlocal())}


In [31]:
# Get job 
while(job['status']!='COMPLETE' ):
  get_job_response = bedrock_agent_client.get_ingestion_job(
      knowledgeBaseId = kb['knowledgeBaseId'],
        dataSourceId = ds["dataSourceId"],
        ingestionJobId = job["ingestionJobId"]
  )
  job = get_job_response["ingestionJob"]
pp.pprint(job)
time.sleep(80)

{'dataSourceId': '532DYM9UHD',
 'ingestionJobId': 'HBSZLQINWL',
 'knowledgeBaseId': 'TN24JCHRAN',
 'startedAt': datetime.datetime(2024, 6, 1, 4, 13, 22, 754503, tzinfo=tzlocal()),
 'statistics': {'numberOfDocumentsDeleted': 0,
                'numberOfDocumentsFailed': 0,
                'numberOfDocumentsScanned': 5,
                'numberOfModifiedDocumentsIndexed': 0,
                'numberOfNewDocumentsIndexed': 5},
 'status': 'COMPLETE',
 'updatedAt': datetime.datetime(2024, 6, 1, 4, 13, 33, 344865, tzinfo=tzlocal())}


In [32]:
kb_id = kb["knowledgeBaseId"]
%store kb_id
pp.pprint(kb_id)

Stored 'kb_id' (str)
'TN24JCHRAN'


## Test the knowledge base
### Using RetrieveAndGenerate API
Behind the scenes, RetrieveAndGenerate API converts queries into embeddings, searches the knowledge base, and then augments the foundation model prompt with the search results as context information and returns the FM-generated response to the question. For multi-turn conversations, Knowledge Bases manage short-term memory of the conversation to provide more contextual results.

The output of the RetrieveAndGenerate API includes the generated response, source attribution as well as the retrieved text chunks.

In [33]:
# try out KB using RetrieveAndGenerate API
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region_name)
model_id = "anthropic.claude-3-sonnet-20240229-v1:0" # try with both claude instant as well as claude-v2. for claude v2 - "anthropic.claude-v2"
model_arn = f'arn:aws:bedrock:{region_name}::foundation-model/{model_id}'

In [34]:
from IPython.display import Markdown, display

query = "How do I add a new pet using the petstore api? Can you generate a test code in python?"
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        'text': query
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': kb_id,
            'modelArn': model_arn
        }
    },
)

generated_text = response['output']['text']

display(Markdown(generated_text))

To add a new pet using the petstore API, you need to send a POST request to the /pets endpoint with a request body containing a NewPet object. The NewPet object must include the name property, and can optionally include the tag property.

Here's an example Python code to add a new pet using the requests library: import requests

url = "https://petstore.example.com/pets"

payload = {
    "name": "Buddy",
    "tag": "dog"
}

headers = {
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

if response.status_code == 200:
    print("Pet created successfully!")
    print(response.json())
else:
    print(f"Error creating pet: {response.status_code} - {response.text}")

## > Create a Simple Chat Interface

In [35]:
import ipywidgets as ipw
from IPython.display import display, clear_output

class ChatUX:
    """ A chat UX using IPWidgets
    """
    def __init__(self, qa):
        self.qa = qa
        self.name = None
        self.b=None
        self.out = ipw.Output()
        self.session_id = None

    def start_chat(self):
        print("Let's chat!")
        display(self.out)
        self.chat(None)

    def chat(self, _):
        if self.name is None:
            prompt = ""
        else:
            prompt = self.name.value
        if 'q' == prompt or 'quit' == prompt or 'Q' == prompt:
            print("Thank you , that was a nice chat !!")
            return
        elif len(prompt) > 0:
            with self.out:
                thinking = ipw.Label(value=f"Thinking...")
                display(thinking)
                try:
                    if self.session_id:
                        response = self.qa.retrieve_and_generate(
                            sessionId=self.session_id,
                            input={
                                'text': prompt
                            },
                            retrieveAndGenerateConfiguration={
                                'type': 'KNOWLEDGE_BASE',
                                'knowledgeBaseConfiguration': {
                                    'knowledgeBaseId': kb_id,
                                    'modelArn': model_arn
                                }
                            }
                        )
                    else:
                        response = self.qa.retrieve_and_generate(
                            input={
                                'text': prompt
                            },
                            retrieveAndGenerateConfiguration={
                                'type': 'KNOWLEDGE_BASE',
                                'knowledgeBaseConfiguration': {
                                    'knowledgeBaseId': kb_id,
                                    'modelArn': model_arn
                                }
                            }
                        )

                    self.session_id = response['sessionId']
                    result = response['output']['text']

                except Exception as e:
                    print(e)
                    result = "No answer"
                thinking.value=""
                print(f"AI: {result}")
                self.name.disabled = True
                self.b.disabled = True
                self.name = None

        if self.name is None:
            with self.out:
                self.name = ipw.Text(description="You: ", placeholder='q to quit')
                self.b = ipw.Button(description="Send")
                self.b.on_click(self.chat)
                display(ipw.Box(children=(self.name, self.b)))

### > Sample Questions
How many routes in flowerstore?
How do I create a new pet in petstore?
What is the response code if a pet is not available?
How do I delete a book from bookstore?
Can you generate a sample code in python to create an new author in bookstore?

In [37]:
chat = ChatUX(bedrock_agent_runtime_client)
chat.start_chat()

Let's chat!


Output()

### > Clean up