# Knowledge Bases for Amazon Bedrock
## Access Control Filtering - Example notebook

This notebook will guide the users on creating access controls for Knowledge Bases on Amazon Bedrock.

To demonstrate the access control capabilities enabled by metadata filtering in Knowledge Bases, let's consider a use case where a healthcare provider has a Knowledge Base containing conversation transcripts between doctors and patients. In this scenario, it is crucial to ensure that each doctor can only access and leverage transcripts from their own patient interactions during the search, and not have access to transcripts from other doctors' patient interactions.

This notebook contains the following sections:

1. **Amazon Cognito:** You are going to create an Amazon Cognito pool with two doctors. We will use the unique identifiers generated by Cognito for each user to associate transcripts with the respective doctors.
2. **Doctor-patient association in Amazon DynamoDB:** You will create a DynamoDB which will store doctor-patient associations. 
3. **Dataset download:** For this notebook we will use user-patient transcripts located in the [following repository](https://github.com/nazmulkazi/dataset_automated_medical_transcription).
4. **Metadata association:** You will use the doctor identifiers generated by Cognito to create metadata files associated to each transcript file.
5. **Upload the dataset to Amazon S3:** You will create an Amazon S3 bucket and upload the dataset and metadata files. 
6. **Create a Knowledge Base for Amazon Bedrock**: You will create a Knowledge Base and it's associated components such as the underlying Amazon Opensearch Serverless Service index. Once configured and populated you will also test the Knowledge Base.
7. **Call the Knowledge Base from AWS Lambda:** You will create an AWS Lambda function which has the necessary permissions to call the created Knowledge Base.
8. **Create and run a Streamlit Application:** You will create a simple interface to showcase access control with metadata filtering using a Streamlit application
9. **Clean up:** Delete all the resources created during this notebook to avoid unnecessary costs. 

In [None]:
!pip install -qU opensearch-py streamlit streamlit-cognito-auth retrying boto3

In [None]:
!sudo apt-get install zip #REMOVE AFTER LAMBDA BOTO IS UPDATED

Let's import necessary Python modules and libraries, and initialize AWS service clients required for the notebook.

In [None]:
import os
import json
import time
import uuid
import boto3
import requests
import random
import zipfile
from retrying import retry
from itertools import cycle
from botocore.exceptions import ClientError
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from bedrock_utils import create_bedrock_execution_role, create_oss_policy_attach_bedrock_execution_role, create_policies_in_oss, delete_iam_role_and_policies

s3_client = boto3.client('s3')
iam_client = boto3.client('iam')
sts_client = boto3.client('sts')
session = boto3.session.Session()
region = session.region_name
lambda_client = boto3.client('lambda')

cloudwatch = boto3.client('cloudwatch')
dynamodb_client = boto3.client('dynamodb')
dynamodb_resource = boto3.resource('dynamodb')
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock = boto3.client("bedrock",region_name=region)
opensearch_client = boto3.client('opensearchserverless')
account_id = sts_client.get_caller_identity()["Account"]
lambda_client = boto3.client('lambda', region_name=region)
cognito_client = boto3.client('cognito-idp', region_name=region)
identity_arn = session.client('sts').get_caller_identity()['Arn']
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')

### 1. Create a Cognito User Pool with doctors and patients
In this section you will begin creating an Amazon Cognito user pool to generate and store doctors and patients. You will also generate a user pool client to later access from the Streamlit application. 
#### Create the Cognito User Pool

In [None]:
def createUserPool(userPoolName):
    # Create a Cognito client
    cognito_client = boto3.client('cognito-idp', region_name=region)

    # Create user pool
    response = cognito_client.create_user_pool(
        PoolName=userPoolName,
        AutoVerifiedAttributes=['email'],
        Schema=[
            {
                'Name': 'name',
                'AttributeDataType': 'String',
                'Required': True
            },
            {
                'Name': 'sub',
                'AttributeDataType': 'String',
                'Required': True
            }
        ]
    )
    user_pool_id = response['UserPool']['Id']

    # Create app client
    response = cognito_client.create_user_pool_client(
        UserPoolId=user_pool_id,
        ClientName=userPoolName + '_client',
        GenerateSecret=True
    )
    client_id = response['UserPoolClient']['ClientId']
    client_secret = response['UserPoolClient'].get('ClientSecret')

    return user_pool_id, client_id, client_secret

In [None]:
userPoolName = "kb_acl_userpool"
user_pool_id, client_id, client_secret = createUserPool(userPoolName)
cognito_arn = cognito_client.describe_user_pool(UserPoolId=user_pool_id)["UserPool"]["Arn"]

print("User Pool ID:", user_pool_id)
print("User Pool Arn:", cognito_arn)
print("Client ID:", client_id)
print("Client Secret:", client_secret)

#### Create doctors and patients into the user pool
We will create doctors and patients to test out the use case. User ids are stored for later use when retrieving information.
For the notebook to work you will need to replace the placeholder for 2 doctors and 3 patients. This users will be created in the Amazon Cognito user pool and you will later need them to log into the web application. While this is a dummy user creation for test purposes, in production use cases you will need to follow you organization best practices and guidelines to create users. 

In [None]:
doctors = [
    {
        'name': 'INSERT_DOCTOR_1_NAME',
        'email': 'INSERT_DOCTOR_1_EMAIL',
        'password': 'INSERT_DOCTOR_1_PASSWORD'
    },
    {
        'name': 'INSERT_DOCTOR_2_NAME',
        'email': 'INSERT_DOCTOR_2_EMAIL',
        'password': 'INSERT_DOCTOR_2_PASSWORD'
    }
]

patients = [
    {
        'name': 'INSERT_PATIENT_1_NAME',
        'email': 'INSERT_PATIENT_1_EMAIL',
        'password': 'INSERT_PATIENT_1_PASSWORD'
    },
    {
        'name': 'INSERT_PATIENT_2_NAME',
        'email': 'INSERT_PATIENT_2_EMAIL',
        'password': 'INSERT_PATIENT_2_PASSWORD'
    },
    {
        'name': 'INSERT_PATIENT_3_NAME',
        'email': 'INSERT_PATIENT_3_EMAIL',
        'password': 'INSERT_PATIENT_3_PASSWORD'
    }
]

doctor_ids = []
patient_ids = []

def create_user(user_data, user_type):
    user_ids = []
    for user in user_data:
        response = cognito_client.admin_create_user(
            UserPoolId=user_pool_id,
            Username=user['email'],
            UserAttributes=[
                {'Name': 'name', 'Value': user['name']},
                {'Name': 'email', 'Value': user['email']},
                {'Name': 'email_verified', 'Value': 'true'}
            ],
            ForceAliasCreation=False,
            MessageAction='SUPPRESS'
        )
        cognito_client.admin_set_user_password(
            UserPoolId=user_pool_id,
            Username=user['email'],
            Password=user['password'],
            Permanent=True
        )
        print(f"{user_type.capitalize()} created:", response['User']['Username'])
        print(f"{user_type.capitalize()} id:", response['User']['Attributes'][0]['Value'])
        user_ids.append(response['User']['Attributes'][0]['Value'])
    return user_ids

doctor_ids = create_user(doctors, 'doctor')
patient_ids = create_user(patients, 'patient')

print("Doctor IDs:", doctor_ids)
print("Patient IDs:", patient_ids)

### 2. Doctor-patient association in DynamoDB
In this section we will create a DynamoDB table to store the doctor-patient associations. This will be useful later on to retrieve the list of patient ids a doctor is allowed to filter by. 

In [None]:
table_name = 'doctor_patient_list_association'
table = dynamodb_resource.create_table(
    TableName=table_name,
    KeySchema=[
        {
            'AttributeName': 'doctor_id',
            'KeyType': 'HASH'
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'doctor_id',
            'AttributeType': 'S'
        }
    ],
    BillingMode='PAY_PER_REQUEST'  # Use on-demand capacity mode
)

# Wait for the table to be created
print(f'Creating table {table_name}...')
table.wait_until_exists()
print(f'Table {table_name} created successfully!')

Now the table has been create we can populate it with the doctor-patient associations. 

In [None]:
with table.batch_writer() as batch:
    batch.put_item(
        Item={
            'doctor_id': doctor_ids[0],
            'patient_id_list': [patient_ids[0]]
        }
    )
    batch.put_item(
        Item={
            'doctor_id': doctor_ids[1],
            'patient_id_list': patient_ids[1:]
        }
    )

print('Data inserted successfully!')

### 3. Dataset download
The dataset that we will be using can be found [here](https://github.com/nazmulkazi/dataset_automated_medical_transcription). It consists of PDF format transcriptions of synthetic conversations. We will download the available pdfs at the github repo, select a subset of 20 and use the raw url to download the pdf files.

In [None]:
dataset_folder = "source_transcripts"
if not os.path.exists(dataset_folder):
    os.makedirs(dataset_folder)

abs_path = os.path.abspath(dataset_folder)
repo_url = 'https://api.github.com/repos/nazmulkazi/dataset_automated_medical_transcription/contents/transcripts/source'
headers = {'Accept': 'application/vnd.github.v3+json'}
response = requests.get(repo_url, headers=headers, timeout=20)
json_data = response.json()

list_of_pdfs = [item for item in json_data if item['type'] == 'file' and item['name'].endswith('.pdf')][:20]
query_parameters = {"downloadformat": "pdf"}

transcripts = [pdf_dict['name'] for pdf_dict in list_of_pdfs]

for pdf_dict in list_of_pdfs:
    pdf_name = pdf_dict['name']
    file_url = pdf_dict['download_url']
    r = requests.get(file_url, params=query_parameters, timeout=20)
    with open(os.path.join(dataset_folder, pdf_name), 'wb') as pdf_file:
        pdf_file.write(r.content)

### 4. Metadata association
These files will need to be uploaded to an Amazon S3 bucket for processing. To use metadata filtering, we need to create a separate metadata JSON file for each transcript file. The metadata file should share the same name as the corresponding PDF file (including the extension). For instance, if the transcript file is named transcript_001.pdf, the metadata file should be named transcript_001.pdf.metadata.json. This nomenclature is crucial for the Knowledge Base to identify the metadata for specific files during the ingestion process. 

The metadata JSON file will contain key-value pairs representing the relevant metadata fields associated with the transcript. In our healthcare provider use case, the most important metadata field is patient_id, which will be used to implement access control. We will assign each transcript to a specific patient by including their unique identifier from the Amazon Cognito User Pool in the patient_id field of the metadata file.

In [None]:
patient_ids_cycle = cycle(patient_ids)
for name in os.listdir(dataset_folder):
    patient_id = next(patient_ids_cycle)
    metadata = json.dumps({"metadataAttributes": {"patient_id": patient_id}})
    with open(f"./{dataset_folder}/{name}.metadata.json", "w") as outfile:
        outfile.write(metadata)

### 5. Upload to Amazon S3
Knowledge Bases for Amazon Bedrock, currently require data to reside in an Amazon S3 bucket. In this section we will create an Amazon S3 bucket and upload both files and metadata files.
#### Create the Amazon S3 bucket

In [None]:
bucket_name = 'kb-acl-test-buzecd'  # Replace with your desired bucket name

if region != 'us-east-1':
    s3_client.create_bucket(
        Bucket=bucket_name,
        CreateBucketConfiguration={'LocationConstraint': region}
    )
else:
    s3_client.create_bucket(Bucket=bucket_name)

#### Upload dataset to the Amazon S3 bucket

In [None]:
files = [f.name for f in os.scandir(abs_path) if f.is_file()]
for file in files:
    s3_client.upload_file(f'{abs_path}/{file}', bucket_name, f'{file}')

### 6. Create a Knowledge Base for Amazon Bedrock

In this section we will go through all the steps to create and test a Knowledge Base. 

These are the steps to complete:
    
- Create an Amazon Opensearch Serverless Index
- Define the Embeddings model ARN
- Create the Knowledge Base
- Create the data source
- Sync the Knowledge Base
- Test the Knowledge Base

#### Create an Amazon Opensearch Serverless Service Index

To create a new index in Amazon Opensearch Serverless Service you will need to complete the following steps:
- Create an Amazon Bedrock execution role
- Create policies in OpenSearch Service (OSS)
- Create a collection
- Define the index settings and mappings
- Create the index

In [None]:
def short_uuid():
    uuid_str = str(uuid.uuid4())
    return uuid_str[:8]

short_uuid = short_uuid()

collectionName = "kb-acl-collection-" + short_uuid
indexName = "kb-acl-index-" + short_uuid

print("Collection name:",collectionName)
print("Index name:",indexName)

In [None]:
bedrock_kb_execution_role = create_bedrock_execution_role(bucket_name=bucket_name)
bedrock_kb_execution_role_arn = bedrock_kb_execution_role['Role']['Arn']

In [None]:
# Create policies in OpenSearch Service (OSS)
encryption_policy, network_policy, access_policy = create_policies_in_oss(
    vector_store_name=collectionName,
    aoss_client=opensearch_client,
    bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn
)

# Create collection in OpenSearch Service
collection = opensearch_client.create_collection(
    name=collectionName,
    type='VECTORSEARCH'
)

# Wait for collection creation to complete
time.sleep(10)

# Extract collection ID and host
collection_detail = collection.get('createCollectionDetail', {})
collection_id = collection_detail.get('id', '')
host = f"{collection_id}.{region}.aoss.amazonaws.com"

# Print collection details and host
print("Collection:", collection)
print("Host:", host)

# Create OSS policy and attach it to Bedrock execution role
create_oss_policy_attach_bedrock_execution_role(
    collection_id=collection_id,
    bedrock_kb_execution_role=bedrock_kb_execution_role
)
# Wait for all elements to be created
time.sleep(40) 

In [None]:
# Set up AWS authentication
service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = auth = AWSV4SignerAuth(credentials, region, service)

# Define index settings and mappings
index_settings = {
    "settings": {
        "index.knn": "true"
    },
    "mappings": {
        "properties": {
            "vector": {
                "type": "knn_vector",
                "dimension": 1536,
                 "method": {
                     "name": "hnsw",
                     "engine": "faiss",
                     "space_type": "innerproduct",
                     "parameters": {
                         "ef_construction": 512,
                         "m": 16
                     },
                 },
             },
            "text": {
                "type": "text"
            },
            "text-metadata": {
                "type": "text"
            }
        }
    }
}

# Build the OpenSearch client
oss_client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)

# Create index
response = oss_client.indices.create(index=indexName,
                                     body=json.dumps(index_settings))
print(response)

#### Define the Embedding Model ARN
Define the Embeddings Model ARN which will be used for indexing data into Knowledge Bases for Amazon Bedrock.

In [None]:
embeddingModelArn = "arn:aws:bedrock:{}::foundation-model/amazon.titan-embed-text-v1".format(region)

#### Create the Knowledge Base
In this section you will create the Knowledge Base, providing the Amazon Bedrock execution role, embeddings model ARN and Opensearch configuration we have previously created. 

In [None]:
knowledge_base_name = "kb-acl-example"
description = "Doctor/Patient transcripts knowledge base."
opensearchServerlessConfiguration = {
            "collectionArn": collection["createCollectionDetail"]['arn'],
            "vectorIndexName": indexName,
            "fieldMapping": {
                "vectorField": "vector",
                "textField": "text",
                "metadataField": "text-metadata"
            }
        }

In [None]:
@retry(wait_random_min=1000, wait_random_max=2000,stop_max_attempt_number=7)
def create_knowledge_base_func():
    create_kb_response = bedrock_agent_client.create_knowledge_base(
        name = knowledge_base_name,
        description = description,
        roleArn = bedrock_kb_execution_role_arn,
        knowledgeBaseConfiguration = {
            "type": "VECTOR",
            "vectorKnowledgeBaseConfiguration": {
                "embeddingModelArn": embeddingModelArn
            }
        },
        storageConfiguration = {
            "type": "OPENSEARCH_SERVERLESS",
            "opensearchServerlessConfiguration":opensearchServerlessConfiguration
        }
    )
    return create_kb_response["knowledgeBase"]

try:
    kb = create_knowledge_base_func()
    kb_id = kb["knowledgeBaseId"]
    print(kb)
except Exception as err:
    print(f"{err=}, {type(err)=}")

#### Create the data source (S3)

After you create your knowledge base, you ingest the data sources into the knowledge base so that they are indexed and able to be queried.

In [None]:
chunkingStrategyConfiguration = {
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": 512,
        "overlapPercentage": 20
    }
}

s3Configuration = {
    "bucketArn": f"arn:aws:s3:::{bucket_name}"
}

data_source = bedrock_agent_client.create_data_source(
    knowledgeBaseId=kb_id,
    name='DoctorPatientTranscripts',
    description='Location of the transcripts',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': s3Configuration
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': chunkingStrategyConfiguration
    }
)

data_source_id = data_source["dataSource"]["dataSourceId"]
print("The data source id is: ", data_source_id)

#### Sync the Knowledge Base
As we have created and associated the data source to the Knowledge Base, we can proceed to Sync the data. 


Each time you add, modify, or remove files from the S3 bucket for a data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes the objects in your S3 bucket that have been added, modified, or deleted since the last sync.

In [None]:
ingestion_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id,
    dataSourceId=data_source_id,
    description='Initial Ingestion'
)

In [None]:
%%time
status = bedrock_agent_client.get_ingestion_job(
    knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
    dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
    ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
)["ingestionJob"]["status"]
print(status)
while status not in ["COMPLETE", "FAILED", "STOPPED"]:
    status = bedrock_agent_client.get_ingestion_job(
        knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
        dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
        ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
    )["ingestionJob"]["status"]
    print(status)
    time.sleep(30)

#### Test the Knowledge Base
Now the Knowlegde Base is available we can test it out using the **retrieve** and **retrieve_and_generate** functions. 

In [None]:
# retrieve and generate API

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": "Who is Kelly?"
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/anthropic.claude-v2:1".format(region),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5,
                    "filter": {
                        "equals": {
                            "key": "patient_id",
                            "value": patient_ids[2]
                        }
                    }
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

In [None]:
response_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id, 
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":5,
            "filter": {
                 "equals": {
                    "key": "patient_id",
                    "value": patient_ids[2]
                        }
                    }
                } 
            },
    retrievalQuery={
        'text': 'Who is Kelly?'
            
        }
)

def response_print(retrieve_resp):
#structure 'retrievalResults': list of contents
# each list has content,location,score,metadata
    for num,chunk in enumerate(response_ret['retrievalResults'],1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

response_print(response_ret)

### 7. Call the Knowledge Base from AWS Lambda
In the following cells we will create an AWS Lambda execution role and function to call Knowledge Bases for Amazon Bedrock. The function will receive **knowledge_base_id**, **doctor_id** and **input_text** as part of the event.

In [None]:
# Define the IAM policy for Lambda execution role
lambda_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Sid": "BedrockAll",
            "Effect": "Allow",
            "Action": [
                "bedrock:*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:GetItem",
                "dynamodb:Query",
                "dynamodb:Scan"
            ],
            "Resource": "arn:aws:dynamodb:{}:{}:table/{}".format(region, account_id, table_name)
        }
    ]
}

# Create IAM role for Lambda execution
execution_role_name = "kb_acl_lambda_execution_role_{}".format(short_uuid)
lambda_execution_policy_name = "kb_acl_lambda_execution_policy_{}".format(short_uuid)
lambda_role_response = iam_client.create_role(
    RoleName=execution_role_name,
    AssumeRolePolicyDocument=json.dumps({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "lambda.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }),
    Description='IAM role for Lambda execution'
)

# Attach the policy to the IAM role
iam_client.put_role_policy(
    RoleName=execution_role_name,
    PolicyName=lambda_execution_policy_name,
    PolicyDocument=json.dumps(lambda_policy_document)
)

In [None]:
# Attach the policy to the IAM role
iam_client.put_role_policy(
    RoleName='kb_acl_lambda_execution_role',
    PolicyName='kb_acl_lambda_execution_policy',
    PolicyDocument=json.dumps(lambda_policy_document)
)

In [None]:
lambda_role_arn = lambda_role_response["Role"]["Arn"]
lambda_role_arn

In [None]:
# Define the code for the Lambda function
lambda_code = '''
import boto3
import json

def lambda_handler(event, context):
    region = context.invoked_function_arn.split(':')[3]
    knowledge_base_id = event['knowledgeBaseId']
    doctor_id = event['doctorId']
    patient_ids = set(event['patientIds'])
    input_text = event['text']

    # Initialize the DynamoDB client
    dynamodb = boto3.client('dynamodb', region_name=region)

    # Query the doctor_patient_association table
    response = dynamodb.query(
        TableName='doctor_patient_list_association',
        KeyConditionExpression='doctor_id = :doctor_id',
        ExpressionAttributeValues={
            ':doctor_id': {'S': doctor_id}
        }
    )
    
    print(list(patient_ids))
    # Extract the associated patient IDs from the query result
    associated_patient_ids = set()
    for item in response['Items']:
        patient_id_list = item.get('patient_id_list', {}).get('L', [])
        associated_patient_ids.update(pid['S'] for pid in patient_id_list)


    # Check if the patient IDs from the event exist in the associated patient IDs
    if not patient_ids.issubset(associated_patient_ids):
        return {
            'statusCode': 400,
            'body': json.dumps({
                'error': 'One or more patient IDs are not associated with the provided doctor ID.'
            })
        }

    # Initialize the Bedrock Agent client
    bedrock_agent = boto3.client('bedrock-agent-runtime')

    # Call the Bedrock Agent API to retrieve and generate response
    response = bedrock_agent.retrieve_and_generate(
        input={
            "text": input_text
        },
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                'knowledgeBaseId': knowledge_base_id,
                "modelArn": "arn:aws:bedrock:{}::foundation-model/anthropic.claude-v2:1".format(region),
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {
                        "numberOfResults": 5,
                        "filter": {
                            "in": {
                                "key": "patient_id",
                                "value": list(patient_ids)
                            }
                        }
                    }
                }
            }
        }
    )

    # Get the response text
    output_text = response['output']['text']

    # Return the response
    return {
        'statusCode': 200,
        'body': output_text
    }
'''

# Create a zip file containing the Lambda function code
with zipfile.ZipFile('lambda_code.zip', 'w') as zipf:
    zipf.writestr('lambda_function.py', lambda_code)

# Read the zip file as bytes
with open('lambda_code.zip', 'rb') as f:
    zip_bytes = f.read()

# Now, create the Lambda function with the inline code
function_name = 'kb-acl-function-{}'.format(short_uuid)
response = lambda_client.create_function(
    FunctionName=function_name,
    Runtime='python3.12',
    Role=lambda_role_arn,
    Handler='lambda_function.lambda_handler',
    Code={
        'ZipFile': zip_bytes
    },
    Description='Lambda to call KB with filters',
    Timeout=40,
)

print("Lambda Function ARN:", response['FunctionArn'])
lambda_arn = response['FunctionArn']

#### Until Boto3/Lambda is updated -- Create a Lambda Layer to include the latest SDK 

In [None]:
!mkdir latest-sdk-layer
%cd latest-sdk-layer
!pip install -qU boto3 botocore -t python/lib/python3.12/site-packages/
!zip -rq latest-sdk-layer.zip .
%cd ..

In [None]:
def publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes):
    with open(zip_file_path, 'rb') as f:
        response = lambda_client.publish_layer_version(
            LayerName=layer_name,
            Description=description,
            Content={
                'ZipFile': f.read(),
            },
            CompatibleRuntimes=compatible_runtimes
        )
    return response['LayerVersionArn']

In [None]:
layer_name = 'latest-sdk-layer'
description = 'Layer with the latest boto3 version.'
zip_file_path = 'latest-sdk-layer/latest-sdk-layer.zip'
compatible_runtimes = ['python3.12']

In [None]:
layer_version_arn = publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes)
print("Layer version ARN:", layer_version_arn)

In [None]:
try:
    # Add the layer to the Lambda function
    lambda_client.update_function_configuration(
        FunctionName=function_name,
        Layers=[layer_version_arn]
    )
    print("Layer added to the Lambda function successfully.")

except ClientError as e:
    print(f"Error adding layer to Lambda function: {e.response['Error']['Message']}")
    
except Exception as e:
    print(f"An unexpected error occurred: {e}")

### 8. Create Streamlit Application
To showcase the interaction between doctors and the Knowledge Bases, we can develop a user-friendly web application using Streamlit, a popular open-source Python library for building interactive data apps. Streamlit provides a simple and intuitive way to create custom interfaces that can seamlessly integrate with the various AWS services involved in this solution.

Here are the details the application will need to run:

In [None]:
print("pool_id:", user_pool_id)
print("app_client_id:", client_id)
print("app_client_secret:", client_secret)
print("kb_id:", kb_id)
print("Lambda ARN:", lambda_arn)

Copy the values above into the application code in the cell below:

In [1]:
%%writefile app.py
import os
import boto3
import json
import requests
import streamlit as st
from streamlit_cognito_auth import CognitoAuthenticator

pool_id = "<<POOL_ID>>"
app_client_id = "<<app_client_id>>"
app_client_secret = "<<app_client_secret>>"
kb_id = "<<kb_id>>"
lambda_function_arn = '<<lambda_function_arn>>'

authenticator = CognitoAuthenticator(
    pool_id=pool_id,
    app_client_id=app_client_id,
    app_client_secret= app_client_secret,
    use_cookies=False
)

is_logged_in = authenticator.login()

if not is_logged_in:
    st.stop()

def logout():
    authenticator.logout()

def get_user_sub(user_pool_id, username):
    cognito_client = boto3.client('cognito-idp')
    try:
        response = cognito_client.admin_get_user(
            UserPoolId=pool_id,
            Username=authenticator.get_username()
        )
        sub = None
        for attr in response['UserAttributes']:
            if attr['Name'] == 'sub':
                sub = attr['Value']
                break
        return sub
    except cognito_client.exceptions.UserNotFoundException:
        print("User not found.")
        return None

def get_patient_ids(doctor_id):
    dynamodb = boto3.client('dynamodb')
    response = dynamodb.query(
        TableName='doctor_patient_list_association',
        KeyConditionExpression='doctor_id = :doctor_id',
        ExpressionAttributeValues={
            ':doctor_id': {'S': doctor_id}
        }
    )
    associated_patient_ids = set()
    for item in response['Items']:
        patient_id_list = [patient_id['S'] for patient_id in item.get('patient_id_list', {}).get('L', [])]
    return patient_id_list

def search_transcript(doctor_id, kb_id, text, patient_ids):
    # Initialize the Lambda client
    lambda_client = boto3.client('lambda')

    # Payload for the Lambda function
    payload = json.dumps({
        "doctorId": sub,
        "knowledgeBaseId": kb_id,
        "text": text, 
        "patientIds": patient_ids
    }).encode('utf-8')

    try:
        # Invoke the Lambda function
        response = lambda_client.invoke(
            FunctionName=lambda_function_arn,
            InvocationType='RequestResponse',
            Payload=payload
        )

        # Process the response
        if response['StatusCode'] == 200:
            response_payload = json.loads(response['Payload'].read().decode('utf-8'))
            return response_payload
        else:
            # Handle error response
            return {'error': 'Failed to fetch data'}

    except Exception as e:
        # Handle exception
        return {'error': str(e)}

sub = get_user_sub(pool_id, authenticator.get_username())
print(sub)
patient_ids = get_patient_ids(sub)
print(patient_ids)

# Application Front

with st.sidebar:
    st.header("User Information")
    st.markdown("## Doctor")
    st.text(authenticator.get_username())
    st.markdown("## Doctor Id")
    st.text(sub)
    selected_patient = st.selectbox("Select a patient (or 'All' for all patients)", ['All'] + patient_ids)
    st.button("Logout", "logout_btn", on_click=logout)

st.header("Transcript Search Tool")

# Text input for the search query
query = st.text_input("Enter your search query:")

if st.button("Search"):
    if query:
        # Perform search
        patient_ids_filter = [selected_patient] if selected_patient != 'All' else patient_ids
        results = search_transcript(sub, kb_id, query, patient_ids_filter)
        print(results)
        if results:
            st.subheader("Search Results:")
            st.markdown(results["body"], unsafe_allow_html=True)
        else:
            st.write("No matching results found.")
    else:
        st.write("Please enter a search query.")

Overwriting app.py


#### Execute the streamlit locally
Execute the cell below to run the Streamlit application

In [2]:
!streamlit run app.py


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8502[0m
[34m  Network URL: [0m[1mhttp://169.255.255.2:8502[0m
[34m  External URL: [0m[1mhttp://52.4.240.77:8502[0m
[0m
^C
[34m  Stopping...[0m


If you are executing this notebook on SageMaker Studio you can access the Streamlit application in the following url. 

```
https://<<STUDIOID>>.studio.<<REGION>>.sagemaker.aws/jupyterlab/default/proxy/8501/
```

### 9. Clean up
Run the following cell to delete the created resources and avoid unnecesary costs. 

In [None]:
# Delete the Cognito user pool client app -- 
try:
    response = cognito_client.delete_user_pool_client(
        UserPoolId=user_pool_id,
        ClientId=client_id
    )
    print(f"User pool client app deleted: {response}")
except Exception as e:
    print(f"Error deleting user pool client app: {e}")

# Delete the Cognito user pool -- 
try:
    response = cognito_client.delete_user_pool(
        UserPoolId=user_pool_id
    )
    print(f"User pool deleted: {response}")
except Exception as e:
    print(f"Error deleting user pool: {e}")

# Delete all objects in the bucket -- 
try:
    response = s3_client.list_objects_v2(Bucket=bucket_name)
    if 'Contents' in response:
        for obj in response['Contents']:
            s3_client.delete_object(Bucket=bucket_name, Key=obj['Key'])
        print(f"All objects in {bucket_name} have been deleted.")
except Exception as e:
    print(f"Error deleting objects from {bucket_name}: {e}")

# Delete the bucket -- 
try:
    response = s3_client.delete_bucket(Bucket=bucket_name)
    print(f"Bucket {bucket_name} has been deleted.")
except Exception as e:
    print(f"Error deleting bucket {bucket_name}: {e}")

# Delete Knowledge Base -- 
try:
    response = bedrock_agent_client.delete_knowledge_base(knowledgeBaseId=kb_id)
    print(f"Knowledge Base {kb_id} has been deleted.")
except Exception as e:
    print(f"Error deleting Knowledge Base {kb_id}: {e}")

# Delete Opensearch Collection -- 
try:
    response = response = opensearch_client.delete_collection(id=collection_id)
    print(f"Collection {collection_id} has been deleted.")
except Exception as e:
    print(f"Error deleting Collection {collection_id}: {e}")

# Delete Lambda Role -- 
try:
    role_policies = iam_client.list_role_policies(RoleName=execution_role_name)['PolicyNames']
except Exception as e:
    print(f"Error listing role policies for {execution_role_name}: {e}")
    exit(1)

# Delete the role's policies -- 
for policy_name in role_policies:
    try:
        iam_client.delete_role_policy(RoleName=execution_role_name, PolicyName=policy_name)
        print(f"Deleted policy {policy_name} from role {execution_role_name}")
    except Exception as e:
        print(f"Error deleting policy {policy_name} from role {execution_role_name}: {e}")

# Delete the role --
try:
    iam_client.delete_role(RoleName=execution_role_name)
    print(f"Role {execution_role_name} deleted")
except Exception as e:
    print(f"Error deleting role {execution_role_name}: {e}")

# Delete Lambda Function --
try:
    response = lambda_client.delete_function(FunctionName=lambda_arn)
    print(f"Lambda {lambda_arn} has been deleted.")
except Exception as e:
    print(f"Error deleting Lambda {lambda_arn}: {e}")

# Delete the DynamoDB table --

try:
    response = dynamodb_client.delete_table(TableName=table_name)
    print(f"Table {table_name} is being deleted...")
    waiter = dynamodb_client.get_waiter('table_not_exists')
    waiter.wait(TableName=table_name)
    print(f"Table {table_name} has been deleted.")
except Exception as e:
    print(f"Error deleting table {table_name}: {e}")