# <span style="color:DarkSeaGreen">Lab 1 - Knowledge Base</span>
*With Knowledge Bases for Amazon Bedrock, you can give FMs and agents contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG) to deliver more relevant, accurate, and customized responses*  

- this notebook creates the following:
  - s3 bucket to:
    - drop pdf files into 
    - used as resources for knowledge base
  - iam
    - roles
    - policies
  - aurora vector database
    - provisioned postgres cluster
    - table with required columns to store vector data
  - secrets manager
    - cluster and database secret credentials
  - knowledge base
    - process pdf files
      - supported data formats include .pdf, .txt, .md, .html, .doc and .docx, .csv, .xls, and .xlsx files
    - process supporting meta json files
    - train the model
- includes clean up cells to delete all above  

# <span style="color:DarkSeaGreen">Prepare Your Environment</span>
### Requirements for this Jupyter Notebook Lab if running in VSCode or equivalent local IDE
##### Note these are macOS specific
- Credentials
  - You need credentials to your AWS account to execute this Jupyter Lab if running locally from your laptop
    - Locally: Credentials and therefore permissions asscociated with the IAM user (with CLI access enabled) are provided by AWS configure connection to your AWS account
    - Cloud: Permissions provided via logged in user
- Installers:
  - Pip
    - Python libraries
    - Works inside Python envs
  - homebrew (brew) (mac)
    - System software, tools, and dependencies
    - Works at OS level

- Run the commands of the cell below in a terminal window to create a virtual environment if you need one
  - Note check your Python version first, then if ok, copy the rest and run in terminal window
  - Note if you copy and paste the multiple lines and run as one you will get zsh: command not found: # errors because of the comments, but you can ignore
  - Remember to restart the kernel to pick up the new venv
  - The venv can be deleted via the last cell in this notebook iof no longer needed
- If you already have a virtual environment, then just activate it as shown in the second cell below
  - Venv (can be created below) used by this notebook is *venv-agentcore*

In [None]:
# Check your credentials (AWS identity) to confirm you are using the right credentials, can also run in a terminal window (remove the !)
!aws sts get-caller-identity

In [None]:
### STOP ###
### IF USING THIS NOTEBOOK IN AN AWS (eg SAGEMAKER) JUPYTER NOTEBOOK INSTANCE, THEN SKIP TO THE NEXT CELL ###
### OTHERWISE, IF USING VSCODE OR EQUIVALENT LOCAL IDE, THEN CONTINUE BELOW ###
### This script is for setting up your environment for the JumpStart Lab 1 ###
# do you need to upgrade python first? Your available version of Python is used to create the virtual environment
python3 --version

### STOP ###
### DO YOU NEED TO UPGRADE PYTHON ###
# upgrade to the latest version of python if required
brew install python
# restart vscode to pickup new version of python
python3 --version

### STOP ###
### OK IF YOU HAVE THE CORRECT VERSION OF PYTHON, CONTINUE ###
# create a virtual environment
python3 -m venv venv-agentcore
# activate the virtual environment
source venv-agentcore/bin/activate
### COPY TO HERE ONLY IF RUNNING AS ONE COPY AND PASTE ###

### STOP ###
### MAKE SURE ABOVE VENV GETS ACTIVATED BEFORE RUNNING THE REST ###
# upgrade pip
pip install --upgrade pip
# jupyter kernel support
pip install ipykernel
# add the virtual environment to jupyter
python  -m ipykernel install --user --name=venv-agentcore --display-name "Python (venv-agentcore)"
# install the required packages - may need to specify the path here if not in the correct folder in terminal window
pip install -r requirements_lab1.txt
# pip install -r Documents/github/labs-sagemaker/jumpstart/requirements_lab1.txt
# verify the installation
pip list

### RESTART VSCODE TO PICKUP THE NEW VENV ###

In [None]:
### STOP ###
### This command is for activating an environment that already exists, its for use in a terminal window if you need it ###
source venv-agentcore/bin/activate
pip list

# use pip freeze if you prefer for friendly format
### ALSO MAKE SURE YOU SELECT IT AS YOUR KERNEL FOR THIS JUPYTER NOTEBOOK ###

In [None]:
### STOP ###
### IF USING THIS NOTEBOOK IN AN AWS (eg SAGEMAKER) JUPYTER NOTEBOOK INSTANCE, THEN EXECUTE THIS CELL ###
!pip install --upgrade pip

# Lab 1 Starts Here!

# <span style="color:DarkSeaGreen">Setup</span>

In [None]:
import random

# region - we use us-east-1 as Bedrock is limited in other reasons
myRegion='us-east-1'

# bucket - MUST BE A UNIQUE NAME
myBucket='doit-agentcore-bucket-' + str(random.randint(0, 1000)) + '-' + str(random.randint(0, 1000))
# iam
myRoleKB="doit-agentcore-kb-execution-role"
myPolicyKB1="doit-agentcore-kb-fm-model-policy"
myPolicyKB3="doit-agentcore-kb-s3vector-policy"
myRoleKBARN='RETRIEVED FROM ROLE BELOW ONCE CREATED'

# knowledge base
myVectorIndex='doit-agentcore-kb-embeddings-index'
myVectorIndexARN='RETRIEVED BELOW ONCE QUERIED'
myVectorBucket='doit-agentcore-kb-embeddings-' + str(random.randint(0, 1000)) + '-' + str(random.randint(0, 1000))
myVectorBucketARN='RETRIEVED BELOW ONCE QUERIED'
myKB='doit-agentcore-kb'
myKBdatasource='doit-agentcore-kb-crypto'

# knowledge base models we will use
myEmbeddingModel='amazon.titan-embed-text-v2:0'
myQueryingModel='amazon.nova-pro-v1:0'
myEmbeddingModelARN='RETRIEVED BELOW ONCE QUERIED'
myQueryingModelARN='RETRIEVED BELOW ONCE QUERIED'

print (f'Make sure you have requested model access via the AWS console to your selected models:\n {myEmbeddingModel} and {myQueryingModel}')
print ('✅ Done! Move to the next cell ->')

In [None]:
import boto3
from certifi import where
import json

# Configure boto3 to use certifi's certificates
sts_client = boto3.client('sts', verify=where())
myAccountNumber = sts_client.get_caller_identity()["Account"]
print(myAccountNumber)
print(sts_client.get_caller_identity()["Arn"])

print ('✅ Done! Move to the next cell ->')

In [None]:
# s3
s3 = boto3.client('s3', region_name=myRegion, verify=where())
s3vectors = boto3.client('s3vectors', region_name=myRegion, verify=where())

# rds
rds = boto3.client('rds', region_name=myRegion, verify=where())
rdsData = boto3.client('rds-data', region_name=myRegion, verify=where())
# iam
iam = boto3.client('iam', region_name=myRegion, verify=where())
# secrets manager
secrets = boto3.client('secretsmanager', region_name=myRegion, verify=where())
# logs (cloudwatch)
logs = boto3.client('logs', region_name=myRegion, verify=where())
# bedrock
bedrockChk = boto3.client(service_name='bedrock', region_name=myRegion, verify=where())
bedrockKB = boto3.client(service_name='bedrock-agent', region_name=myRegion, verify=where())
bedrockKBRun = boto3.client(service_name='bedrock-agent-runtime', region_name=myRegion, verify=where())
bedrockRun = boto3.client(service_name='bedrock-runtime', region_name=myRegion, verify=where())

print ('✅ Done! Move to the next cell ->')

-  <span style="color:greenyellow">REMEMBER TO CHECK THIS PATH TO THE RESOURCES!<span>
-  <span style="color:greenyellow">IF IN AWS JUPYTER MAKE SURE THE 2ND IS UNCOMMENTED<span>

In [None]:
# local client path for resources
myLocalPathForDataSources='/Users/simondavies/Documents/GitHub/labs-bedrock/agentcore/resources/kb-datasource/'
# jupyter notebook path if notebook is used in AWS for example
#myLocalPathForDataSources='/home/ec2-user/SageMaker/labs-bedrock/agentcore/resources/kb-datasource/'

print ('✅ Done! Move to the next cell ->')

In [None]:
# define tags added to all services we create
myTags = [
    {"Key": "env", "Value": "non_prod"},
    {"Key": "owner", "Value": "doit_agentcore_lab"},
    {"Key": "project", "Value": "doit_agentcore_crypto"},
    {"Key": "author", "Value": "simon"},
]
myTagsDct = {
    "env": "non_prod",
    "owner": "doit_agentcore_lab",
    "project": "doit_agentcore_crypto",
    "author": "simon",
}

print ('✅ Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">S3</span>
- defaults used, will use sse-s3 encryption and block public access

In [None]:
# create bucket
if myRegion=='us-east-1':
    s3.create_bucket(
        Bucket=myBucket
    )
else:
    s3.create_bucket(
        Bucket=myBucket, CreateBucketConfiguration={"LocationConstraint": myRegion}
    )

s3.put_bucket_tagging(Bucket=myBucket, Tagging={"TagSet": myTags})

# create a "folder" - really keys as S3 is flat
s3.put_object(Bucket=myBucket, Key="crypto/")

print ('✅ Done! Move to the next cell ->')

- upload resource files to s3 that will be used to create the knowledge base with
  - includes metadata file
  - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html#kb-ds-metadata
  - If you're adding metadata to a vector index in an Amazon Aurora database cluster, you must add a column to the table for each metadata attribute in your metadata files before starting ingestion. The metadata attribute values will be written to these columns.

In [None]:
# Upload each file to the S3 bucket
files = [
    {
        's3key': 'crypto/Crypto Bubble.md',
        'localpath': '{}Crypto Bubble.md'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Crypto Bubble.md.metadata.json',
        'localpath': '{}Crypto Bubble.md.metadata.json'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Crypto Scams.md',
        'localpath': '{}Crypto Scams.md'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Crypto Scams.md.metadata.json',
        'localpath': '{}Crypto Scams.md.metadata.json'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Finding Crypto To Invest In.md',
        'localpath': '{}Finding Crypto To Invest In.md'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Finding Crypto To Invest In.md.metadata.json',
        'localpath': '{}Finding Crypto To Invest In.md.metadata.json'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Mechanics of Cryptocurrency.md',
        'localpath': '{}Mechanics of Cryptocurrency.md'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Mechanics of Cryptocurrency.md.metadata.json',
        'localpath': '{}Mechanics of Cryptocurrency.md.metadata.json'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Token Supply.md',
        'localpath': '{}Token Supply.md'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Token Supply.md.metadata.json',
        'localpath': '{}Token Supply.md.metadata.json'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Verifying Token Legitimacy.md',
        'localpath': '{}Verifying Token Legitimacy.md'.format(myLocalPathForDataSources)
    },
    {
        's3key': 'crypto/Verifying Token Legitimacy.md.metadata.json',
        'localpath': '{}Verifying Token Legitimacy.md.metadata.json'.format(myLocalPathForDataSources)
    }
]

for file in files:
    print ('uploading: {}'.format(file['s3key']))
    s3.upload_file(file['localpath'], myBucket, file['s3key'], ExtraArgs={'StorageClass': 'STANDARD'})
    print ('uploaded: {}'.format(file['s3key']))

print ('✅ Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">S3 Vector Store</span>
- S3 vector store for kb

In [None]:
# Create vector bucket
response = s3vectors.create_vector_bucket(
    vectorBucketName=myVectorBucket,
    encryptionConfiguration={
        'sseType': 'AES256' 
    }
)
response = s3vectors.get_vector_bucket(
    vectorBucketName=myVectorBucket
)
myVectorBucketARN = response['vectorBucket']['vectorBucketArn']

print ('✅ Done! Move to the next cell ->')

- In the create index below, ensure you include any keys you have created in your metadata descriptions of your knowledge base datasources
- embedding dimension (typically 1024) may need to change according to the embedding model used 
  - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html

| Model                      | Dimensions              |
|---------------------------|-------------------------|
| Titan G1 Embeddings - Text | 1,536                   |
| Titan V2 Embeddings - Text | 1,024, 512, and 256     |
| Cohere Embed English       | 1,024                   |
| Cohere Embed Multilingual  | 1,024                   |

In [None]:
# Create vector index in the bucket
response = s3vectors.create_index(
    vectorBucketArn=myVectorBucketARN,
    indexName=myVectorIndex,
    dataType='float32',
    dimension=1024,  # Match your embedding model
    distanceMetric='cosine',  # or 'euclidean'
    metadataConfiguration={
        'nonFilterableMetadataKeys': ['source_text']  # Optional
    }
)

response = s3vectors.get_index(
    vectorBucketName=myVectorBucket,
    indexName=myVectorIndex
)
myVectorIndexARN = response['index']['indexArn']

print("✅ Done! Move to the next cell ->")


In [None]:
# Read markdown files from S3 bucket and embed into S3 Vectors
# List all markdown files from S3 bucket where we uploaded them
response = s3.list_objects_v2(Bucket=myBucket, Prefix="crypto")
md_files = [
    obj["Key"] for obj in response.get("Contents", []) if obj["Key"].endswith(".md")
]

vectors_to_insert = []

# Process each file
for md_file in md_files:
    print(f"Processing: {md_file}")

    # Read markdown content
    md_obj = s3.get_object(Bucket=myBucket, Key=md_file)
    text_content = md_obj["Body"].read().decode("utf-8")

    # Read metadata if exists
    metadata_file = f"{md_file}.metadata.json"
    metadata = {}
    try:
        meta_obj = s3.get_object(Bucket=myBucket, Key=metadata_file)
        metadata = json.loads(meta_obj["Body"].read().decode("utf-8"))
        print(f"  Found metadata: {metadata_file}")
    except:
        print(f"  No metadata found")

    # Generate embedding
    response = bedrockRun.invoke_model(
        modelId=myEmbeddingModel, body=json.dumps({"inputText": text_content})
    )
    embedding = json.loads(response["body"].read())["embedding"]

    # Prepare vector with metadata
    vector_data = {
        "key": md_file,
        "data": {"float32": embedding},
        "metadata": {
            "source_text": text_content,
            **metadata.get("metadataAttributes", {}),
        },
    }
    vectors_to_insert.append(vector_data)

# Insert all vectors
s3vectors.put_vectors(
    indexArn=myVectorIndexARN, vectors=vectors_to_insert
)

print(f"✅ Embedded {len(vectors_to_insert)} documents into S3 Vectors")
print("✅ Done! Move to the next cell ->")

In [None]:
# test you can query the vectors
query_text = "What are the risks of investing in cryptocurrency?"

# Generate query embedding
response = bedrockRun.invoke_model(
    modelId=myEmbeddingModel,
    body=json.dumps({'inputText': query_text})
)

# Extract embedding from response
query_embedding = json.loads(response['body'].read())['embedding']

# Query vectors
response = s3vectors.query_vectors(
    vectorBucketName=myVectorBucket,
    indexName=myVectorIndex,
    queryVector={'float32': query_embedding},
    topK=3,
    returnDistance=True,
    returnMetadata=True
)

print(json.dumps(response["vectors"], indent=2))
print("✅ Done! Move to the next cell ->")

# <span style="color:DarkSeaGreen">IAM</span>

- bedrock iam
  - https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-rds

In [None]:
# define kb-fm-model-policy json
policyJson = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:ListFoundationModels",
                "bedrock:ListCustomModels"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:{}::foundation-model/{}".format(myRegion, myEmbeddingModel)
            ]
        }
    ]
}

# create kb-fm-model-policy policy
policy1 = iam.create_policy(
    PolicyName=myPolicyKB1,
    PolicyDocument=json.dumps(policyJson),
    Description="Policy allowing Bedrock KB to use the specified foundation model",
    Tags=[
        *myTags,
    ],
)

# define kb-s3-vector-policy json - a different vector store will need a different policy
policyJson = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3vectors:*"
            ],
            "Resource": [
                myVectorBucketARN, 
                myVectorIndexARN
            ]
        }
    ]
}

# create kb-aurora-policy policy
policy3 = iam.create_policy(
    PolicyName=myPolicyKB3,
    PolicyDocument=json.dumps(policyJson),
    Description="Policy allowing Bedrock KB to use s3 vector store as its vector database",
    Tags=[
        *myTags,
    ],
)

# trust policy for the role
roleTrust = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "bedrock.amazonaws.com"},
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "{}".format(myAccountNumber)
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:bedrock:{}:{}:knowledge-base/*".format(myRegion, myAccountNumber)
                }
            }
        }
    ],
}

# create role
role = iam.create_role(
    RoleName=myRoleKB,
    AssumeRolePolicyDocument=json.dumps(roleTrust),
    Description="Service role for Bedrock Knowledge Base use",
    Tags=[
        *myTags,
    ],
)

# attach policies to role
iam.attach_role_policy(
    RoleName=role["Role"]["RoleName"], PolicyArn=policy1["Policy"]["Arn"]
)
iam.attach_role_policy(
    RoleName=role["Role"]["RoleName"], PolicyArn=policy3["Policy"]["Arn"]
)
myRoleKBARN = role['Role']['Arn']

print ('✅ Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">Knowledge Base</span>
Create the knowledge base
* find embedding model arn
* find model to use for kb generated responses
* create iam role
* create opensearch serverless cluster
* create knowledge base
* sync

- find an embedding model to use - this will be used to create the kb

In [None]:
# find the arn of the embedding model we need (this model converts your data into vectors)
# We will be using Titan Embeddings G1 - Text v1.2 (Command Cohere is also available as an embedding model for KBs)
# look in the list to get the ARN of the model we want to use
# use in the bedrockKB.create_knowledge_base if we create the kb via code

# this lists all models based on the filter
response = bedrockChk.list_foundation_models(
    byProvider='Amazon',
    byOutputModality='EMBEDDING',
    byInferenceType='PROVISIONED'
)

# but we know what we want so lets just find it so we can get the arn
response = bedrockChk.get_foundation_model(modelIdentifier=myEmbeddingModel)
myEmbeddingModelARN=response['modelDetails']['modelArn']

print('Embedding model ARN: {}'.format(myEmbeddingModelARN))
print ('✅ Done! Move to the next cell ->')

- find a foundation model to use - this will be used when we want to query the kb

In [None]:
# find the arn of the model to use for kb generated responses (parses the data retrieved fropm the knowledge base)
# look in the list to get the ARN of the model we want to use
# use in the bedrockKBRun.retrieve_and_generate when you query the kb

# this lists all models based on the filter
response = bedrockChk.list_foundation_models(
    byProvider='Anthropic',
    byOutputModality='TEXT',
    byInferenceType='ON_DEMAND'
)

# but we know what we want so lets just find it so we can get the arn
response = bedrockChk.get_foundation_model(modelIdentifier=myQueryingModel)
myQueryingModelARN=response['modelDetails']['modelArn']

print('Querying model ARN: {}'.format(myQueryingModelARN))
print ('✅ Done! Move to the next cell ->')

- create the knowledge base

In [None]:
# Create Knowledge Base with S3 Vectors
response = bedrockKB.create_knowledge_base(
    name=myKB,
    description='Crypto knowledge base using S3 Vectors',
    roleArn=myRoleKBARN,
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': myEmbeddingModelARN
        }
    },
    storageConfiguration={
        'type': 'S3_VECTORS',
        's3VectorsConfiguration': {
            'vectorBucketArn': myVectorBucketARN,
            'indexArn': myVectorIndexARN
        }
    }
)
myKBid=response['knowledgeBase']['knowledgeBaseId']

print (f'Knowledge Base ID: {myKBid}')
print ('✅ Done! Move to the next cell ->')

In [None]:
import time

timeout = 300  # total seconds to wait
interval = 10  # seconds between checks
start_time = time.time()

while True:
    kb = bedrockKB.get_knowledge_base(knowledgeBaseId=myKBid)['knowledgeBase']
    status = kb['status']
    print(f"Current KB status: {status}")
    
    if status == 'ACTIVE':
        print('✅ Done! Move to the next cell ->')
        break
    
    if time.time() - start_time > timeout:
        print("⏰ Timeout reached. KB is still not ACTIVE.")
        break
    
    time.sleep(interval)


In [None]:
print(f"You will need this KB ID in the next lab, make a note of it: {myKBid}")

# <span style="color:DarkSeaGreen">Example Use of Knowledge Base</span>
- the following code can be used in your projects to invoke the knowledge base we just created  
  - this is just using the bedrock knowledge base run api
  - there is NO strands SDK in use
  - there is NO agent being used
  
<br>

*You are able to query the knowledge base in the following ways*  
<br>  
1. Retrieve - query a knowledge base and only return relevant text from data sources.  
2. RetrieveAndGenerate - query a knowledge base and use a foundation model to generate responses based off the results from the data sources.  
https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-api-query.html#w116aac45c37c35c11

Start querying!
- If you get a response stating cant answer the question, etc - the index is probably still propergating

In [None]:
# NOTE good examples of use of the KB
promptkb='What are the risks of investing in cryptocurrency?'
#promptkb='How do I choose which coins to invest in?'

response = bedrockKBRun.retrieve_and_generate(
    input={
        'text': promptkb,
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': myKBid,
            'modelArn': myQueryingModelARN
        }
    }
)

print("GENERATED RESPONSE:\n{}".format(response['output']['text']))
print("---------------------------------------\n")

# A list of segments of the generated response that are based on sources in the knowledge base
numCitations=len(response.get('citations'))
print("NUMBER OF CITATIONS: {}".format(numCitations))
print("---------------------------------------\n")

ic=0
while ic <= numCitations-1:
    print("CITATION: {}".format(ic+1))
    print("---------------------------------------")

    numReferences = len(response['citations'][ic].get('retrievedReferences'))
    print("   NUMBER OF REFERENCES FOR CITATION {}: {}".format(ic+1, numReferences))
    print("   ---------------------------------------")

    print("   GENERATED TEXT: {}".format(response['citations'][ic]['generatedResponsePart']['textResponsePart']['text']))
    print("   ---------------------------------------")

    ir=0
    while ir <= numReferences-1:
        print("   REFERENCE: {}".format(ir+1))
        print("   ---------------------------------------")

        # reference ceted text used
        print("      CITED TEXT: {}".format(response['citations'][ic]['retrievedReferences'][ir]['content']))
        print("      ---------------------------------------")

        # json metadata used as a filter
        print("      METADATA USED: {}".format(response['citations'][ic]['retrievedReferences'][ir]['metadata']))
        print("      ---------------------------------------")

        # fata source s3 file
        print("      S3 FILE: {}".format(response['citations'][ic]['retrievedReferences'][ir]['location']))
        print("      ---------------------------------------")

        ir +=1

    ic +=1

# <span style="color:DarkSeaGreen">Move to Lab 2</span>
# <span style="color:DarkSeaGreen">OR...</span>
# <span style="color:DarkSeaGreen">Clean Up Architecture</span>
### <span style="color:Red">Only do this if you have finished with this lab and any labs that depend on it!</span>
##### It will delete all architecture created, make sure you no longer need any of it!!!

In [None]:
# NOTE STOP STOP
# NOTE only run this if you have lost the contents of your variables!!
# NOTE if you have lost the kernel, you will need to manually get the dataSourceId and knowledgeBaseId from your account
myKBid='DO NOT RUN UNLESS YOU HAVE LOST THE VARIABLE - GET FROM YOUR ACCOUNT'
myDatasourceId='DO NOT RUN UNLESS YOU HAVE LOST THE VARIABLE - GET FROM YOUR ACCOUNT'
myBucket='DO NOT RUN UNLESS YOU HAVE LOST THE VARIABLE - GET FROM YOUR ACCOUNT'

- Start deleting from here - don't need to run the above if you still have kernel variables populated

In [None]:
# delete knowledge base
bedrockKB.delete_knowledge_base(
    knowledgeBaseId=myKBid
)

In [None]:
# can take approx 1 mins to delete the kb
try:
    print(bedrockKB.get_knowledge_base(knowledgeBaseId=myKBid)['knowledgeBase']['status'])
except:
    print("Deleted!")

In [None]:
# delete s3 vextor store index
s3vectors.delete_index(
    vectorBucketName=myVectorBucket,
    indexName=myVectorIndex
)

In [None]:
# delete vector bucket
s3vectors.delete_vector_bucket(
    vectorBucketName=myVectorBucket
)

In [None]:
# delete roles and policies
iam.detach_role_policy(
    RoleName=myRoleKB, PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyKB1)
)
iam.detach_role_policy(
    RoleName=myRoleKB, PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyKB3)
)
iam.delete_role(RoleName=myRoleKB)
iam.delete_policy(PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyKB1))
iam.delete_policy(PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyKB3))

In [None]:
# delete s3 bucket
# NOTE WARNING - this will delete all objects in the bucket with NO prompt or confirmation
s3r = boto3.resource('s3', region_name=myRegion, verify=where())
bucket = s3r.Bucket(myBucket)
bucket.objects.all().delete()

# delete the bucket
response = s3.delete_bucket(Bucket=myBucket)

# <span style="color:DarkSeaGreen">Clean Up venv</span>
### Clean up if finished with this lab and running in VSCode or equivalent local IDE
#### Note these are macOS specific
- Run the commands of the cell below in a terminal window if you need to clean up a local venv
  - Note if you copy and paste the entire cell and run as one you will get zsh: command not found: # errors because of the comments, but you can ignore
  - Remember to restart the kernel to refresh whats available

In [None]:
# if you have local host in your terminal prompt
unset HOST
# deactivate the venv
deactivate 
# remove it and its contents if not needed
rm -rf venv-agentcore 