# Creating a vectorstore with Amazon Bedrock multimodal-embeddings

This notebook gives a step-by-step tutorial to populate a vector database in [Opensearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/). These vector embeddings will be used by the Bedrock Agent to search for similar images in the provided vectorstore.

This notebook is required if you would like to the agent to be able to take the `/image_look_up` action, otherwise you can directly run the `Create_Fashion_Agent.ipynb` notebook.

### Environment setup

This has been tested in `conda_python3` Jupyter Notebook kernel with `ml.t3.medium`

### Prerequisite

Ensure you have an AWS account with permission to:

- Create security policy, access policy, collection, index, index mapping on OpenSearchServerless

- BatchGetCollection on OpenSearchServerless

#### Install the requirements

In [1]:
!pip install -q opensearch-py --quiet
!pip install -q requests_aws4auth --quiet

#### Download the dataset locally

In [3]:
!git clone https://github.com/orbitalsonic/Fashion-Dataset-Images-Western-Dress.git

Cloning into 'Fashion-Dataset-Images-Western-Dress'...
remote: Enumerating objects: 594, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 594 (delta 1), reused 1 (delta 0), pack-reused 589 (from 1)[K
Receiving objects: 100% (594/594), 118.65 MiB | 35.36 MiB/s, done.
Resolving deltas: 100% (1/1), done.
Updating files: 100% (584/584), done.


### Add all the dependencies/imports

In [3]:
import os
import boto3
from opensearchpy import AWSV4SignerAuth, OpenSearch, RequestsHttpConnection
from dependencies.opensearch_utils import OpensearchIngestion
from dependencies.build_infrastructure_aoss import (
    createEncryptionPolicy, 
    createNetworkPolicy, 
    createAccessPolicy, 
    createCollection, 
    waitForCollectionCreation 
)
from dependencies.config import collection_name, index_name

In [4]:
boto3_session = boto3.Session()
identity_arn = boto3_session.client('sts').get_caller_identity()['Arn']
print("Current IAM Role ARN:", identity_arn)

Current IAM Role ARN: arn:aws:sts::525407566630:assumed-role/AmazonSageMaker-ExecutionRole-20240627T165261/SageMaker


In [5]:
# create a client for OSS
client = boto3.client('opensearchserverless')
service = 'aoss'
region = boto3_session.region_name
credentials = boto3_session.get_credentials()
AWSAUTH = AWSV4SignerAuth(credentials, region, "aoss")

## Create a vector database using Opensearch Serverless

#### Create a collection

In [6]:
createEncryptionPolicy(client, collection_name)
createNetworkPolicy(client, collection_name)
createAccessPolicy(client, collection_name, identity_arn)
createCollection(client, collection_name)
host, collection_id = waitForCollectionCreation(client, collection_name)


Encryption policy created:
{'securityPolicyDetail': {'createdDate': 1727109120677, 'description': 'Encryption policy for fashion-collection-new collections', 'lastModifiedDate': 1727109120677, 'name': 'fashion-collection-new-policy', 'policy': {'Rules': [{'Resource': ['collection/fashion-collection-new*'], 'ResourceType': 'collection'}], 'AWSOwnedKey': True}, 'policyVersion': 'MTcyNzEwOTEyMDY3N18x', 'type': 'encryption'}, 'ResponseMetadata': {'RequestId': 'ce894741-e2d9-4348-a321-77a263bc7502', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'ce894741-e2d9-4348-a321-77a263bc7502', 'date': 'Mon, 23 Sep 2024 16:32:00 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '378', 'connection': 'keep-alive'}, 'RetryAttempts': 0}}

Network policy created:
{'securityPolicyDetail': {'createdDate': 1727109120796, 'description': 'Network policy for fashion-collection-new collections', 'lastModifiedDate': 1727109120796, 'name': 'fashion-collection-new-policy', 'policy':

In [7]:
# Save collection_id to config.py file, which will be used when deleting resources
with open('dependencies/config.py', 'a') as file:
    file.write('\n# These 2 lines are imported from collection creation\n')
    file.write(f'\naoss_collection_id = "{collection_id}"\naoss_host = "{host}"\n')

#### Initialize an Opensearch client

In [8]:
# Create the client with SSL/TLS enabled.
OSSclient = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=AWSAUTH,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    pool_maxsize=20,
    timeout=3000,
)

### Create an index for the Opensearch ingestion
Opensearch Ingestion class (created in opensearch_utils.py) contains helper functions for the document processing and ingestion into the index

In [9]:
oss_instance = OpensearchIngestion(
    client=OSSclient,
    session=boto3_session
)

In [10]:
oss_instance.create_index(index_name)
oss_instance.create_index_mapping(index_name)

[2024-09-23 16:49:42,124] p11340 {base.py:258} INFO - PUT https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new [status:200 request:0.399s]
[2024-09-23 16:49:42,204] p11340 {base.py:258} INFO - PUT https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new/_mapping [status:200 request:0.079s]


True

### Ingest the images

In [11]:
dataset_path = "Fashion-Dataset-Images-Western-Dress/WesternDress_Images/"

In [None]:
failed = []
for image_name in os.listdir(dataset_path):
    image = dataset_path+image_name
    try:
        (data, embedding) = oss_instance.create_titan_multimodal_embeddings(image_path=image)
        img_id = image.rsplit("/",1)[1].split(".")[0]
        # print(img_id)
        body = {
            "vector_field": embedding["embedding"],
            "image_b64": data["inputImage"], 
            }
    except Exception as e:
        print(f"Exception thrown in image {image}: {e}")
        continue
    # Ingest the images one by one.
    status = oss_instance.client.index(
        index=index_name, 
        body=body, 
    )
    if status["result"] != "created":
        failed.append(image)
        
print(f"Ingestion Complete. Failed ingestion for the following: {failed}")

[2024-09-23 16:49:53,404] p11340 {base.py:258} INFO - POST https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new/_doc [status:201 request:1.728s]
[2024-09-23 16:49:54,240] p11340 {base.py:258} INFO - POST https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new/_doc [status:201 request:0.704s]
[2024-09-23 16:49:55,257] p11340 {base.py:258} INFO - POST https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new/_doc [status:201 request:0.877s]
[2024-09-23 16:49:55,764] p11340 {base.py:258} INFO - POST https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new/_doc [status:201 request:0.369s]
[2024-09-23 16:49:56,274] p11340 {base.py:258} INFO - POST https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new/_doc [status:201 request:0.340s]
[2024-09-23 16:49:56,672] p11340 {base.py:258} INFO - POST https://gbwzu10kwdcxw1kqiu2d.us-east-1.aoss.amazonaws.com:443/images-index-new/_doc [s

##### Clean up will be done together with all other agent assets