# Creating a vectorstore with Amazon Bedrock multimodal-embeddings

This notebook provides you a step-by-step tutorial to populate a vector store in [Opensearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/). These vector embeddings will be used by the Bedrock Agent to search for similar images in the provided vectorstore.


**NOTE**: This notebook is required if you would like to the agent to be able to take the `/image_lookup` action, otherwise you can directly run the `Create_Agent.ipynb` notebook.

### Environment setup

This notebook has been tested in `conda_python3` Jupyter Notebook kernel with `ml.t3.medium`

### Prerequisite

Ensure you have an AWS account with permission to:

- Create security policy, access policy, collection, index, index mapping on OpenSearchServerless

- BatchGetCollection on OpenSearchServerless


#### Install the requirements
We need two libraries to run this notebook smoothly: 
1. opensearch-py is the python client for opensearch and 
2. request_aws4auth to request authentication for the OpenSearch service.

In [None]:
!pip install -q opensearch-py --quiet
!pip install -q requests_aws4auth --quiet

#### Download the dataset locally

In [None]:
!git clone https://github.com/alexeygrigorev/clothing-dataset.git

### Add all the dependencies/imports

In [None]:
import os
import boto3
from opensearchpy import AWSV4SignerAuth, OpenSearch, RequestsHttpConnection
from dependencies.opensearch_utils import OpenSearchIngestion
from dependencies.build_infrastructure_aoss import (
    createEncryptionPolicy, 
    createNetworkPolicy, 
    createAccessPolicy, 
    createCollection, 
    waitForCollectionCreation 
)
from dependencies.config import collection_name, index_name

In [None]:
boto3_session = boto3.Session()
identity_arn = boto3_session.client('sts').get_caller_identity()['Arn']
print("Current IAM Role ARN:", identity_arn)

In [None]:
# create a client for OSS
client = boto3.client('opensearchserverless')
service = 'aoss'
region = boto3_session.region_name
credentials = boto3_session.get_credentials()
AWSAUTH = AWSV4SignerAuth(credentials, region, "aoss")

## Create a vector database using OpenSearch Serverless

#### Create an OSS Collection 
A collection is a a group of OpenSearch Indexes that work together to support a specific workload. We chose the Serverless option to ensure scalability. Read more about [OSS Collections here](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-vector-search.html) 

In [None]:
createEncryptionPolicy(client, collection_name)
createNetworkPolicy(client, collection_name)
createAccessPolicy(client, collection_name, identity_arn)
createCollection(client, collection_name)
host, collection_id = waitForCollectionCreation(client, collection_name)
print(f"Host: {host}")

In [None]:
# Save collection_id to config.py file, which will be used when deleting resources
with open('dependencies/config.py', 'a') as file:
    file.write('\n# These 2 lines are imported from collection creation\n')
    file.write(f'\naoss_collection_id = "{collection_id}"\naoss_host = "{host}"\n')

#### Initialize an Opensearch client

In [None]:
# Create the OpenSearch client with SSL/TLS enabled.
OSSclient = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=AWSAUTH,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    pool_maxsize=20,
    timeout=300,
)
# Note: It can take up to a minute for data access rules to be enforced


### Create an index for the OpenSearch ingestion
OpenSearch Ingestion class (created in dependencies/opensearch_utils.py) contains helper functions for defining and creating an index, for ingesting the documents into the index.

In [None]:
oss_instance = OpenSearchIngestion(
    client=OSSclient,
    session=boto3_session
)

In [None]:
# create an index within the collection
oss_instance.create_index(index_name)
# index-mapping defines the fields, field-types and the search approach
oss_instance.create_index_mapping(index_name)

### Ingest the images

In [None]:
dataset_path = "clothing-dataset/images/"

# We limit the number of images for demo purposes - the entire dataset takes more than 20 minutes to ingest.
num_images_to_be_ingested = 100

In [None]:
failed = []
num_ingested_imgs = 0

for image_name in os.listdir(dataset_path):
    if image_name.endswith(".jpg") and (num_ingested_imgs < num_images_to_be_ingested):
        image = dataset_path + image_name
        try:
            (data, embedding) = oss_instance.create_titan_multimodal_embeddings(image_path=image)
            img_id = image.rsplit("/",1)[1].split(".")[0]
            body = {
                "vector_field": embedding["embedding"],
                "image_b64": data["inputImage"], 
                }
        except Exception as e:
            print(f"Exception thrown in image {image}: {e}")
            continue
        
        # Ingest the images one by one.
        status = oss_instance.client.index(
            index=index_name, 
            body=body, 
        )
        if status["result"] != "created":
            failed.append(image)
        else: 
            num_ingested_imgs += 1
        
print(f"Ingestion Complete. Failed ingestion for the following: {failed}")

### Clean up 
Clean up will be done together with all other agent assets in the notebook `Create_Agent.ipynb`