# Guide to Using Cohere's Embed V3 Multimodal Model on Amazon Sagemaker

Cohere's embeddings model, Embed 3 is an industry-leading AI search model that is designed to transform semantic search and generative AI applications. Cohere Embed 3 is now multimodal and it is capable of generating embeddings from both text and images. This enables enterprises to unlock real value from their vast amounts of data that exist in image form. Businesses can now build systems that accurately search important multimodal assets such as complex reports, ecommerce product catalogs, and design files to boost workforce productivity. This upgrade makes Embed 3 the most generally capable multimodal embedding model on the market. 

## Getting Started

This sample notebook will be using Cohere Embed v3 family of models using Amazon SageMaker:
[Cohere Embed Model v3 - English](https://aws.amazon.com/marketplace/pp/prodview-qd64mji3pbnvk)


> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Use kernel either *conda_python3*, *conda_pytorch_p310* or *conda_tensorflow2_p310*.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to one of the models listed above. If so, skip step: [Subscribe to the model package](#1.-Subscribe-to-the-model-package)


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### Step 1: Imports and Install Dependencies

In [46]:
!pip install striprtf hnswlib --quiet
!pip install boto3 --quiet
!pip install cohere-aws==0.8.16 --quiet

In [None]:
import boto3
from cohere_aws import Client
from striprtf.striprtf import rtf_to_text
from PIL import Image
import base64
import hnswlib
from io import BytesIO
import json
import numpy as np
from PIL import Image
import os

### Step 2: Create an endpoint and perform real-time inference

One you subscribe to the model on AWS Marketplace, a model ARN will be available for you to use and copy as seen below.

In [47]:
# Set model_package variables for endpoint creation
model_package = "arn:aws:sagemaker:us-east-1:865070037744:model-package/cohere-embed-english-v3-7-6d097a095fdd314d90a8400a620cac54"

In [48]:
# List existing IAM roles to identify the existing sagemamker execution role
iam = boto3.client('iam')
roles = iam.list_roles(
    PathPrefix='/service-role/',
    MaxItems=100
)
for role in roles['Roles']:
    if 'sagemaker.amazonaws.com' in role['AssumeRolePolicyDocument']['Statement'][0]['Principal']['Service']:
        execution_role_arn = role['Arn']
        break

In [49]:
# Create a real-time inference endpoint
sagemaker = boto3.client('sagemaker')
sagemaker_runtime = boto3.client('sagemaker-runtime')

---
**Start of section to only run cells once if endpoint does not exist yet**

Below is showing how to create a model, endpoint configuration and then the sagemaker endpoint after you have subscribed to the embed V3 model in AWS Marketplace. If you already have your endpoint or it was created on the AWS console, then just replace with the endpoint name you have used after the next 3 cells

In [None]:
# Create model
sagemaker.create_model(ModelName='Model-Cohere-Embed-Model-v3-English-1',
    ExecutionRoleArn=execution_role_arn,
    PrimaryContainer={
        'ModelPackageName': model_package
    },
    EnableNetworkIsolation=True)

In [None]:
# Create endpoint config and endpoint
sagemaker.create_endpoint_config(EndpointConfigName='EndpointConfig-Cohere-Embed-Model-v3-English-1',
    ProductionVariants=[
        {
            'VariantName': 'variant-1',
            'ModelName': 'Model-Cohere-Embed-Model-v3-English-1',
            'InstanceType': 'ml.g5.xlarge',
            'InitialInstanceCount': 1
        }
    ])

In [None]:
# Create endpoint
sagemaker.create_endpoint(
    EndpointName='Endpoint-Cohere-Embed-Model-v3-English-1',
    EndpointConfigName='EndpointConfig-Cohere-Embed-Model-v3-English-1'
)

---
**End of section to only run cells once**

Next, we want to ensure that our endpoint status is "InService"

In [None]:
# Check endpoint status, keep running the cell for new updates!
def check_endpoint_status(endpoint_name):
    try:
        response = sagemaker.describe_endpoint(EndpointName=endpoint_name)
        return response['EndpointStatus']
    except Exception as e:
        print(f"Error checking endpoint status: {e}")
        return None

# Example usage
endpoint_name = 'Endpoint-Cohere-Embed-Model-v3-English-1'
status = check_endpoint_status(endpoint_name)
if status:
    print(f"Endpoint status: {status}")
else:
    print("Error getting endpoint status.")

### Step 2: Sagemaker Embed Function

For embedding functionality, Amazon SageMaker doesn't include a native embedding method like Cohere's co.embed() as it's a service designed to be a hosting platform for many models allowing for flexibility of model choice and provider. Below walks through an example function to use and reuse to run the embeddings model on Amazon Sagemaker endpoint.


In [60]:
# Define Sagemker Embed function, will be used later
def sagemaker_embed(texts, model="embed-english-v3.0", input_type="search_document", truncate="END"):
    payload = {
        "texts": texts,
        "model": model,
        "input_type": input_type,
        "truncate": truncate
    }
    
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    
    result = json.loads(response['Body'].read().decode("utf-8"))
    return result['embeddings']

In [61]:
#If you want to test the above function, uncomment out the code below
#texts = ["Testing the embeddings"]
#sagemaker_embed(texts)

### Step 3: Use Cohere's Multimodel Embeddings V3 to Embed Images
For this notebook we have generated 5 images that we will step through embedding via the multimodal embeddings model. We will then show how to run a query against these embeddings to return the most relevant images based on a sample natural language query. 

You will see a folder called "content" which contains both .png images from a sample e-commerce site within the "image_files" folder.

In [86]:
# Function to convert an image to a data url
def image_to_base64_data_url(image_path):
    with open(image_path, "rb") as f:
        enc_img = base64.b64encode(f.read()).decode('utf-8')
        enc_img = f"data:image/png;base64,{enc_img}"

    payload = {
        "model": "embed-english-v3.0",
        "input_type": 'image',
        "embedding_types": ["float"],
        "images": [enc_img]
    }

    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    result = json.loads(response['Body'].read().decode("utf-8"))
    return result

In [87]:
folder_path = 'content/image_files'
files = [f for f in os.listdir(folder_path) if '.ipynb_checkpoints' not in f]
#files = os.listdir(folder_path)
embedding_objects = []
embeddings = []
file_paths = []

for file in files:
    file_path = os.path.join(folder_path, file)
    res = image_to_base64_data_url(file_path)
    file_paths.append(file_path)
    embeddings.append(res['embeddings']['float'][0])  # Assuming the response structure matches Cohere's
    embedding_objects.append(res)

Next, for purposes of this notebook we will be using hnsw which is a graph-based algorithm that performs approximate nearest neighbor searches (ANN) in vector databases.

In [88]:
# Create the hnsw index for images
size = (200, 200)
image_index = hnswlib.Index(space='cosine', dim=1024)
image_index.init_index(max_elements=len(embeddings), ef_construction=512, M=64)
image_index.add_items(embeddings,list(range(len(embeddings))))

Now let's assume for an ecommerce application, a user asked "Avocado Dog Toy" when searching for items.

In [89]:
# Set these paramters and query your database
query = ["Avocado Dog Toy"]
top_k=5

#convert natural language query into embeddings
query_emb = sagemaker_embed(query)

In [90]:
# Comparing cosine similarity score between the query embedded above and the images that we previously embedded into vectors
res = image_index.knn_query(query_emb, k=top_k)

In [91]:
image_index=res[0][0]
image_scores=res[1][0]

In [None]:
# For the full list of images grabbed we will iterate through the results
for x in range(0,len(image_index)):
    print(f"Ranking of Relevance:{x+1} with a distance of: {image_scores[0]:.2f}")
    img = Image.open(file_paths[image_index[x]])
    img_resized = img.resize(size)
    display(img_resized)

### Step 4: Clean Up

If the endpoint was created by the execution of this notebook, then make sure to delete the endpoint after completion to avoid charges. Skip the below step if you are connecting to an rerank existing endpoint

In [37]:
# Delete the endpoint
#Skip this step if created through the AWS console

sagemaker.delete_endpoint(EndpointName='Endpoint-Cohere-Embed-Model-v3-English-1')
sagemaker.close()

**Note, if you need to create the same endpoint again, run the create_endpoint() function in the cell previously generated only. No need to run create_model() and create_endpoint_config()

## Conclusion

In this notebook we walked through how to leverage Cohere's Embed V3 multimodal model, capable of generating embeddings from both text and images. This enables enterprises to unlock value from their vast image data, allowing them to build powerful search and recommendation systems across multimodal assets like product catalogs, design files, and business reports. 

Cohere Embed 3 is now available on Amazon SageMaker, allowing customers to seamlessly deploy this state-of-the-art multimodal embeddings model and leverage it in their own applications. Key use cases include enhanced e-commerce search, efficient data-driven decision making with visual insights, and streamlined creative workflows. Cohere's multimodal embeddings can further improve semantic search when combined with Cohere's Rerank models, providing more contextual relevance to generative AI systems.