# Build a contextual text and image search engine for product recommendations using Amazon Bedrock (Titan Multimodal Embedding)

The solution presented provides an implementation for building a Amazon Titan Multilodal Embedding Model powered search engine prototype to retrieve and recommend products based on text or image queries. This is a step-by-step guide on how to use [Amazon Bedrock Titan models](https://aws.amazon.com/bedrock/titan) to encode images and text into embeddings, ingest embeddings into [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) index, and query the index using OpenSearch Service [k-nearest neighbors (KNN) functionality](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html).


It's recommended to execute the notebook in SageMaker Studio Notebooks `Python 3.0(Data Science)` Kernel with `ml.t3.medium` instance.

This notebook has been borrrowed from -- Bedrock samples link here -- [MultiModal Embeddings](https://github.com/aws-samples/amazon-bedrock-samples/tree/main/multimodal/titan-multimodal-embeddings)

![image.png](https://daxg39y63pxwu.cloudfront.net/images/blog/langchain/LangChain.webp
(185 kB))


Install dependencies

In [None]:
!pip install opensearch-py
!pip install requests-aws4auth
!pip install -U boto3
!pip install -U botocore
!pip install -U awscli
!pip install s3fs
!pip install sns
!pip install seaborn
!pip install sagemaker

In [None]:
%pip install -U --no-cache-dir boto3
%pip install -U --no-cache-dir  \
    "langchain>=0.1.11" \
    sqlalchemy -U \
    "faiss-cpu>=1.7,<2" \
    "pypdf>=3.8,<4" \
    pinecone-client==2.2.4 \
    apache-beam==2.52. \
    tiktoken==0.5.2 \
    "ipywidgets>=7,<8" \
    matplotlib==3.8.2 \
    anthropic==0.9.0
%pip install -U --no-cache-dir transformers

In [None]:
!pip install sqlalchemy --upgrade

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

## 1. Setup

Install some python packages we are going to use in the POC. For the sake of abstraction, we have defined all important function used in this notebook in utils.py

In [None]:
import boto3
import pandas as pd
import os
import re
import boto3
import json
import time
import base64
import logging
import numpy as np
import seaborn as sns
from PIL import Image
from io import BytesIO


from tqdm import tqdm
from urllib.parse import urlparse
from multiprocessing.pool import ThreadPool
from sagemaker.s3 import S3Downloader as s3down

# import sagemaker
# from utils import *
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth, helpers
from PIL import Image
import matplotlib.pyplot as plt

In [None]:
import boto3
import os
from IPython.display import Markdown, display, Pretty

# getting boto3 clients for required AWS services
sts_client = boto3.client('sts')
s3_client = boto3.client('s3')
#aoss_client = boto3.client('opensearchserverless')



region = os.environ.get("AWS_REGION")
boto3_bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)

## 2. Load publically available dataset

For this notebook, you are using the Amazon Berkeley Objects Dataset. The dataset is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalog images. 8,222 listings come with turntable photography. You will only make use of the item images and item name in US English (which is we consider as the product’s short description). For demo purposes you are going to use about 1,600 products for this practice. 

In [None]:
# Bedrock models
# Select Amazon titan-embed-image-v1 as Embedding model for multimodal indexing
multimodal_embed_model = f'amazon.titan-embed-image-v1'


""" 
Function to plot heatmap from embeddings
"""

def plot_similarity_heatmap(embeddings_a, embeddings_b):
    inner_product = np.inner(embeddings_a, embeddings_b)
    sns.set(font_scale=1.1)
    graph = sns.heatmap(
        inner_product,
        vmin=np.min(inner_product),
        vmax=1,
        cmap="OrRd",
    )

""" 
Function to fetch the image based on image id from dataset
"""
def get_image_from_item_id( item_id = "0", dataset = None, return_image=True):
 
    item_idx = dataset.query(f"item_id == {item_id}").index[0]
    img_path = dataset.iloc[item_idx].image_path
    
    if return_image:
        img = Image.open(img_path)
        return img, dataset.iloc[item_idx].item_desc
    else:
        return img_path, dataset.iloc[item_idx].item_desc
    print(item_idx,img_path)


""" 
Function to fetch the image based on image id from S3 bucket
"""
    
def get_image_from_item_id_s3(item_id = "B0896LJNLH", dataset = None, image_path = None,  return_image=True):

    item_idx = dataset.query(f"item_id == '{item_id}'").index[0]
    img_loc =  dataset.iloc[item_idx].img_full_path
    
    if img_loc.startswith('s3'):
        # download and store images locally 
        local_data_root = f'./data/images'
        local_file_name = img_loc.split('/')[-1]
 
        s3down.download(img_loc, local_data_root)
 
    local_image_path = f"{local_data_root}/{local_file_name}"
    
    if return_image:
        img = Image.open(local_image_path)
        return img, dataset.iloc[item_idx].item_name_in_en_us
    else:
        return local_image_path, dataset.iloc[item_idx].item_name_in_en_us

""" 
Function to display the images.
"""
def display_images(images: [Image], columns=2, width=20, height=8, max_images=15, label_wrap_length=50, label_font_size=8):
 
    if not images:
        print("No images to display.")
        return 
 
    if len(images) > max_images:
        print(f"Showing {max_images} images of {len(images)}:")
        images=images[0:max_images]
 
    height = max(height, int(len(images)/columns) * height)
    plt.figure(figsize=(width, height))
    for i, image in enumerate(images):
 
        plt.subplot(int(len(images) / columns + 1), columns, i + 1)
        plt.imshow(image)
 
        if hasattr(image, 'name_and_score'):
            plt.title(image.name_and_score, fontsize=label_font_size); 
            




### 2.1 Data overview and preparation

Load the metadata

You can use pandas to load metadata, then select products which have titles in US English from the data frame. You will use a column called main_image_id to merge item name with item image later.

In [None]:
meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_0.json.gz", lines=True)
def func_(x):
    us_texts = [item["value"] for item in x if item["language_tag"] == "en_US"]
    return us_texts[0] if us_texts else None

meta = meta.assign(item_name_in_en_us=meta.item_name.apply(func_))
meta = meta[~meta.item_name_in_en_us.isna()][["item_id", "item_name_in_en_us", "main_image_id"]]
print(f"#products with US English title: {len(meta)}")
meta.head()

You should be able to see over 1600 products in the data frame.
Next, you can link the item names with item images. `images/metadata/images.csv.gz` contains Image metadata. This file is a gzip-compressed comma-separated value (CSV) file with the following columns: `image_id`, `height`, `width`, and `path`. You can read the meta data file and then merge it with item metadata.

In [None]:
image_meta = pd.read_csv("s3://amazon-berkeley-objects/images/metadata/images.csv.gz")
dataset = meta.merge(image_meta, left_on="main_image_id", right_on="image_id")

In [None]:
# Create a new column in dataset with FULL PATH of the image
dataset = dataset.assign(img_full_path=f's3://amazon-berkeley-objects/images/small/' + dataset.path.astype(str))
dataset

You can have a look at one sample image from the dataset by running the following code

In [None]:
image, item_name = get_image_from_item_id_s3(item_id = "B0896LJNLH", dataset = dataset, image_path = f's3://amazon-berkeley-objects/images/small/' )
print(item_name)
image

## 3. Generate embedding from item images

Amazon Titan Multimodal Embeddings G1 Generation 1 (G1) is able to project both images and text into the same latent space, so we only need to encode item images or texts into embedding space. In this practice, you can use [batch inference](https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html) to encode item images. Before creating the job, you need to copy item images from Amazon Berkeley Objects Dataset public S3 bucket to your own S3 Bucket. The operation needs take less than 10 mins.

But for this notebook, we'll use real-time API than batch inference. 

In [None]:

batch_size=10
dataset = dataset.iloc[:batch_size]
dataset

In [None]:
# for i in enumerate(zip(dataset['img_full_path'], dataset['item_name_in_en_us'])):
#     print(i[0], i[1])
for img_details in enumerate(zip(dataset['img_full_path'], dataset['item_name_in_en_us'])):
    print(img_details[0], img_details[1])

In [None]:
%%time


def get_titan_multimodal_embedding_fix(
    image_path:str=None,  # maximum 2048 x 2048 pixels
    description:str=None, # English only and max input tokens 128
    dimension:int=1024,   # 1,024 (default), 384, 256
    model_id:str=multimodal_embed_model
):
    # print(image_path)
    # print(description)
    payload_body = {}
    embedding_config = {
        "embeddingConfig": { 
             "outputEmbeddingLength": dimension
         }
    }
    # You can specify either text or image or both
    if image_path:
        if image_path.startswith('s3'):
            s3 = boto3.client('s3')
            bucket_name, key = image_path.replace("s3://", "").split("/", 1)
            obj = s3.get_object(Bucket=bucket_name, Key=key)
            # Read the object's body
            body = obj['Body'].read()
            # Encode the body in base64
            base64_image = base64.b64encode(body).decode('utf-8')
            payload_body["inputImage"] = base64_image
        else:   
            with open(image_path, "rb") as image_file:
                input_image = base64.b64encode(image_file.read()).decode('utf8')
            payload_body["inputImage"] = input_image
    if description:
        payload_body["inputText"] = description

    # print(payload_body)
    # print(json.dumps({**payload_body, **embedding_config}))
    print(f" get_titan_multimodal_embedding_fix()::payload:keys={payload_body.keys()}::")
    response = boto3_bedrock.invoke_model(
        body=json.dumps({**payload_body, **embedding_config}), 
        modelId=model_id,
        accept="application/json", 
        contentType="application/json"
    )

    return json.loads(response.get("body").read())


multimodal_embeddings_img = []
for img_details in enumerate(zip(dataset['img_full_path'], dataset['item_name_in_en_us'])):
    #print(img_details[1])
    embedding = get_titan_multimodal_embedding_fix(description=img_details[1][1], image_path=img_details[1][0], dimension=1024)["embedding"]
    print(np.array(embedding).shape)
    multimodal_embeddings_img.append(embedding)


dataset = dataset.assign(embedding_img=multimodal_embeddings_img)

In [None]:
dataset.head()

In [None]:
dataset['item_name_in_en_us'].to_list()

### 3.1 Visualize the Image Embedding
Let's visualize the embedding 

In [None]:
plot_similarity_heatmap(multimodal_embeddings_img[:batch_size], multimodal_embeddings_img[:batch_size])

### 3.2 [OPTIONAL] Store datset

In [None]:
# Store dataset
# dataset.to_csv('dataset.csv', index = False)

## 4. Create a vector store - FAISS In memory vector store

Before creating the new vector search collection and index, we must first create three associated OpenSearch policies: encryption security policy, network security policy, and data access policy. 

### 4.1 Create a new FAISS vector Database

we will use the metat data to store the image location so we can read the image back from the vector db

In [None]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.schema import Document



### 4.2 Setting up the In-Memory KNN search

In [None]:
metadata_dict =  [ {key:value} for i, (key, value) in enumerate(zip(dataset['item_name_in_en_us'].to_list(), dataset['img_full_path'].to_list()))] 

metadata_dict#['AmazonBasics Serene 16-Piece Old Fashioned and Coolers Glass Drinkware Set']

### 4.3 Ingest the image embeddings

Next you need to loop through your dataset and ingest items data into the cluster. A more robust and scalable solution for the embedding ingestion can be found in [Ingesting enriched data into Amazon ES](https://aws.amazon.com/blogs/industries/novartis-ag-uses-amazon-elasticsearch-k-nearest-neighbor-knn-and-amazon-sagemaker-to-power-search-and-recommendation/). The data ingestion for this POC should finish within 60 seconds. It also executes a simple query to verify the data have been ingested into the index.

In [None]:
#collection = aoss_client.create_collection(name=vector_store_name,type='VECTORSEARCH')
# create vector store
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.schema import Document

from langchain.embeddings import BedrockEmbeddings
from langchain.vectorstores import FAISS

multimodal_embed_model = f'amazon.titan-embed-image-v1'
# create instantiation to embedding model
embedding_model = BedrockEmbeddings(
    client=boto3_bedrock,
    model_id=multimodal_embed_model
)

text_embedding_pairs = zip(dataset['item_name_in_en_us'].to_list(), multimodal_embeddings_img)
#metadata_dict =  dict ( [(key, value) for i, (key, value) in enumerate(zip(dataset['item_name_in_en_us'].to_list(), dataset['img_full_path'].to_list()))] )


db = FAISS.from_embeddings(text_embedding_pairs, embedding_model, metadatas=metadata_dict)

In [None]:
query_prompt = "drinkware glass"

v = embedding_model.embed_query(query_prompt)
print(v[0:10])
results = db.similarity_search_by_vector(v, k=2)
display(Markdown('Let us look at the documents which had the relevant information pertaining to our query'))
for r in results:
    display(Markdown(f'{r.page_content}'), Markdown(f'{r.metadata}'))
    display(Markdown(f'------------------------------------'))

In [None]:
print(results[0].metadata.values())
print(results[0].metadata.keys())

## 5. Perform a real-time Multimodal Search

Now that you have a working OpenSearch index to contain embeddings for your inventory, let's have a look at how you can generate embedding for new items. You'll use Amazon Titan Multimodal Embeddings G1 Generation 1 (G1) extracting text features and image features. 

Let’s take a look at the results of a simple query. After retrieving results from the OpenSearch service, we get the item names and images from dataset.

In [None]:
def get_image_from_faiss_results(results=None):
    image_list = []
    for img_path in [s3_path for result in results for s3_path in result.metadata.values()]:
        print(img_path)

        if img_path.startswith('s3'):
            # download and store images locally
            local_data_root = f'./data/images'
            local_file_name = img_path.split('/')[-1]
 
            s3down.download(img_path, local_data_root)
 
            local_image_path = f"{local_data_root}/{local_file_name}"
    
        img = Image.open(local_image_path)
        image_list.append(img)

    return image_list

### 5.1. Perform Image Search based on Text Input

Let’s take a look at the results of a simple query. In below example, we'll receive an text input i.e. "drinkware glass" from user, and then will send it to search engine to find the similar items.

Find the similar items based on use queries. You can see that we found glass drinkware from our dataset based on the input query. That's what we want to achieve.

query_prompt = "drinkware glass"
v = embedding_model.embed_query(query_prompt)
results = db.similarity_search_by_vector(v, k=2)

all_images = get_image_from_faiss_results(results)

display_images(all_images)



### 5.2 Perform Image Search based on Image Input

Let’s take a look at the results based on a simple image. The input image will get coverted into vector embeddings and based on the similarity search, it will return the result,

You can use any image, but for below example, we'll select a random image from the above dataset based on item_id (for ex. item_id = "B07JCDQWM6" ),  and then will send this image to search engine to find the similar items. First, Let's get the image amd image location based on the item id.

In [None]:
item_id = "B0896LJNLH"

image, item_name = get_image_from_item_id_s3(item_id = item_id, dataset = dataset, image_path = f's3://amazon-berkeley-objects/images/small/' )
print(item_name)
image

Then, get the similar items based on the image above

In [None]:
""" 
Function for semantic search capability using knn on input image prompt.
"""
def find_similar_items_from_image(image_path: str, k_nn: int ) -> []:
    """
    Main semantic search capability using knn on input image prompt.
    Args:
        k: number of top-k similar vectors to retrieve from OpenSearch index
        num_results: number of the top-k similar vectors to retrieve
        index_name: index name in OpenSearch
    """
    query_emb = get_titan_multimodal_embedding_fix(image_path=search_image_path, dimension=1024)["embedding"]
    #print(query_emb)
    results = db.similarity_search_by_vector(query_emb, k=k_nn)
    print(results)
    image_list = get_image_from_faiss_results(results)
    return image_list


In [None]:
item_id = "B0896LJNLH"
search_image_path = dataset[dataset['item_id']==item_id]['img_full_path'].iloc[0]
print(search_image_path)

image_list = find_similar_items_from_image(search_image_path, 2)
display_images(image_list)

## Query an Image - Multimodal model Claude sonnet

 Now let us send in a query based on an image. The image is a generic flights dashboard

In [None]:
from PIL import Image

with open("./images/departure_rate.jpg", "rb") as image_file:
    content_image = base64.b64encode(image_file.read()).decode('utf8')

In [None]:
type(content_image)

In [None]:
import json 
import boto3
import os
from IPython.display import Markdown, display



region = os.environ.get("AWS_REGION")
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)

with open("./images/departure_rate.jpg", "rb") as image_file:
    content_image = base64.b64encode(image_file.read()).decode('utf8')

body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 100,
            "messages": [{
                "role": "user",
                "content": [{
                  "type": "image",
                  "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": content_image,
                  }
                },
                {
                  "type": "text",
                  "text": "Give me the flight timings from here."
                }
                ]
            }],
            "temperature": 0.5,
            "top_p": 0.9
        }  
    )  
modelId = "anthropic.claude-3-sonnet-20240229-v1:0"
accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())
response_body



## 6. Clean up

When you finish this exercise, remove your resources with the following steps:

Delete vector index.
Delete data, network, and encryption access ploicies.
Delete collection.
Delete SageMaker Studio user profile and domain.
Optionally, empty and delete the S3 bucket, or keep whatever you want.  

In [None]:
# - since these are in memory nothing to delete