In this notebook, you will build an image search application with [Titan Multimodal Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html) and [LangChain](https://python.langchain.com/docs/get_started/introduction), and then storing those embeddings in [FAISS](https://python.langchain.com/docs/integrations/vectorstores/faiss/) vector database for later search.

You will learn how to:
- Generate embeddings for images.
- Build an image vector database.
- Query the vector database.


![Diagram](build_images_vector_db.jpg)

https://catalog.workshops.aws/building-with-amazon-bedrock/en-US/image-labs/bedrock-image-search


## Requirements: 
- Install boto3 - This is the [AWS SDK for Python ](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingTheBotoAPI.html)that allows interacting with AWS services. Install with `pip install boto3`.
- [Configure AWS credentials](https://docs.aws.amazon.com/braket/latest/developerguide/braket-using-boto3.html) - Boto3 needs credentials to make API calls to AWS.
- Install [Langchain](https://python.langchain.com/docs/get_started/introduction), a framework for developing applications powered by large language models (LLMs). Install with `pip install langchain`.

In [None]:
#!pip install boto3
#!pip install langchain

In [2]:
import os
import boto3
import json
import base64
from langchain_community.vectorstores import FAISS
from io import BytesIO
from PIL import Image

`get_multimodal_vector` function takes multimodal input like text and images and uses the Amazon Titan model via [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) [Amazon Bedrock](https://aws.amazon.com/bedrock/) API to generate a joint embedding vector. 

In [4]:
#calls Bedrock to get a vector from either an image, text, or both
def get_multimodal_vector(input_image_base64=None, input_text=None):
    
    session = boto3.Session()

    bedrock = session.client(service_name='bedrock-runtime') #creates a Bedrock client
    
    request_body = {}
    
    if input_text:
        request_body["inputText"] = input_text
        
    if input_image_base64:
        request_body["inputImage"] = input_image_base64
    
    body = json.dumps(request_body)
    
    response = bedrock.invoke_model(
    	body=body, 
    	modelId="amazon.titan-embed-image-v1", 
    	accept="application/json", 
    	contentType="application/json"
    )
    
    response_body = json.loads(response.get('body').read())
    
    embedding = response_body.get("embedding")
    
    return embedding

`get_vector_from_file` function takes an image file path, encodes the image to base64, generates an embedding vector using [Titan Multimodal Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html), and returns the vector - allowing images to be represented as vectors

In [5]:
#creates a vector from a file
def get_vector_from_file(file_path):
    with open(file_path, "rb") as image_file:
        input_image_base64 = base64.b64encode(image_file.read()).decode('utf8')
    
    vector = get_multimodal_vector(input_image_base64 = input_image_base64)
    
    return vector

Creates a list of (path, vector) tuples from a directory.

A sample of [Kaggle Animal Image Dataset (90 Different Animals)](https://www.kaggle.com/datasets/iamsouravbanerjee/animal-image-dataset-90-different-animals) is used in this app.

In [6]:
def get_image_vectors_from_directory(path_name):
    items = []
    sub_1 = os.listdir(path_name)
    for n in sub_1:
        if n.endswith('.jpg'):
            file_path = os.path.join(path_name,n)
            check_size_image(file_path)
            vector = get_vector_from_file(file_path)
            items.append((file_path, vector))
        else:
            for n_2 in os.listdir(path_name+"/"+n):
                if n_2.endswith('.jpg'):
                    file_path = os.path.join(path_name+"/"+n, n_2)
                    check_size_image(file_path)
                    vector = get_vector_from_file(file_path)
                    items.append((file_path, vector))
                else:
                    print("no a .jpg file: ",n_2)

    return items


In [7]:
def check_size_image(file_path):
    # Maximum image size supported is 2048 x 2048 pixel
    image = Image.open(file_path) #open image
    width, height = image.size # Get the width and height of the image in pixels
    if width > 2048 or height > 2048:
        print(f"Big File:{file_path} , width: {width}, height {height} px")
        dif_width = width - 2048
        dif_height = height - 2048
        if dif_width > dif_height:
            ave = 1-(dif_width/width)
            new_width = int(width*ave)
            new_height = int(height*ave)
        else:
            ave = 1-(dif_height/height)
            new_width = int(width*ave)
            new_height = int(height*ave)
        print(f"New file: {file_path} , width: {new_width}, height {new_height} px")

        new_image = image.resize((new_width, new_height))
        # Save New image
        new_image.save(file_path)
 
    return


### Creates and returns an in-memory vector store to be used in the application

In [9]:
def create_vector_db(path_name):
    image_vectors = get_image_vectors_from_directory(path_name)
        
    text_embeddings = [("", item[1]) for item in image_vectors]
    metadatas = [{"image_path": item[0]} for item in image_vectors]
        
    db = FAISS.from_embeddings(
        text_embeddings=text_embeddings,
        embedding = None,
        metadatas = metadatas
    )
    print(f"Vector Database:{db.index.ntotal} docs")
    return db

In [None]:
db = create_vector_db(path_name)

### [Save to a local Vector database.](https://python.langchain.com/docs/integrations/vectorstores/faiss/#as-a-retriever)

In [105]:
db_file = "animals.vdb"
db.save_local(db_file)
print(f"vectordb was saved in {db_file}")

vectordb was saved in animals.vdb


https://python.langchain.com/docs/modules/data_connection/vectorstores/ 

### [Query](https://python.langchain.com/docs/integrations/vectorstores/faiss/#querying) by text

In [None]:
query = "dog"
search_vector = get_multimodal_vector(input_text=query)
results = db.similarity_search_by_vector(embedding=search_vector)
for res in results:
    with open(res.metadata['image_path'], "rb") as f:
        img = BytesIO(f.read())
        image = Image.open(img)
        image.show()

### [Query](https://python.langchain.com/docs/integrations/vectorstores/faiss/#querying) by Image

In [None]:
query_path = 'animals/animals/cat/9d21019336.jpg'
vector = get_vector_from_file(query_path)
results = db.similarity_search_by_vector(embedding=vector)
for res in results:
    with open(res.metadata['image_path'], "rb") as f:
        img = BytesIO(f.read())
        image = Image.open(img)
        image.show()

### Load and Query local Vector database

In [11]:
from langchain_community.embeddings import BedrockEmbeddings # to create embeddings for the documents.
db_file = "animals.vdb"
bedrock_client              = boto3.client("bedrock-runtime") 
bedrock_embeddings          = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",client=bedrock_client)
new_db = FAISS.load_local(db_file, bedrock_embeddings, allow_dangerous_deserialization=True)

In [12]:
query = "cat"
search_vector = get_multimodal_vector(input_text=query)
results = new_db.similarity_search_by_vector(embedding=search_vector)
for res in results:
    with open(res.metadata['image_path'], "rb") as f:
        img = BytesIO(f.read())
        image = Image.open(img)
        image.show()

In [132]:
query_path = 'animals/animals/leopard/9cc45df890.jpg'
vector = get_vector_from_file(query_path)
results = db.similarity_search_by_vector(embedding=vector)
for res in results:
    with open(res.metadata['image_path'], "rb") as f:
        img = BytesIO(f.read())
        image = Image.open(img)
        image.show()

In [18]:
results

[Document(page_content='', metadata={'image_path': 'animals/animals/cat/9d21019336.jpg'}),
 Document(page_content='', metadata={'image_path': 'animals/animals/leopard/8a0751ed41.jpg'}),
 Document(page_content='', metadata={'image_path': 'animals/animals/leopard/9cc45df890.jpg'}),
 Document(page_content='', metadata={'image_path': 'animals/animals/cat/578d493138.jpg'})]

### [Delete Vectordb](https://python.langchain.com/docs/integrations/vectorstores/faiss/#delete)

You can also delete records from the vector store.

In [None]:
print("count before:", new_db.index.ntotal)
new_db.delete([new_db.index_to_docstore_id[0]])
print("count after:", new_db.index.ntotal)

Delete the entire database

In [None]:
new_db.delete